Prompt Caching
XeroML SDKs cache prompts in memory after the first fetch. This means prompt retrieval adds no latency to your application after the initial load — subsequent fetches are served from the local cache, as fast as reading a local variable.
Default Behavior
- First fetch: A network request is made to the XeroML API to retrieve the prompt
- Subsequent fetches: Served from in-process memory cache
- Default TTL: 300 seconds (5 minutes)
- After TTL expiry: The next fetch makes a background network request to refresh; the stale version is served immediately while the refresh happens in the background (stale-while-revalidate pattern)
This means your application never blocks on a prompt refresh, even after the cache expires.
Configuring Cache TTL
Set a custom TTL (in seconds) when fetching a prompt:
from xeroml import get_client
xeroml = get_client()
# Cache for 60 secondsprompt = xeroml.get_prompt("my-prompt", cache_ttl_seconds=60)
# Disable caching (always fetch fresh)prompt = xeroml.get_prompt("my-prompt", cache_ttl_seconds=0)TypeScript:
const prompt = await xeroml.getPrompt("my-prompt", { cacheTtlSeconds: 60 });Disabling the Cache
Set cache_ttl_seconds=0 to always fetch fresh from the API. This is appropriate for:
- Staging environments where you want prompt changes to be immediate
- Local development when iterating rapidly on prompts
- Low-traffic services where the latency cost is acceptable
Cache Invalidation
The cache is per-process and per-label. If you update the production label to point to a new version, running instances will see the update within their TTL window (up to 5 minutes by default).
For immediate propagation in production, either:
- Reduce the TTL for that prompt
- Restart your application instances
- Call the internal cache invalidation method (SDK-specific, see API reference)
Availability Resilience
If the XeroML API is unavailable when the cache expires, the SDK continues serving the cached version. Prompts are not a point of failure — your application will use the last known good version until connectivity is restored.