Prompt Caching

XeroML SDKs cache prompts in memory after the first fetch. This means prompt retrieval adds no latency to your application after the initial load — subsequent fetches are served from the local cache, as fast as reading a local variable.

Default Behavior

First fetch: A network request is made to the XeroML API to retrieve the prompt
Subsequent fetches: Served from in-process memory cache
Default TTL: 300 seconds (5 minutes)
After TTL expiry: The next fetch makes a background network request to refresh; the stale version is served immediately while the refresh happens in the background (stale-while-revalidate pattern)

This means your application never blocks on a prompt refresh, even after the cache expires.

Configuring Cache TTL

Set a custom TTL (in seconds) when fetching a prompt:

from xeroml import get_client

xeroml = get_client()

# Cache for 60 seconds
prompt = xeroml.get_prompt("my-prompt", cache_ttl_seconds=60)

# Disable caching (always fetch fresh)
prompt = xeroml.get_prompt("my-prompt", cache_ttl_seconds=0)

TypeScript:

const prompt = await xeroml.getPrompt("my-prompt", { cacheTtlSeconds: 60 });

Disabling the Cache

Set cache_ttl_seconds=0 to always fetch fresh from the API. This is appropriate for:

Staging environments where you want prompt changes to be immediate
Local development when iterating rapidly on prompts
Low-traffic services where the latency cost is acceptable

Cache Invalidation

The cache is per-process and per-label. If you update the production label to point to a new version, running instances will see the update within their TTL window (up to 5 minutes by default).

For immediate propagation in production, either:

Reduce the TTL for that prompt
Restart your application instances
Call the internal cache invalidation method (SDK-specific, see API reference)

Availability Resilience

If the XeroML API is unavailable when the cache expires, the SDK continues serving the cached version. Prompts are not a point of failure — your application will use the last known good version until connectivity is restored.