Skip to content

Prompt Caching

XeroML SDKs cache prompts in memory after the first fetch. This means prompt retrieval adds no latency to your application after the initial load — subsequent fetches are served from the local cache, as fast as reading a local variable.

Default Behavior

  • First fetch: A network request is made to the XeroML API to retrieve the prompt
  • Subsequent fetches: Served from in-process memory cache
  • Default TTL: 300 seconds (5 minutes)
  • After TTL expiry: The next fetch makes a background network request to refresh; the stale version is served immediately while the refresh happens in the background (stale-while-revalidate pattern)

This means your application never blocks on a prompt refresh, even after the cache expires.

Configuring Cache TTL

Set a custom TTL (in seconds) when fetching a prompt:

from xeroml import get_client
xeroml = get_client()
# Cache for 60 seconds
prompt = xeroml.get_prompt("my-prompt", cache_ttl_seconds=60)
# Disable caching (always fetch fresh)
prompt = xeroml.get_prompt("my-prompt", cache_ttl_seconds=0)

TypeScript:

const prompt = await xeroml.getPrompt("my-prompt", { cacheTtlSeconds: 60 });

Disabling the Cache

Set cache_ttl_seconds=0 to always fetch fresh from the API. This is appropriate for:

  • Staging environments where you want prompt changes to be immediate
  • Local development when iterating rapidly on prompts
  • Low-traffic services where the latency cost is acceptable

Cache Invalidation

The cache is per-process and per-label. If you update the production label to point to a new version, running instances will see the update within their TTL window (up to 5 minutes by default).

For immediate propagation in production, either:

  1. Reduce the TTL for that prompt
  2. Restart your application instances
  3. Call the internal cache invalidation method (SDK-specific, see API reference)

Availability Resilience

If the XeroML API is unavailable when the cache expires, the SDK continues serving the cached version. Prompts are not a point of failure — your application will use the last known good version until connectivity is restored.