Observability & Application Tracing
AI applications are non-deterministic by nature. The same input can produce different outputs depending on the model, the prompt version, retrieved context, or external tool state. This makes traditional debugging approaches — inspecting logs, adding print statements — insufficient for production LLM systems.
Well-implemented observability gives you the tools to understand what’s happening inside your application and why.
What is Application Tracing?
Application tracing captures structured logs of every request that flow through your system. For LLM applications this means recording:
- The exact prompt sent to the model (including system instructions and context)
- The model’s response and any tool calls made
- Token usage and associated cost
- Latency for every step in the pipeline
- Retrieval steps, embeddings, and any non-LLM operations
XeroML captures this data automatically during development and production with no manual instrumentation required for supported frameworks. The result is a detailed, searchable log of every request your application handles.
Getting Started
The fastest way to start is to instrument your existing application. XeroML supports 50+ frameworks and libraries natively.
Once you have traces flowing, explore the core concepts to understand how XeroML structures your data:
→ Data Model: Traces, Observations, Sessions
What You Can Do with Traces
After instrumentation, traces unlock the following workflows:
Debugging production issues When something goes wrong in production, open the trace for that request. You’ll see the exact prompt, context, and response that caused the problem — not a stack trace pointing at library internals.
Performance analysis Identify slow steps in your pipeline. Token usage summaries and per-step latency let you spot bottlenecks in retrieval, model calls, or post-processing.
Evaluation Traces are the input to XeroML’s evaluation system. Run LLM-as-a-judge evaluators on live traces, create datasets from interesting examples, and track quality over time.
Prompt iteration When prompts are managed in XeroML, every trace links back to the prompt version that generated it. Compare quality across versions directly.
Core Features
| Feature | Description |
|---|---|
| Sessions | Group multiple traces from a single conversation or workflow |
| Environments | Separate production, staging, and development data |
| Tags | Categorize traces for filtering and reporting |
| Users | Track per-user token usage, costs, and feedback |
| Metadata | Attach arbitrary key-value data to any trace or observation |
| Releases & Versioning | Tag traces with application version for regression tracking |
FAQ
What’s the difference between observability and tracing?
Observability is the broader capability — the ability to understand a system’s internal state from its external outputs. It encompasses metrics, logs, and tracing. Application tracing is one specific tool within observability: it records the complete flow of a request, including every operation, its timing, inputs, and outputs.
How does XeroML differ from general APM tools?
General APM tools (Datadog, New Relic, etc.) are designed for traditional request/response services. They don’t natively understand token counts, model parameters, prompt templates, or evaluation scores. XeroML is built specifically for LLM applications and understands the semantics of your AI workloads out of the box.
What’s the performance impact?
XeroML SDKs send tracing data asynchronously in the background. Traces are batched locally before sending, so there is no added latency to your application’s response time. In short-lived processes (serverless functions, scripts), call flush() before your process exits to ensure all data is sent.
Does XeroML support OpenTelemetry?
Yes. XeroML’s SDKs are built on OpenTelemetry. This means you can use any OTEL-compatible instrumentation library alongside XeroML, and you can send traces to multiple destinations simultaneously.