Observation Types
XeroML supports three observation types, each designed for a specific kind of operation in an LLM application pipeline.
Generations
A generation represents an LLM call. XeroML applies special handling to generations:
- Automatically captures model name, prompt messages, completion, and finish reason
- Calculates token usage (input, output, total) and estimated cost
- Records latency from request send to first token and full completion
- Captures model parameters (temperature, max_tokens, etc.)
Generations are the primary unit for cost tracking and quality evaluation. Most LLM-as-a-judge evaluators run at the generation level.
# Generations are created automatically by native integrations# For manual creation:with xeroml.start_as_current_observation( name="gpt-call", type="generation", model="gpt-4o", input=messages,) as gen: response = openai_client.chat.completions.create(...) gen.update( output=response.choices[0].message.content, usage={"input": response.usage.prompt_tokens, "output": response.usage.completion_tokens} )Spans
A span represents a logical grouping of operations — a pipeline stage, a tool call sequence, or any multi-step process. Spans have a start time and an end time (duration).
Use spans to create hierarchy in your traces. A RAG pipeline might have:
- A top-level
spanfor the entire request - A child
spanfor the retrieval step - A child
generationfor the LLM call - A child
spanfor post-processing
from xeroml import get_client
xeroml = get_client()
with xeroml.start_as_current_observation(name="rag-pipeline", type="span") as pipeline: with xeroml.start_as_current_observation(name="retrieval", type="span") as retrieval: docs = retrieve_documents(query) retrieval.update(output={"num_docs": len(docs)})
# LLM call is a generation (captured automatically if using openai wrapper) response = call_llm(query, docs) pipeline.update(output=response)Events
An event is a point-in-time occurrence with no meaningful duration. Use events to log discrete moments in your application:
- Cache hit/miss
- Tool selection decision
- Safety filter triggered
- State transitions
with xeroml.start_as_current_observation( name="cache-hit", type="event", metadata={"cache_key": key, "ttl_remaining": 300}) as event: pass # Events complete immediatelyNesting
All three types can be nested to any depth. XeroML uses OpenTelemetry context propagation to manage the hierarchy automatically — any observation created while another is active becomes its child.
trace└── span: "rag-pipeline" ├── span: "retrieval" │ └── event: "cache-miss" ├── generation: "gpt-4o call" └── span: "post-processing" └── event: "safety-filter-passed"