Skip to content

Observation Types

XeroML supports three observation types, each designed for a specific kind of operation in an LLM application pipeline.

Generations

A generation represents an LLM call. XeroML applies special handling to generations:

  • Automatically captures model name, prompt messages, completion, and finish reason
  • Calculates token usage (input, output, total) and estimated cost
  • Records latency from request send to first token and full completion
  • Captures model parameters (temperature, max_tokens, etc.)

Generations are the primary unit for cost tracking and quality evaluation. Most LLM-as-a-judge evaluators run at the generation level.

# Generations are created automatically by native integrations
# For manual creation:
with xeroml.start_as_current_observation(
name="gpt-call",
type="generation",
model="gpt-4o",
input=messages,
) as gen:
response = openai_client.chat.completions.create(...)
gen.update(
output=response.choices[0].message.content,
usage={"input": response.usage.prompt_tokens, "output": response.usage.completion_tokens}
)

Spans

A span represents a logical grouping of operations — a pipeline stage, a tool call sequence, or any multi-step process. Spans have a start time and an end time (duration).

Use spans to create hierarchy in your traces. A RAG pipeline might have:

  • A top-level span for the entire request
  • A child span for the retrieval step
  • A child generation for the LLM call
  • A child span for post-processing
from xeroml import get_client
xeroml = get_client()
with xeroml.start_as_current_observation(name="rag-pipeline", type="span") as pipeline:
with xeroml.start_as_current_observation(name="retrieval", type="span") as retrieval:
docs = retrieve_documents(query)
retrieval.update(output={"num_docs": len(docs)})
# LLM call is a generation (captured automatically if using openai wrapper)
response = call_llm(query, docs)
pipeline.update(output=response)

Events

An event is a point-in-time occurrence with no meaningful duration. Use events to log discrete moments in your application:

  • Cache hit/miss
  • Tool selection decision
  • Safety filter triggered
  • State transitions
with xeroml.start_as_current_observation(
name="cache-hit",
type="event",
metadata={"cache_key": key, "ttl_remaining": 300}
) as event:
pass # Events complete immediately

Nesting

All three types can be nested to any depth. XeroML uses OpenTelemetry context propagation to manage the hierarchy automatically — any observation created while another is active becomes its child.

trace
└── span: "rag-pipeline"
├── span: "retrieval"
│ └── event: "cache-miss"
├── generation: "gpt-4o call"
└── span: "post-processing"
└── event: "safety-filter-passed"