Observation Types

XeroML supports three observation types, each designed for a specific kind of operation in an LLM application pipeline.

Generations

A generation represents an LLM call. XeroML applies special handling to generations:

Automatically captures model name, prompt messages, completion, and finish reason
Calculates token usage (input, output, total) and estimated cost
Records latency from request send to first token and full completion
Captures model parameters (temperature, max_tokens, etc.)

Generations are the primary unit for cost tracking and quality evaluation. Most LLM-as-a-judge evaluators run at the generation level.

# Generations are created automatically by native integrations
# For manual creation:
with xeroml.start_as_current_observation(
    name="gpt-call",
    type="generation",
    model="gpt-4o",
    input=messages,
) as gen:
    response = openai_client.chat.completions.create(...)
    gen.update(
        output=response.choices[0].message.content,
        usage={"input": response.usage.prompt_tokens, "output": response.usage.completion_tokens}
    )

Spans

A span represents a logical grouping of operations — a pipeline stage, a tool call sequence, or any multi-step process. Spans have a start time and an end time (duration).

Use spans to create hierarchy in your traces. A RAG pipeline might have:

A top-level span for the entire request
A child span for the retrieval step
A child generation for the LLM call
A child span for post-processing

from xeroml import get_client

xeroml = get_client()

with xeroml.start_as_current_observation(name="rag-pipeline", type="span") as pipeline:
    with xeroml.start_as_current_observation(name="retrieval", type="span") as retrieval:
        docs = retrieve_documents(query)
        retrieval.update(output={"num_docs": len(docs)})

    # LLM call is a generation (captured automatically if using openai wrapper)
    response = call_llm(query, docs)
    pipeline.update(output=response)

Events

An event is a point-in-time occurrence with no meaningful duration. Use events to log discrete moments in your application:

Cache hit/miss
Tool selection decision
Safety filter triggered
State transitions

with xeroml.start_as_current_observation(
    name="cache-hit",
    type="event",
    metadata={"cache_key": key, "ttl_remaining": 300}
) as event:
    pass  # Events complete immediately

Nesting

All three types can be nested to any depth. XeroML uses OpenTelemetry context propagation to manage the hierarchy automatically — any observation created while another is active becomes its child.

trace
└── span: "rag-pipeline"
    ├── span: "retrieval"
    │   └── event: "cache-miss"
    ├── generation: "gpt-4o call"
    └── span: "post-processing"
        └── event: "safety-filter-passed"