Annotation Queues

Annotation Queues are structured workflows for human review of traces. They let you route specific traces to reviewers who score them according to a rubric — creating high-quality ground truth labels that can serve as baselines for automated evaluators or dataset creation.

When to Use Annotation Queues

Creating ground truth — when you need gold-standard labels from domain experts
Evaluating subjective quality — dimensions like tone, brand voice, or domain-specific accuracy that LLM judges may not reliably score
Reviewing flagged traces — routing low-confidence or edge-case traces for human inspection
Calibrating automated evaluators — comparing LLM-as-a-Judge scores against human judgments to validate evaluator quality

How Queues Work

Create a queue with a scoring rubric (which dimensions to score, valid values)
Add traces to the queue — manually, via filters, or automatically based on evaluator triggers
Reviewers work through the queue in the annotation UI, scoring each trace
Scores are stored as human-sourced scores in XeroML and linked to the traces

Multiple reviewers can score the same traces for inter-annotator agreement measurement.

Creating a Queue

Navigate to Evaluations → Annotation Queues
Click New Queue
Define the scoring dimensions:
- Name (e.g., accuracy, helpfulness)
- Value type (numeric 1-5, boolean, categorical)
- Instructions for reviewers
Set access permissions (which team members can review)
Save the queue

Adding Traces to a Queue

From the Traces list: Select traces and click Add to Queue

From an evaluator: Configure a live evaluator to route low-scoring traces to a queue automatically (e.g., route any trace scoring below 0.5 on accuracy to the human review queue)

Via API/SDK:

from xeroml import get_client

xeroml = get_client()

xeroml.add_to_annotation_queue(
    queue_id="queue_abc123",
    trace_id="trace_xyz789",
)

The Annotation Interface

Reviewers access their queue through the XeroML UI. Each item shows:

The trace input and output
The full observation tree (expandable)
The scoring form for each dimension
An optional free-text comment field

Reviewers can also view the full trace detail, linked prompt version, and any existing automated scores for context.

Exporting Annotations

Completed annotations can be exported for:

Training fine-tuned evaluator models
Inter-annotator agreement analysis
Feeding into external analytics pipelines

Export via the UI export or Public API.