Annotation Queues
Annotation Queues are structured workflows for human review of traces. They let you route specific traces to reviewers who score them according to a rubric — creating high-quality ground truth labels that can serve as baselines for automated evaluators or dataset creation.
When to Use Annotation Queues
- Creating ground truth — when you need gold-standard labels from domain experts
- Evaluating subjective quality — dimensions like tone, brand voice, or domain-specific accuracy that LLM judges may not reliably score
- Reviewing flagged traces — routing low-confidence or edge-case traces for human inspection
- Calibrating automated evaluators — comparing LLM-as-a-Judge scores against human judgments to validate evaluator quality
How Queues Work
- Create a queue with a scoring rubric (which dimensions to score, valid values)
- Add traces to the queue — manually, via filters, or automatically based on evaluator triggers
- Reviewers work through the queue in the annotation UI, scoring each trace
- Scores are stored as
human-sourced scores in XeroML and linked to the traces
Multiple reviewers can score the same traces for inter-annotator agreement measurement.
Creating a Queue
- Navigate to Evaluations → Annotation Queues
- Click New Queue
- Define the scoring dimensions:
- Name (e.g.,
accuracy,helpfulness) - Value type (numeric 1-5, boolean, categorical)
- Instructions for reviewers
- Name (e.g.,
- Set access permissions (which team members can review)
- Save the queue
Adding Traces to a Queue
From the Traces list: Select traces and click Add to Queue
From an evaluator: Configure a live evaluator to route low-scoring traces to a queue automatically (e.g., route any trace scoring below 0.5 on accuracy to the human review queue)
Via API/SDK:
from xeroml import get_client
xeroml = get_client()
xeroml.add_to_annotation_queue( queue_id="queue_abc123", trace_id="trace_xyz789",)The Annotation Interface
Reviewers access their queue through the XeroML UI. Each item shows:
- The trace input and output
- The full observation tree (expandable)
- The scoring form for each dimension
- An optional free-text comment field
Reviewers can also view the full trace detail, linked prompt version, and any existing automated scores for context.
Exporting Annotations
Completed annotations can be exported for:
- Training fine-tuned evaluator models
- Inter-annotator agreement analysis
- Feeding into external analytics pipelines
Export via the UI export or Public API.