Skip to content

Annotation Queues

Annotation Queues are structured workflows for human review of traces. They let you route specific traces to reviewers who score them according to a rubric — creating high-quality ground truth labels that can serve as baselines for automated evaluators or dataset creation.

When to Use Annotation Queues

  • Creating ground truth — when you need gold-standard labels from domain experts
  • Evaluating subjective quality — dimensions like tone, brand voice, or domain-specific accuracy that LLM judges may not reliably score
  • Reviewing flagged traces — routing low-confidence or edge-case traces for human inspection
  • Calibrating automated evaluators — comparing LLM-as-a-Judge scores against human judgments to validate evaluator quality

How Queues Work

  1. Create a queue with a scoring rubric (which dimensions to score, valid values)
  2. Add traces to the queue — manually, via filters, or automatically based on evaluator triggers
  3. Reviewers work through the queue in the annotation UI, scoring each trace
  4. Scores are stored as human-sourced scores in XeroML and linked to the traces

Multiple reviewers can score the same traces for inter-annotator agreement measurement.

Creating a Queue

  1. Navigate to Evaluations → Annotation Queues
  2. Click New Queue
  3. Define the scoring dimensions:
    • Name (e.g., accuracy, helpfulness)
    • Value type (numeric 1-5, boolean, categorical)
    • Instructions for reviewers
  4. Set access permissions (which team members can review)
  5. Save the queue

Adding Traces to a Queue

From the Traces list: Select traces and click Add to Queue

From an evaluator: Configure a live evaluator to route low-scoring traces to a queue automatically (e.g., route any trace scoring below 0.5 on accuracy to the human review queue)

Via API/SDK:

from xeroml import get_client
xeroml = get_client()
xeroml.add_to_annotation_queue(
queue_id="queue_abc123",
trace_id="trace_xyz789",
)

The Annotation Interface

Reviewers access their queue through the XeroML UI. Each item shows:

  • The trace input and output
  • The full observation tree (expandable)
  • The scoring form for each dimension
  • An optional free-text comment field

Reviewers can also view the full trace detail, linked prompt version, and any existing automated scores for context.

Exporting Annotations

Completed annotations can be exported for:

  • Training fine-tuned evaluator models
  • Inter-annotator agreement analysis
  • Feeding into external analytics pipelines

Export via the UI export or Public API.