Skip to content

Datasets

Datasets are collections of test cases used to evaluate your LLM application systematically. Each item in a dataset provides an input scenario (and optionally an expected output) that your application is run against during an experiment.

Creating a Dataset

  1. Navigate to Evaluation → Datasets
  2. Click New Dataset
  3. Enter a name. Use / separators to organize into virtual folders: evaluation/qa-dataset
  4. Optionally add a description and metadata
  5. Click Create

Adding Dataset Items

dataset = xeroml.get_dataset("evaluation/qa-dataset")
dataset.upsert_item(
input={"question": "How do I reset my API key?"},
expected_output={"answer": "Go to Settings → API Keys and click Regenerate."},
metadata={"source": "support-ticket-1234"},
)

Add multiple items at once:

items = [
{"input": {"question": q}, "expected_output": {"answer": a}}
for q, a in qa_pairs
]
for item in items:
dataset.upsert_item(**item)

Dataset Organization

Use / in dataset names to create virtual folder hierarchies:

evaluation/
qa-dataset
rag-evaluation
safety-checks
experiments/
prompt-v2-baseline
model-comparison

The UI displays these as nested folders automatically.

Schema Validation

Optionally enforce a JSON Schema on your dataset items to ensure consistency:

dataset = xeroml.create_dataset(
name="structured-qa",
input_schema={
"type": "object",
"properties": {
"question": {"type": "string"},
"context": {"type": "string"},
},
"required": ["question"]
},
expected_output_schema={
"type": "object",
"properties": {
"answer": {"type": "string"},
},
"required": ["answer"]
}
)

Items that don’t match the schema are rejected with a detailed error message.

Dataset Versioning

Every modification to a dataset (add, update, delete, archive item) creates a new version timestamp. When running experiments, XeroML records which dataset version was used, making results reproducible.

To retrieve a dataset as it was at a specific point in time:

dataset = xeroml.get_dataset("evaluation/qa-dataset", as_of="2025-01-15T10:00:00Z")

Next Steps

Once you have a dataset, run experiments against it:

Experiments via SDK

Prompt Experiments (UI)