Prompt Experiments

Prompt Experiments let you test different prompt versions against a dataset directly from the XeroML UI. No SDK code required — select a dataset, choose prompt versions to compare, and XeroML runs the experiment using your configured LLM Connection.

When to Use

Quick prompt iteration when you don’t want to set up SDK experiment code
Comparing prompt versions before deciding which to deploy
Sharing experiment results with non-technical team members
Rapid prototyping with the Playground before formalizing an experiment

Running a Prompt Experiment

Navigate to Evaluation → Experiments
Click New Prompt Experiment
Select the dataset to run against
Select the prompt to evaluate and the versions to compare
Fill in any prompt variables that are common across all runs
Select the model (from your configured LLM Connections)
Optionally configure an evaluator to score outputs automatically
Click Run

XeroML runs each prompt version against every dataset item and creates traces for each run. Results appear in the Experiments dashboard as they complete.

Viewing Results

After the experiment completes:

Per-item results — see the output for each prompt version for each dataset item side by side
Aggregate scores — average scores per version across all items
Cost and latency — token usage and response time per version

Use this to make data-driven decisions about which prompt version to promote to production.

Requirements

At least one LLM Connection configured in project settings
A dataset with at least one item
A prompt with at least one version

Limitations

Prompt Experiments run in the UI are limited to the models available through your LLM Connections
For experiments that require custom task logic (e.g., multi-step pipelines, retrieval), use Experiments via SDK