Prompt Experiments
Prompt Experiments let you test different prompt versions against a dataset directly from the XeroML UI. No SDK code required — select a dataset, choose prompt versions to compare, and XeroML runs the experiment using your configured LLM Connection.
When to Use
- Quick prompt iteration when you don’t want to set up SDK experiment code
- Comparing prompt versions before deciding which to deploy
- Sharing experiment results with non-technical team members
- Rapid prototyping with the Playground before formalizing an experiment
Running a Prompt Experiment
- Navigate to Evaluation → Experiments
- Click New Prompt Experiment
- Select the dataset to run against
- Select the prompt to evaluate and the versions to compare
- Fill in any prompt variables that are common across all runs
- Select the model (from your configured LLM Connections)
- Optionally configure an evaluator to score outputs automatically
- Click Run
XeroML runs each prompt version against every dataset item and creates traces for each run. Results appear in the Experiments dashboard as they complete.
Viewing Results
After the experiment completes:
- Per-item results — see the output for each prompt version for each dataset item side by side
- Aggregate scores — average scores per version across all items
- Cost and latency — token usage and response time per version
Use this to make data-driven decisions about which prompt version to promote to production.
Requirements
- At least one LLM Connection configured in project settings
- A dataset with at least one item
- A prompt with at least one version
Limitations
- Prompt Experiments run in the UI are limited to the models available through your LLM Connections
- For experiments that require custom task logic (e.g., multi-step pipelines, retrieval), use Experiments via SDK