Skip to content

XeroML Documentation

XeroML is an open-source LLM engineering platform. Collaboratively debug, analyze, and iterate on your LLM applications with natively integrated observability, prompt management, and evaluation.

What is XeroML?

XeroML gives teams the tools to collaboratively develop and monitor LLM applications. Whether you’re debugging a production issue, iterating on prompts, or running systematic evaluations, XeroML brings together the full lifecycle in one place.

Observability

Trace every LLM call, tool use, and retrieval step. Understand what’s happening inside your application with structured, searchable logs.

Get started with tracing →

Prompt Management

Version, deploy, and iterate on prompts without code changes. Separate prompt iteration from engineering deployments.

Manage your prompts →

Evaluation

Score outputs with LLM-as-a-judge, human annotation, or custom logic. Build datasets and run experiments systematically.

Set up evaluations →

API & Data Platform

Access all your data programmatically. Export traces, query metrics, and integrate with your existing data stack.

Explore the API →

Core Capabilities

Observability

AI applications are non-deterministic — a trace that passed yesterday may fail today. XeroML’s application tracing captures structured logs of every request: the exact prompt sent, the model’s response, token usage, latency, and any tools or retrieval steps in between. This data is captured automatically with minimal performance impact, as all SDK calls are asynchronous.

Key tracing features:

  • Track all LLM and non-LLM operations (retrieval, embeddings, API calls)
  • Multi-turn conversation monitoring via Sessions
  • Visualize agent graphs with nested observations
  • 50+ native integrations: OpenAI, LangChain, LlamaIndex, Vercel AI SDK, and more
  • OpenTelemetry-based architecture to reduce vendor lock-in

Prompt Management

Rather than hardcoding prompts in your application, XeroML provides a central store where prompts are versioned, labeled, and deployed independently of code. Non-technical teammates can iterate on prompts via the UI while engineers control deployments through labels — no code review required for a text change.

  • Instant production deployment via labels (no code redeploy)
  • Client-side SDK caching — as fast as reading from memory
  • Link prompts to traces to analyze per-version performance
  • Interactive Playground for testing prompts before deployment

Evaluation

Quality assurance for LLM applications requires more than unit tests. XeroML supports multiple evaluation approaches that can be run in production and offline:

  • LLM-as-a-Judge — scalable automated evaluation for nuanced qualities
  • Human Annotation — structured annotation queues for ground truth
  • Custom Scores — numeric, boolean, or categorical scores via API/SDK
  • Datasets & Experiments — systematic testing before every deployment

Quickstart

Choose where you want to start:

  1. Trace your first LLM callObservability: Get Started
  2. Manage prompts outside codePrompt Management: Get Started
  3. Score your application outputsEvaluation: Core Concepts

Open Source

XeroML is fully open source. The core platform, SDKs, and integrations are all MIT-licensed. You can self-host XeroML on your own infrastructure or use XeroML Cloud.