Skip to content

XeroML Documentation

XeroML is compliance observability for every AI agent in finance — logged, auditable, regulator-ready.

What is XeroML?

XeroML gives finance teams the controls to monitor, verify, and govern AI agents end to end. Whether you’re investigating an incident, enforcing policy before execution, preparing audit evidence, or reviewing model behavior, XeroML keeps every action logged, auditable, and regulator-ready.

Observability

Trace every LLM call, tool use, and retrieval step. Understand what’s happening inside your application with structured, searchable logs.

Get started with tracing →

Prompt Management

Version, deploy, and iterate on prompts without code changes. Separate prompt iteration from engineering deployments.

Manage your prompts →

Evaluation

Score outputs with LLM-as-a-judge, human annotation, or custom logic. Build datasets and run experiments systematically.

Set up evaluations →

Compliance Operations

Enforce policy rules on traces, route risky outcomes to reviewers, and maintain tamper-evident audit records.

Explore compliance workflows →

API & Data Platform

Access all your data programmatically. Export traces, query metrics, and integrate with your existing data stack.

Explore the API →

Core Capabilities

Observability

AI applications are non-deterministic — a trace that passed yesterday may fail today. XeroML’s application tracing captures structured logs of every request: the exact prompt sent, the model’s response, token usage, latency, and any tools or retrieval steps in between. This data is captured automatically with minimal performance impact, as all SDK calls are asynchronous.

Key tracing features:

  • Track all LLM and non-LLM operations (retrieval, embeddings, API calls)
  • Multi-turn conversation monitoring via Sessions
  • Visualize agent graphs with nested observations
  • 50+ native integrations: OpenAI, LangChain, LlamaIndex, Vercel AI SDK, and more
  • OpenTelemetry-based architecture to reduce vendor lock-in

Prompt Management

Rather than hardcoding prompts in your application, XeroML provides a central store where prompts are versioned, labeled, and deployed independently of code. Non-technical teammates can iterate on prompts via the UI while engineers control deployments through labels — no code review required for a text change.

  • Instant production deployment via labels (no code redeploy)
  • Client-side SDK caching — as fast as reading from memory
  • Link prompts to traces to analyze per-version performance
  • Interactive Playground for testing prompts before deployment

Evaluation

Quality assurance for LLM applications requires more than unit tests. XeroML supports multiple evaluation approaches that can be run in production and offline:

  • LLM-as-a-Judge — scalable automated evaluation for nuanced qualities
  • Human Annotation — structured annotation queues for ground truth
  • Custom Scores — numeric, boolean, or categorical scores via API/SDK
  • Datasets & Experiments — systematic testing before every deployment

Quickstart

Choose where you want to start:

  1. Trace your first LLM callObservability: Get Started
  2. Manage prompts outside codePrompt Management: Get Started
  3. Score your application outputsEvaluation: Core Concepts
  4. Set up compliance guardrailsCompliance: Get Started

Regulator Readiness

XeroML is built for finance teams that need provable AI governance. Every agent decision can be traced, reviewed, and reported with complete audit context.