Custom Scores

Custom scores let you attach evaluation results to traces from any source — your own deterministic checks, user feedback, A/B test outcomes, or external evaluation systems. They’re the most flexible evaluation method in XeroML.

Use Cases

User feedback — store thumbs up/down or star ratings collected in your UI
Deterministic checks — regex match, JSON schema validation, exact string comparison
Business metrics — order completed, support ticket resolved, conversion rate
External evaluators — scores from third-party safety filters or domain-specific models
Custom LLM judges — run your own evaluation logic and push results to XeroML

Adding Scores via SDK

Score a specific trace by ID:

from xeroml import get_client

xeroml = get_client()

xeroml.score(
    trace_id="trace_abc123",
    name="user-feedback",
    value=1,  # thumbs up
    comment="User clicked helpful",
    data_type="NUMERIC",
)

Score with categorical values:

xeroml.score(
    trace_id="trace_abc123",
    name="safety-check",
    value="pass",
    data_type="CATEGORICAL",
)

Score a specific observation (span or generation):

xeroml.score(
    trace_id="trace_abc123",
    observation_id="obs_xyz789",
    name="retrieval-relevance",
    value=0.92,
    data_type="NUMERIC",
)

import { XeroMLClient } from "@xeroml/client";

const xeroml = new XeroMLClient();

await xeroml.score({
  traceId: "trace_abc123",
  name: "user-feedback",
  value: 1,
  comment: "User clicked helpful",
  dataType: "NUMERIC",
});

curl -X POST https://cloud.xeroml.com/api/public/v2/scores \
  -u "pk-xm-...:sk-xm-..." \
  -H "Content-Type: application/json" \
  -d '{
    "traceId": "trace_abc123",
    "name": "user-feedback",
    "value": 1,
    "comment": "User clicked helpful",
    "dataType": "NUMERIC"
  }'

Passing the Trace ID to Your Frontend

To score traces from user interactions, you need the trace ID in your frontend. Generate it deterministically so you don’t need to pass it through your response:

from xeroml import create_trace_id, propagate_attributes

# Generate a deterministic trace ID from a session + request identifier
trace_id = create_trace_id(seed=f"session-{session_id}-msg-{message_id}")

with propagate_attributes(trace_id=trace_id):
    response = handle_request(user_message)

# Return trace_id to the frontend alongside the response
return {"response": response, "trace_id": trace_id}

Then on the frontend, when the user submits feedback, send the trace_id back and call the XeroML API (or your backend which calls XeroML) to store the score.

Score Data Types

Type	Values	Use for
`NUMERIC`	Any float	Ratings, confidence scores, latency penalties
`BOOLEAN`	`true` / `false`	Pass/fail checks, binary feedback
`CATEGORICAL`	Any string	Named categories, multi-class labels

Viewing Custom Scores

Custom scores appear in the trace detail view alongside automated scores. In the Metrics dashboard, you can chart any score name over time, compare across prompt versions, and filter traces by score value.