Skip to content

Custom Scores

Custom scores let you attach evaluation results to traces from any source — your own deterministic checks, user feedback, A/B test outcomes, or external evaluation systems. They’re the most flexible evaluation method in XeroML.

Use Cases

  • User feedback — store thumbs up/down or star ratings collected in your UI
  • Deterministic checks — regex match, JSON schema validation, exact string comparison
  • Business metrics — order completed, support ticket resolved, conversion rate
  • External evaluators — scores from third-party safety filters or domain-specific models
  • Custom LLM judges — run your own evaluation logic and push results to XeroML

Adding Scores via SDK

Score a specific trace by ID:

from xeroml import get_client
xeroml = get_client()
xeroml.score(
trace_id="trace_abc123",
name="user-feedback",
value=1, # thumbs up
comment="User clicked helpful",
data_type="NUMERIC",
)

Score with categorical values:

xeroml.score(
trace_id="trace_abc123",
name="safety-check",
value="pass",
data_type="CATEGORICAL",
)

Score a specific observation (span or generation):

xeroml.score(
trace_id="trace_abc123",
observation_id="obs_xyz789",
name="retrieval-relevance",
value=0.92,
data_type="NUMERIC",
)

Passing the Trace ID to Your Frontend

To score traces from user interactions, you need the trace ID in your frontend. Generate it deterministically so you don’t need to pass it through your response:

from xeroml import create_trace_id, propagate_attributes
# Generate a deterministic trace ID from a session + request identifier
trace_id = create_trace_id(seed=f"session-{session_id}-msg-{message_id}")
with propagate_attributes(trace_id=trace_id):
response = handle_request(user_message)
# Return trace_id to the frontend alongside the response
return {"response": response, "trace_id": trace_id}

Then on the frontend, when the user submits feedback, send the trace_id back and call the XeroML API (or your backend which calls XeroML) to store the score.

Score Data Types

TypeValuesUse for
NUMERICAny floatRatings, confidence scores, latency penalties
BOOLEANtrue / falsePass/fail checks, binary feedback
CATEGORICALAny stringNamed categories, multi-class labels

Viewing Custom Scores

Custom scores appear in the trace detail view alongside automated scores. In the Metrics dashboard, you can chart any score name over time, compare across prompt versions, and filter traces by score value.