OpenAI SDK (Python)

The XeroML OpenAI Python integration wraps the official OpenAI client and automatically traces all API calls — chat completions, embeddings, image generation, and more.

Installation

pip install xeroml

The OpenAI integration is included in the core xeroml package.

Setup

Set environment variables:

XEROML_SECRET_KEY="sk-xm-..."
XEROML_PUBLIC_KEY="pk-xm-..."
XEROML_BASE_URL="https://cloud.xeroml.com"
OPENAI_API_KEY="sk-..."

Replace your OpenAI import:

# Before
import openai

# After
from xeroml.openai import openai

Use normally — all calls are automatically traced:

from xeroml.openai import openai

response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain XeroML in one sentence."},
    ],
)

What Gets Captured

For each API call, XeroML automatically records:

Model name and parameters (temperature, max_tokens, etc.)
Input messages (prompt)
Completion response
Token usage (prompt tokens, completion tokens, total)
Estimated cost
Latency (time to first token, total response time)
Finish reason

Adding User and Session Context

Use propagate_attributes() to attach user and session context to OpenAI calls:

from xeroml.openai import openai
from xeroml import propagate_attributes

def handle_chat(message: str, user_id: str, session_id: str) -> str:
    with propagate_attributes(user_id=user_id, session_id=session_id):
        response = openai.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": message}],
        )
    return response.choices[0].message.content

Async Support

The integration works identically with the async OpenAI client:

from xeroml.openai import AsyncOpenAI

async def generate(prompt: str) -> str:
    client = AsyncOpenAI()
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

Streaming

Streaming completions are fully supported. The trace is completed when the stream closes:

from xeroml.openai import openai

stream = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")