Skip to content

OpenAI SDK (Python)

The XeroML OpenAI Python integration wraps the official OpenAI client and automatically traces all API calls — chat completions, embeddings, image generation, and more.

Installation

Terminal window
pip install xeroml

The OpenAI integration is included in the core xeroml package.

Setup

  1. Set environment variables:

    Terminal window
    XEROML_SECRET_KEY="sk-xm-..."
    XEROML_PUBLIC_KEY="pk-xm-..."
    XEROML_BASE_URL="https://cloud.xeroml.com"
    OPENAI_API_KEY="sk-..."
  2. Replace your OpenAI import:

    # Before
    import openai
    # After
    from xeroml.openai import openai
  3. Use normally — all calls are automatically traced:

    from xeroml.openai import openai
    response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain XeroML in one sentence."},
    ],
    )

What Gets Captured

For each API call, XeroML automatically records:

  • Model name and parameters (temperature, max_tokens, etc.)
  • Input messages (prompt)
  • Completion response
  • Token usage (prompt tokens, completion tokens, total)
  • Estimated cost
  • Latency (time to first token, total response time)
  • Finish reason

Adding User and Session Context

Use propagate_attributes() to attach user and session context to OpenAI calls:

from xeroml.openai import openai
from xeroml import propagate_attributes
def handle_chat(message: str, user_id: str, session_id: str) -> str:
with propagate_attributes(user_id=user_id, session_id=session_id):
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": message}],
)
return response.choices[0].message.content

Async Support

The integration works identically with the async OpenAI client:

from xeroml.openai import AsyncOpenAI
async def generate(prompt: str) -> str:
client = AsyncOpenAI()
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content

Streaming

Streaming completions are fully supported. The trace is completed when the stream closes:

from xeroml.openai import openai
stream = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")