Chat completions
Create a chat completion using the OpenAI wire format. Drop-in replacement
for the OpenAI base URL — point your SDK at https://api.modelux.ai/openai/v1
and existing code keeps working while requests flow through modelux’s
routing, budgets, and observability.
POST /openai/v1/chat/completions
Prefer Anthropic SDKs? Use /anthropic/v1/messages
instead — same routing pipeline, Anthropic wire shape.
Request
{
"model": "@production",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Hello!" }
],
"temperature": 0.7,
"max_tokens": 1024,
"stream": false
}
Model identifier
- Raw model name —
gpt-4o-mini,claude-sonnet-4-5,gemini-2.5-flash - Routing config slug —
@production,@fallback,@experiment
modelux extensions
Pass extra fields under extra_body (OpenAI SDK) or top-level mlx:* keys:
| Field | Description |
|---|---|
mlx:tags | Object of key-value tags for analytics + routing |
mlx:end_user | End-user identifier (for per-user analytics + budgets) |
mlx:cache | Cache controls: { skip: true } to bypass cache for this request |
mlx:trace | Set true to include the decision trace in the response |
Example:
response = client.chat.completions.create(
model="@production",
messages=[...],
extra_body={
"mlx:tags": {"tenant": "acme", "feature": "summarize"},
"mlx:end_user": "user_abc",
},
)
Response
Standard OpenAI response shape:
{
"id": "chatcmpl_...",
"object": "chat.completion",
"created": 1710000000,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hello!" },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 8,
"total_tokens": 20
}
}
Streaming
Set stream: true. modelux returns SSE events in the OpenAI streaming format.
Works the same with cascades and fallbacks — the stream starts when the first
successful attempt begins responding.
Tool / function calling
Tools, tool_choice, and tool_calls round-trip across OpenAI, Anthropic,
Google, Bedrock (Claude family), and Cohere — modelux translates tool
definitions and calls into each provider’s native shape. The same tool
schemas you’d write for OpenAI work against any of them; finish reasons
and tool-call IDs are normalized so SDK consumers see one shape.
Streaming tool calls: modelux accumulates tool_calls.function.arguments
deltas across chunks and surfaces complete tool calls on the chunk that
carries finish_reason: tool_calls.
Tool result limitation: the OpenAI wire shape’s role: "tool" reply
only carries text content. Image-typed tool results from agent frameworks
that produce them are not currently round-tripped — only the text portion
is forwarded. If you need image results in tool replies, file an issue.
Structured output (JSON mode)
response_format: { type: "json_object" } works across all providers.
response_format: { type: "json_schema", json_schema: {...} } works where the
underlying provider supports it; otherwise modelux falls back to JSON mode
and validates post-hoc.
For OpenAI specifically, strict: true schemas round-trip end-to-end
— the proxy forwards the schema verbatim and OpenAI guarantees the
returned content is valid JSON matching it:
{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Capital of France with population in millions."}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "capital_answer",
"schema": {
"type": "object",
"properties": {
"capital": {"type": "string"},
"population_millions": {"type": "integer"}
},
"required": ["capital", "population_millions"],
"additionalProperties": false
},
"strict": true
}
}
}
OpenAI-specific passthrough fields
These fields are forwarded byte-identical to OpenAI-family upstreams (OpenAI, Azure OpenAI, OpenAI-compatible providers). Other providers silently ignore them.
| Field | Type | Purpose |
|---|---|---|
response_format | object | JSON mode / json_schema (above) |
seed | integer | Best-effort reproducible sampling for the same (model, prompt, params) |
logprobs | boolean | Include per-token log-probabilities on each choice |
top_logprobs | integer (0–20) | Number of alternative tokens to score per position |
parallel_tool_calls | boolean | Set false to force serial tool execution (default true upstream) |
When unset they don’t appear on the upstream wire — so seed: 0 /
logprobs: false only happens if the caller explicitly set them.
Request headers
modelux recognizes these optional headers on every proxy request
(/openai/v1/* and /anthropic/v1/* alike):
| Header | Purpose |
|---|---|
X-Modelux-User-Id | End-user identifier (overrides user in body) |
X-Modelux-User-Tags | key=value pairs for analytics + routing, e.g. tier=premium,cohort=beta |
X-Modelux-Trace-Id | Groups LLM calls serving one agent run / user turn (OTel/LangSmith-aligned). Use this for tool-loop fan-out and agent workflows. |
X-Modelux-Conversation-Id | Groups turns in a long-lived thread (a chatbot session). One conversation contains many trace IDs. |
Body-metadata fallback. If your SDK doesn’t let you attach custom HTTP headers, you can pass the same two grouping IDs via the OpenAI-compatible metadata map on the request body — modelux reads metadata.trace_id and metadata.conversation_id when the corresponding headers are absent. Anthropic callers can use the same keys inside their metadata object (alongside the existing metadata.user_id). The metadata map is parsed on ingress and stripped before the outbound provider call.
| X-Modelux-Dry-Run | Set true to evaluate routing without calling the upstream |
| X-Modelux-Provider-Key | BYOK passthrough — provider key used for this single call; never stored. Only honored on direct calls (provider/model or known bare prefix), not @config. See Routing → BYOK passthrough. |
Response headers
modelux includes response headers on every successful request:
x-modelux-request-id: req_a1b2c3
x-modelux-model-used: gpt-4o-mini
x-modelux-provider: openai
x-modelux-cost-usd: 0.002134
x-modelux-latency-ms: 238
x-modelux-config: @production
x-modelux-config-version: 4