Chat completions
Create a chat completion. OpenAI-compatible request and response shape.
POST /v1/chat/completions
Request
{
"model": "@production",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Hello!" }
],
"temperature": 0.7,
"max_tokens": 1024,
"stream": false
}
Model identifier
- Raw model name —
gpt-4o-mini,claude-sonnet-4-5,gemini-2.5-flash - Routing config slug —
@production,@fallback,@experiment
Modelux extensions
Pass extra fields under extra_body (OpenAI SDK) or top-level mlx:* keys:
| Field | Description |
|---|---|
mlx:tags | Object of key-value tags for analytics + routing |
mlx:end_user | End-user identifier (for per-user analytics + budgets) |
mlx:cache | Cache controls: { skip: true } to bypass cache for this request |
mlx:trace | Set true to include the decision trace in the response |
Example:
response = client.chat.completions.create(
model="@production",
messages=[...],
extra_body={
"mlx:tags": {"tenant": "acme", "feature": "summarize"},
"mlx:end_user": "user_abc",
},
)
Response
Standard OpenAI response shape:
{
"id": "chatcmpl_...",
"object": "chat.completion",
"created": 1710000000,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hello!" },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 8,
"total_tokens": 20
}
}
Streaming
Set stream: true. Modelux returns SSE events in the OpenAI streaming format.
Works the same with cascades and fallbacks — the stream starts when the first
successful attempt begins responding.
Tool / function calling
Passes through unchanged to the provider. Modelux normalizes tool-call behavior across OpenAI, Anthropic, and Google so the same tool schemas work across all three.
Structured output (JSON mode)
response_format: { type: "json_object" } works across all providers.
response_format: { type: "json_schema", json_schema: {...} } works where the
underlying provider supports it; otherwise Modelux falls back to JSON mode
and validates post-hoc.
Headers
Modelux includes response headers on every successful request:
x-modelux-request-id: req_a1b2c3
x-modelux-model-used: gpt-4o-mini
x-modelux-provider: openai
x-modelux-cost-usd: 0.002134
x-modelux-latency-ms: 238
x-modelux-config: @production
x-modelux-config-version: 4