[view as .md]

Chat completions

Create a chat completion. OpenAI-compatible request and response shape.

POST /v1/chat/completions

Request

{
  "model": "@production",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false
}

Model identifier

  • Raw model namegpt-4o-mini, claude-sonnet-4-5, gemini-2.5-flash
  • Routing config slug@production, @fallback, @experiment

Modelux extensions

Pass extra fields under extra_body (OpenAI SDK) or top-level mlx:* keys:

FieldDescription
mlx:tagsObject of key-value tags for analytics + routing
mlx:end_userEnd-user identifier (for per-user analytics + budgets)
mlx:cacheCache controls: { skip: true } to bypass cache for this request
mlx:traceSet true to include the decision trace in the response

Example:

response = client.chat.completions.create(
    model="@production",
    messages=[...],
    extra_body={
        "mlx:tags": {"tenant": "acme", "feature": "summarize"},
        "mlx:end_user": "user_abc",
    },
)

Response

Standard OpenAI response shape:

{
  "id": "chatcmpl_...",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello!" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

Streaming

Set stream: true. Modelux returns SSE events in the OpenAI streaming format. Works the same with cascades and fallbacks — the stream starts when the first successful attempt begins responding.

Tool / function calling

Passes through unchanged to the provider. Modelux normalizes tool-call behavior across OpenAI, Anthropic, and Google so the same tool schemas work across all three.

Structured output (JSON mode)

response_format: { type: "json_object" } works across all providers. response_format: { type: "json_schema", json_schema: {...} } works where the underlying provider supports it; otherwise Modelux falls back to JSON mode and validates post-hoc.

Headers

Modelux includes response headers on every successful request:

x-modelux-request-id:   req_a1b2c3
x-modelux-model-used:   gpt-4o-mini
x-modelux-provider:     openai
x-modelux-cost-usd:     0.002134
x-modelux-latency-ms:   238
x-modelux-config:       @production
x-modelux-config-version: 4