[view as .md]

Chat completions

Create a chat completion using the OpenAI wire format. Drop-in replacement for the OpenAI base URL — point your SDK at https://api.modelux.ai/openai/v1 and existing code keeps working while requests flow through modelux’s routing, budgets, and observability.

POST /openai/v1/chat/completions

Prefer Anthropic SDKs? Use /anthropic/v1/messages instead — same routing pipeline, Anthropic wire shape.

Request

{
  "model": "@production",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false
}

Model identifier

  • Raw model namegpt-4o-mini, claude-sonnet-4-5, gemini-2.5-flash
  • Routing config slug@production, @fallback, @experiment

modelux extensions

Pass extra fields under extra_body (OpenAI SDK) or top-level mlx:* keys:

FieldDescription
mlx:tagsObject of key-value tags for analytics + routing
mlx:end_userEnd-user identifier (for per-user analytics + budgets)
mlx:cacheCache controls: { skip: true } to bypass cache for this request
mlx:traceSet true to include the decision trace in the response

Example:

response = client.chat.completions.create(
    model="@production",
    messages=[...],
    extra_body={
        "mlx:tags": {"tenant": "acme", "feature": "summarize"},
        "mlx:end_user": "user_abc",
    },
)

Response

Standard OpenAI response shape:

{
  "id": "chatcmpl_...",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello!" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

Streaming

Set stream: true. modelux returns SSE events in the OpenAI streaming format. Works the same with cascades and fallbacks — the stream starts when the first successful attempt begins responding.

Tool / function calling

Tools, tool_choice, and tool_calls round-trip across OpenAI, Anthropic, Google, Bedrock (Claude family), and Cohere — modelux translates tool definitions and calls into each provider’s native shape. The same tool schemas you’d write for OpenAI work against any of them; finish reasons and tool-call IDs are normalized so SDK consumers see one shape.

Streaming tool calls: modelux accumulates tool_calls.function.arguments deltas across chunks and surfaces complete tool calls on the chunk that carries finish_reason: tool_calls.

Tool result limitation: the OpenAI wire shape’s role: "tool" reply only carries text content. Image-typed tool results from agent frameworks that produce them are not currently round-tripped — only the text portion is forwarded. If you need image results in tool replies, file an issue.

Structured output (JSON mode)

response_format: { type: "json_object" } works across all providers. response_format: { type: "json_schema", json_schema: {...} } works where the underlying provider supports it; otherwise modelux falls back to JSON mode and validates post-hoc.

For OpenAI specifically, strict: true schemas round-trip end-to-end — the proxy forwards the schema verbatim and OpenAI guarantees the returned content is valid JSON matching it:

{
  "model": "gpt-4o-mini",
  "messages": [
    {"role": "user", "content": "Capital of France with population in millions."}
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "capital_answer",
      "schema": {
        "type": "object",
        "properties": {
          "capital":             {"type": "string"},
          "population_millions": {"type": "integer"}
        },
        "required": ["capital", "population_millions"],
        "additionalProperties": false
      },
      "strict": true
    }
  }
}

OpenAI-specific passthrough fields

These fields are forwarded byte-identical to OpenAI-family upstreams (OpenAI, Azure OpenAI, OpenAI-compatible providers). Other providers silently ignore them.

FieldTypePurpose
response_formatobjectJSON mode / json_schema (above)
seedintegerBest-effort reproducible sampling for the same (model, prompt, params)
logprobsbooleanInclude per-token log-probabilities on each choice
top_logprobsinteger (0–20)Number of alternative tokens to score per position
parallel_tool_callsbooleanSet false to force serial tool execution (default true upstream)

When unset they don’t appear on the upstream wire — so seed: 0 / logprobs: false only happens if the caller explicitly set them.

Request headers

modelux recognizes these optional headers on every proxy request (/openai/v1/* and /anthropic/v1/* alike):

HeaderPurpose
X-Modelux-User-IdEnd-user identifier (overrides user in body)
X-Modelux-User-Tagskey=value pairs for analytics + routing, e.g. tier=premium,cohort=beta
X-Modelux-Trace-IdGroups LLM calls serving one agent run / user turn (OTel/LangSmith-aligned). Use this for tool-loop fan-out and agent workflows.
X-Modelux-Conversation-IdGroups turns in a long-lived thread (a chatbot session). One conversation contains many trace IDs.

Body-metadata fallback. If your SDK doesn’t let you attach custom HTTP headers, you can pass the same two grouping IDs via the OpenAI-compatible metadata map on the request body — modelux reads metadata.trace_id and metadata.conversation_id when the corresponding headers are absent. Anthropic callers can use the same keys inside their metadata object (alongside the existing metadata.user_id). The metadata map is parsed on ingress and stripped before the outbound provider call. | X-Modelux-Dry-Run | Set true to evaluate routing without calling the upstream | | X-Modelux-Provider-Key | BYOK passthrough — provider key used for this single call; never stored. Only honored on direct calls (provider/model or known bare prefix), not @config. See Routing → BYOK passthrough. |

Response headers

modelux includes response headers on every successful request:

x-modelux-request-id:   req_a1b2c3
x-modelux-model-used:   gpt-4o-mini
x-modelux-provider:     openai
x-modelux-cost-usd:     0.002134
x-modelux-latency-ms:   238
x-modelux-config:       @production
x-modelux-config-version: 4