<!-- source: https://modelux.ai/docs/api/chat-completions -->

> POST /v1/chat/completions — the primary inference endpoint.

# Chat completions

Create a chat completion. OpenAI-compatible request and response shape.

```
POST /v1/chat/completions
```

## Request

```json
{
  "model": "@production",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false
}
```

### Model identifier

- **Raw model name** — `gpt-4o-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash`
- **Routing config slug** — `@production`, `@fallback`, `@experiment`

## Modelux extensions

Pass extra fields under `extra_body` (OpenAI SDK) or top-level `mlx:*` keys:

| Field | Description |
|---|---|
| `mlx:tags` | Object of key-value tags for analytics + routing |
| `mlx:end_user` | End-user identifier (for per-user analytics + budgets) |
| `mlx:cache` | Cache controls: `{ skip: true }` to bypass cache for this request |
| `mlx:trace` | Set `true` to include the decision trace in the response |

Example:

```python
response = client.chat.completions.create(
    model="@production",
    messages=[...],
    extra_body={
        "mlx:tags": {"tenant": "acme", "feature": "summarize"},
        "mlx:end_user": "user_abc",
    },
)
```

## Response

Standard OpenAI response shape:

```json
{
  "id": "chatcmpl_...",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello!" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}
```

## Streaming

Set `stream: true`. Modelux returns SSE events in the OpenAI streaming format.
Works the same with cascades and fallbacks — the stream starts when the first
successful attempt begins responding.

## Tool / function calling

Passes through unchanged to the provider. Modelux normalizes tool-call
behavior across OpenAI, Anthropic, and Google so the same tool schemas work
across all three.

## Structured output (JSON mode)

`response_format: { type: "json_object" }` works across all providers.
`response_format: { type: "json_schema", json_schema: {...} }` works where the
underlying provider supports it; otherwise Modelux falls back to JSON mode
and validates post-hoc.

## Headers

Modelux includes response headers on every successful request:

```
x-modelux-request-id:   req_a1b2c3
x-modelux-model-used:   gpt-4o-mini
x-modelux-provider:     openai
x-modelux-cost-usd:     0.002134
x-modelux-latency-ms:   238
x-modelux-config:       @production
x-modelux-config-version: 4
```
