<!-- source: https://modelux.ai/docs/concepts/capability-matrix -->

> Which features are supported on each proxy surface and each upstream provider.

# Capability matrix

modelux exposes two proxy surfaces — OpenAI shape and Anthropic shape —
and routes requests across many upstream providers. The matrices below
show which features work on which combination, and why the empty cells
are empty.

## Proxy surfaces

What you can send to each endpoint:

| Feature | `/openai/v1/*` | `/anthropic/v1/*` |
|---|:---:|:---:|
| Chat / messages | ✓ | ✓ |
| Streaming (SSE) | ✓ | ✓ |
| Tool / function calling | ✓ | ✓ |
| Streaming tool calls | ✓ | ✓ |
| Vision input | ✓ | ✓ |
| Extended thinking (Claude) | n/a | ✓ |
| `count_tokens` | n/a | ✓ |
| Responses API (`/v1/responses`) | ✓ [^6] | n/a |
| Message batches (async) | ✓ | ✓ |
| Files (upload/download) | ✓ | ✓ [^5] |
| Embeddings | ✓ | n/a [^1] |
| Image generation | ✓ | n/a |
| Speech (TTS) | ✓ [^7] | n/a |
| Transcription (STT) | ✓ [^7] | n/a |
| BYOK passthrough (`X-Modelux-Provider-Key`) | ✓ | ✓ |
| Semantic cache | ✓ | ✓ |
| Dry-run (`X-Modelux-Dry-Run`) | ✓ | ✓ |
| `Cache-Control: no-store` opt-out | ✓ | ✓ |
| `GET /v1/models` | ✓ | ✓ |
| Auth via `Authorization: Bearer` | ✓ | ✓ |
| Auth via `x-api-key` (SDK drop-in) | n/a | ✓ |

[^1]: Anthropic doesn't have an embeddings endpoint upstream, so there's nothing
    for the proxy to translate to. Use `/openai/v1/embeddings` (which routes to
    OpenAI, Cohere, Google, or Voyage) regardless of which surface your
    chat traffic is on.

The semantic cache index is **shared across surfaces**: an entry stored
from `/openai/v1/chat/completions` with model `gpt-4o-mini` and a given
prompt serves a hit for the same model + prompt arriving on
`/anthropic/v1/messages`. The cached response is re-serialized into
whichever wire shape the calling endpoint expects.

## Upstream providers

What each upstream supports through the proxy. Cells marked **drop**
mean the proxy silently skips that part of the request rather than
fail — so a multi-modal request with an image can still reach a
provider that doesn't accept images, just without the image.

| Provider | Chat | Streaming | Tools | Streaming tools | Vision | Extended thinking | Embeddings |
|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| OpenAI | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
| Anthropic | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
| Google (Gemini) | ✓ | ✓ | ✓ | ✓ | base64 only [^2] | ✗ | ✗ [^3] |
| AWS Bedrock (Claude) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
| Cohere | ✓ | ✓ | ✓ | ✓ | drop | ✗ | ✓ |
| Azure OpenAI | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
| Groq, Fireworks, DeepSeek, xAI, Mistral, Cerebras, Together, Perplexity | ✓ | ✓ | ✓ | ✓ | ✓ [^4] | ✗ | ✗ |

[^2]: Gemini accepts only base64-encoded `inlineData` for images. URL-source
    images get dropped on the cross-provider path.
[^3]: Gemini does have an embeddings endpoint (`/v1beta/models/...:embedContent`)
    but its request shape diverges from OpenAI's enough that we haven't built
    the adapter yet.
[^4]: All OpenAI-compatible providers; vision works wherever the underlying
    model does. Some smaller providers don't ship vision-capable models.
[^5]: Anthropic's Files API is currently a beta on the upstream side —
    pass `anthropic-beta: files-api-2025-04-14` (or the current tag your
    SDK uses) and the proxy forwards it untouched. The download endpoint
    is restricted to API-generated files (e.g., outputs from
    computer-use tools); user-uploaded files come back with an
    "is not downloadable" error from Anthropic.
[^7]: See [Audio](/docs/api/audio) for models, request shapes, and
    fallback behavior. OpenAI + Azure OpenAI only today.
[^6]: The Responses API is OpenAI-specific. `@config` routing,
    fallback-within-OpenAI, A/B tests, and budgets are all supported
    on the create endpoint — but cross-provider targets, ensemble
    policies, and semantic cache are not (the Responses item taxonomy
    includes function_call, web_search_call, computer_call, reasoning,
    encrypted reasoning content, and image outputs that have no
    faithful Claude/Gemini mapping and make cache-key hashing fragile).
    Use `/openai/v1/chat/completions` when you need cross-provider
    routing or semantic cache; use `/openai/v1/responses` when you
    specifically need the Responses shape (reasoning effort, web_search
    tool, computer use, image outputs, stored / chained responses)
    with the proxy's auth + routing + observability layer.

## Cross-provider translation notes

When you send a request on one surface and modelux routes it to a
provider whose native API is shaped differently, these are the
translations modelux performs and the edges where fidelity is lost:

### Tools
- OpenAI ↔ Anthropic: `tools` ↔ `tools` (input_schema rename), `tool_choice`
  shapes mapped, assistant `tool_calls[]` ↔ assistant `tool_use` blocks,
  `role: "tool"` reply ↔ `tool_result` block.
- ↔ Google: tools become `functionDeclarations`; `tool_choice` becomes
  `toolConfig.functionCallingConfig` mode. Tool call IDs are synthesized
  on the proxy side because Gemini doesn't issue them.
- ↔ Cohere: tool reply content is wrapped in a `document` array (Cohere v2
  requirement); finish reasons are uppercase upstream and normalized to
  the OpenAI vocabulary.

### Tool result content
**Text only on the cross-provider path.** OpenAI's `role: "tool"` reply
shape only carries text content, so image-typed `tool_result` blocks
(which Anthropic supports natively) get dropped during translation.
The text portion is preserved.

### Extended thinking
Round-tripped verbatim when the upstream is Anthropic or Bedrock-Claude
(signature included so multi-turn reasoning resumes correctly).
Silently ignored by all other providers.

### Vision
Anthropic image source formats (`base64`, `url`) translate to OpenAI's
`image_url` data URI / HTTPS form on the OpenAI cross-provider path.
On Google's path, only base64 sources work — URL-only images get
dropped because Gemini requires inline data.

### Stop reasons / finish reasons
Each provider's native stop reasons normalize into the OpenAI
vocabulary internally (`stop`, `length`, `tool_calls`, `content_filter`),
then translate back into Anthropic's vocabulary (`end_turn`,
`max_tokens`, `tool_use`) on the response side. SDK consumers see
exactly the values their SDK expects.

### Streaming event taxonomy
The OpenAI surface emits OpenAI-format SSE chunks (`chat.completion.chunk`).
The Anthropic surface emits Anthropic's full event taxonomy
(`message_start` → `content_block_start` → `content_block_delta` →
`content_block_stop` → `message_delta` → `message_stop`), with
`thinking_delta`/`signature_delta` for extended-thinking and
`input_json_delta` for tool argument streaming. Both shapes are
rebuilt from the same internal stream regardless of which provider
served the chunks.

### Native prompt caching (Anthropic)

Anthropic's `cache_control` field on a content block — typically
`{"type":"ephemeral"}` on a long system prompt or tool definition you
expect to reuse — passes through the proxy verbatim. Set it on the
caller side and you get Anthropic's native prompt-cache discount on
the next matching request, additive to modelux's semantic cache:

```json
{
  "model": "claude-sonnet-4-5",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "long context to cache", "cache_control": {"type": "ephemeral"}},
      {"type": "text", "text": "Question about that context."}
    ]
  }]
}
```

Honored when the upstream is Anthropic or Bedrock-Claude. Other
providers ignore it.

Cache hit / priming token counts land in the request log row
(`cache_read_tokens`, `cache_creation_tokens`) so dashboards can
answer "did my cache_control tag actually hit?".

### Message batches (async)

Both providers' batch surfaces are proxied as authenticated thin
passthroughs — auth + rate-limit + entitlements via the customer's
modelux key, forwarded to the upstream with the org's stored
credential (or BYOK via `X-Modelux-Provider-Key`), and logged to
ClickHouse so batch traffic shows up in the dashboard alongside
synchronous traffic. Per-sub-request analytics inside a batch is
deferred — query the results file for those. Each provider's async
discount (Anthropic 50%, OpenAI 50%) applies on the upstream side
untouched.

**Anthropic** (`/anthropic/v1/messages/batches/*`):

| Method | Path |
|---|---|
| `POST` | `/anthropic/v1/messages/batches` — create |
| `GET`  | `/anthropic/v1/messages/batches` — list (with `limit`, `after_id`, `before_id`) |
| `GET`  | `/anthropic/v1/messages/batches/{id}` — retrieve |
| `GET`  | `/anthropic/v1/messages/batches/{id}/results` — JSONL results stream |
| `POST` | `/anthropic/v1/messages/batches/{id}/cancel` |
| `DELETE` | `/anthropic/v1/messages/batches/{id}` |

**OpenAI** (`/openai/v1/batches/*` + `/openai/v1/files/*`):

| Method | Path |
|---|---|
| `POST` | `/openai/v1/files` — upload batch input (multipart/form-data) |
| `GET`  | `/openai/v1/files` — list (with `purpose`, `limit`, `order`, `after`) |
| `GET`  | `/openai/v1/files/{id}` — retrieve metadata |
| `GET`  | `/openai/v1/files/{id}/content` — download bytes (results) |
| `DELETE` | `/openai/v1/files/{id}` |
| `POST` | `/openai/v1/batches` — create (references `input_file_id`) |
| `GET`  | `/openai/v1/batches` — list (with `limit`, `after`) |
| `GET`  | `/openai/v1/batches/{id}` — retrieve |
| `POST` | `/openai/v1/batches/{id}/cancel` |

OpenAI batches reference uploaded files by id, so `/v1/files` is also
proxied (same auth/BYOK/observability story).

### Responses API (OpenAI)

The full `/v1/responses` surface is proxied:

| Method | Path |
|---|---|
| `POST` | `/openai/v1/responses` — create (sync or `stream:true` SSE) |
| `GET`  | `/openai/v1/responses/{id}` — retrieve (or `?stream=true` to replay) |
| `POST` | `/openai/v1/responses/{id}/cancel` — cancel a background response |
| `DELETE` | `/openai/v1/responses/{id}` |
| `GET`  | `/openai/v1/responses/{id}/input_items` — list input items |

Same auth + BYOK + observability story as batches. Streaming requests
relay SSE events (`response.created` → deltas → `response.completed`)
through to the caller without buffering. The proxy snaps usage
(`input_tokens`, `output_tokens`, `cached_tokens`) out of the terminal
`response.completed` event for analytics. The Responses request shape
isn't translated cross-provider — see footnote on the matrix for why.

On create, `@config` in the `model` field resolves through the
standard router: single-model, fallback_chain, and ab_test policies
all work (they yield one or more OpenAI targets that the proxy tries
in order on 5xx / 429 / transport failures). Non-OpenAI targets and
ensemble policies are rejected before any upstream call. See the
[Responses page](/docs/api/openai-responses) for examples.

### Files (Anthropic)

The Anthropic Files API beta is proxied alongside batches. Use it to
upload large documents/images once and reference them by id from
`/v1/messages` content blocks (instead of inlining base64 every turn).

| Method | Path |
|---|---|
| `POST` | `/anthropic/v1/files` — upload (multipart/form-data) |
| `GET`  | `/anthropic/v1/files` — list (with `limit`, `before_id`, `after_id`) |
| `GET`  | `/anthropic/v1/files/{id}` — metadata |
| `GET`  | `/anthropic/v1/files/{id}/content` — download (API-generated files only) |
| `DELETE` | `/anthropic/v1/files/{id}` |

The proxy forwards the caller's `anthropic-beta` header(s) verbatim
so the SDK's beta-tag declaration reaches the upstream — meaning the
proxy doesn't need to track which beta tag is current.

## What's not in the matrix

The matrix is currently complete for both proxy surfaces. If you spot
a gap, file an issue.
