Capability matrix
modelux exposes two proxy surfaces — OpenAI shape and Anthropic shape — and routes requests across many upstream providers. The matrices below show which features work on which combination, and why the empty cells are empty.
Proxy surfaces
What you can send to each endpoint:
| Feature | /openai/v1/* | /anthropic/v1/* |
|---|---|---|
| Chat / messages | ✓ | ✓ |
| Streaming (SSE) | ✓ | ✓ |
| Tool / function calling | ✓ | ✓ |
| Streaming tool calls | ✓ | ✓ |
| Vision input | ✓ | ✓ |
| Extended thinking (Claude) | n/a | ✓ |
count_tokens | n/a | ✓ |
Responses API (/v1/responses) | ✓ 1 | n/a |
| Message batches (async) | ✓ | ✓ |
| Files (upload/download) | ✓ | ✓ 2 |
| Embeddings | ✓ | n/a 3 |
| Image generation | ✓ | n/a |
| Speech (TTS) | ✓ 4 | n/a |
| Transcription (STT) | ✓ 4 | n/a |
BYOK passthrough (X-Modelux-Provider-Key) | ✓ | ✓ |
| Semantic cache | ✓ | ✓ |
Dry-run (X-Modelux-Dry-Run) | ✓ | ✓ |
Cache-Control: no-store opt-out | ✓ | ✓ |
GET /v1/models | ✓ | ✓ |
Auth via Authorization: Bearer | ✓ | ✓ |
Auth via x-api-key (SDK drop-in) | n/a | ✓ |
The semantic cache index is shared across surfaces: an entry stored
from /openai/v1/chat/completions with model gpt-4o-mini and a given
prompt serves a hit for the same model + prompt arriving on
/anthropic/v1/messages. The cached response is re-serialized into
whichever wire shape the calling endpoint expects.
Upstream providers
What each upstream supports through the proxy. Cells marked drop mean the proxy silently skips that part of the request rather than fail — so a multi-modal request with an image can still reach a provider that doesn’t accept images, just without the image.
| Provider | Chat | Streaming | Tools | Streaming tools | Vision | Extended thinking | Embeddings |
|---|---|---|---|---|---|---|---|
| OpenAI | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
| Anthropic | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
| Google (Gemini) | ✓ | ✓ | ✓ | ✓ | base64 only 5 | ✗ | ✗ 6 |
| AWS Bedrock (Claude) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
| Cohere | ✓ | ✓ | ✓ | ✓ | drop | ✗ | ✓ |
| Azure OpenAI | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
| Groq, Fireworks, DeepSeek, xAI, Mistral, Cerebras, Together, Perplexity | ✓ | ✓ | ✓ | ✓ | ✓ 7 | ✗ | ✗ |
Cross-provider translation notes
When you send a request on one surface and modelux routes it to a provider whose native API is shaped differently, these are the translations modelux performs and the edges where fidelity is lost:
Tools
- OpenAI ↔ Anthropic:
tools↔tools(input_schema rename),tool_choiceshapes mapped, assistanttool_calls[]↔ assistanttool_useblocks,role: "tool"reply ↔tool_resultblock. - ↔ Google: tools become
functionDeclarations;tool_choicebecomestoolConfig.functionCallingConfigmode. Tool call IDs are synthesized on the proxy side because Gemini doesn’t issue them. - ↔ Cohere: tool reply content is wrapped in a
documentarray (Cohere v2 requirement); finish reasons are uppercase upstream and normalized to the OpenAI vocabulary.
Tool result content
Text only on the cross-provider path. OpenAI’s role: "tool" reply
shape only carries text content, so image-typed tool_result blocks
(which Anthropic supports natively) get dropped during translation.
The text portion is preserved.
Extended thinking
Round-tripped verbatim when the upstream is Anthropic or Bedrock-Claude (signature included so multi-turn reasoning resumes correctly). Silently ignored by all other providers.
Vision
Anthropic image source formats (base64, url) translate to OpenAI’s
image_url data URI / HTTPS form on the OpenAI cross-provider path.
On Google’s path, only base64 sources work — URL-only images get
dropped because Gemini requires inline data.
Stop reasons / finish reasons
Each provider’s native stop reasons normalize into the OpenAI
vocabulary internally (stop, length, tool_calls, content_filter),
then translate back into Anthropic’s vocabulary (end_turn,
max_tokens, tool_use) on the response side. SDK consumers see
exactly the values their SDK expects.
Streaming event taxonomy
The OpenAI surface emits OpenAI-format SSE chunks (chat.completion.chunk).
The Anthropic surface emits Anthropic’s full event taxonomy
(message_start → content_block_start → content_block_delta →
content_block_stop → message_delta → message_stop), with
thinking_delta/signature_delta for extended-thinking and
input_json_delta for tool argument streaming. Both shapes are
rebuilt from the same internal stream regardless of which provider
served the chunks.
Native prompt caching (Anthropic)
Anthropic’s cache_control field on a content block — typically
{"type":"ephemeral"} on a long system prompt or tool definition you
expect to reuse — passes through the proxy verbatim. Set it on the
caller side and you get Anthropic’s native prompt-cache discount on
the next matching request, additive to modelux’s semantic cache:
{
"model": "claude-sonnet-4-5",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "long context to cache", "cache_control": {"type": "ephemeral"}},
{"type": "text", "text": "Question about that context."}
]
}]
}
Honored when the upstream is Anthropic or Bedrock-Claude. Other providers ignore it.
Cache hit / priming token counts land in the request log row
(cache_read_tokens, cache_creation_tokens) so dashboards can
answer “did my cache_control tag actually hit?”.
Message batches (async)
Both providers’ batch surfaces are proxied as authenticated thin
passthroughs — auth + rate-limit + entitlements via the customer’s
modelux key, forwarded to the upstream with the org’s stored
credential (or BYOK via X-Modelux-Provider-Key), and logged to
ClickHouse so batch traffic shows up in the dashboard alongside
synchronous traffic. Per-sub-request analytics inside a batch is
deferred — query the results file for those. Each provider’s async
discount (Anthropic 50%, OpenAI 50%) applies on the upstream side
untouched.
Anthropic (/anthropic/v1/messages/batches/*):
| Method | Path |
|---|---|
POST | /anthropic/v1/messages/batches — create |
GET | /anthropic/v1/messages/batches — list (with limit, after_id, before_id) |
GET | /anthropic/v1/messages/batches/{id} — retrieve |
GET | /anthropic/v1/messages/batches/{id}/results — JSONL results stream |
POST | /anthropic/v1/messages/batches/{id}/cancel |
DELETE | /anthropic/v1/messages/batches/{id} |
OpenAI (/openai/v1/batches/* + /openai/v1/files/*):
| Method | Path |
|---|---|
POST | /openai/v1/files — upload batch input (multipart/form-data) |
GET | /openai/v1/files — list (with purpose, limit, order, after) |
GET | /openai/v1/files/{id} — retrieve metadata |
GET | /openai/v1/files/{id}/content — download bytes (results) |
DELETE | /openai/v1/files/{id} |
POST | /openai/v1/batches — create (references input_file_id) |
GET | /openai/v1/batches — list (with limit, after) |
GET | /openai/v1/batches/{id} — retrieve |
POST | /openai/v1/batches/{id}/cancel |
OpenAI batches reference uploaded files by id, so /v1/files is also
proxied (same auth/BYOK/observability story).
Responses API (OpenAI)
The full /v1/responses surface is proxied:
| Method | Path |
|---|---|
POST | /openai/v1/responses — create (sync or stream:true SSE) |
GET | /openai/v1/responses/{id} — retrieve (or ?stream=true to replay) |
POST | /openai/v1/responses/{id}/cancel — cancel a background response |
DELETE | /openai/v1/responses/{id} |
GET | /openai/v1/responses/{id}/input_items — list input items |
Same auth + BYOK + observability story as batches. Streaming requests
relay SSE events (response.created → deltas → response.completed)
through to the caller without buffering. The proxy snaps usage
(input_tokens, output_tokens, cached_tokens) out of the terminal
response.completed event for analytics. The Responses request shape
isn’t translated cross-provider — see footnote on the matrix for why.
On create, @config in the model field resolves through the
standard router: single-model, fallback_chain, and ab_test policies
all work (they yield one or more OpenAI targets that the proxy tries
in order on 5xx / 429 / transport failures). Non-OpenAI targets and
ensemble policies are rejected before any upstream call. See the
Responses page for examples.
Files (Anthropic)
The Anthropic Files API beta is proxied alongside batches. Use it to
upload large documents/images once and reference them by id from
/v1/messages content blocks (instead of inlining base64 every turn).
| Method | Path |
|---|---|
POST | /anthropic/v1/files — upload (multipart/form-data) |
GET | /anthropic/v1/files — list (with limit, before_id, after_id) |
GET | /anthropic/v1/files/{id} — metadata |
GET | /anthropic/v1/files/{id}/content — download (API-generated files only) |
DELETE | /anthropic/v1/files/{id} |
The proxy forwards the caller’s anthropic-beta header(s) verbatim
so the SDK’s beta-tag declaration reaches the upstream — meaning the
proxy doesn’t need to track which beta tag is current.
What’s not in the matrix
The matrix is currently complete for both proxy surfaces. If you spot a gap, file an issue.
Footnotes
-
The Responses API is OpenAI-specific.
@configrouting, fallback-within-OpenAI, A/B tests, and budgets are all supported on the create endpoint — but cross-provider targets, ensemble policies, and semantic cache are not (the Responses item taxonomy includes function_call, web_search_call, computer_call, reasoning, encrypted reasoning content, and image outputs that have no faithful Claude/Gemini mapping and make cache-key hashing fragile). Use/openai/v1/chat/completionswhen you need cross-provider routing or semantic cache; use/openai/v1/responseswhen you specifically need the Responses shape (reasoning effort, web_search tool, computer use, image outputs, stored / chained responses) with the proxy’s auth + routing + observability layer. ↩ -
Anthropic’s Files API is currently a beta on the upstream side — pass
anthropic-beta: files-api-2025-04-14(or the current tag your SDK uses) and the proxy forwards it untouched. The download endpoint is restricted to API-generated files (e.g., outputs from computer-use tools); user-uploaded files come back with an “is not downloadable” error from Anthropic. ↩ -
Anthropic doesn’t have an embeddings endpoint upstream, so there’s nothing for the proxy to translate to. Use
/openai/v1/embeddings(which routes to OpenAI, Cohere, Google, or Voyage) regardless of which surface your chat traffic is on. ↩ -
See Audio for models, request shapes, and fallback behavior. OpenAI + Azure OpenAI only today. ↩ ↩2
-
Gemini accepts only base64-encoded
inlineDatafor images. URL-source images get dropped on the cross-provider path. ↩ -
Gemini does have an embeddings endpoint (
/v1beta/models/...:embedContent) but its request shape diverges from OpenAI’s enough that we haven’t built the adapter yet. ↩ -
All OpenAI-compatible providers; vision works wherever the underlying model does. Some smaller providers don’t ship vision-capable models. ↩