[view as .md]

Capability matrix

modelux exposes two proxy surfaces — OpenAI shape and Anthropic shape — and routes requests across many upstream providers. The matrices below show which features work on which combination, and why the empty cells are empty.

Proxy surfaces

What you can send to each endpoint:

Feature/openai/v1/*/anthropic/v1/*
Chat / messages
Streaming (SSE)
Tool / function calling
Streaming tool calls
Vision input
Extended thinking (Claude)n/a
count_tokensn/a
Responses API (/v1/responses)1n/a
Message batches (async)
Files (upload/download)2
Embeddingsn/a 3
Image generationn/a
Speech (TTS)4n/a
Transcription (STT)4n/a
BYOK passthrough (X-Modelux-Provider-Key)
Semantic cache
Dry-run (X-Modelux-Dry-Run)
Cache-Control: no-store opt-out
GET /v1/models
Auth via Authorization: Bearer
Auth via x-api-key (SDK drop-in)n/a

The semantic cache index is shared across surfaces: an entry stored from /openai/v1/chat/completions with model gpt-4o-mini and a given prompt serves a hit for the same model + prompt arriving on /anthropic/v1/messages. The cached response is re-serialized into whichever wire shape the calling endpoint expects.

Upstream providers

What each upstream supports through the proxy. Cells marked drop mean the proxy silently skips that part of the request rather than fail — so a multi-modal request with an image can still reach a provider that doesn’t accept images, just without the image.

ProviderChatStreamingToolsStreaming toolsVisionExtended thinkingEmbeddings
OpenAI
Anthropic
Google (Gemini)base64 only 56
AWS Bedrock (Claude)
Coheredrop
Azure OpenAI
Groq, Fireworks, DeepSeek, xAI, Mistral, Cerebras, Together, Perplexity7

Cross-provider translation notes

When you send a request on one surface and modelux routes it to a provider whose native API is shaped differently, these are the translations modelux performs and the edges where fidelity is lost:

Tools

  • OpenAI ↔ Anthropic: toolstools (input_schema rename), tool_choice shapes mapped, assistant tool_calls[] ↔ assistant tool_use blocks, role: "tool" reply ↔ tool_result block.
  • ↔ Google: tools become functionDeclarations; tool_choice becomes toolConfig.functionCallingConfig mode. Tool call IDs are synthesized on the proxy side because Gemini doesn’t issue them.
  • ↔ Cohere: tool reply content is wrapped in a document array (Cohere v2 requirement); finish reasons are uppercase upstream and normalized to the OpenAI vocabulary.

Tool result content

Text only on the cross-provider path. OpenAI’s role: "tool" reply shape only carries text content, so image-typed tool_result blocks (which Anthropic supports natively) get dropped during translation. The text portion is preserved.

Extended thinking

Round-tripped verbatim when the upstream is Anthropic or Bedrock-Claude (signature included so multi-turn reasoning resumes correctly). Silently ignored by all other providers.

Vision

Anthropic image source formats (base64, url) translate to OpenAI’s image_url data URI / HTTPS form on the OpenAI cross-provider path. On Google’s path, only base64 sources work — URL-only images get dropped because Gemini requires inline data.

Stop reasons / finish reasons

Each provider’s native stop reasons normalize into the OpenAI vocabulary internally (stop, length, tool_calls, content_filter), then translate back into Anthropic’s vocabulary (end_turn, max_tokens, tool_use) on the response side. SDK consumers see exactly the values their SDK expects.

Streaming event taxonomy

The OpenAI surface emits OpenAI-format SSE chunks (chat.completion.chunk). The Anthropic surface emits Anthropic’s full event taxonomy (message_startcontent_block_startcontent_block_deltacontent_block_stopmessage_deltamessage_stop), with thinking_delta/signature_delta for extended-thinking and input_json_delta for tool argument streaming. Both shapes are rebuilt from the same internal stream regardless of which provider served the chunks.

Native prompt caching (Anthropic)

Anthropic’s cache_control field on a content block — typically {"type":"ephemeral"} on a long system prompt or tool definition you expect to reuse — passes through the proxy verbatim. Set it on the caller side and you get Anthropic’s native prompt-cache discount on the next matching request, additive to modelux’s semantic cache:

{
  "model": "claude-sonnet-4-5",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "long context to cache", "cache_control": {"type": "ephemeral"}},
      {"type": "text", "text": "Question about that context."}
    ]
  }]
}

Honored when the upstream is Anthropic or Bedrock-Claude. Other providers ignore it.

Cache hit / priming token counts land in the request log row (cache_read_tokens, cache_creation_tokens) so dashboards can answer “did my cache_control tag actually hit?”.

Message batches (async)

Both providers’ batch surfaces are proxied as authenticated thin passthroughs — auth + rate-limit + entitlements via the customer’s modelux key, forwarded to the upstream with the org’s stored credential (or BYOK via X-Modelux-Provider-Key), and logged to ClickHouse so batch traffic shows up in the dashboard alongside synchronous traffic. Per-sub-request analytics inside a batch is deferred — query the results file for those. Each provider’s async discount (Anthropic 50%, OpenAI 50%) applies on the upstream side untouched.

Anthropic (/anthropic/v1/messages/batches/*):

MethodPath
POST/anthropic/v1/messages/batches — create
GET/anthropic/v1/messages/batches — list (with limit, after_id, before_id)
GET/anthropic/v1/messages/batches/{id} — retrieve
GET/anthropic/v1/messages/batches/{id}/results — JSONL results stream
POST/anthropic/v1/messages/batches/{id}/cancel
DELETE/anthropic/v1/messages/batches/{id}

OpenAI (/openai/v1/batches/* + /openai/v1/files/*):

MethodPath
POST/openai/v1/files — upload batch input (multipart/form-data)
GET/openai/v1/files — list (with purpose, limit, order, after)
GET/openai/v1/files/{id} — retrieve metadata
GET/openai/v1/files/{id}/content — download bytes (results)
DELETE/openai/v1/files/{id}
POST/openai/v1/batches — create (references input_file_id)
GET/openai/v1/batches — list (with limit, after)
GET/openai/v1/batches/{id} — retrieve
POST/openai/v1/batches/{id}/cancel

OpenAI batches reference uploaded files by id, so /v1/files is also proxied (same auth/BYOK/observability story).

Responses API (OpenAI)

The full /v1/responses surface is proxied:

MethodPath
POST/openai/v1/responses — create (sync or stream:true SSE)
GET/openai/v1/responses/{id} — retrieve (or ?stream=true to replay)
POST/openai/v1/responses/{id}/cancel — cancel a background response
DELETE/openai/v1/responses/{id}
GET/openai/v1/responses/{id}/input_items — list input items

Same auth + BYOK + observability story as batches. Streaming requests relay SSE events (response.created → deltas → response.completed) through to the caller without buffering. The proxy snaps usage (input_tokens, output_tokens, cached_tokens) out of the terminal response.completed event for analytics. The Responses request shape isn’t translated cross-provider — see footnote on the matrix for why.

On create, @config in the model field resolves through the standard router: single-model, fallback_chain, and ab_test policies all work (they yield one or more OpenAI targets that the proxy tries in order on 5xx / 429 / transport failures). Non-OpenAI targets and ensemble policies are rejected before any upstream call. See the Responses page for examples.

Files (Anthropic)

The Anthropic Files API beta is proxied alongside batches. Use it to upload large documents/images once and reference them by id from /v1/messages content blocks (instead of inlining base64 every turn).

MethodPath
POST/anthropic/v1/files — upload (multipart/form-data)
GET/anthropic/v1/files — list (with limit, before_id, after_id)
GET/anthropic/v1/files/{id} — metadata
GET/anthropic/v1/files/{id}/content — download (API-generated files only)
DELETE/anthropic/v1/files/{id}

The proxy forwards the caller’s anthropic-beta header(s) verbatim so the SDK’s beta-tag declaration reaches the upstream — meaning the proxy doesn’t need to track which beta tag is current.

What’s not in the matrix

The matrix is currently complete for both proxy surfaces. If you spot a gap, file an issue.

Footnotes

  1. The Responses API is OpenAI-specific. @config routing, fallback-within-OpenAI, A/B tests, and budgets are all supported on the create endpoint — but cross-provider targets, ensemble policies, and semantic cache are not (the Responses item taxonomy includes function_call, web_search_call, computer_call, reasoning, encrypted reasoning content, and image outputs that have no faithful Claude/Gemini mapping and make cache-key hashing fragile). Use /openai/v1/chat/completions when you need cross-provider routing or semantic cache; use /openai/v1/responses when you specifically need the Responses shape (reasoning effort, web_search tool, computer use, image outputs, stored / chained responses) with the proxy’s auth + routing + observability layer.

  2. Anthropic’s Files API is currently a beta on the upstream side — pass anthropic-beta: files-api-2025-04-14 (or the current tag your SDK uses) and the proxy forwards it untouched. The download endpoint is restricted to API-generated files (e.g., outputs from computer-use tools); user-uploaded files come back with an “is not downloadable” error from Anthropic.

  3. Anthropic doesn’t have an embeddings endpoint upstream, so there’s nothing for the proxy to translate to. Use /openai/v1/embeddings (which routes to OpenAI, Cohere, Google, or Voyage) regardless of which surface your chat traffic is on.

  4. See Audio for models, request shapes, and fallback behavior. OpenAI + Azure OpenAI only today. 2

  5. Gemini accepts only base64-encoded inlineData for images. URL-source images get dropped on the cross-provider path.

  6. Gemini does have an embeddings endpoint (/v1beta/models/...:embedContent) but its request shape diverges from OpenAI’s enough that we haven’t built the adapter yet.

  7. All OpenAI-compatible providers; vision works wherever the underlying model does. Some smaller providers don’t ship vision-capable models.