<!-- source: https://modelux.ai/docs/api/anthropic-messages -->

> POST /anthropic/v1/messages — Anthropic-shape inference with cross-provider routing.

# Messages

Create a message using the Anthropic Messages API wire format. Drop-in
replacement for the Anthropic base URL — point your SDK at
`https://api.modelux.ai/anthropic` and existing code keeps working while
requests flow through modelux's routing, budgets, and observability.

```
POST /anthropic/v1/messages
POST /anthropic/v1/messages/count_tokens
```

The same routing pipeline backs the OpenAI surface at
[`/openai/v1/chat/completions`](/docs/api/chat-completions) — pick whichever
wire format your existing SDK speaks. Both accept the same model identifiers
(raw model names or `@config` slugs) and the same `X-Modelux-*` headers.

## Cross-provider routing

The headline pitch: an Anthropic-SDK-shaped request can route to **any**
provider's model. The proxy translates between Anthropic's content-block
format and the upstream provider's native shape on the way out, then
inverts the translation on the response.

```bash
# Anthropic SDK shape, Claude upstream:
curl https://api.modelux.ai/anthropic/v1/messages \
  -H "Authorization: Bearer mlx_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 256,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Same SDK shape, OpenAI upstream — only the model name changed:
curl https://api.modelux.ai/anthropic/v1/messages \
  -H "Authorization: Bearer mlx_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "max_tokens": 256,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
```

The response is in Anthropic's shape (`{type: "message", role: "assistant",
content: [{type: "text", text: "..."}], usage: {input_tokens, output_tokens}, ...}`)
regardless of which upstream actually served the request. SDK consumers
deserialize one shape; behind the scenes they can hit OpenAI, Anthropic,
Google, or Bedrock-Claude.

## Request

```json
{
  "model": "claude-sonnet-4-5",
  "max_tokens": 1024,
  "system": "You are a helpful assistant.",
  "messages": [
    { "role": "user", "content": "Hello!" }
  ],
  "temperature": 0.7,
  "stream": false
}
```

### Model identifier

- **Raw model name** — `claude-sonnet-4-5`, `gpt-4o-mini`, `gemini-2.5-flash`,
  `anthropic.claude-3-5-haiku-20241022-v1:0` (Bedrock form)
- **Routing config slug** — `@production`, `@fallback`, `@experiment`

### Content blocks

Anthropic's content-block format is supported on both request and response
sides:

- **`text`** — text content
- **`image`** — vision input (base64 `source.type=base64` and URL
  `source.type=url` both supported)
- **`tool_use`** — assistant's request to invoke a tool (multi-turn echo)
- **`tool_result`** — caller's reply with the tool's output
- **`thinking`** / **`redacted_thinking`** — extended-thinking blocks
  (round-tripped verbatim, including the opaque `signature` field)

## Streaming

Set `stream: true`. modelux returns SSE events in Anthropic's streaming
format:

```
event: message_start
data: {"type":"message_start","message":{...}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":N}}

event: message_stop
data: {"type":"message_stop"}
```

Streaming works identically across providers. When the upstream is
Anthropic the events forward 1:1; for OpenAI / Google / Bedrock / Cohere
the proxy translates the upstream stream into Anthropic's event taxonomy
on the way out.

## Tool calling

Tools are first-class on this endpoint and route across providers:

```json
{
  "model": "gpt-4o-mini",
  "max_tokens": 1024,
  "tools": [{
    "name": "get_weather",
    "description": "Get the current weather in a city",
    "input_schema": {
      "type": "object",
      "properties": {"city": {"type": "string"}},
      "required": ["city"]
    }
  }],
  "tool_choice": {"type": "auto"},
  "messages": [{"role": "user", "content": "Weather in SF?"}]
}
```

Response includes `tool_use` blocks regardless of upstream:

```json
{
  "type": "message",
  "role": "assistant",
  "content": [
    {"type": "text", "text": "Let me check."},
    {"type": "tool_use", "id": "...", "name": "get_weather", "input": {"city": "SF"}}
  ],
  "stop_reason": "tool_use",
  "usage": {"input_tokens": 12, "output_tokens": 18}
}
```

Echo the assistant's blocks back verbatim with a `tool_result` block on the
next turn:

```json
{
  "messages": [
    {"role": "user", "content": "Weather in SF?"},
    {"role": "assistant", "content": [
      {"type": "text", "text": "Let me check."},
      {"type": "tool_use", "id": "toolu_abc", "name": "get_weather", "input": {"city": "SF"}}
    ]},
    {"role": "user", "content": [
      {"type": "tool_result", "tool_use_id": "toolu_abc", "content": "72°F sunny"}
    ]}
  ]
}
```

`tool_choice` shapes supported: `{"type": "auto"}`, `{"type": "any"}`
(forces a tool call), `{"type": "tool", "name": "X"}` (forces a specific
tool). The proxy translates these to each upstream's equivalent
(`required` / `none` / function-named for OpenAI; `AUTO` / `ANY` / `NONE`
+ `allowedFunctionNames` for Google; etc.).

**Tool result limitation:** tool_result content is currently text-only on
the cross-provider path. Image-typed tool_result content (Anthropic
supports it natively) is dropped during translation because OpenAI's
`role: "tool"` shape doesn't carry images and would break the
cross-provider promise. If you need image results in tool replies, file
an issue.

## Native prompt caching (cache_control)

Anthropic's `cache_control` marker passes through the proxy verbatim
on every place the upstream API accepts it: per-content-block on
messages, per-system-block (when `system` is sent as an array of
blocks), and per-tool. Set it on the caller side and the next
matching request hits Anthropic's native prompt cache for the
discounted price (cache reads ≈ 0.10× input rate; cache writes ≈
1.25× for the 5-minute ephemeral tier).

```json
{
  "model": "claude-sonnet-4-5",
  "max_tokens": 256,
  "system": [
    {"type": "text", "text": "short instructions"},
    {"type": "text", "text": "long context to cache",
     "cache_control": {"type": "ephemeral"}}
  ],
  "tools": [
    {"name": "first", "input_schema": {"type":"object","properties":{}}},
    {"name": "last", "input_schema": {"type":"object","properties":{}},
     "cache_control": {"type": "ephemeral"}}
  ],
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "long retrieved chunks…",
       "cache_control": {"type": "ephemeral"}},
      {"type": "text", "text": "Question about that context."}
    ]
  }]
}
```

The standard pattern for tools is to mark the **last** tool — that
caches the entire tool list as the prefix. For system, mark the long
shared block; for messages, mark whatever you expect to reuse across
turns.

Honored when the routed upstream is Anthropic or Bedrock-Claude; other
providers ignore it. Cache hit / write counts land in the request log
as `cache_read_tokens` and `cache_creation_tokens` so dashboards can
answer "did my marker actually hit?". The proxy applies the matching
discount when computing logged `cost_usd`.

This is **separate from and additive to** modelux's own
[semantic cache](#semantic-caching) (which keys on prompt similarity
across surfaces and replays a cached response without an upstream
call). The two layers compose.

## Extended thinking

Pass the `thinking` config to enable Claude's extended-thinking models:

```json
{
  "model": "claude-sonnet-4-5",
  "max_tokens": 4000,
  "thinking": {"type": "enabled", "budget_tokens": 2000},
  "messages": [{"role": "user", "content": "Walk me through 17 × 24."}]
}
```

The response includes `thinking` content blocks ahead of the text answer:

```json
{
  "type": "message",
  "content": [
    {"type": "thinking", "thinking": "I'll break 24 into 20 + 4...", "signature": "..."},
    {"type": "text", "text": "17 × 24 = 408."}
  ],
  ...
}
```

For multi-turn conversations, **echo the thinking blocks back verbatim**
on subsequent turns — the `signature` field is the opaque token Anthropic
uses to resume reasoning. Streaming emits `thinking_delta` and
`signature_delta` events ahead of any text/tool_use blocks, preserving
the signature across the wire.

Extended thinking is honored when the routed upstream is Anthropic or
Bedrock-Claude. Other providers ignore the `thinking` field silently.

## Semantic caching

When a project enables the semantic cache (Settings → Caching), responses
to similar prompts are reused without a real upstream call. The cache
index is keyed by `(project, model, embedding-of-canonical-prompt)` on
modelux's internal canonical message shape — so an entry stored from a
request on `/openai/v1/chat/completions` is reusable from a request on
`/anthropic/v1/messages` (same model, same prompt) and vice versa.

Cache hits are emitted in whichever wire shape the calling endpoint
expects: a hit served to `/anthropic/v1/messages` comes back as
`{type:"message", content:[…]}`; the same entry served to
`/openai/v1/chat/completions` comes back as the OpenAI completion shape.
Streaming hits replay the cached response as a well-formed SSE
sequence (`message_start` → `content_block_*` → `message_delta` →
`message_stop`).

Response headers identify cache behavior:

- `X-Modelux-Cache: HIT` (or `MISS`)
- `X-Modelux-Cache-Similarity: 0.9876` (cosine similarity, hits only)

Cache writes only happen on successful upstream calls (a partial or
errored stream isn't poisoned into the cache). High-temperature
(creative) requests skip the cache by default; clients can also set
`Cache-Control: no-store` to bypass per request.

## Count tokens

Pre-flight token estimation:

```
POST /anthropic/v1/messages/count_tokens
```

Same request body as `/anthropic/v1/messages` (model + messages, optionally
tools/system). Returns:

```json
{ "input_tokens": 42 }
```

The proxy forwards this verbatim to Anthropic upstream — token counts use
Claude's actual tokenizer. Requires an Anthropic credential configured
for your org (or a `X-Modelux-Provider-Key` header).

## Dry-run

Set `X-Modelux-Dry-Run: true` (or `1`) to evaluate routing without
calling the upstream:

```bash
curl https://api.modelux.ai/anthropic/v1/messages \
  -H "x-api-key: mlx_sk_..." \
  -H "X-Modelux-Dry-Run: true" \
  -H "Content-Type: application/json" \
  -d '{"model":"@production","max_tokens":256,"messages":[{"role":"user","content":"hi"}]}'
```

Returns the same `dry_run` envelope the OpenAI surface uses — routing
policy, target model + provider, candidate trace, matched rule,
variant bucket. Identical shape so the dashboard's "preview route"
feature works regardless of which endpoint a caller uses.

## Other endpoints

```
GET /anthropic/v1/models
```

Returns Anthropic's pagination envelope (`{data: [], has_more: false}`).
Modelux doesn't yet expose a curated model registry; the list is
empty. Present so SDK probes via `client.models.list()` don't 404.

### Message batches

For async batch processing (50% upstream discount), the full
`/v1/messages/batches/*` surface is proxied. See
[Message batches (Anthropic)](/docs/api/anthropic-batches) for the
endpoint table and a complete walkthrough.

## Request headers

Same `X-Modelux-*` headers as the OpenAI surface. See
[Chat completions → Request headers](/docs/api/chat-completions#request-headers).

The Modelux API key is accepted as either `Authorization: Bearer
mlx_sk_…` or `x-api-key: mlx_sk_…` so the official Anthropic SDKs
(which set `x-api-key` from their `apiKey` constructor option) work
as drop-in clients with nothing more than a `baseURL` swap.

## Response headers

```
x-modelux-request-id:   req_a1b2c3
x-modelux-model-used:   claude-sonnet-4-5
x-modelux-provider:     anthropic
x-modelux-cost-usd:     0.002134
x-modelux-cache:        MISS
```

## Errors

Errors follow Anthropic's wire shape:

```json
{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "max_tokens must be > 0"
  }
}
```

| `error.type` | Status |
|---|---|
| `invalid_request_error` | 400 |
| `authentication_error` | 401 |
| `permission_error` | 402 (budget) / 403 |
| `not_found_error` | 404 |
| `rate_limit_error` | 429 |
| `api_error` | 5xx |
