<!-- source: https://modelux.ai/docs/guides/traffic-split -->

> Distribute load across multiple providers (or keys) with weighted, session-sticky routing.

# Splitting traffic across providers

`traffic_split` spreads requests across N providers by weight. Common uses:

- Run multiple keys for the same provider to work around per-key rate limits
- Distribute load across providers to de-risk a single-provider outage
- Sit between two models at a chosen ratio without the formal `ab_test` framing

Unlike `ab_test`, each target is a concrete model + credential — no nested
sub-policies. Unlike `ensemble`, only one target serves each request.

## Session stickiness

Each call doesn't roll fresh. The proxy uses a waterfall:

1. If the request carries an `X-Modelux-Conversation-Id` → the whole
   conversation pins to one target. Every turn lands on the same provider.
2. Otherwise, if there's an `X-Modelux-Trace-Id` → one agent run (the fan-out
   of tool loops) pins together.
3. Otherwise, roll weighted-random per request.

Rationale: switching providers mid-conversation produces visibly different
behavior to a chat user — tone, verbosity, tool-call style. Stickiness per
session preserves consistency; across many sessions the split converges to
the configured weights. No configuration knob — the right identifier wins
automatically based on what your caller sends.

## Create the config

In the dashboard, go to **Routing -> Create config**, pick **Traffic split**,
and add targets:

```json
{
  "strategy": "traffic_split",
  "targets": [
    { "model": "gpt-4o-mini",      "provider_credential_id": "cred_oa_1", "weight": 70 },
    { "model": "claude-haiku-4-5", "provider_credential_id": "cred_an_1", "weight": 30 }
  ]
}
```

Weights are arbitrary positive numbers — modelux normalizes by the sum. `70/30`,
`7/3`, and `0.7/0.3` are equivalent.

## Rate-limit bypass with multiple keys

To fan out across multiple keys of the same provider, point several targets
at the same model with different `provider_credential_id`s and equal weights:

```json
{
  "strategy": "traffic_split",
  "targets": [
    { "model": "gpt-4o", "provider_credential_id": "cred_oa_primary",   "weight": 1 },
    { "model": "gpt-4o", "provider_credential_id": "cred_oa_secondary", "weight": 1 },
    { "model": "gpt-4o", "provider_credential_id": "cred_oa_tertiary",  "weight": 1 }
  ]
}
```

Each key gets ~1/3 of the load, spreading you under per-key rate limits.
Stickiness still applies — a conversation won't bounce between keys mid-stream.

## Verify

Check **Logs** for the `routing.target_reason` tag on recent decisions —
`traffic_split` targets are tagged `weight N` so you can confirm the chosen
target and, over time, that actual distribution matches configured weights.

## Call it from your app

```python
client.chat.completions.create(
    model="@production",   # the slug of the routing config
    messages=[...],
    extra_headers={
        "X-Modelux-Conversation-Id": "conv_abc123",   # pins the session
    },
)
```

Without a conversation ID, pass `X-Modelux-Trace-Id` to keep one agent run
pinned; with neither, every request rolls fresh.
