Splitting traffic across providers

traffic_split spreads requests across N providers by weight. Common uses:

Run multiple keys for the same provider to work around per-key rate limits
Distribute load across providers to de-risk a single-provider outage
Sit between two models at a chosen ratio without the formal ab_test framing

Unlike ab_test, each target is a concrete model + credential — no nested sub-policies. Unlike ensemble, only one target serves each request.

Session stickiness

Each call doesn’t roll fresh. The proxy uses a waterfall:

If the request carries an X-Modelux-Conversation-Id → the whole conversation pins to one target. Every turn lands on the same provider.
Otherwise, if there’s an X-Modelux-Trace-Id → one agent run (the fan-out of tool loops) pins together.
Otherwise, roll weighted-random per request.

Rationale: switching providers mid-conversation produces visibly different behavior to a chat user — tone, verbosity, tool-call style. Stickiness per session preserves consistency; across many sessions the split converges to the configured weights. No configuration knob — the right identifier wins automatically based on what your caller sends.

Create the config

In the dashboard, go to Routing -> Create config, pick Traffic split, and add targets:

{
  "strategy": "traffic_split",
  "targets": [
    { "model": "gpt-4o-mini",      "provider_credential_id": "cred_oa_1", "weight": 70 },
    { "model": "claude-haiku-4-5", "provider_credential_id": "cred_an_1", "weight": 30 }
  ]
}

Weights are arbitrary positive numbers — modelux normalizes by the sum. 70/30, 7/3, and 0.7/0.3 are equivalent.

Rate-limit bypass with multiple keys

To fan out across multiple keys of the same provider, point several targets at the same model with different provider_credential_ids and equal weights:

{
  "strategy": "traffic_split",
  "targets": [
    { "model": "gpt-4o", "provider_credential_id": "cred_oa_primary",   "weight": 1 },
    { "model": "gpt-4o", "provider_credential_id": "cred_oa_secondary", "weight": 1 },
    { "model": "gpt-4o", "provider_credential_id": "cred_oa_tertiary",  "weight": 1 }
  ]
}

Each key gets ~1/3 of the load, spreading you under per-key rate limits. Stickiness still applies — a conversation won’t bounce between keys mid-stream.

Verify

Check Logs for the routing.target_reason tag on recent decisions — traffic_split targets are tagged weight N so you can confirm the chosen target and, over time, that actual distribution matches configured weights.

Call it from your app

client.chat.completions.create(
    model="@production",   # the slug of the routing config
    messages=[...],
    extra_headers={
        "X-Modelux-Conversation-Id": "conv_abc123",   # pins the session
    },
)

Without a conversation ID, pass X-Modelux-Trace-Id to keep one agent run pinned; with neither, every request rolls fresh.