Splitting traffic across providers
traffic_split spreads requests across N providers by weight. Common uses:
- Run multiple keys for the same provider to work around per-key rate limits
- Distribute load across providers to de-risk a single-provider outage
- Sit between two models at a chosen ratio without the formal
ab_testframing
Unlike ab_test, each target is a concrete model + credential — no nested
sub-policies. Unlike ensemble, only one target serves each request.
Session stickiness
Each call doesn’t roll fresh. The proxy uses a waterfall:
- If the request carries an
X-Modelux-Conversation-Id→ the whole conversation pins to one target. Every turn lands on the same provider. - Otherwise, if there’s an
X-Modelux-Trace-Id→ one agent run (the fan-out of tool loops) pins together. - Otherwise, roll weighted-random per request.
Rationale: switching providers mid-conversation produces visibly different behavior to a chat user — tone, verbosity, tool-call style. Stickiness per session preserves consistency; across many sessions the split converges to the configured weights. No configuration knob — the right identifier wins automatically based on what your caller sends.
Create the config
In the dashboard, go to Routing -> Create config, pick Traffic split, and add targets:
{
"strategy": "traffic_split",
"targets": [
{ "model": "gpt-4o-mini", "provider_credential_id": "cred_oa_1", "weight": 70 },
{ "model": "claude-haiku-4-5", "provider_credential_id": "cred_an_1", "weight": 30 }
]
}
Weights are arbitrary positive numbers — modelux normalizes by the sum. 70/30,
7/3, and 0.7/0.3 are equivalent.
Rate-limit bypass with multiple keys
To fan out across multiple keys of the same provider, point several targets
at the same model with different provider_credential_ids and equal weights:
{
"strategy": "traffic_split",
"targets": [
{ "model": "gpt-4o", "provider_credential_id": "cred_oa_primary", "weight": 1 },
{ "model": "gpt-4o", "provider_credential_id": "cred_oa_secondary", "weight": 1 },
{ "model": "gpt-4o", "provider_credential_id": "cred_oa_tertiary", "weight": 1 }
]
}
Each key gets ~1/3 of the load, spreading you under per-key rate limits. Stickiness still applies — a conversation won’t bounce between keys mid-stream.
Verify
Check Logs for the routing.target_reason tag on recent decisions —
traffic_split targets are tagged weight N so you can confirm the chosen
target and, over time, that actual distribution matches configured weights.
Call it from your app
client.chat.completions.create(
model="@production", # the slug of the routing config
messages=[...],
extra_headers={
"X-Modelux-Conversation-Id": "conv_abc123", # pins the session
},
)
Without a conversation ID, pass X-Modelux-Trace-Id to keep one agent run
pinned; with neither, every request rolls fresh.