Setting up a fallback chain
Fallback chains are the simplest way to improve reliability. You define an ordered list of model attempts with per-attempt timeouts. If the first attempt fails (timeout, 429, or 5xx), Modelux automatically retries with the next model.
Why fallbacks?
- Every provider has outages, rate limits, and intermittent failures
- Different models fail at different times — a good fallback chain is rarely fully down
- Your app sees a consistent response — no retry logic to write
Create the config
In the dashboard, go to Routing -> Create config, pick Fallback, and add attempts:
{
"strategy": "fallback",
"attempts": [
{ "model": "claude-haiku-4-5", "timeout_ms": 2000 },
{ "model": "gpt-4o-mini", "timeout_ms": 3000 },
{ "model": "gemini-2.5-flash", "timeout_ms": 5000 }
],
"retry_on": ["429", "5xx", "timeout"]
}
Tips
Primary: fast, cheap, usually good
Put your preferred cheap+fast model first. Aggressive timeout (1-2s for simple prompts, longer for reasoning tasks).
Secondary: different provider
Diversify across providers — if OpenAI is down, Anthropic usually isn’t.
Tertiary: conservative timeout
Longer timeout on the last attempt; you’ve already burned latency budget on the first two.
Don’t chain too many
Three attempts is usually enough. More than that and you’ll hit your overall request timeout before the last attempts even run.
Verify
Check the decision trace of a few requests in Logs to see which attempt actually served each request. Over time, the Analytics page will show attempt distribution.
Call it from your app
client.chat.completions.create(
model="@production", # the slug of the routing config
messages=[...],
)