[view as .md]

Setting up a fallback chain

Fallback chains are the simplest way to improve reliability. You define an ordered list of model attempts with per-attempt timeouts. If the first attempt fails (timeout, 429, or 5xx), Modelux automatically retries with the next model.

Why fallbacks?

  • Every provider has outages, rate limits, and intermittent failures
  • Different models fail at different times — a good fallback chain is rarely fully down
  • Your app sees a consistent response — no retry logic to write

Create the config

In the dashboard, go to Routing -> Create config, pick Fallback, and add attempts:

{
  "strategy": "fallback",
  "attempts": [
    { "model": "claude-haiku-4-5",   "timeout_ms": 2000 },
    { "model": "gpt-4o-mini",        "timeout_ms": 3000 },
    { "model": "gemini-2.5-flash",   "timeout_ms": 5000 }
  ],
  "retry_on": ["429", "5xx", "timeout"]
}

Tips

Primary: fast, cheap, usually good

Put your preferred cheap+fast model first. Aggressive timeout (1-2s for simple prompts, longer for reasoning tasks).

Secondary: different provider

Diversify across providers — if OpenAI is down, Anthropic usually isn’t.

Tertiary: conservative timeout

Longer timeout on the last attempt; you’ve already burned latency budget on the first two.

Don’t chain too many

Three attempts is usually enough. More than that and you’ll hit your overall request timeout before the last attempts even run.

Verify

Check the decision trace of a few requests in Logs to see which attempt actually served each request. Over time, the Analytics page will show attempt distribution.

Call it from your app

client.chat.completions.create(
    model="@production",   # the slug of the routing config
    messages=[...],
)