Stay up when providers don't.
Modelux sits on the path of every LLM request your team sends. That means two things are non-negotiable: the proxy has to be fast, and it has to stay up even when the underlying providers don't. This page is how we do it. Last updated 2026-04-14.
Fallback chains, by config.
Every production routing config should have a fallback chain. When the primary provider 429s, 5xxs, or misses its timeout, Modelux advances to the next attempt without your app knowing. No retry loop in your code, no pager at 2am.
- ▸ Per-attempt timeouts bound tail latency
- ▸ Retries on 429, 5xx, and timeout
- ▸ Cross-provider by default
- ▸ Full decision trace per request
{
"strategy": "fallback",
"attempts": [
{ "model": "claude-haiku-4-5", "timeout_ms": 2000 },
{ "model": "gpt-4o-mini", "timeout_ms": 3000 },
{ "model": "gemini-2.5-flash", "timeout_ms": 5000 }
],
"retry_on": ["429", "5xx", "timeout"],
"total_budget_ms": 8000
} The routing layer is the reliability layer.
Failover is not a bolt-on. It's built into how Modelux picks a provider for every single request.
Fallback chains
Ordered lists of models with per-attempt timeouts. On a 429, 5xx, or timeout, the router advances to the next attempt — automatically, without a retry from your app.
Health-aware routing
Providers are continuously scored on error rate and latency. When a provider degrades, traffic shifts to healthier options before you'd notice from graphs.
Latency-optimized routing
Live p50 measurements per model and provider decide which candidate is fastest right now. Rankings update as conditions change.
Retries with budgets
Configurable retry policies bounded by a total-latency budget — so a retry storm can't turn a 2-second call into a 20-second call.
Streaming passthrough
Streaming responses are forwarded chunk-by-chunk. No buffering, no extra round-trip, no head-of-line blocking.
What this actually means for your app.
Fewer incidents. Faster recovery when there is one. Changes you can make without holding your breath.
Provider outages don't become yours
When a provider has a bad hour, your fallback chain routes around it. Users don't see an error; you don't get paged.
Bad policy changes don't become incidents
Test a new routing config against the last 24 hours of real traffic before it serves a single live request. See the cost, latency, and success-rate diff up front.
Rollback is a button
Every config change is versioned. If something regresses, roll back to any prior version in one click — no redeploy, no engineering ticket.
Your keys stay yours
Bring your own provider credentials. Modelux uses them only to make the call you asked for. No shared quota, no vendored keys, no surprise charges.
What we're engineering for.
These are the uptime targets the platform is designed to meet for the control plane and proxy data plane. They describe our own availability — upstream provider uptime is a separate number, and your fallback chain is what absorbs it.
| Plan | Uptime target | Contractual SLA | Support |
|---|---|---|---|
| Free | Best-effort | — | Community |
| Pro | 99.9% | — | |
| Team | 99.9% | — | Priority |
| Enterprise | 99.95% | Signed | Dedicated |
Need a custom SLA or dedicated capacity?
Enterprise customers can negotiate tighter uptime targets, reserved capacity, regional endpoints, and a signed SLA with credits. Email sales@modelux.ai and we'll get back within one business day.