modelux
$ modelux reliability

Stay up when providers don't.

Modelux sits on the path of every LLM request your team sends. That means two things are non-negotiable: the proxy has to be fast, and it has to stay up even when the underlying providers don't. This page is how we do it. Last updated 2026-04-14.

Uptime target
99.9%+
across the control plane and proxy data plane
Proxy overhead
< 5ms
p99, Go proxy, excluding provider call
Failover
sub-second
automatic on provider errors or timeouts
Streaming
chunk-by-chunk
SSE passthrough, no buffering
# failover

Fallback chains, by config.

Every production routing config should have a fallback chain. When the primary provider 429s, 5xxs, or misses its timeout, Modelux advances to the next attempt without your app knowing. No retry loop in your code, no pager at 2am.

  • Per-attempt timeouts bound tail latency
  • Retries on 429, 5xx, and timeout
  • Cross-provider by default
  • Full decision trace per request
@production json
{
  "strategy": "fallback",
  "attempts": [
    { "model": "claude-haiku-4-5",   "timeout_ms": 2000 },
    { "model": "gpt-4o-mini",        "timeout_ms": 3000 },
    { "model": "gemini-2.5-flash",   "timeout_ms": 5000 }
  ],
  "retry_on": ["429", "5xx", "timeout"],
  "total_budget_ms": 8000
}
# mechanics

The routing layer is the reliability layer.

Failover is not a bolt-on. It's built into how Modelux picks a provider for every single request.

Fallback chains

Ordered lists of models with per-attempt timeouts. On a 429, 5xx, or timeout, the router advances to the next attempt — automatically, without a retry from your app.

Health-aware routing

Providers are continuously scored on error rate and latency. When a provider degrades, traffic shifts to healthier options before you'd notice from graphs.

Latency-optimized routing

Live p50 measurements per model and provider decide which candidate is fastest right now. Rankings update as conditions change.

Retries with budgets

Configurable retry policies bounded by a total-latency budget — so a retry storm can't turn a 2-second call into a 20-second call.

Streaming passthrough

Streaming responses are forwarded chunk-by-chunk. No buffering, no extra round-trip, no head-of-line blocking.

# what you get

What this actually means for your app.

Fewer incidents. Faster recovery when there is one. Changes you can make without holding your breath.

Provider outages don't become yours

When a provider has a bad hour, your fallback chain routes around it. Users don't see an error; you don't get paged.

Bad policy changes don't become incidents

Test a new routing config against the last 24 hours of real traffic before it serves a single live request. See the cost, latency, and success-rate diff up front.

Rollback is a button

Every config change is versioned. If something regresses, roll back to any prior version in one click — no redeploy, no engineering ticket.

Your keys stay yours

Bring your own provider credentials. Modelux uses them only to make the call you asked for. No shared quota, no vendored keys, no surprise charges.

# sla

What we're engineering for.

These are the uptime targets the platform is designed to meet for the control plane and proxy data plane. They describe our own availability — upstream provider uptime is a separate number, and your fallback chain is what absorbs it.

Plan Uptime target Contractual SLA Support
Free Best-effort Community
Pro 99.9% Email
Team 99.9% Priority
Enterprise 99.95% Signed Dedicated

Need a custom SLA or dedicated capacity?

Enterprise customers can negotiate tighter uptime targets, reserved capacity, regional endpoints, and a signed SLA with credits. Email sales@modelux.ai and we'll get back within one business day.