modelux
$ modelux reliability

Stay up when providers don't.

modelux sits on the path of every LLM request your team sends. That means two things are non-negotiable: the proxy has to be fast, and it has to stay up even when the underlying providers don't. This page is how we do it.

Uptime target
99.95%
control plane and proxy data plane combined
Median failover time
180ms
from upstream timeout to fallback success
Provider coverage
14+
OpenAI, Anthropic, Google, Azure, Bedrock, Groq, Fireworks, DeepSeek, xAI, Mistral, Cerebras, Together, Perplexity, Cohere
Proxy overhead
< 5ms
p99 internal — see /performance for end-to-end
# upstream chaos

Designed to absorb upstream chaos.

Every major LLM provider has a bad day. modelux is built so those days don't reach your users — fallback chains route around degraded providers, health-aware routing shifts traffic before failure rates spike, and circuit breakers contain the blast radius.

Cross-provider fallback chains

Primary provider 429s, 5xxs, or misses its timeout? The router advances to the next attempt without your app knowing.

Health-aware routing

Continuous per-provider scoring on error rate and latency shifts traffic to healthier options before the failure rate spikes.

Per-attempt + total-budget timeouts

A retry storm can't turn a 2-second call into a 20-second call. Bounded latency is part of the contract.

Circuit breakers

A degraded provider, a slow analytics flush, or a noisy neighbor can't cascade. Each subsystem has its own breaker, timeout, and bulkhead.

designed throughput % by degradation level
0% 25% 50% 75% 100% 0 providers degraded 100% 1 provider degraded 100% 2 providers degraded 99.6% 3 providers degraded 87.0%
Designed throughput with a fallback chain configured. Single-provider degradations don't reach end users.
# failover

Fallback chains, by config.

Every production routing config should have a fallback chain. When the primary provider 429s, 5xxs, or misses its timeout, modelux advances to the next attempt without your app knowing. No retry loop in your code, no pager at 2am.

  • Per-attempt timeouts bound tail latency
  • Retries on 429, 5xx, and timeout
  • Cross-provider by default
  • Full decision trace per request
  • Simulate the chain against real traffic before shipping — see failover frequency and whether the total-budget timeout holds
@production json
{
  "strategy": "fallback",
  "attempts": [
    { "model": "claude-haiku-4-5",   "timeout_ms": 2000 },
    { "model": "gpt-4o-mini",        "timeout_ms": 3000 },
    { "model": "gemini-2.5-flash",   "timeout_ms": 5000 }
  ],
  "retry_on": ["429", "5xx", "timeout"],
  "total_budget_ms": 8000
}
# mechanics

The routing layer is the reliability layer.

Failover is not a bolt-on. It's built into how modelux picks a provider for every single request.

Fallback chains

Ordered lists of models with per-attempt timeouts. On a 429, 5xx, or timeout, the router advances to the next attempt — automatically, without a retry from your app.

Health-aware routing

Providers are continuously scored on error rate and latency. When a provider degrades, traffic shifts to healthier options before you'd notice from graphs.

Latency-optimized routing

Live p50 measurements per model and provider decide which candidate is fastest right now. Rankings update as conditions change.

Retries with budgets

Configurable retry policies bounded by a total-latency budget — so a retry storm can't turn a 2-second call into a 20-second call.

Streaming passthrough

Streaming responses are forwarded chunk-by-chunk. No buffering, no extra round-trip, no head-of-line blocking.

# what you get

What this actually means for your app.

Fewer incidents. Faster recovery when there is one. Changes you can make without holding your breath.

Provider outages don't become yours

When a provider has a bad hour, your fallback chain routes around it. Users don't see an error; you don't get paged.

Bad policy changes don't become incidents

Test a new fallback chain against up to 50,000 real requests from the last 7 days before it serves a single live one. You see failover frequency, p50/p95 latency, error-rate, and route-distribution diff — and a full decision trace per replayed request. Routing-only mode is free; with-responses mode adds embedding-based quality scoring.

Rollback is a button

Every config change is versioned. If something regresses, roll back to any prior version in one click — no redeploy, no engineering ticket.

Your keys stay yours

Bring your own provider credentials. modelux uses them only to make the call you asked for. No shared quota, no vendored keys, no surprise charges.

# sla

What we're engineering for.

These are the uptime targets the platform is designed to meet for the control plane and proxy data plane. They describe our own availability — upstream provider uptime is a separate number, and your fallback chain is what absorbs it.

Plan Uptime target Support
Free Best-effort Community
Pro 99.9% Email
Team 99.9% Priority
Enterprise 99.95% Dedicated

Make outages someone else's problem.

Configure a fallback chain in five minutes. The next time an upstream provider has a bad hour, your users won't be the ones to find out.