One control plane for every LLM.
Route, cap, trace.
Smart routing, spend caps you can trust, and a full trace of every call — across every provider, without changing your code.
from openai import OpenAI
client = OpenAI(
base_url="https://api.modelux.ai/v1",
api_key="mlx_sk_..."
)
response = client.chat.completions.create(
model="@production", # your routing config
messages=[{"role": "user", "content": "Hello!"}]
) One plane. Six jobs.
Most LLM tools do one of these well. Modelux does all six, on the same data, with the same primitives.
Policy-driven routing
Fallback chains, cost-optimized, latency-optimized, ensembles, A/B tests, cascades, and a custom rule DSL across every provider. Define once, change without redeploying.
- ▸Eight built-in strategies
- ▸Custom rule DSL over cost / latency / budget / tags
- ▸Versioned configs with one-click rollback
Spend caps that hold
Set a ceiling per project, per tag, or per end-user. Modelux auto-downgrades to a cheaper model or blocks the call before you blow past it — so a runaway loop doesn't become a runaway bill.
- ▸Budgets per project, tag, or end-user
- ▸Auto-downgrade or block at cap
- ▸Every config change logged
Decision-level traces
Every request stores the full routing decision: attempts tried, reasons, per-attempt timings and costs. Searchable by tag. Latency percentiles per model. Per-request cost attribution.
- ▸Decision trace per request
- ▸Per-tag analytics, cost forecasting
- ▸Export to S3 / Parquet when you want it
Replay before you ship
Take historical traffic and run it against a candidate routing config. See cost, latency, and success-rate diff before flipping production. Promote the winner with one audited click.
- ▸Up to 24h of traffic, replayed
- ▸Side-by-side diff on cost / latency / success
- ▸Promote with audited version bump
Stay up when providers don't
Any single provider goes down a few times a year. Modelux routes around outages automatically — users don't see an error, you don't get paged, nothing changes in your app code.
- ▸Automatic failover across providers
- ▸Rollback a bad config in one click
- ▸Test policy changes on real traffic first
Thin proxy. No added slowness.
Modelux sits between your app and every LLM call, so it stays out of the way. Streaming responses come through chunk-by-chunk; routing decisions use live latency measurements to pick whichever provider is fastest right now.
- ▸Streaming passes through unbuffered
- ▸Routes to the provider that's fastest now
- ▸Sub-5ms target overhead per request
From zero to routing in under a minute.
Add a provider
Paste your OpenAI, Anthropic, or Google API key. Modelux proxies the request through your own credentials — BYO keys, no markup.
Configure routing
Pick a strategy: single model, fallback chain, cost-optimized, ensemble, A/B test. Or write a custom rule DSL for complex logic.
Point your app
Change your base_url to https://api.modelux.ai/v1 and use a modelux API key. That's it. Existing OpenAI SDK code just works.
Change providers with a config, not a deploy.
Every routing config lives at a stable @slug.
Your app calls @production — we
handle the rest. Swap models, add fallbacks, run A/B tests, build
ensembles. Version-controlled, replayable, rollback-able.
- ▸ Fallback chains with per-provider timeouts
- ▸ Cost-optimized with quality tiers
- ▸ Ensembles with parallel aggregation
- ▸ A/B tests with percentage rollouts
- ▸ Custom rule DSL over cost, latency, budget
{
"strategy": "fallback",
"attempts": [
{ "model": "claude-haiku-4-5", "timeout_ms": 2000 },
{ "model": "gpt-4o-mini", "timeout_ms": 3000 },
{ "model": "gemini-2.5-flash", "timeout_ms": 5000 }
],
"retry_on": ["429", "5xx", "timeout"]
} $ modelux logs --trace req_a1b2c3 [modelux] req_a1b2c3 gpt-4o-mini 238ms $0.002134 200 attempt_1 claude-haiku-4-5 timeout (2000ms) attempt_2 gpt-4o-mini 200 OK 1,247 tokens decision: fallback -> attempt_2 reason: primary timeout, secondary healthy cost: input $0.000187 + output $0.001947 latency: tts_138ms ttlt_238ms project: playwright-test config: @production (v4)
Every request logged. Every decision explained.
Not just request/response capture — Modelux records the full routing decision trace: which attempts ran, why the router picked what it picked, per-attempt latency, costs, errors. Replay historical traffic against a new config before you ship it.
- ▸ Searchable request logs with structured tags
- ▸ Per-request cost attribution
- ▸ Latency percentiles per model/provider
- ▸ Replay simulator with cost diffs
Flat tiers. No per-token markup.
You pay providers directly with your own keys. Modelux charges a flat fee for the control plane — no per-token markup.
10k req/month. 1 project. Single + fallback routing.
100k req/month. 5 projects. All routing policies.
1M req/month. Unlimited projects. Team roles.
SSO, SAML, audit logs, dedicated support & SLA.