13× cheaper than GPT-5. Same accuracy.
Most LLM bills are over-provisioned. The frontier model gets used for traffic the cheap model would've answered correctly. modelux is the routing layer that fixes that — without you rewriting your app or your prompts.
Same accuracy. A fraction of the bill.
From the modelux ensembles benchmark: 150 real cases pulled from
MMLU, GSM8K, and TriviaQA. Same prompts. Same scoring. The
ensemble of gpt-4.1-mini +
claude-haiku with confidence routing
ties Claude Sonnet on accuracy and beats GPT-5 — at a tenth of
the spend.
- ▸ 74% accuracy — same as Sonnet, beats GPT-5 (73%)
- ▸ $0.04 vs $0.27 vs $0.54 for the same 150 cases
- ▸ Configurable as a one-line routing strategy
Three levers. Compose any combination.
Each one is a configuration change, not a code change. Combine them and the savings compound multiplicatively.
Ensembles
Run two or three small models in parallel. Pick the consensus answer. Match Sonnet at 6× lower cost. The latency stays bounded by the slowest small model — not the sum.
Fallback downgrades
Mark a cheap model as the primary, a frontier model as the safety net. Most traffic resolves on the cheap one; the rest spills to frontier. You never pay frontier rates for traffic that didn't need them.
Semantic cache
Embedding-keyed cache returns identical-meaning answers in under a millisecond, at zero provider cost. The hit rate compounds across users and sessions — repeat questions stop costing money.
The cheap models aren't bad. The expensive ones aren't always worth it.
Cheap → expensive, list prices as of April 2026. The blended column assumes a typical chat mix (75% output cost). Routing in modelux can pick the cheapest model that meets your quality bar per request.
| Model | Provider | Input / 1M | Output / 1M | Blended |
|---|---|---|---|---|
| gpt-4o-mini | openai | $0.15 | $0.60 | $1.95 |
| gpt-5-mini | openai | $0.25 | $2.00 | $6.25 |
| gpt-4.1-mini | openai | $0.40 | $1.60 | $5.20 |
| claude-haiku-4-5 | anthropic | $1.00 | $5.00 | $16.00 |
| gpt-4.1 | openai | $2.00 | $8.00 | $26.00 |
| gpt-5 | openai | $1.25 | $10.00 | $31.25 |
| claude-sonnet-4-5 | anthropic | $3.00 | $15.00 | $48.00 |
| gpt-4o | openai | $2.50 | $10.00 | $32.50 |
| gpt-5.4 | openai | $2.50 | $15.00 | $47.50 |
| claude-opus-4-5 | anthropic | $5.00 | $25.00 | $80.00 |
| gpt-5-pro | openai | $15.00 | $120.00 | $375.00 |
| gpt-5.4-pro | openai | $30.00 | $180.00 | $570.00 |
Hard caps. Enforced before the call leaves the proxy.
Set a daily, weekly, or monthly cap per project, key, or end-user. modelux checks the budget in under 5ms before every request and rejects with a clean error when the cap is hit. No surprise bills. No "we'll true up next month."
- ▸ Per-project, per-key, or per-end-user caps
- ▸ Soft warnings (Slack / email / webhook) before the hard cap hits
- ▸ Atomic enforcement on the hot path — no race conditions, no overshoot
{
"scope": "project",
"period": "monthly",
"limit_usd": 500,
"warn_at_pct": [50, 80, 95],
"warn_to": ["slack:#ops", "webhook:billing"],
"on_limit": "reject"
} Projected savings aren't good enough? Measure them.
The ROI calculator gives you a ballpark. An experiment gives you the number. Replay up to 50,000 real requests from your own logs against a cheaper candidate config — routing-only mode is free, projects cost and latency from each candidate's real token counts and per-provider health metrics. When the candidate also changes response quality, flip to with-responses mode: modelux actually calls the candidate model and scores each response against the baseline with embedding similarity.
- ▸ Routing-only — $0 provider spend, projected numbers, full decision trace per row
- ▸ With-responses — measured cost, measured latency, cosine-similarity score per pair
- ▸ Per-experiment spend cap + auto-cancel on overrun
- ▸ Promote the winning candidate to a versioned production config in one click
{
"id": "sim_8f3c…",
"window": { "last_days": 7 },
"requests": 14218,
"mode": "routing_only",
"baseline": {
"cost_usd": 412.88
},
"candidate": {
"cost_usd": 178.40, // −56.8%
"route_distribution": {
"gpt-4o-mini": 0.72,
"claude-haiku-4-5": 0.23,
"gpt-4o": 0.05
}
}
} Plug your numbers in.
The ROI calculator takes your monthly LLM spend, the share of traffic that could downgrade, and the expected cost ratio — and shows you what modelux saves. Free tier gets you live in five minutes, then a routing-only experiment replays your real traffic against the candidate so you replace the estimate with a measurement. No card.