Cost optimization
LLM bills scale with usage. Modelux gives you three levers to cut costs without code changes: smaller models for simpler traffic, ensembles of cheap models, and budget enforcement.
Lever 1: Right-size the model
Most teams over-provision. GPT-4o isn’t needed for a classification task that GPT-4o-mini handles fine.
Use a cost-optimized routing config:
{
"strategy": "cost_optimized",
"quality_tier": "standard",
"allowed_models": [
"gpt-4o-mini",
"claude-haiku-4-5",
"gemini-2.5-flash"
]
}
Modelux picks the cheapest allowed model that meets the quality tier. You’ll see typical savings of 50-70% vs. using a frontier model for everything.
Lever 2: Ensembles
For quality-critical tasks where you’d otherwise reach for a frontier model, try a 3-model ensemble of cheap models. Voting across multiple cheap models often matches frontier-model quality at 20% of the cost.
See Ensembles for full configuration details.
Lever 3: Budget caps with auto-downgrade
Set a monthly budget with auto-downgrade:
- Cap: $500/month on your production project
- At 80%: email alert
- At 100%: automatically downgrade all traffic to a cheaper routing config
Your app keeps serving requests — just at a lower cost — until the budget resets next month.
Lever 4: Caching
Enable exact-match caching on routing configs where prompts repeat. Cache hits cost $0. Even 10% cache hit rate on a high-volume endpoint pays for Modelux many times over.
Measure the savings
Go to Analytics -> Period comparison. Set the baseline to the month before you enabled cost optimization. You’ll see period-over-period cost and volume diffs side-by-side.
A typical result
A team spending $10k/month on GPT-4o:
- Adds a cost-optimized config routing 60% of traffic to
gpt-4o-mini+claude-haiku-4-5 - Keeps 40% on
gpt-4ofor complex queries - New monthly bill: ~$5,200
- Modelux cost: $199/month (Team tier)
- Net savings: ~$4,600/month