Cost optimization

LLM bills scale with usage. modelux gives you three levers to cut costs without code changes: smaller models for simpler traffic, ensembles of cheap models, and budget enforcement.

Lever 1: Right-size the model

Most teams over-provision. GPT-4o isn’t needed for a classification task that GPT-4o-mini handles fine.

Use a cost-optimized routing config:

{
  "strategy": "cost_optimized",
  "quality_tier": "standard",
  "allowed_models": [
    "gpt-4o-mini",
    "claude-haiku-4-5",
    "gemini-2.5-flash"
  ]
}

modelux picks the cheapest allowed model that meets the quality tier. You’ll see typical savings of 50-70% vs. using a frontier model for everything.

Lever 2: Ensembles

For quality-critical tasks where you’d otherwise reach for a frontier model, try a 3-model ensemble of cheap models. Voting across multiple cheap models often matches frontier-model quality at 20% of the cost.

See Ensembles for full configuration details.

Lever 3: Budget caps with auto-downgrade

Set a monthly budget with auto-downgrade:

Cap: $500/month on your production project
At 80%: email alert
At 100%: automatically downgrade all traffic to a cheaper routing config

Your app keeps serving requests — just at a lower cost — until the budget resets next month.

Lever 4: Caching

Enable exact-match caching on routing configs where prompts repeat. Cache hits cost $0. Even 10% cache hit rate on a high-volume endpoint pays for modelux many times over.

Measure the savings

Go to Analytics -> Period comparison. Set the baseline to the month before you enabled cost optimization. You’ll see period-over-period cost and volume diffs side-by-side.

A typical result

A team spending $10k/month on GPT-4o:

Adds a cost-optimized config routing 60% of traffic to gpt-4o-mini + claude-haiku-4-5
Keeps 40% on gpt-4o for complex queries
New monthly bill: ~$5,200
modelux cost: $199/month (Team tier)
Net savings: ~$4,600/month