$ modelux for app developers

Your app calls an LLM. Don't let it be the weakest link.

You wrote client.chat.completions.create(...) six months ago and haven't touched it since. Meanwhile your cost doubled, your latency got worse, and you still can't tell your CFO what feature X cost last month. modelux fixes that without changing a line of your code.

Get started free Read the migration guide →

# two-line migration

Change these two lines. Keep everything else.

Same OpenAI SDK. Same call sites. Same streaming, same tool calling, same structured outputs. You get fallbacks, analytics, and cost attribution for free.

your-app.py diff

  from openai import OpenAI

  client = OpenAI(
-     api_key=os.environ["OPENAI_API_KEY"],
+     base_url="https://api.modelux.ai/v1",
+     api_key=os.environ["MODELUX_API_KEY"],
  )

# sound familiar?

The problems that show up once LLM traffic is real.

> problem

OpenAI is down mid-launch

Your demo hits a 503 and recovery takes hours. You ship the same code path for every user.

> problem

Costs are unpredictable

A prompt-engineering tweak doubles your bill overnight. Finance asks why; you open a spreadsheet.

> problem

Model tests require a deploy

Trying GPT-4o-mini vs Haiku on your summarization endpoint means a feature flag, a PR, and a rollout.

> problem

'Why was this slow?' is unanswerable

A user complains the response took 8 seconds. You have no per-request trace, no provider-level latency history, no way to know if it was the model or the network.

# with modelux

Each problem has a config-driven fix.

> solution

Fallbacks fix reliability in five minutes

Define a fallback chain once. Your app calls @production forever. modelux retries across providers when one is degraded. You stop paging at 2am.

> solution

Cost optimization without code changes

Flip traffic from gpt-4o to a cost-optimized config that routes 60% of requests to cheaper models meeting the same quality tier. Typical savings: 40-60%.

> solution

A/B test models in production

Route 10% of traffic to a candidate config. Compare cost, latency, and your own quality signals. Promote the winner with one click.

> solution

Full decision traces

Every request stores the full routing trace: which attempts ran, why, per-attempt latency, cost. The "why was this slow" question has an answer now.

# example

A reliability config you can ship tonight.

Three attempts across three providers. When the primary is slow or rate-limited, modelux retries with the next one. Your app sees a single successful response every time; your p99 latency drops; your "LLM down" Slack channel goes quiet.

▸ Per-attempt timeouts
▸ Auto-retry on 429 / 5xx / timeout
▸ Full decision trace on every request
▸ Zero code changes in your app

@production json

{
  "strategy": "fallback",
  "attempts": [
    { "model": "claude-haiku-4-5",   "timeout_ms": 2000 },
    { "model": "gpt-4o-mini",        "timeout_ms": 3000 },
    { "model": "gemini-2.5-flash",   "timeout_ms": 5000 }
  ],
  "retry_on": ["429", "5xx", "timeout"]
}

# related

Reliability

Stay up when providers don't.

The fallback chains and health-aware routing your end users will never notice — multi-provider failover by default.

Cost

13× cheaper than GPT-5.

Why over-provisioning the frontier model is the easy 50% off your bill — and how ensembles match Sonnet at 6× lower spend.

Two-minute migration. Free tier. No credit card.

Free tier covers 10k requests per month — enough to prove it works before anyone asks you to justify another vendor.

Get started free See pricing →