$ modelux --control-plane

One control plane for every LLM.
Route, cap, trace.

Smart routing, spend caps you can trust, and a full trace of every call — across every provider, without changing your code.

Get started free Read the quickstart →

quickstart.py python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.modelux.ai/v1",
    api_key="mlx_sk_..."
)

response = client.chat.completions.create(
    model="@production",          # your routing config
    messages=[{"role": "user", "content": "Hello!"}]
)

# who's using modelux

App developers Coding agents Agent builders Platform teams AI startups

# by the numbers

Performance

240ms

TTFT through modelux, p50

See the numbers →

Cost

13× cheaper

ensemble vs single GPT-5 — same accuracy

See the numbers →

Scale

25k+ req/s

per proxy pod, p99 under 70ms

See the numbers →

Reliability

99.95%

uptime target, multi-provider failover by default

See the numbers →

# pillars

One plane. Six jobs.

Most LLM tools do one of these well. modelux does all six, on the same data, with the same primitives.

01. Routing

Policy-driven routing

Fallback chains, cost-optimized, latency-optimized, ensembles, A/B tests, cascades, and a custom rule DSL across every provider. Define once, change without redeploying.

▸Eight built-in strategies
▸Custom rule DSL over cost / latency / budget / tags
▸Versioned configs with one-click rollback

Learn more →

02. Budgets

Spend caps that hold

Set a ceiling per project, per tag, or per end-user. modelux auto-downgrades to a cheaper model or blocks the call before you blow past it — so a runaway loop doesn't become a runaway bill.

▸Budgets per project, tag, or end-user
▸Auto-downgrade or block at cap
▸Every config change logged

Learn more →

03. Observability

Decision-level traces

Every request stores the full routing decision: attempts tried, reasons, per-attempt timings and costs. Searchable by tag. Latency percentiles per model. Per-request cost attribution.

▸Decision trace per request
▸Per-tag analytics, cost forecasting
▸Export to S3 / Parquet when you want it

Learn more →

04. Replay

Replay before you ship

Take historical traffic and run it against a candidate routing config. See cost, latency, and success-rate diff before flipping production. Promote the winner with one audited click.

▸Up to 24h of traffic, replayed
▸Side-by-side diff on cost / latency / success
▸Promote with audited version bump

Learn more →

05. Reliability

Stay up when providers don't

Any single provider goes down a few times a year. modelux routes around outages automatically — users don't see an error, you don't get paged, nothing changes in your app code.

▸Automatic failover across providers
▸Rollback a bad config in one click
▸Test policy changes on real traffic first

Learn more →

06. Performance

Thin proxy. No added slowness.

modelux sits between your app and every LLM call, so it stays out of the way. Streaming responses come through chunk-by-chunk; routing decisions use live latency measurements to pick whichever provider is fastest right now.

▸Streaming passes through unbuffered
▸Routes to the provider that's fastest now
▸Sub-5ms target overhead per request

Learn more →

# how it works

From zero to routing in under a minute.

Add a provider

Paste your OpenAI, Anthropic, or Google API key. modelux proxies the request through your own credentials — BYO keys, no markup.

Configure routing

Pick a strategy: single model, fallback chain, cost-optimized, ensemble, A/B test. Or write a custom rule DSL for complex logic.

Point your app

Change your base_url to https://api.modelux.ai/v1 and use a modelux API key. That's it. Existing OpenAI SDK code just works.

# routing

Change providers with a config, not a deploy.

Every routing config lives at a stable @slug. Your app calls @production — we handle the rest. Swap models, add fallbacks, run A/B tests, build ensembles. Version-controlled, replayable, rollback-able.

▸ Fallback chains with per-provider timeouts
▸ Cost-optimized with quality tiers
▸ Ensembles with parallel aggregation
▸ A/B tests with percentage rollouts
▸ Custom rule DSL over cost, latency, budget

@production json

{
  "strategy": "fallback",
  "attempts": [
    { "model": "claude-haiku-4-5",   "timeout_ms": 2000 },
    { "model": "gpt-4o-mini",        "timeout_ms": 3000 },
    { "model": "gemini-2.5-flash",   "timeout_ms": 5000 }
  ],
  "retry_on": ["429", "5xx", "timeout"]
}

modelux logs --trace trace

$ modelux logs --trace req_a1b2c3
[modelux] req_a1b2c3   gpt-4o-mini   238ms  $0.002134  200

  attempt_1  claude-haiku-4-5   timeout (2000ms)
  attempt_2  gpt-4o-mini        200 OK  1,247 tokens

  decision:  fallback -> attempt_2
  reason:    primary timeout, secondary healthy
  cost:      input $0.000187 + output $0.001947
  latency:   tts_138ms  ttlt_238ms
  project:   playwright-test
  config:    @production (v4)

# observability

Every request logged. Every decision explained.

Not just request/response capture — modelux records the full routing decision trace: which attempts ran, why the router picked what it picked, per-attempt latency, costs, errors. Replay historical traffic against a new config before you ship it.

▸ Searchable request logs with structured tags
▸ Per-request cost attribution
▸ Latency percentiles per model/provider
▸ Replay experiments with cost diffs

# pricing

Flat tiers. No per-token markup.

You pay providers directly with your own keys. modelux charges a flat fee for the control plane — no per-token markup.

Free

$0 forever

10k req/month. 1 project. Single + fallback routing.

Pro

$49 / month

100k req/month. 5 projects. All routing policies.

Team

$199 / month

1M req/month. Unlimited projects. Team roles.

Enterprise

Custom

SSO, SAML, audit logs, dedicated support & SLA.

See full pricing →

$ modelux init █

Ship faster. Spend less.

Free tier. No credit card. 2-minute setup.

Start building

One control plane for every LLM. Route, cap, trace.

One plane. Six jobs.

Policy-driven routing

Spend caps that hold

Decision-level traces

Replay before you ship

Stay up when providers don't

Thin proxy. No added slowness.

From zero to routing in under a minute.

Add a provider

Configure routing

Point your app

Change providers with a config, not a deploy.

Every request logged. Every decision explained.

Flat tiers. No per-token markup.

Ship faster. Spend less.

One control plane for every LLM.
Route, cap, trace.