modelux
$ modelux --control-plane

One control plane for every LLM.
Route, cap, trace.

Smart routing, spend caps you can trust, and a full trace of every call — across every provider, without changing your code.

quickstart.py python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.modelux.ai/v1",
    api_key="mlx_sk_..."
)

response = client.chat.completions.create(
    model="@production",          # your routing config
    messages=[{"role": "user", "content": "Hello!"}]
)
Uptime target
99.9%+
multi-provider, multi-region
Proxy overhead
< 5ms
p99, excluding provider call
Failover
sub-second
health-aware + circuit breakers
Providers
7+
OpenAI, Anthropic, Google, Azure, Bedrock, Groq, Fireworks
# pillars

One plane. Six jobs.

Most LLM tools do one of these well. Modelux does all six, on the same data, with the same primitives.

01. Routing

Policy-driven routing

Fallback chains, cost-optimized, latency-optimized, ensembles, A/B tests, cascades, and a custom rule DSL across every provider. Define once, change without redeploying.

  • Eight built-in strategies
  • Custom rule DSL over cost / latency / budget / tags
  • Versioned configs with one-click rollback
Learn more →
02. Budgets

Spend caps that hold

Set a ceiling per project, per tag, or per end-user. Modelux auto-downgrades to a cheaper model or blocks the call before you blow past it — so a runaway loop doesn't become a runaway bill.

  • Budgets per project, tag, or end-user
  • Auto-downgrade or block at cap
  • Every config change logged
Learn more →
03. Observability

Decision-level traces

Every request stores the full routing decision: attempts tried, reasons, per-attempt timings and costs. Searchable by tag. Latency percentiles per model. Per-request cost attribution.

  • Decision trace per request
  • Per-tag analytics, cost forecasting
  • Export to S3 / Parquet when you want it
Learn more →
04. Replay

Replay before you ship

Take historical traffic and run it against a candidate routing config. See cost, latency, and success-rate diff before flipping production. Promote the winner with one audited click.

  • Up to 24h of traffic, replayed
  • Side-by-side diff on cost / latency / success
  • Promote with audited version bump
Learn more →
05. Reliability

Stay up when providers don't

Any single provider goes down a few times a year. Modelux routes around outages automatically — users don't see an error, you don't get paged, nothing changes in your app code.

  • Automatic failover across providers
  • Rollback a bad config in one click
  • Test policy changes on real traffic first
Learn more →
06. Performance

Thin proxy. No added slowness.

Modelux sits between your app and every LLM call, so it stays out of the way. Streaming responses come through chunk-by-chunk; routing decisions use live latency measurements to pick whichever provider is fastest right now.

  • Streaming passes through unbuffered
  • Routes to the provider that's fastest now
  • Sub-5ms target overhead per request
Learn more →
# how it works

From zero to routing in under a minute.

01

Add a provider

Paste your OpenAI, Anthropic, or Google API key. Modelux proxies the request through your own credentials — BYO keys, no markup.

02

Configure routing

Pick a strategy: single model, fallback chain, cost-optimized, ensemble, A/B test. Or write a custom rule DSL for complex logic.

03

Point your app

Change your base_url to https://api.modelux.ai/v1 and use a modelux API key. That's it. Existing OpenAI SDK code just works.

# routing

Change providers with a config, not a deploy.

Every routing config lives at a stable @slug. Your app calls @production — we handle the rest. Swap models, add fallbacks, run A/B tests, build ensembles. Version-controlled, replayable, rollback-able.

  • Fallback chains with per-provider timeouts
  • Cost-optimized with quality tiers
  • Ensembles with parallel aggregation
  • A/B tests with percentage rollouts
  • Custom rule DSL over cost, latency, budget
@production json
{
  "strategy": "fallback",
  "attempts": [
    { "model": "claude-haiku-4-5",   "timeout_ms": 2000 },
    { "model": "gpt-4o-mini",        "timeout_ms": 3000 },
    { "model": "gemini-2.5-flash",   "timeout_ms": 5000 }
  ],
  "retry_on": ["429", "5xx", "timeout"]
}
modelux logs --trace trace
$ modelux logs --trace req_a1b2c3
[modelux] req_a1b2c3   gpt-4o-mini   238ms  $0.002134  200

  attempt_1  claude-haiku-4-5   timeout (2000ms)
  attempt_2  gpt-4o-mini        200 OK  1,247 tokens

  decision:  fallback -> attempt_2
  reason:    primary timeout, secondary healthy
  cost:      input $0.000187 + output $0.001947
  latency:   tts_138ms  ttlt_238ms
  project:   playwright-test
  config:    @production (v4)
# observability

Every request logged. Every decision explained.

Not just request/response capture — Modelux records the full routing decision trace: which attempts ran, why the router picked what it picked, per-attempt latency, costs, errors. Replay historical traffic against a new config before you ship it.

  • Searchable request logs with structured tags
  • Per-request cost attribution
  • Latency percentiles per model/provider
  • Replay simulator with cost diffs
# pricing

Flat tiers. No per-token markup.

You pay providers directly with your own keys. Modelux charges a flat fee for the control plane — no per-token markup.

Free
$0 forever

10k req/month. 1 project. Single + fallback routing.

Pro
$49 / month

100k req/month. 5 projects. All routing policies.

Team
$199 / month

1M req/month. Unlimited projects. Team roles.

Enterprise
Custom

SSO, SAML, audit logs, dedicated support & SLA.

$ modelux init

Ship faster. Spend less.

Free tier. No credit card. 2-minute setup.

Start building