modelux
$ modelux features

Everything you need to run LLMs in production.

Gateway features to keep your stack flexible. Control plane features to keep it accountable. Built for engineering teams who treat LLMs like any other production dependency.

# routing

Eight routing strategies. All config, no code.

Every routing config is a named, versioned resource. Your app calls @production and doesn't care what's underneath.

strategy

Single model

Lock traffic to a specific model + provider. The simplest routing config.

strategy

Fallback chain

Ordered list of models with per-attempt timeouts. Auto-retry on 429, 5xx, and timeouts.

strategy

Cost-optimized

Pick the cheapest model that meets your quality tier. Allowlist specific models.

strategy

Latency-optimized

Route based on real-time p50 latency measurements across healthy providers.

strategy

Ensemble

Fan out to multiple models in parallel. Aggregate with voting, first-valid, or weighted consensus.

strategy

A/B test

Split traffic by percentage between configs. Compare quality and cost in production.

strategy

Cascade

Sequential attempts with early stop on success. Useful for quality-tier fallbacks.

strategy

Custom rules

DSL over cost, latency, budget, and tags. Programmable policies for complex traffic.

# ensembles

Match frontier quality at 20% of the cost.

Run multiple smaller models in parallel, aggregate their outputs. Research shows 3-model ensembles of small open-weights models match or exceed frontier single-model quality for most tasks — at a fraction of the cost.

  • Parallel fan-out with per-attempt timeouts
  • Aggregation: voting, first-valid, weighted
  • Weight tuning per-model
  • Live cost estimate in the builder UI
@quality-ensemble json
{
  "strategy": "ensemble",
  "aggregation": "weighted_vote",
  "members": [
    { "model": "claude-haiku-4-5",    "weight": 1.0 },
    { "model": "gpt-4o-mini",         "weight": 1.0 },
    { "model": "gemini-2.5-flash",    "weight": 0.8 }
  ],
  "timeout_ms": 5000
}
# control plane

Ship changes with confidence.

The gateway is table stakes. The control plane is what makes Modelux different: budgets, replay, explainability, audit, versioning.

Budgets & spend governance

Set per-project, per-tag, or org-wide spend caps. Auto-downgrade to cheaper models near the cap. Alerts at 80% and 100%.

Replay simulator

Take 24 hours of historical requests and replay them against a new routing config. Compare cost, latency, and success rate before shipping.

Decision explainability

Every request stores a full decision trace: which attempts ran, why the router picked what it picked, per-attempt cost and latency.

Audit log

Every config change, API key action, and team event is logged. Searchable. Exportable. Required for SOC 2.

Config versioning

Every routing config change creates a version. Diff changes, rollback with one click, promote from simulation results.

Webhooks

Subscribe to events: config changes, budget alerts, provider health changes, request anomalies. Signed HMAC payloads.

# reliability & performance

Stay up when providers don't.

The routing layer is the reliability layer. Failover, health scoring, and circuit breakers are built in — not a bolt-on. See the full reliability page →

Multi-provider failover

Fallback chains with per-attempt timeouts. Automatic retries on 429, 5xx, and timeout — no retry loop in your app.

Health-aware routing

Providers scored continuously on error rate and latency. Traffic shifts to healthier options before you'd notice from graphs.

Per-attempt timeouts

Each fallback attempt has its own timeout budget. A slow primary can't blow up your tail latency — the router moves on.

Streaming passthrough

Go proxy engineered for low overhead. Streaming responses forwarded chunk-by-chunk; no buffering, no head-of-line blocking.

# observability

Know what's happening. Know why.

Most LLM observability tools stop at request/response logging. We capture the full routing decision trace — so you can answer "why did this request go to that model?" for any request in your history.

Request logs

Every request captured: input, output, tokens, cost, latency, model, provider, decision trace. Searchable by tag, user, project.

Cost attribution

Per-request cost computation. Drill down by project, tag, end-user, model, or provider. Exportable to your warehouse.

Latency percentiles

p50, p95, p99 latencies per model and provider. Detect regressions before they page you.

Error tracking

Group errors by type, provider, model. See which prompts consistently fail and why.

claude-code ~/my-app mcp
> create a cascade that tries haiku first
  and falls back to sonnet

[modelux] creating routing config @cascade-v1
  strategy:   cascade
  attempt_1:  claude-haiku-4-5   timeout 2s
  attempt_2:  claude-sonnet-4-5  timeout 5s
  retry_on:   [429, 5xx, timeout]

[modelux] config @cascade-v1 created. active.

> show me yesterday's spend by model

[modelux] fetching analytics report...
  gpt-4o-mini        $12.47   4,821 req
  claude-haiku-4-5   $ 8.22   2,140 req
  gemini-2.5-flash   $ 3.91   1,103 req
  total              $24.60   8,064 req
# ai-native

Manage Modelux from your AI.

Every dashboard action is also a tool in our MCP server. Connect Claude Code, Cursor, or any MCP-aware client and manage routing, budgets, providers, and analytics through natural language.

  • 80+ MCP tools covering the full API surface
  • REST API for everything (dashboard is a client)
  • Webhooks for event-driven integrations
  • OpenAPI spec for generated SDKs
# integrations

Works with your stack

[provider] OpenAI
[provider] Anthropic
[provider] Google
[provider] Azure OpenAI
[provider] AWS Bedrock
[provider] Groq
[provider] Fireworks
[client] OpenAI SDK
[client] Anthropic SDK
[client] LangChain
[client] LlamaIndex
[client] curl / HTTP
[management] MCP
[management] REST API
[management] Webhooks

Ready to ship?

Free tier. No credit card. Upgrade when you need it.