Routing
A routing config is a named, versioned resource that tells modelux how to
handle a request. Your application calls modelux with a routing config slug
(like @production) and modelux decides which model(s) and provider(s) to
actually invoke.
Routing configs live in modelux, not in your code. Change the routing behavior without redeploying your app.
The model field: three forms
modelux accepts three shapes in the OpenAI model field:
| Form | Example | What happens |
|---|---|---|
@<slug> | @production | Runs the named routing config — applies policies, budgets, fallbacks, and records a full decision trace. |
<provider>/<model> | openai/gpt-4o-mini | Direct call using your org’s default credential for that provider. Bypasses routing configs. |
<model> (bare) | gpt-4o-mini | Infers the provider from the name’s prefix and routes directly. Lets OpenAI-SDK apps point at modelux with a one-line base-URL change. |
Bare-name prefix map
| Prefix | Provider |
|---|---|
gpt-*, o1*, o3*, o4*, text-embedding-* | openai |
claude-* | anthropic |
gemini-* | google |
Unknown bare names return 400 invalid_request with the list of recognized
prefixes — you need to either use @config or provider/model.
Bare-name traffic skips routing configs entirely — no budgets, no
fallbacks, no cost/latency optimization. The decision trace records
resolved: auto so you can see which requests bypassed policies in the
dashboard. For production traffic, prefer @config.
Strict deployments can disable bare-name resolution per project by setting
settings.auto_resolve_bare_model to false via the
update project API or the
update_project MCP tool.
BYOK passthrough — bring your own provider key
For compliance, multi-tenant SaaS, or rotation-sensitive deployments, you
can pass the provider API key on each request via the
X-Modelux-Provider-Key header. The proxy uses it for that single
outbound call and never stores it — only a masked fingerprint
(sk-proj-...TA0A) lands in request logs, so you can still answer
“which of my keys sent this traffic”.
curl https://api.modelux.ai/openai/v1/chat/completions \
-H "Authorization: Bearer mlx_sk_..." \
-H "X-Modelux-Provider-Key: sk-proj-your-openai-key" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}'
Rules:
- The modelux API key (
mlx_sk_*) is still required — it identifies the org, project, budgets, and rate limits. The provider key only overrides which upstream credential the proxy uses for the outbound call. - If a stored credential exists for the same provider, the header
wins — the stored key is ignored. The stored credential’s
base_urlis still inherited, which lets you BYOK against Azure, Bedrock, Vertex, or a self-hosted endpoint: register the base URL once, override the key per request. - Only honored on direct calls (
provider/modelor a known bare prefix).@configrouting still uses whichever credentials the config references — passthrough +@configdoesn’t compose. - No cross-credential fallback, health monitoring, or credential-level budgets for passthrough traffic — by design. You opted out of the stored-credential surface.
Logs record auth_mode = "passthrough" and provider_key_fingerprint = "sk-proj-...TA0A" for BYOK requests. The log detail page shows
Auth: passthrough (sk-proj-...TA0A) instead of a stored credential
name. Filter aggregate analytics by auth_mode to answer “what
fraction of my traffic is BYOK”.
Calling a routing config
Use the slug prefixed with @ as the model name:
client.chat.completions.create(
model="@production",
messages=[...]
)
Strategies
| Strategy | Description |
|---|---|
single | Lock traffic to one model + provider. |
fallback | Ordered list of attempts with per-attempt timeouts. Retries on 429, 5xx, timeout. |
cost_optimized | Pick the cheapest model meeting a quality tier, from an allowlist. |
latency_optimized | Route to the lowest-p50-latency healthy provider. |
ensemble | Parallel fan-out + aggregation (voting, first-valid, weighted). |
ab_test | Percentage-based split across sub-configs. |
traffic_split | Weighted random split across providers, sticky per conversation. |
cascade | Sequential attempts with early stop on success. |
custom_rules | Programmable DSL over cost, latency, budget, tags. |
Versioning
Every save creates a new version. You can:
- Diff two versions side-by-side
- Rollback to any previous version with one click
- Promote a candidate from an experiment result
Model aliases
Instead of hardcoding gpt-4o-mini in every request, create a routing config
at @fast or @cheap and reference those slugs. Change the underlying model
later without touching your app code.
Tags
Tag requests with arbitrary key-value pairs to scope routing, analytics, and budgets:
client.chat.completions.create(
model="@production",
messages=[...],
extra_body={
"mlx:tags": {
"tenant": "acme",
"feature": "summarize",
},
},
)
Custom rules can branch on tags: if tenant == "enterprise" then use @premium else use @production.