Routing

A routing config is a named, versioned resource that tells modelux how to handle a request. Your application calls modelux with a routing config slug (like @production) and modelux decides which model(s) and provider(s) to actually invoke.

Routing configs live in modelux, not in your code. Change the routing behavior without redeploying your app.

The `model` field: three forms

modelux accepts three shapes in the OpenAI model field:

Form	Example	What happens
`@<slug>`	`@production`	Runs the named routing config — applies policies, budgets, fallbacks, and records a full decision trace.
`<provider>/<model>`	`openai/gpt-4o-mini`	Direct call using your org’s default credential for that provider. Bypasses routing configs.
`<model>` (bare)	`gpt-4o-mini`	Infers the provider from the name’s prefix and routes directly. Lets OpenAI-SDK apps point at modelux with a one-line base-URL change.

Bare-name prefix map

Prefix	Provider
`gpt-`, `o1`, `o3`, `o4`, `text-embedding-*`	`openai`
`claude-*`	`anthropic`
`gemini-*`	`google`

Unknown bare names return 400 invalid_request with the list of recognized prefixes — you need to either use @config or provider/model.

Bare-name traffic skips routing configs entirely — no budgets, no fallbacks, no cost/latency optimization. The decision trace records resolved: auto so you can see which requests bypassed policies in the dashboard. For production traffic, prefer @config.

Strict deployments can disable bare-name resolution per project by setting settings.auto_resolve_bare_model to false via the update project API or the update_project MCP tool.

BYOK passthrough — bring your own provider key

For compliance, multi-tenant SaaS, or rotation-sensitive deployments, you can pass the provider API key on each request via the X-Modelux-Provider-Key header. The proxy uses it for that single outbound call and never stores it — only a masked fingerprint (sk-proj-...TA0A) lands in request logs, so you can still answer “which of my keys sent this traffic”.

curl https://api.modelux.ai/openai/v1/chat/completions \
  -H "Authorization: Bearer mlx_sk_..." \
  -H "X-Modelux-Provider-Key: sk-proj-your-openai-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}'

Rules:

The modelux API key (mlx_sk_*) is still required — it identifies the org, project, budgets, and rate limits. The provider key only overrides which upstream credential the proxy uses for the outbound call.
If a stored credential exists for the same provider, the header wins — the stored key is ignored. The stored credential’s base_url is still inherited, which lets you BYOK against Azure, Bedrock, Vertex, or a self-hosted endpoint: register the base URL once, override the key per request.
Only honored on direct calls (provider/model or a known bare prefix). @config routing still uses whichever credentials the config references — passthrough + @config doesn’t compose.
No cross-credential fallback, health monitoring, or credential-level budgets for passthrough traffic — by design. You opted out of the stored-credential surface.

Logs record auth_mode = "passthrough" and provider_key_fingerprint = "sk-proj-...TA0A" for BYOK requests. The log detail page shows Auth: passthrough (sk-proj-...TA0A) instead of a stored credential name. Filter aggregate analytics by auth_mode to answer “what fraction of my traffic is BYOK”.

Calling a routing config

Use the slug prefixed with @ as the model name:

client.chat.completions.create(
    model="@production",
    messages=[...]
)

Strategies

Strategy	Description
`single`	Lock traffic to one model + provider.
`fallback`	Ordered list of attempts with per-attempt timeouts. Retries on 429, 5xx, timeout.
`cost_optimized`	Pick the cheapest model meeting a quality tier, from an allowlist.
`latency_optimized`	Route to the lowest-p50-latency healthy provider.
`ensemble`	Parallel fan-out + aggregation (voting, first-valid, weighted).
`ab_test`	Percentage-based split across sub-configs.
`traffic_split`	Weighted random split across providers, sticky per conversation.
`cascade`	Sequential attempts with early stop on success.
`custom_rules`	Programmable DSL over cost, latency, budget, tags.

Versioning

Every save creates a new version. You can:

Diff two versions side-by-side
Rollback to any previous version with one click
Promote a candidate from an experiment result

Model aliases

Instead of hardcoding gpt-4o-mini in every request, create a routing config at @fast or @cheap and reference those slugs. Change the underlying model later without touching your app code.