[view as .md]

Routing

A routing config is a named, versioned resource that tells modelux how to handle a request. Your application calls modelux with a routing config slug (like @production) and modelux decides which model(s) and provider(s) to actually invoke.

Routing configs live in modelux, not in your code. Change the routing behavior without redeploying your app.

The model field: three forms

modelux accepts three shapes in the OpenAI model field:

FormExampleWhat happens
@<slug>@productionRuns the named routing config — applies policies, budgets, fallbacks, and records a full decision trace.
<provider>/<model>openai/gpt-4o-miniDirect call using your org’s default credential for that provider. Bypasses routing configs.
<model> (bare)gpt-4o-miniInfers the provider from the name’s prefix and routes directly. Lets OpenAI-SDK apps point at modelux with a one-line base-URL change.

Bare-name prefix map

PrefixProvider
gpt-*, o1*, o3*, o4*, text-embedding-*openai
claude-*anthropic
gemini-*google

Unknown bare names return 400 invalid_request with the list of recognized prefixes — you need to either use @config or provider/model.

Bare-name traffic skips routing configs entirely — no budgets, no fallbacks, no cost/latency optimization. The decision trace records resolved: auto so you can see which requests bypassed policies in the dashboard. For production traffic, prefer @config.

Strict deployments can disable bare-name resolution per project by setting settings.auto_resolve_bare_model to false via the update project API or the update_project MCP tool.

BYOK passthrough — bring your own provider key

For compliance, multi-tenant SaaS, or rotation-sensitive deployments, you can pass the provider API key on each request via the X-Modelux-Provider-Key header. The proxy uses it for that single outbound call and never stores it — only a masked fingerprint (sk-proj-...TA0A) lands in request logs, so you can still answer “which of my keys sent this traffic”.

curl https://api.modelux.ai/openai/v1/chat/completions \
  -H "Authorization: Bearer mlx_sk_..." \
  -H "X-Modelux-Provider-Key: sk-proj-your-openai-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}'

Rules:

  • The modelux API key (mlx_sk_*) is still required — it identifies the org, project, budgets, and rate limits. The provider key only overrides which upstream credential the proxy uses for the outbound call.
  • If a stored credential exists for the same provider, the header wins — the stored key is ignored. The stored credential’s base_url is still inherited, which lets you BYOK against Azure, Bedrock, Vertex, or a self-hosted endpoint: register the base URL once, override the key per request.
  • Only honored on direct calls (provider/model or a known bare prefix). @config routing still uses whichever credentials the config references — passthrough + @config doesn’t compose.
  • No cross-credential fallback, health monitoring, or credential-level budgets for passthrough traffic — by design. You opted out of the stored-credential surface.

Logs record auth_mode = "passthrough" and provider_key_fingerprint = "sk-proj-...TA0A" for BYOK requests. The log detail page shows Auth: passthrough (sk-proj-...TA0A) instead of a stored credential name. Filter aggregate analytics by auth_mode to answer “what fraction of my traffic is BYOK”.

Calling a routing config

Use the slug prefixed with @ as the model name:

client.chat.completions.create(
    model="@production",
    messages=[...]
)

Strategies

StrategyDescription
singleLock traffic to one model + provider.
fallbackOrdered list of attempts with per-attempt timeouts. Retries on 429, 5xx, timeout.
cost_optimizedPick the cheapest model meeting a quality tier, from an allowlist.
latency_optimizedRoute to the lowest-p50-latency healthy provider.
ensembleParallel fan-out + aggregation (voting, first-valid, weighted).
ab_testPercentage-based split across sub-configs.
traffic_splitWeighted random split across providers, sticky per conversation.
cascadeSequential attempts with early stop on success.
custom_rulesProgrammable DSL over cost, latency, budget, tags.

Versioning

Every save creates a new version. You can:

  • Diff two versions side-by-side
  • Rollback to any previous version with one click
  • Promote a candidate from an experiment result

Model aliases

Instead of hardcoding gpt-4o-mini in every request, create a routing config at @fast or @cheap and reference those slugs. Change the underlying model later without touching your app code.

Tags

Tag requests with arbitrary key-value pairs to scope routing, analytics, and budgets:

client.chat.completions.create(
    model="@production",
    messages=[...],
    extra_body={
        "mlx:tags": {
            "tenant": "acme",
            "feature": "summarize",
        },
    },
)

Custom rules can branch on tags: if tenant == "enterprise" then use @premium else use @production.