Switch in five minutes. Keep your code, your providers, your keys.
modelux is an OpenAI-SDK drop-in. The only thing that changes is what sits between your app and the provider. Same SDK, same request shape, same provider credentials — and you pick up decision traces, fallback chains, semantic cache, and budgets on day one.
One swap. Every SDK. Every language.
If your app already uses the OpenAI SDK — Python, TypeScript, curl, anything — you only change two values: the base URL and the API key. Your model names, your messages, your tools, your streaming logic, your function calls all keep working as-is.
- ▸ Works with the official OpenAI SDK on every platform
- ▸ Bring your own provider keys (BYO) — modelux never proxies through shared quota
- ▸ Anthropic-shape requests work too via the
/anthropic/v1/*surface
from openai import OpenAI
client = OpenAI(
base_url="https://api.modelux.ai/v1", # was: https://api.openai.com/v1
api_key="mlx_sk_...", # was: sk-...
)
response = client.chat.completions.create(
model="gpt-4o-mini", # or "@production" for routing
messages=[{"role": "user", "content": "Hello!"}],
) Switching from Helicone.
Helicone is observability-only — logs and dashboards on top of whatever provider you call. modelux gives you the same observability, plus routing, budgets, fallback, replay, and semantic cache. The switch is shorter than the Helicone setup was.
from openai import OpenAI
client = OpenAI(
base_url="https://oai.hconeai.com/v1",
api_key="sk-...",
default_headers={
"Helicone-Auth": f"Bearer {helicone_key}",
"Helicone-Property-User": user_id,
},
) from openai import OpenAI
client = OpenAI(
base_url="https://api.modelux.ai/v1",
api_key="mlx_sk_...",
)
# user / project tagging via X-Modelux-User-Id
# header (or the standard "user" field) Per-request observability — logs, costs, latencies, user/property tagging — and your existing OpenAI-SDK code.
Cross-provider routing, fallback chains, semantic cache, replay against historical traffic, hard spend caps, ensembles.
Switching from Portkey.
Portkey gives you a gateway with virtual keys and a routing config. modelux gives you the same — without per-token markup, with decision traces and replay built in, and with flat-tier pricing that doesn't surprise you when traffic grows.
from openai import OpenAI
client = OpenAI(
base_url="https://api.portkey.ai/v1",
api_key="sk-...",
default_headers={
"x-portkey-api-key": "pk-...",
"x-portkey-virtual-key": "openai-prod",
"x-portkey-config": "router-prod",
},
) from openai import OpenAI
client = OpenAI(
base_url="https://api.modelux.ai/v1",
api_key="mlx_sk_...",
)
response = client.chat.completions.create(
model="@production", # routing config by name
messages=[...],
) Routing strategies, virtual keys (BYO credentials), per-key analytics, structured request shape.
Flat pricing, decision traces per request, replay up to 50k requests of real traffic against candidate configs (with embedding-based quality scoring), ensembles, semantic cache, signed-SLA on Enterprise.
Switching from a self-hosted LiteLLM.
LiteLLM is great if running a proxy is your job. If it isn't, modelux is the same OpenAI-SDK shape with no infrastructure to operate — plus versioned configs, replay, and a dashboard you didn't have to build.
# 1. provision a host or pod
# 2. write litellm_config.yaml
# 3. run the proxy
$ litellm --config litellm_config.yaml --port 4000
# in your app:
client = OpenAI(
base_url="http://litellm.internal:4000",
api_key="sk-litellm-...",
) # no proxy to run, no config to deploy
from openai import OpenAI
client = OpenAI(
base_url="https://api.modelux.ai/v1",
api_key="mlx_sk_...",
)
# routing config edited in dashboard;
# rolled back with one click. Multi-provider unification, OpenAI-shape requests, BYO provider keys, the same model names you use today.
Zero infra to run, versioned configs with one-click rollback, decision traces, replay, semantic cache, dashboard out of the box.
Replay your migration before you flip the switch.
Once your first requests have flowed through modelux, you have a log slice you can replay. Point an experiment at the last 7 days of traffic and the candidate routing config you want to run in production. Before a single live user sees it, you get the cost, p50/p95 latency, error-rate, and route-distribution diff against your current baseline.
- ▸ Routing-only mode is free — no provider calls, projected numbers, up to 50,000 requests replayed per experiment
- ▸ with_responses mode actually calls the candidate model and scores each response against your baseline with embedding similarity
- ▸ Hard per-experiment spend caps with auto-cancel on overrun — you can't accidentally burn the migration budget
- ▸ Promote the winning candidate to a versioned production config in one audited click
{
"id": "sim_8f3c…",
"window": { "last_days": 7 },
"requests": 14218,
"mode": "routing_only",
"baseline": {
"cost_usd": 412.88,
"p50_ms": 980, "p95_ms": 3120
},
"candidate": {
"cost_usd": 178.40, // −56.8%
"p50_ms": 640, // −34.7%
"p95_ms": 2210 // −29.2%
}
} What you get the moment your first request lands.
No config required for any of these — they light up automatically once you swap the base URL and start sending traffic.
Decision traces on every request
Why each call went to which provider — including fallback advances and cache hits.
Cost & latency in the dashboard
Per-project, per-key, per-end-user breakdowns. No log piping, no warehouse setup.
Fallback chains by config
Cross-provider failover with per-attempt timeouts. Your app stops needing retry logic.
Semantic cache for free
Embedding-keyed cache. Repeat questions return in under a millisecond at $0 provider cost.
Ensembles in one config
Match Sonnet on accuracy at 6× lower cost — small models in parallel via confidence routing.
Hard spend caps that fire fast
Daily / weekly / monthly caps per project, key, or end-user. Enforced before the call leaves the proxy.
Two lines. Five minutes. No card.
The free tier covers 10k requests/month — enough to point your app at modelux, watch real percentiles flow into the dashboard, and decide whether it sticks before talking to anyone.