modelux
$ modelux migration

Switch in five minutes. Keep your code, your providers, your keys.

modelux is an OpenAI-SDK drop-in. The only thing that changes is what sits between your app and the provider. Same SDK, same request shape, same provider credentials — and you pick up decision traces, fallback chains, semantic cache, and budgets on day one.

Lines to change
2
swap base_url and the API key — that's it
Providers supported
14+
OpenAI, Anthropic, Google, Azure, Bedrock, Groq, Fireworks, DeepSeek, xAI, Mistral, Cerebras, Together, Perplexity, Cohere
Time to first request
< 5 min
free tier, no card; sign up, paste a key, send a request
Migration cost
$0
free tier covers 10k requests/month — enough to verify before you commit
# the universal step

One swap. Every SDK. Every language.

If your app already uses the OpenAI SDK — Python, TypeScript, curl, anything — you only change two values: the base URL and the API key. Your model names, your messages, your tools, your streaming logic, your function calls all keep working as-is.

  • Works with the official OpenAI SDK on every platform
  • Bring your own provider keys (BYO) — modelux never proxies through shared quota
  • Anthropic-shape requests work too via the /anthropic/v1/* surface
@drop-in python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.modelux.ai/v1",   # was: https://api.openai.com/v1
    api_key="mlx_sk_...",                   # was: sk-...
)

response = client.chat.completions.create(
    model="gpt-4o-mini",                    # or "@production" for routing
    messages=[{"role": "user", "content": "Hello!"}],
)
# from helicone

Switching from Helicone.

Helicone is observability-only — logs and dashboards on top of whatever provider you call. modelux gives you the same observability, plus routing, budgets, fallback, replay, and semantic cache. The switch is shorter than the Helicone setup was.

before — helicone python
from openai import OpenAI

client = OpenAI(
    base_url="https://oai.hconeai.com/v1",
    api_key="sk-...",
    default_headers={
        "Helicone-Auth": f"Bearer {helicone_key}",
        "Helicone-Property-User": user_id,
    },
)
after — modelux python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.modelux.ai/v1",
    api_key="mlx_sk_...",
)
# user / project tagging via X-Modelux-User-Id
# header (or the standard "user" field)
What carries over

Per-request observability — logs, costs, latencies, user/property tagging — and your existing OpenAI-SDK code.

What you gain

Cross-provider routing, fallback chains, semantic cache, replay against historical traffic, hard spend caps, ensembles.

# from portkey

Switching from Portkey.

Portkey gives you a gateway with virtual keys and a routing config. modelux gives you the same — without per-token markup, with decision traces and replay built in, and with flat-tier pricing that doesn't surprise you when traffic grows.

before — portkey python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.portkey.ai/v1",
    api_key="sk-...",
    default_headers={
        "x-portkey-api-key": "pk-...",
        "x-portkey-virtual-key": "openai-prod",
        "x-portkey-config": "router-prod",
    },
)
after — modelux python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.modelux.ai/v1",
    api_key="mlx_sk_...",
)

response = client.chat.completions.create(
    model="@production",   # routing config by name
    messages=[...],
)
What carries over

Routing strategies, virtual keys (BYO credentials), per-key analytics, structured request shape.

What you gain

Flat pricing, decision traces per request, replay up to 50k requests of real traffic against candidate configs (with embedding-based quality scoring), ensembles, semantic cache, signed-SLA on Enterprise.

# from litellm

Switching from a self-hosted LiteLLM.

LiteLLM is great if running a proxy is your job. If it isn't, modelux is the same OpenAI-SDK shape with no infrastructure to operate — plus versioned configs, replay, and a dashboard you didn't have to build.

before — self-hosted litellm bash
# 1. provision a host or pod
# 2. write litellm_config.yaml
# 3. run the proxy
$ litellm --config litellm_config.yaml --port 4000

# in your app:
client = OpenAI(
    base_url="http://litellm.internal:4000",
    api_key="sk-litellm-...",
)
after — modelux managed python
# no proxy to run, no config to deploy
from openai import OpenAI

client = OpenAI(
    base_url="https://api.modelux.ai/v1",
    api_key="mlx_sk_...",
)
# routing config edited in dashboard;
# rolled back with one click.
What carries over

Multi-provider unification, OpenAI-shape requests, BYO provider keys, the same model names you use today.

What you gain

Zero infra to run, versioned configs with one-click rollback, decision traces, replay, semantic cache, dashboard out of the box.

# derisk

Replay your migration before you flip the switch.

Once your first requests have flowed through modelux, you have a log slice you can replay. Point an experiment at the last 7 days of traffic and the candidate routing config you want to run in production. Before a single live user sees it, you get the cost, p50/p95 latency, error-rate, and route-distribution diff against your current baseline.

  • Routing-only mode is free — no provider calls, projected numbers, up to 50,000 requests replayed per experiment
  • with_responses mode actually calls the candidate model and scores each response against your baseline with embedding similarity
  • Hard per-experiment spend caps with auto-cancel on overrun — you can't accidentally burn the migration budget
  • Promote the winning candidate to a versioned production config in one audited click
@migration-sim json
{
  "id": "sim_8f3c…",
  "window":   { "last_days": 7 },
  "requests": 14218,
  "mode": "routing_only",
  "baseline": {
    "cost_usd": 412.88,
    "p50_ms": 980,  "p95_ms": 3120
  },
  "candidate": {
    "cost_usd": 178.40,   // −56.8%
    "p50_ms": 640,        // −34.7%
    "p95_ms": 2210        // −29.2%
  }
}
# day one

What you get the moment your first request lands.

No config required for any of these — they light up automatically once you swap the base URL and start sending traffic.

Decision traces on every request

Why each call went to which provider — including fallback advances and cache hits.

Cost & latency in the dashboard

Per-project, per-key, per-end-user breakdowns. No log piping, no warehouse setup.

Fallback chains by config

Cross-provider failover with per-attempt timeouts. Your app stops needing retry logic.

Semantic cache for free

Embedding-keyed cache. Repeat questions return in under a millisecond at $0 provider cost.

Ensembles in one config

Match Sonnet on accuracy at 6× lower cost — small models in parallel via confidence routing.

Hard spend caps that fire fast

Daily / weekly / monthly caps per project, key, or end-user. Enforced before the call leaves the proxy.

Two lines. Five minutes. No card.

The free tier covers 10k requests/month — enough to point your app at modelux, watch real percentiles flow into the dashboard, and decide whether it sticks before talking to anyone.