# Modelux Documentation (Full)

> The control plane for LLMs. Every Modelux docs page concatenated into one file for ingestion by LLMs with long context windows.

Source: https://modelux.ai/llms-full.txt
Individual pages: https://modelux.ai/docs/<slug>.md
Docs tree (JSON): https://modelux.ai/docs.json

---

# Section: Getting Started

---

<!-- source: https://modelux.ai/docs -->

> Modelux is the control plane for LLMs. Policy-driven routing, finance-grade budgets, decision traces, and replay across every provider.

# Modelux Docs

Welcome. Modelux is the **control plane for your LLM stack**. You point
your OpenAI SDK at Modelux and get policy-driven routing across every
provider, finance-grade budgets, full decision traces, and a replay
simulator &mdash; without changing your application code.

## Start here

- **[Quickstart](/docs/quickstart)** — Send your first request in under 2 minutes.
- **[Concepts / Routing](/docs/concepts/routing)** — How routing configs work.
- **[API Reference](/docs/api/overview)** — The proxy and management APIs.

## What Modelux does

- **Policy-driven routing.** Fallback chains, cost-optimized, latency-optimized, ensembles, A/B tests, cascades, custom rule DSL across OpenAI, Anthropic, Google, Azure, Bedrock, Groq, Fireworks.
- **Finance-grade budgets.** Scoped spend caps with auto-downgrade, alerts, and tag-based attribution.
- **Decision-level observability.** Every request stores the full routing decision: attempts, reasons, per-attempt timings and costs.
- **Replay & versioning.** Configs are versioned with one-click rollback. Replay historical traffic against candidate configs before you ship them.
- **Audit & governance.** Audit log, role-based access, SSO/SAML, IP allowlists.
- **AI-native management.** REST API + MCP server &mdash; manage everything from your AI agent.

## What Modelux doesn't do

- Prompt management / versioning (use a dedicated tool)
- Model fine-tuning or hosting (we route to providers)
- Prompt evaluation (planned, not shipped)

---

<!-- source: https://modelux.ai/docs/quickstart -->

> Send your first request through Modelux in under 2 minutes.

# Quickstart

Two minutes from zero to routing. This guide walks you through: creating an
account, adding a provider, creating a project + API key, and sending your
first request.

## 1. Create an account

Go to [app.modelux.ai](https://app.modelux.ai/login) and sign in with
Google or a passwordless email link. When you log in for the first time,
Modelux creates a personal organization for you.

## 2. Add a provider

Modelux is BYO-keys — we proxy requests using your own provider credentials.

1. Open **Providers** in the sidebar.
2. Click **Add provider**.
3. Pick a provider (OpenAI, Anthropic, Google, Azure, Bedrock, etc.).
4. Paste your API key. Modelux stores it encrypted and verifies it with a
   test call.

## 3. Create a project

Projects group routing configs, API keys, and usage analytics.

1. Open **Projects** in the sidebar.
2. Click **Create project**. Give it a name like `my-app`.
3. Create an API key scoped to the project — it'll be shown once, prefixed
   with `mlx_sk_`.

## 4. Configure routing (optional)

By default, you can call any model directly by name (`gpt-4o`, `claude-sonnet-4-5`,
etc.) and Modelux will route it to the matching provider.

For more advanced routing — fallbacks, ensembles, cost optimization — create
a **routing config** under **Routing** in the sidebar. Each config gets a
stable slug like `@production` that your app calls instead of a raw model
name.

## 5. Send your first request

The OpenAI SDK works unchanged. Just swap the `base_url` and API key:

<CodeTabs labels={["Python", "Node", "curl"]}>

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.modelux.ai/v1",
    api_key="mlx_sk_...",
)

response = client.chat.completions.create(
    model="gpt-4o-mini",          # or "@production" for a routing config
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)
```

```javascript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.modelux.ai/v1",
  apiKey: process.env.MODELUX_API_KEY,
});

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);
```

```bash
curl https://api.modelux.ai/v1/chat/completions \
  -H "Authorization: Bearer $MODELUX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
```

</CodeTabs>

## What to do next

- **[Routing concepts](/docs/concepts/routing)** — Understand how routing configs work.
- **[Set up a fallback chain](/docs/guides/fallback-chain)** — Reliability in 5 minutes.
- **[Cost optimization](/docs/guides/cost-optimization)** — Cut your bill with smart routing.
- **[MCP setup](/docs/guides/mcp-setup)** — Manage Modelux from Claude Code.

---

# Section: Concepts

---

<!-- source: https://modelux.ai/docs/concepts/routing -->

> How routing configs work in Modelux.

# Routing

A **routing config** is a named, versioned resource that tells Modelux how to
handle a request. Your application calls Modelux with a routing config slug
(like `@production`) and Modelux decides which model(s) and provider(s) to
actually invoke.

Routing configs live in Modelux, not in your code. Change the routing
behavior without redeploying your app.

## Calling a routing config

Use the slug prefixed with `@` as the model name:

```python
client.chat.completions.create(
    model="@production",
    messages=[...]
)
```

You can also call raw model names (`gpt-4o`, `claude-sonnet-4-5`) directly —
Modelux auto-routes them to the matching provider using your credentials.

## Strategies

| Strategy | Description |
|---|---|
| `single` | Lock traffic to one model + provider. |
| `fallback` | Ordered list of attempts with per-attempt timeouts. Retries on 429, 5xx, timeout. |
| `cost_optimized` | Pick the cheapest model meeting a quality tier, from an allowlist. |
| `latency_optimized` | Route to the lowest-p50-latency healthy provider. |
| `ensemble` | Parallel fan-out + aggregation (voting, first-valid, weighted). |
| `ab_test` | Percentage-based split across sub-configs. |
| `cascade` | Sequential attempts with early stop on success. |
| `custom_rules` | Programmable DSL over cost, latency, budget, tags. |

## Versioning

Every save creates a new version. You can:

- **Diff** two versions side-by-side
- **Rollback** to any previous version with one click
- **Promote** a candidate from a simulation result

## Model aliases

Instead of hardcoding `gpt-4o-mini` in every request, create a routing config
at `@fast` or `@cheap` and reference those slugs. Change the underlying model
later without touching your app code.

## Tags

Tag requests with arbitrary key-value pairs to scope routing, analytics, and
budgets:

```python
client.chat.completions.create(
    model="@production",
    messages=[...],
    extra_body={
        "mlx:tags": {
            "tenant": "acme",
            "feature": "summarize",
        },
    },
)
```

Custom rules can branch on tags: `if tenant == "enterprise" then use @premium else use @production`.

---

<!-- source: https://modelux.ai/docs/concepts/providers -->

> Managing provider credentials in Modelux.

# Providers

A **provider** is an upstream LLM vendor — OpenAI, Anthropic, Google, Azure
OpenAI, AWS Bedrock, Groq, Fireworks. Modelux proxies your requests using
provider credentials you supply (BYO keys). We don't mark up per-token costs.

## Supported providers

| Provider | Status |
|---|---|
| OpenAI | Shipped |
| Anthropic | Shipped |
| Google (Gemini) | Shipped |
| Azure OpenAI | Shipped |
| AWS Bedrock | Shipped |
| Groq | In progress |
| Fireworks | In progress |

## Adding a provider

1. Open **Providers** in the dashboard.
2. Click **Add provider**.
3. Select the vendor, paste your API key, optionally set a base URL for
   self-hosted or regional endpoints.
4. Modelux stores the credential encrypted and runs a verification call
   before marking it active.

## Health monitoring

Modelux tracks provider health continuously:

- **Success rate** — rolling window of 2xx vs 4xx/5xx
- **p50 latency** — per-model, per-region where applicable
- **Last check timestamp** — indicates how fresh the health signal is

When a provider is marked unhealthy, health-aware routing strategies
automatically prefer other providers until it recovers.

## Credential rotation

Rotate a provider's API key without downtime:

1. Edit the provider in the dashboard
2. Paste the new key and save
3. Modelux verifies the new key, then atomically swaps it

Old in-flight requests finish with the old key; new requests pick up the new
key immediately.

## Custom base URLs

For Azure OpenAI deployments, self-hosted vLLM endpoints, or regional Bedrock
routes, set a custom base URL when creating the provider. Modelux will use
that URL for all requests routed to this provider.

---

<!-- source: https://modelux.ai/docs/concepts/projects -->

> Projects, API keys, and access control.

# Projects & API Keys

A **project** is a logical grouping of API keys, routing configs, and usage
analytics within an organization. Typical pattern: one project per app, one
project per environment (prod/staging), or one project per customer.

## Projects

Projects give you:

- **Scoped API keys** — each key belongs to exactly one project
- **Scoped routing configs** — configs are created within a project
- **Scoped analytics** — request logs, cost, and latency filterable per project
- **Scoped budgets** — spend caps can apply to a single project

## API keys

Modelux API keys use the prefix `mlx_sk_`. They're shown once at creation
time — copy it somewhere safe. We store a SHA-256 hash, so we can't show you
the key again later.

Each key can have:

- **Name** — for humans (e.g., "staging-server")
- **Optional expiry** — auto-revoke after N days
- **Optional rate limit** — requests per minute
- **Revoked** status — revoke at any time without rotating other keys

## Using an API key

Pass it as a Bearer token:

```bash
Authorization: Bearer mlx_sk_xxxxxxxxxxxx
```

Or via the OpenAI SDK:

```python
client = OpenAI(
    base_url="https://api.modelux.ai/v1",
    api_key="mlx_sk_...",
)
```

## Project-scoped routing

Routing configs belong to a project, but their slug (`@production`) is unique
within the project. A request authenticated with a project's API key can only
reference routing configs from that project.

---

<!-- source: https://modelux.ai/docs/concepts/budgets -->

> Spend caps, alerts, and auto-downgrade.

# Budgets

Budgets let you set spending limits at multiple scopes and enforce them
automatically. Modelux tracks per-request cost across every provider and
applies your budget rules in real time.

## Scopes

A budget applies to one of:

- **Organization** — total spend across all projects
- **Project** — spend on a specific project
- **End-user** — spend per end-user tag
- **Tag** — spend scoped by any custom tag key

## Caps

Each budget has:

- **Monthly cap** — reset on the 1st of each month, or a custom reset schedule
- **Soft cap** — alert threshold (defaults to 80%)
- **Hard cap** — enforcement action when reached

## Actions

When a hard cap is hit, Modelux can:

- **Alert only** — send an email / webhook; continue serving
- **Auto-downgrade** — automatically route to a cheaper model (configured per budget)
- **Block** — return `402 Payment Required` with an upgrade prompt

## Alerts

Every budget supports multiple alert thresholds with configurable actions.
Alerts fire via:

- Email to designated recipients
- Webhook (Slack-compatible format supported)
- Dashboard banner

## Forecasting

Modelux shows a projected end-of-month spend based on current rate, plus
period-over-period comparisons so you can catch cost regressions quickly.

---

<!-- source: https://modelux.ai/docs/concepts/analytics -->

> Request logging, analytics, and decision traces.

# Analytics & Logs

Modelux captures every request in ClickHouse for fast analytics and keeps a
full decision trace so you can answer "why did this request go to that model"
for anything in your history.

## Request logs

Every request gets a log entry with:

- **Timestamp, model, provider, project**
- **Input and output** (tokens + optional full content based on retention config)
- **Cost** — input cost, output cost, total, in USD
- **Latency** — time-to-first-token and total latency
- **Status** — 2xx/4xx/5xx with error class on failures
- **Decision trace** — attempts, reasons, per-attempt metrics
- **Tags** — whatever custom tags you attached to the request

Browse logs in the dashboard or query via the management API. Logs are
searchable by tag, user, project, status, latency threshold, and time range.

## Analytics

The analytics page aggregates logs into:

- **Volume** — requests over time, stacked by model or provider
- **Cost** — total cost over time, broken down by model/provider/project/tag
- **Latency** — p50/p95/p99 percentiles per model
- **Error rates** — grouped by error class and provider
- **End users** — top users by spend, volume, latency
- **Forecasts** — projected monthly spend with period-over-period comparison

Filters: date range, project, model, provider, status, tags.

## Decision traces

A decision trace answers: what routing strategy ran, which attempts were
tried, why they succeeded or failed, and what was chosen.

Example:

```
config:    @production (v4)
strategy:  fallback

attempt_1  claude-haiku-4-5   timeout (2000ms)
attempt_2  gpt-4o-mini        200 OK

decision:  fallback → attempt_2
reason:    primary timeout, secondary healthy
```

Click any request in the dashboard to see the full trace.

## Log retention

Retention varies by tier: 7 days (Free), 30 days (Pro), 60 days (Team), 90+
days (Enterprise, configurable). Structured analytics are retained longer
than raw request/response payloads.

---

<!-- source: https://modelux.ai/docs/concepts/ensembles -->

> Run multiple models in parallel and aggregate their outputs.

# Ensembles

An ensemble routing config fans a single request out to multiple models in
parallel, then aggregates their responses into a single final output. Done
right, ensembles of smaller models can match or exceed frontier-model quality
at a fraction of the cost.

## Aggregation strategies

| Strategy | Description | Best for |
|---|---|---|
| `first_valid` | Return the first attempt that succeeds the validation | Latency-sensitive with reliability fallback |
| `weighted_vote` | Classification-style vote across outputs | Categorical / structured outputs |
| `weighted_average` | Numeric outputs combined by weight | Scoring, ratings |
| `llm_judge` | Send all outputs to a judge model for best-pick | Open-ended generation |

## Configuration

```json
{
  "strategy": "ensemble",
  "aggregation": "weighted_vote",
  "members": [
    { "model": "claude-haiku-4-5",   "weight": 1.0 },
    { "model": "gpt-4o-mini",        "weight": 1.0 },
    { "model": "gemini-2.5-flash",   "weight": 0.8 }
  ],
  "timeout_ms": 5000
}
```

## Cost math

A 3-model ensemble of cheap models costs roughly 3x the cost of one cheap
model. Example:

- Frontier model (e.g. GPT-4o): ~$0.015 per 1k tokens
- 3-model ensemble (haiku + 4o-mini + flash): ~$0.003 per 1k tokens

That's 5x cheaper, often at comparable quality for many tasks. The ensemble
cost estimator in the dashboard shows live per-request cost based on your
typical prompt size.

## When to use ensembles

Good fits:

- Structured / classification tasks where voting helps
- Quality-critical tasks where you'd otherwise use a frontier model
- Tasks where small model variance is the main quality issue

Less ideal:

- Streaming-heavy workloads (ensembles don't stream)
- Latency-critical (you wait for slowest member, bounded by timeout)
- Tasks where cheap models already suffice on their own

---

<!-- source: https://modelux.ai/docs/concepts/caching -->

> Exact-match and semantic caching.

# Caching

Modelux caches successful responses keyed by request content. Cache hits skip
the provider call entirely and return the cached response with sub-millisecond
latency at zero provider cost.

## Cache modes

| Mode | Match behavior |
|---|---|
| `exact` | Request body must match byte-for-byte (default) |
| `semantic` | Embedding similarity above a threshold (advanced) |

## TTL

Configure TTL per routing config:

```json
{
  "cache": {
    "mode": "exact",
    "ttl_seconds": 3600
  }
}
```

## What gets cached

- Successful chat completions and embeddings
- Streaming responses (stitched into a complete response server-side)
- Structured outputs (JSON mode)

## What doesn't get cached

- Failed responses (4xx, 5xx)
- Requests with `temperature > 0` unless you opt in explicitly
- Requests with tool-calling (by default; opt-in available)

## Cache-control per-request

Override the cache at request time:

```python
client.chat.completions.create(
    model="@production",
    messages=[...],
    extra_body={"mlx:cache": {"skip": True}},
)
```

---

<!-- source: https://modelux.ai/docs/concepts/webhooks -->

> Event-driven integrations via webhooks.

# Webhooks

Webhooks let you react to Modelux events in your own infrastructure: budget
alerts, config changes, provider health transitions, request anomalies.

## Event types

- `budget.threshold_reached`
- `budget.exceeded`
- `routing_config.updated`
- `routing_config.created`
- `routing_config.deleted`
- `provider.health_changed`
- `api_key.revoked`
- `request.anomaly_detected`

## Endpoint setup

1. Open **Integrations -> Webhooks** in the dashboard.
2. Click **Add endpoint**. Enter the destination URL.
3. Select which event types to subscribe to.
4. Modelux generates a signing secret. Save it to verify incoming payloads.

## Signature verification

Every delivery includes an `X-Modelux-Signature` header with an HMAC-SHA256
of the raw body using your signing secret.

```javascript
import crypto from "crypto";

function verify(body, signature, secret) {
  const expected = crypto
    .createHmac("sha256", secret)
    .update(body)
    .digest("hex");
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expected),
  );
}
```

## Delivery & retries

- Deliveries run asynchronously through a durable queue
- On non-2xx response or timeout: retried with exponential backoff up to 24h
- The dashboard shows per-delivery status, payload, response body, and a
  replay button for manual redelivery

## Slack-compatible format

Set the endpoint URL to a Slack webhook URL. Modelux detects it and formats
the payload as a Slack message automatically.

---

# Section: Guides

---

<!-- source: https://modelux.ai/docs/guides/openai-migration -->

> Move your app from direct OpenAI to Modelux in 3 lines.

# Migrating from OpenAI

If your app already uses the OpenAI SDK, you can point it at Modelux with
three changes — no other code modifications needed.

## 1. Add OpenAI as a provider in Modelux

In the dashboard, add your existing OpenAI API key as a provider. Modelux
will use that key to proxy requests — you keep your existing OpenAI account,
billing, and rate limits.

## 2. Create a Modelux API key

Create a project, then generate an API key scoped to it. Copy the
`mlx_sk_...` value.

## 3. Update your client config

Change two lines in your app:

```diff
  from openai import OpenAI

  client = OpenAI(
-     api_key=os.environ["OPENAI_API_KEY"],
+     base_url="https://api.modelux.ai/v1",
+     api_key=os.environ["MODELUX_API_KEY"],
  )
```

That's it. Your existing `client.chat.completions.create(...)` calls work
unchanged. Model names like `gpt-4o-mini` are routed to OpenAI through your
credentials.

## What you get for free

Just by routing through Modelux, with zero other code changes:

- **Full request logs** with searchable traces
- **Per-request cost tracking**
- **Latency percentiles** by model
- **Team-level analytics** if multiple apps share one org

## Next steps

Once traffic is flowing, you can add value without further code changes:

- **Add a fallback chain** to improve reliability — create a routing config
  `@production` that falls back from `gpt-4o-mini` to `claude-haiku-4-5`,
  then update your app to call `model="@production"` instead of `gpt-4o-mini`.
- **Set a monthly budget** with auto-downgrade to cap your spend.
- **Enable the replay simulator** to test changes against historical traffic.

## Streaming still works

Streaming responses pass through unchanged:

```python
stream = client.chat.completions.create(
    model="@production",
    messages=[...],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
```

---

<!-- source: https://modelux.ai/docs/guides/fallback-chain -->

> Build a reliable routing config with automatic failover.

# Setting up a fallback chain

Fallback chains are the simplest way to improve reliability. You define an
ordered list of model attempts with per-attempt timeouts. If the first
attempt fails (timeout, 429, or 5xx), Modelux automatically retries with the
next model.

## Why fallbacks?

- Every provider has outages, rate limits, and intermittent failures
- Different models fail at different times — a good fallback chain is rarely
  fully down
- Your app sees a consistent response — no retry logic to write

## Create the config

In the dashboard, go to **Routing -> Create config**, pick **Fallback**, and
add attempts:

```json
{
  "strategy": "fallback",
  "attempts": [
    { "model": "claude-haiku-4-5",   "timeout_ms": 2000 },
    { "model": "gpt-4o-mini",        "timeout_ms": 3000 },
    { "model": "gemini-2.5-flash",   "timeout_ms": 5000 }
  ],
  "retry_on": ["429", "5xx", "timeout"]
}
```

## Tips

### Primary: fast, cheap, usually good

Put your preferred cheap+fast model first. Aggressive timeout (1-2s for
simple prompts, longer for reasoning tasks).

### Secondary: different provider

Diversify across providers — if OpenAI is down, Anthropic usually isn't.

### Tertiary: conservative timeout

Longer timeout on the last attempt; you've already burned latency budget on
the first two.

### Don't chain too many

Three attempts is usually enough. More than that and you'll hit your overall
request timeout before the last attempts even run.

## Verify

Check the decision trace of a few requests in **Logs** to see which attempt
actually served each request. Over time, the **Analytics** page will show
attempt distribution.

## Call it from your app

```python
client.chat.completions.create(
    model="@production",   # the slug of the routing config
    messages=[...],
)
```

---

<!-- source: https://modelux.ai/docs/guides/cost-optimization -->

> Strategies to cut your LLM bill with Modelux.

# Cost optimization

LLM bills scale with usage. Modelux gives you three levers to cut costs
without code changes: smaller models for simpler traffic, ensembles of cheap
models, and budget enforcement.

## Lever 1: Right-size the model

Most teams over-provision. GPT-4o isn't needed for a classification task
that GPT-4o-mini handles fine.

Use a **cost-optimized** routing config:

```json
{
  "strategy": "cost_optimized",
  "quality_tier": "standard",
  "allowed_models": [
    "gpt-4o-mini",
    "claude-haiku-4-5",
    "gemini-2.5-flash"
  ]
}
```

Modelux picks the cheapest allowed model that meets the quality tier. You'll
see typical savings of 50-70% vs. using a frontier model for everything.

## Lever 2: Ensembles

For quality-critical tasks where you'd otherwise reach for a frontier model,
try a 3-model ensemble of cheap models. Voting across multiple cheap models
often matches frontier-model quality at 20% of the cost.

See [Ensembles](/docs/concepts/ensembles) for full configuration details.

## Lever 3: Budget caps with auto-downgrade

Set a monthly budget with auto-downgrade:

- Cap: $500/month on your production project
- At 80%: email alert
- At 100%: automatically downgrade all traffic to a cheaper routing config

Your app keeps serving requests — just at a lower cost — until the budget
resets next month.

## Lever 4: Caching

Enable exact-match caching on routing configs where prompts repeat. Cache
hits cost $0. Even 10% cache hit rate on a high-volume endpoint pays for
Modelux many times over.

## Measure the savings

Go to **Analytics -> Period comparison**. Set the baseline to the month
before you enabled cost optimization. You'll see period-over-period cost and
volume diffs side-by-side.

## A typical result

A team spending $10k/month on GPT-4o:

- Adds a cost-optimized config routing 60% of traffic to `gpt-4o-mini` +
  `claude-haiku-4-5`
- Keeps 40% on `gpt-4o` for complex queries
- New monthly bill: ~$5,200
- Modelux cost: $199/month (Team tier)
- **Net savings: ~$4,600/month**

---

<!-- source: https://modelux.ai/docs/guides/ab-testing -->

> Run controlled experiments to compare models in production.

# A/B testing models

A/B tests route a configurable percentage of traffic to each sub-config so
you can compare cost, latency, and quality in real production traffic.

## Why A/B test?

- Changing models is high-stakes. Benchmarks don't match your specific use case.
- Ensemble configs are especially tricky — aggregation behavior depends on your data distribution.
- Cost/latency claims from vendors rarely match your real numbers.

## Create an A/B test

```json
{
  "strategy": "ab_test",
  "variants": [
    { "weight": 80, "config": "@production" },
    { "weight": 20, "config": "@production-candidate" }
  ]
}
```

Call the wrapper config from your app:

```python
client.chat.completions.create(
    model="@experiment",
    messages=[...],
)
```

Modelux logs which variant ran per request, so you can compare.

## Read the results

Go to **Analytics -> Compare variants**. Modelux shows side-by-side:

- Request volume
- Mean cost per request
- p50 / p95 latency
- Error rate

If you tag requests with a quality signal from your app (e.g., user
thumbs-up/down), the analytics can also compare quality metrics across
variants.

## Promote a variant

Once you've seen enough volume to be confident, promote the winner:

1. Go to **Simulations** or the routing config's versions view
2. Select the variant
3. Click **Promote** — Modelux atomically switches your traffic over

## Replay before you A/B

If you want signal before sending real traffic, use the **replay simulator**:
take the last 24h of requests and run them through the candidate config.
You'll see the cost/latency diff without risking production quality.

---

<!-- source: https://modelux.ai/docs/guides/mcp-setup -->

> Manage Modelux from Claude Code, Cursor, or any MCP client.

# Connecting your AI (MCP)

Modelux exposes every management action as a tool through the Model Context
Protocol (MCP). Connect an MCP-aware AI client — Claude Code, Cursor, etc. —
and you can manage routing, budgets, providers, analytics, and more through
natural language.

## Get your MCP URL and management token

1. In the dashboard, go to **Settings -> API & MCP**.
2. Copy the MCP server URL: `https://api.modelux.ai/mcp`
3. Create a **management API key** (separate from the proxy API keys) — this
   token has broader scope and can modify your org's configuration.

## Configure Claude Code

Add the server to your `~/.claude/claude_desktop_config.json` (or Claude
Code settings):

```json
{
  "mcpServers": {
    "modelux": {
      "url": "https://api.modelux.ai/mcp",
      "headers": {
        "Authorization": "Bearer mlx_mk_..."
      }
    }
  }
}
```

Restart Claude Code. It will connect to the server and list available tools.

## What you can ask

- "Create a cascade that tries haiku first then falls back to sonnet"
- "Show me yesterday's spend by model"
- "Set a $500/month cap on my production project with auto-downgrade to @cheap"
- "Which provider had the highest error rate last week?"
- "Rotate my OpenAI API key and verify it works"
- "Replay the last 24 hours of traffic against the candidate config"

## Scope

The MCP key is org-scoped. Whoever has the key can perform any management
action within that org. Treat it like any privileged credential:

- Don't commit it to source control
- Rotate periodically
- Revoke immediately if exposed

## Tools available

Modelux exposes 80+ tools covering the full management API surface:
projects, routing configs, providers, budgets, API keys, analytics,
audit logs, webhooks, simulations, and org/member management.

---

<!-- source: https://modelux.ai/docs/guides/sso -->

> Connect your identity provider to Modelux so your team signs in with their work account.

# SAML SSO overview

Modelux supports **SAML 2.0** for workforce single sign-on. Connect your
identity provider (IdP) — Okta, Microsoft Entra ID, Google Workspace,
JumpCloud, OneLogin, or any SAML-compliant IdP — and members of your org
sign in with their work account instead of a password.

SSO is available on the **Enterprise** plan. Pair it with
[SCIM provisioning](/docs/guides/scim) to automate user lifecycle.

## When do I need this?

You need SSO when any of these apply:

- Your IT / security team requires federated auth for every SaaS
- You want one-click deactivation when someone leaves the company
- You need SOC 2 or ISO 27001 evidence for access control
- Your organization has more than ~25 seats

If none of those apply, password + Google + passwordless email login is
probably simpler.

## How Modelux's SAML works

Modelux is a **SAML 2.0 service provider (SP)**. Your IdP authenticates
the user and sends a signed assertion back to Modelux, which creates
the session.

Two login entry points are supported:

- **SP-initiated:** user visits `app.modelux.ai/login`, clicks
  "Use SAML SSO", enters their work email, and we redirect them to
  your IdP.
- **IdP-initiated:** user clicks the Modelux tile in their workforce
  portal (Okta dashboard, myapps.microsoft.com, etc.) and lands straight
  in the Modelux dashboard.

On first login, Modelux **just-in-time provisions** the user into your
org with the default role you configured. If they already have a
Modelux user with a matching verified email, we merge — no duplicate
accounts.

## What you'll need

Before starting:

- An **admin** or **owner** role in your Modelux org
- An account with **app-administrator** rights in your IdP (to register
  a new SAML app)
- A verified email domain you can add a DNS TXT record to

## High-level setup

The flow is the same for every IdP:

1. In Modelux → **Settings → SSO**, note these three values. Your IdP
   needs them to register Modelux as a service provider:
   - **SP Entity ID** (e.g. `https://app.modelux.ai/api/sso/metadata`)
   - **Assertion Consumer Service URL** (e.g.
     `https://app.modelux.ai/api/sso/saml/acs`)
   - **SP metadata XML** (we publish the full metadata document — some
     IdPs let you import from a URL)
2. Create a new SAML app in your IdP and paste those values.
3. Copy three values **back** from the IdP into Modelux's
   **Settings → SSO** → Identity provider form:
   - **IdP Entity ID** (often called *Identifier* or *Issuer URL*)
   - **IdP SSO URL** (the SAML 2.0 Single Sign-On URL)
   - **IdP Certificate** (PEM-encoded x509 signing certificate)
4. Set a **default role** for new SSO users (member is usually right).
5. Click **Test connection** — this validates the cert parses without
   attempting a live login.
6. Click **Save**.
7. Add your email domain under **Email domains**, publish the TXT
   record we give you at `_modelux.<your-domain>`, and click **Verify**.
   Once verified, anyone signing in from `you@your-domain.com` is
   automatically routed to your org's IdP.
8. Once the flow works end-to-end for at least one test user, flip
   **Require SAML for all members** to block password / Google logins
   for people in your org.

## Provider-specific guides

- [Okta](/docs/guides/sso-okta)
- [Microsoft Entra ID](/docs/guides/sso-entra) (formerly Azure AD)
- [Google Workspace](/docs/guides/sso-google)

> Using JumpCloud, OneLogin, or another IdP? The setup follows the same
> pattern — any IdP that exports standard SAML 2.0 metadata works.
> Follow the high-level flow above. If you hit a snag, email
> support@modelux.ai and we'll walk you through.

## Attribute mapping

Modelux reads these attributes from the SAML assertion (first match wins
per field):

| Field | Claims we read |
| --- | --- |
| Email | `email`, `emailAddress`, `mail`, `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress`, `urn:oid:0.9.2342.19200300.100.1.3`, or the NameID if it is an email |
| First name | `firstName`, `givenName`, `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname`, `urn:oid:2.5.4.42` |
| Last name | `lastName`, `surname`, `sn`, `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname`, `urn:oid:2.5.4.4` |
| Display name | `displayName`, `name`, `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name`, `cn` |

Most IdPs emit the first set of names by default. If yours is picky
about URIs, use the OIDs — those are the SAML 2.0 canonical forms.

## Enforcement

With **Require SAML for all members** enabled, Modelux rejects every
non-SAML login for anyone who is a member of your org. Google and
passwordless email sign-ins return a friendly banner pointing them at
the SSO flow. Don't enable this until:

- You've tested the flow end-to-end with a non-admin account
- Your domain is DNS-verified
- At least one owner can log in via SSO (so a bad IdP config doesn't
  lock every owner out — owners retain their SSO identity through
  provider changes)

If you do lock yourself out, email support@modelux.ai — we can disable
enforcement for you.

## Rotating the IdP certificate

IdP signing certs expire. When yours is about to rotate:

1. Get the new PEM from your IdP
2. Paste it into the **IdP certificate** field in Modelux and save
3. The current fingerprint shown next to the field updates — verify it
   matches what your IdP shows

Modelux stores the cert encrypted at rest and only ever audits the
SHA-256 fingerprint.

---

<!-- source: https://modelux.ai/docs/guides/sso-okta -->

> Connect Okta as your SAML identity provider for Modelux.

# SSO with Okta

Step-by-step Okta setup. Assumes you have the **Super Admin** role in
your Okta tenant and **admin** or **owner** in Modelux.

If you haven't read the [SAML SSO overview](/docs/guides/sso), start
there.

## 1. Collect Modelux SP details

In a separate browser tab, open Modelux → **Settings → SSO**. You'll
copy these values into Okta in step 3:

- **SP Entity ID**
- **Assertion Consumer Service URL**
- **SP metadata XML** URL (optional — Okta doesn't accept metadata import
  for custom SAML apps; you'll paste the individual fields instead)

Keep the tab open; you'll also paste values **back** from Okta.

## 2. Create a SAML app in Okta

1. In the Okta admin console: **Applications → Applications → Create App
   Integration → SAML 2.0 → Next**.
2. **General Settings:**
   - App name: `Modelux`
   - (Optional) Upload the Modelux logo
   - Click **Next**.

## 3. Configure SAML settings

On the **Configure SAML** step:

- **Single sign-on URL:** paste Modelux's **Assertion Consumer Service
  URL**. Check *"Use this for Recipient URL and Destination URL"*.
- **Audience URI (SP Entity ID):** paste Modelux's **SP Entity ID**.
- **Name ID format:** `EmailAddress`
- **Application username:** `Email`

### Attribute Statements

Add these (case-sensitive):

| Name | Name format | Value |
| --- | --- | --- |
| `email` | Basic | `user.email` |
| `firstName` | Basic | `user.firstName` |
| `lastName` | Basic | `user.lastName` |
| `displayName` | Basic | `user.displayName` |

Click **Next**, pick *"I'm an Okta customer adding an internal app"*,
then **Finish**.

## 4. Copy Okta → Modelux

Back on your new Okta app's **Sign On** tab, click **View SAML setup
instructions**. Copy these into the Modelux **Identity provider** form:

| Okta field | Modelux field |
| --- | --- |
| *Identity Provider Issuer* | IdP Entity ID |
| *Identity Provider Single Sign-On URL* | IdP SSO URL |
| *X.509 Certificate* (download or copy the PEM block) | IdP certificate |

Set **Default role** to `member` (the typical choice).

Click **Test connection** in Modelux — this confirms the cert parses.
Then click **Save**.

## 5. Assign users in Okta

- On the Okta app's **Assignments** tab, click **Assign → Assign to
  People** (or Groups).
- Add yourself (and a test user if possible) to the app.

## 6. Verify domain + test

1. In Modelux **Settings → SSO**, add your email domain (e.g.
   `acme.com`). Publish the DNS TXT record we give you at
   `_modelux.acme.com` and click **Verify**.
2. In an incognito window, visit
   [app.modelux.ai/login](https://app.modelux.ai/login).
3. Click **Use SAML SSO**, enter your work email, and confirm you land
   back in the Modelux dashboard logged in.

## 7. Turn on enforcement

Once a non-admin test user has logged in successfully, return to
**Settings → SSO** and toggle **Require SAML for all members**. This
blocks password / Google logins for any member of your org.

> Don't enable enforcement until at least one org **owner** has
> successfully signed in via SAML. If the IdP config is wrong and
> enforcement is on, everyone is locked out.

## SCIM provisioning (optional)

To have Okta push user create / update / deactivate events into
Modelux, enable SCIM on the same app. See the
[SCIM provisioning guide](/docs/guides/scim#okta).

## Troubleshooting

- **"Invalid SAML assertion"** — the cert in Modelux doesn't match the
  Okta signing cert. Re-copy the x509 from Okta's setup instructions.
  Make sure you include `-----BEGIN CERTIFICATE-----` and
  `-----END CERTIFICATE-----`.
- **"No SSO configured for this email's domain"** — add the domain in
  Modelux and verify the TXT record.
- **Landing on /login instead of the dashboard after clicking the Okta
  tile** — IdP-initiated SSO works, but Okta needs the *Default
  RelayState* set. Modelux accepts either a missing or `/` RelayState;
  most Okta setups work out of the box.

---

<!-- source: https://modelux.ai/docs/guides/sso-entra -->

> Connect Microsoft Entra ID (formerly Azure AD) as your SAML identity provider for Modelux.

# SSO with Microsoft Entra ID

Step-by-step Entra ID (formerly Azure AD) setup. Assumes you have
**Cloud Application Administrator** (or Global Admin) in Entra and
**admin** or **owner** in Modelux.

If you haven't read the [SAML SSO overview](/docs/guides/sso), start
there.

## 1. Collect Modelux SP details

In a separate browser tab, open Modelux → **Settings → SSO** and keep
these handy:

- **SP Entity ID**
- **Assertion Consumer Service URL**

## 2. Create an Enterprise Application

1. Go to the [Entra admin center](https://entra.microsoft.com/) →
   **Applications → Enterprise applications → New application**.
2. Click **Create your own application**.
3. Name it `Modelux`, pick **Integrate any other application you don't
   find in the gallery (Non-gallery)**, and click **Create**.

## 3. Enable SAML SSO

1. On the app's overview, click **Single sign-on → SAML**.
2. In the **Basic SAML Configuration** panel, click **Edit** and fill in:
   - **Identifier (Entity ID):** Modelux's **SP Entity ID**
   - **Reply URL (Assertion Consumer Service URL):** Modelux's **ACS URL**
   - Leave Sign on URL / Relay State / Logout URL blank
   - Save.

## 4. Configure attribute claims

Entra ships reasonable defaults but they use the SAML 2.0 canonical URIs.
Modelux reads those, so the defaults usually "just work." If you want to
add friendlier names:

1. In the **Attributes & Claims** panel, click **Edit**.
2. Add these claims alongside the defaults:

| Claim name | Source | Source attribute |
| --- | --- | --- |
| `email` | Attribute | `user.mail` |
| `firstName` | Attribute | `user.givenname` |
| `lastName` | Attribute | `user.surname` |
| `displayName` | Attribute | `user.displayname` |

Make sure the **Unique User Identifier (Name ID)** claim maps to
`user.mail` (or `user.userprincipalname` if mail isn't populated for
every user).

## 5. Copy Entra → Modelux

In the **SAML Certificates** panel:

- Download **Certificate (Base64)** — this is the PEM x509.

In the **Set up Modelux** panel:

| Entra field | Modelux field |
| --- | --- |
| *Microsoft Entra Identifier* (or Azure AD Identifier) | IdP Entity ID |
| *Login URL* | IdP SSO URL |
| *Certificate (Base64)* (contents of the downloaded file) | IdP certificate |

In Modelux's **Identity provider** form, paste these values, set
**Default role** to `member`, click **Test connection**, then **Save**.

## 6. Assign users

Back in the Entra Enterprise app → **Users and groups → Add user/group**.
Assign yourself and a test user.

> Entra defaults to requiring user assignment. If you disable *User
> assignment required* on the app's Properties panel, any licensed user
> in your tenant can sign in — only do this if that's intentional.

## 7. Verify domain + test

1. In Modelux **Settings → SSO**, add your email domain. Publish the
   DNS TXT record at `_modelux.<your-domain>` and click **Verify**.
2. In an incognito window, sign in at
   [app.modelux.ai/login](https://app.modelux.ai/login) via **Use SAML
   SSO**.

## 8. Turn on enforcement

Once verified, toggle **Require SAML for all members** in Modelux.

## SCIM provisioning

To automate user lifecycle from Entra, see the
[SCIM provisioning guide](/docs/guides/scim#microsoft-entra-id).

## Troubleshooting

- **"AADSTS50105" — the signed-in user is not assigned to a role**: add
  the user under **Enterprise application → Users and groups**, or
  disable *User assignment required*.
- **Cert errors in Modelux**: Entra exports the cert as a `.cer` file.
  Open it in a text editor — if it starts with `-----BEGIN
  CERTIFICATE-----`, paste it as-is. If it's binary (raw DER), download
  the **Certificate (Base64)** variant instead.
- **Attributes not appearing**: by default Entra emits claims under the
  `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/...` URIs.
  Modelux reads those, so it should work without customization. If you
  overrode the defaults, make sure at least one of the email claims in
  the [attribute mapping table](/docs/guides/sso#attribute-mapping) is
  present.

---

<!-- source: https://modelux.ai/docs/guides/sso-google -->

> Connect Google Workspace as your SAML identity provider for Modelux.

# SSO with Google Workspace

Step-by-step Google Workspace setup. Requires the **Super Admin** role
in Google Workspace and **admin** or **owner** in Modelux.

If you haven't read the [SAML SSO overview](/docs/guides/sso), start
there.

## 1. Collect Modelux SP details

In Modelux → **Settings → SSO**, note:

- **SP Entity ID**
- **Assertion Consumer Service URL**

## 2. Create a custom SAML app in Google

1. [admin.google.com](https://admin.google.com) → **Apps → Web and
   mobile apps → Add app → Add custom SAML app**.
2. App name: `Modelux`. Click **Continue**.
3. On the **Google Identity Provider details** page, you'll see three
   values:
   - SSO URL
   - Entity ID
   - Certificate (click **Download certificate** — this is a `.pem`
     file)
4. **Keep this browser tab open** — you'll paste these into Modelux in
   the next step. Click **Continue**.

## 3. Configure the SP side

On the **Service Provider details** page:

- **ACS URL:** Modelux's **Assertion Consumer Service URL**
- **Entity ID:** Modelux's **SP Entity ID**
- **Name ID format:** `EMAIL`
- **Name ID:** `Basic Information > Primary email`
- Click **Continue**.

## 4. Configure attribute mapping

| Google directory attribute | App attribute |
| --- | --- |
| `Primary email` | `email` |
| `First name` | `firstName` |
| `Last name` | `lastName` |
| `Full name` (if available) | `displayName` |

Click **Finish**.

## 5. Copy Google → Modelux

Back on the Google side, re-open the **Google Identity Provider details**
panel (you can find it on the app's overview → **SP detailed configure**
or by re-entering the wizard):

| Google field | Modelux field |
| --- | --- |
| *SSO URL* | IdP SSO URL |
| *Entity ID* | IdP Entity ID |
| *Certificate* (contents of the downloaded `.pem`) | IdP certificate |

Paste into Modelux's **Identity provider** form, set **Default role**
to `member`, click **Test connection**, then **Save**.

## 6. Turn the app on for users

In Google Admin, on the Modelux app page:

1. Under **User access**, click the tile and choose **ON for everyone**
   or **ON for certain organizational units / groups**.
2. Save. Propagation to end users takes a few minutes.

## 7. Verify domain + test

1. In Modelux **Settings → SSO**, add your email domain, publish the
   TXT record at `_modelux.<domain>`, and click **Verify**.
2. Open [app.modelux.ai/login](https://app.modelux.ai/login) in an
   incognito window → **Use SAML SSO** → enter your work email.

## 8. Turn on enforcement

Once the test user logs in successfully, toggle **Require SAML for all
members** in Modelux.

## SCIM provisioning

Google Workspace supports SCIM but auto-provisioning setup is less
common than Okta/Entra. Contact support@modelux.ai if you need help
wiring Google's SCIM client.

## Troubleshooting

- **"Clock skew"-style errors**: Google-signed assertions are valid for
  ~10 minutes. If your server clocks drift, verification fails. NTP
  usually handles this; check your server time matches real time within
  a few seconds.
- **Certificate mismatch**: Google rotates signing certs periodically.
  If users start failing to log in, re-download the certificate from
  Google and paste the new PEM into Modelux.
- **"No Primary email"**: some Google Workspace users (service
  accounts) don't have a primary email. Those can't use SSO — they
  weren't real users anyway.

---

<!-- source: https://modelux.ai/docs/guides/scim -->

> Automate user lifecycle (create, update, deactivate) from your IdP using SCIM 2.0.

# SCIM user provisioning

SCIM 2.0 lets your IdP push user lifecycle events — create, update,
deactivate — into Modelux automatically. Without it, you manage seats
manually: when someone leaves the company, an admin has to remember to
remove them from Modelux.

SCIM is available on the **Enterprise** plan. You'll typically set it
up alongside [SAML SSO](/docs/guides/sso).

## What SCIM does in Modelux

Each SCIM "User" is a **membership** in your Modelux org:

- **Create** → add the user to your org. If they already have a Modelux
  account with the same verified email, we link to it.
- **Update / PATCH `active: false`** → mark the membership
  deactivated. The user can't sign in or use management API keys they
  created.
- **Update / PATCH `active: true`** → reactivate.
- **Delete** → remove the membership from your org. The user's global
  account (if shared with other orgs) is not deleted.

Role is transmitted via a Modelux SCIM extension:
`urn:modelux:params:scim:schemas:extension:2.0:User`. If your IdP
doesn't fill it, new members get the **default role** from your SSO
configuration.

## Create a SCIM token

1. In Modelux → **Settings → SSO**, scroll to the **SCIM tokens** card.
2. Click **Create token**. Give it a name like `Okta` or `Entra
   provisioning`.
3. Copy the token value — it starts with `mlx_scim_`. **It is shown
   exactly once.** If you lose it, revoke the token and create a new
   one.

The base URL your IdP will POST to is:

```
https://app.modelux.ai/api/scim/v2
```

## Okta

1. On your Modelux SAML app in Okta → **General → App Settings**,
   enable **Provisioning → SCIM** (requires the Lifecycle Management
   add-on).
2. **Base URL:** `https://app.modelux.ai/api/scim/v2`
3. **Unique identifier field for users:** `userName`
4. **Supported provisioning actions:** check Push New Users, Push
   Profile Updates, Push Groups are not used.
5. **Authentication Mode:** HTTP Header
6. **Authorization:** `Bearer <your-mlx_scim_ token>`
7. Click **Test API Credentials** — you should get a green checkmark.

On the **To App** settings, enable *Create Users*, *Update User
Attributes*, and *Deactivate Users*.

## Microsoft Entra ID

1. On your Modelux Enterprise app → **Provisioning → Get started →
   Automatic**.
2. **Tenant URL:** `https://app.modelux.ai/api/scim/v2`
3. **Secret token:** your `mlx_scim_...` token
4. Click **Test Connection** — Entra should report success.
5. Save, then under **Mappings → Provision Microsoft Entra ID Users**,
   confirm `userPrincipalName` or `mail` maps to `userName`, and the
   name/email attributes map to their SCIM equivalents.
6. Set **Provisioning Status** to **On** and save.

Entra provisioning cycles every ~40 minutes. Use **Provision on
demand** to test a single user immediately.

## Deactivation behavior

A SCIM `active: false` patch keeps the membership row but sets
`deactivatedAt`. This preserves the audit trail (when they were
deactivated, by which token) and keeps history intact. Reactivating
clears the flag; the user gets their previous role back.

A SCIM `DELETE /Users/<id>` removes the membership entirely. The
global `User` record stays — they may be a member of other Modelux
orgs, and their personal account (if any) shouldn't be collateral
damage.

## Last-owner protection

Modelux refuses to delete or deactivate the **last owner** of an org.
SCIM returns a `409 mutability` error. Add another owner first, or
contact support if you need a workaround.

## Testing

- Create a test user in your IdP, assign them to the Modelux app, and
  trigger provisioning manually. They should appear under **Settings →
  Team** within a few seconds (Okta) or minutes (Entra).
- Deactivate them in the IdP. They should appear in Modelux with a
  *deactivated* indicator and be unable to sign in.
- Reactivate and confirm they can sign in again.

## Auditing

Every SCIM mutation emits an audit event under
**Settings → Audit log**, scoped to the SCIM token that made the call.

---

# Section: API Reference

---

<!-- source: https://modelux.ai/docs/api/overview -->

> Conventions for the Modelux proxy and management APIs.

# API overview

Modelux exposes two HTTP APIs:

- **Proxy API** at `https://api.modelux.ai/v1/*` — OpenAI-compatible surface for running inference. Authenticated with a project API key.
- **Management API** at `https://api.modelux.ai/manage/v1/*` — REST API for managing projects, routing configs, providers, budgets, analytics, etc. Authenticated with a management API key.

## Authentication

All API calls require a bearer token:

```
Authorization: Bearer mlx_sk_...        // proxy / project key
Authorization: Bearer mlx_mk_...        // management key
```

### API key prefixes

| Prefix | Use |
|---|---|
| `mlx_sk_` | Project API key — for the proxy API |
| `mlx_mk_` | Management API key — for the management API and MCP |

## Base URLs

| Environment | Base URL |
|---|---|
| Production | `https://api.modelux.ai` |
| Dev (self-hosted) | `http://localhost:8080` (proxy), `http://localhost:5100` (app) |

## Error format

Errors follow the OpenAI error shape for familiarity:

```json
{
  "error": {
    "type": "invalid_request_error",
    "message": "Missing required field: messages",
    "code": "missing_field"
  }
}
```

Management API errors use the same shape with additional fields where useful:

```json
{
  "error": {
    "type": "not_found",
    "message": "routing config @missing not found",
    "code": "routing_config_not_found",
    "resource": "routing_config",
    "id": "@missing"
  }
}
```

## Status codes

| Code | Meaning |
|---|---|
| `200` | Success |
| `400` | Bad request — malformed input |
| `401` | Missing or invalid authentication |
| `402` | Payment required — budget exceeded |
| `403` | Forbidden — authenticated but lacks permission |
| `404` | Not found |
| `409` | Conflict — duplicate resource |
| `422` | Unprocessable entity — validation failed |
| `429` | Rate limited |
| `500` | Server error — check status page |
| `502`/`504` | Upstream provider error or timeout |

## Rate limits

Proxy API limits are per-API-key. Default: 600 requests/minute. Upgrade tiers
increase this. Custom limits can be set per key.

Management API: 60 req/min per management key.

## Idempotency

Mutating management API endpoints accept an `Idempotency-Key` header. Same
key + same body returns the original response within 24 hours.

## Pagination

Management API list endpoints return:

```json
{
  "data": [...],
  "next_cursor": "opaque_cursor_or_null",
  "has_more": true
}
```

Pass `?cursor=...` to fetch the next page.

---

<!-- source: https://modelux.ai/docs/api/chat-completions -->

> POST /v1/chat/completions — the primary inference endpoint.

# Chat completions

Create a chat completion. OpenAI-compatible request and response shape.

```
POST /v1/chat/completions
```

## Request

```json
{
  "model": "@production",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false
}
```

### Model identifier

- **Raw model name** — `gpt-4o-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash`
- **Routing config slug** — `@production`, `@fallback`, `@experiment`

## Modelux extensions

Pass extra fields under `extra_body` (OpenAI SDK) or top-level `mlx:*` keys:

| Field | Description |
|---|---|
| `mlx:tags` | Object of key-value tags for analytics + routing |
| `mlx:end_user` | End-user identifier (for per-user analytics + budgets) |
| `mlx:cache` | Cache controls: `{ skip: true }` to bypass cache for this request |
| `mlx:trace` | Set `true` to include the decision trace in the response |

Example:

```python
response = client.chat.completions.create(
    model="@production",
    messages=[...],
    extra_body={
        "mlx:tags": {"tenant": "acme", "feature": "summarize"},
        "mlx:end_user": "user_abc",
    },
)
```

## Response

Standard OpenAI response shape:

```json
{
  "id": "chatcmpl_...",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello!" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}
```

## Streaming

Set `stream: true`. Modelux returns SSE events in the OpenAI streaming format.
Works the same with cascades and fallbacks — the stream starts when the first
successful attempt begins responding.

## Tool / function calling

Passes through unchanged to the provider. Modelux normalizes tool-call
behavior across OpenAI, Anthropic, and Google so the same tool schemas work
across all three.

## Structured output (JSON mode)

`response_format: { type: "json_object" }` works across all providers.
`response_format: { type: "json_schema", json_schema: {...} }` works where the
underlying provider supports it; otherwise Modelux falls back to JSON mode
and validates post-hoc.

## Headers

Modelux includes response headers on every successful request:

```
x-modelux-request-id:   req_a1b2c3
x-modelux-model-used:   gpt-4o-mini
x-modelux-provider:     openai
x-modelux-cost-usd:     0.002134
x-modelux-latency-ms:   238
x-modelux-config:       @production
x-modelux-config-version: 4
```

---

<!-- source: https://modelux.ai/docs/api/embeddings -->

> POST /v1/embeddings — vector embeddings for text.

# Embeddings

Create vector embeddings. OpenAI-compatible request and response shape.

```
POST /v1/embeddings
```

## Request

```json
{
  "model": "text-embedding-3-small",
  "input": ["Hello world", "Another string"]
}
```

Supports:

- Single string or array of strings
- Routing config slugs (`@embeddings`) just like chat completions
- OpenAI, Google, Cohere, and Voyage embedding models through their respective providers

## Response

```json
{
  "object": "list",
  "data": [
    { "object": "embedding", "index": 0, "embedding": [0.012, -0.034, ...] },
    { "object": "embedding", "index": 1, "embedding": [0.056, 0.089, ...] }
  ],
  "model": "text-embedding-3-small",
  "usage": { "prompt_tokens": 10, "total_tokens": 10 }
}
```

## Dimensions

Pass `dimensions: N` to request a specific vector dimensionality (where the
provider supports it, e.g. `text-embedding-3-small` and `text-embedding-3-large`).

---

<!-- source: https://modelux.ai/docs/api/management -->

> REST endpoints for managing Modelux resources.

# Management API overview

The management API lets you do everything the dashboard does: create
projects, edit routing configs, rotate credentials, fetch analytics, set
budgets.

Base URL: `https://api.modelux.ai/manage/v1`

Authenticate with a management API key (`mlx_mk_...`), not a project key.

## Resources

| Resource | Reference |
|---|---|
| [Projects](/docs/api/management-projects) | List, create, get, update, delete |
| [Routing configs](/docs/api/management-routing) | CRUD + versions, test, restore |
| [Providers](/docs/api/management-providers) | CRUD + health, credential rotation |
| [Budgets](/docs/api/management-budgets) | CRUD + alerts, events, reset |
| [Analytics & logs](/docs/api/management-analytics) | Reports, decisions, logs, traces, replay |
| [Webhooks](/docs/api/management-webhooks) | Endpoints, deliveries, event types |
| API keys | List, create, revoke |
| Simulations | Create, list, results, promote, estimate |
| Audit log | List, get |
| Org & members | Update org, invite, list, role updates |

## OpenAPI spec

The full OpenAPI spec is available at:

```
https://api.modelux.ai/manage/v1/openapi.yaml
```

Use it to generate clients in any language or import into Postman / Insomnia.

## MCP tools

Every management endpoint has a corresponding MCP tool. See [MCP setup](/docs/guides/mcp-setup)
to connect Claude Code or another MCP client.

## Idempotency

All mutating endpoints accept an `Idempotency-Key` header. Same key + same
body within 24h returns the cached response. Ideal for retries.

## Pagination

List endpoints return cursor-paginated responses:

```json
{
  "data": [...],
  "next_cursor": "opaque_cursor_or_null",
  "has_more": true
}
```

---

<!-- source: https://modelux.ai/docs/api/management-projects -->

> Create, list, and manage projects via the Management API.

# Projects API

Projects group API keys, routing configs, and usage analytics. Most accounts
use one project per app or environment.

## List projects

```
GET /manage/v1/projects
```

Returns all projects in the authenticated organization.

**Query parameters**

| Name | Type | Description |
|---|---|---|
| `cursor` | string | Opaque pagination cursor |
| `limit` | integer | Max items per page (default 50, max 200) |

**Response**

```json
{
  "data": [
    {
      "id": "proj_01HXY...",
      "name": "production",
      "slug": "production",
      "description": "main API traffic",
      "created_at": "2026-04-01T12:00:00Z"
    }
  ],
  "next_cursor": null,
  "has_more": false
}
```

## Get a project

```
GET /manage/v1/projects/{project_id}
```

## Create a project

```
POST /manage/v1/projects
```

**Request body**

```json
{
  "name": "staging",
  "description": "optional — shown in the dashboard"
}
```

Returns the created project.

## Update a project

```
PATCH /manage/v1/projects/{project_id}
```

Supports partial updates of `name` and `description`.

## Delete a project

```
DELETE /manage/v1/projects/{project_id}
```

Soft-delete. Existing API keys scoped to this project are revoked.
Historical analytics remain queryable with `include_deleted=true`.

## MCP tools

| Tool | Maps to |
|---|---|
| `list_projects` | `GET /manage/v1/projects` |
| `get_project` | `GET /manage/v1/projects/{id}` |
| `create_project` | `POST /manage/v1/projects` |
| `update_project` | `PATCH /manage/v1/projects/{id}` |
| `delete_project` | `DELETE /manage/v1/projects/{id}` |

## See also

- [Projects & API Keys (concept)](/docs/concepts/projects)
- [Management API overview](/docs/api/management)

---

<!-- source: https://modelux.ai/docs/api/management-routing -->

> Create and manage routing configs, versions, and test them.

# Routing Configs API

Routing configs define how requests are dispatched to providers. Every config
has a stable `@slug` your app calls instead of raw model names.

## List routing configs

```
GET /manage/v1/routing-configs
```

**Query parameters**

| Name | Type | Description |
|---|---|---|
| `project_id` | string | Filter to configs in one project |
| `cursor` | string | Pagination cursor |
| `limit` | integer | Max items per page |

## Get a routing config

```
GET /manage/v1/routing-configs/{config_id}
```

Returns the current active version. To fetch a specific version, use the
versions endpoint below.

## Create a routing config

```
POST /manage/v1/routing-configs
```

**Request body**

```json
{
  "project_id": "proj_01HXY...",
  "name": "production",
  "slug": "production",
  "strategy": "fallback",
  "config": {
    "attempts": [
      { "model": "claude-haiku-4-5", "timeout_ms": 2000 },
      { "model": "gpt-4o-mini", "timeout_ms": 3000 }
    ],
    "retry_on": ["429", "5xx", "timeout"]
  }
}
```

Valid strategies: `single`, `fallback`, `cost_optimized`, `latency_optimized`,
`ensemble`, `ab_test`, `cascade`, `custom_rules`.

## Update a routing config

```
PATCH /manage/v1/routing-configs/{config_id}
```

Any update creates a new version. The previous version stays queryable for
rollback.

## Versions

```
GET  /manage/v1/routing-configs/{config_id}/versions
GET  /manage/v1/routing-configs/{config_id}/versions/{version_id}
POST /manage/v1/routing-configs/{config_id}/versions/{version_id}/restore
```

`restore` promotes an old version back to the active version (creating a new
version that matches).

## Test a routing config

```
POST /manage/v1/routing-configs/{config_id}/test
```

**Request body**

```json
{
  "messages": [{ "role": "user", "content": "Hello" }],
  "dry_run": true
}
```

Returns the routing decision that would be made, without actually calling the
provider. With `dry_run: false`, runs the request end-to-end for verification.

## Delete a routing config

```
DELETE /manage/v1/routing-configs/{config_id}
```

Soft-delete. The slug becomes reusable immediately for new configs.

## MCP tools

| Tool | Maps to |
|---|---|
| `list_routing_configs` | `GET /manage/v1/routing-configs` |
| `get_routing_config` | `GET /manage/v1/routing-configs/{id}` |
| `create_routing_config` | `POST /manage/v1/routing-configs` |
| `update_routing_config` | `PATCH /manage/v1/routing-configs/{id}` |
| `delete_routing_config` | `DELETE /manage/v1/routing-configs/{id}` |
| `list_routing_config_versions` | `GET /manage/v1/routing-configs/{id}/versions` |
| `get_routing_config_version` | `GET /manage/v1/routing-configs/{id}/versions/{version}` |
| `restore_routing_config_version` | `POST /manage/v1/routing-configs/{id}/versions/{version}/restore` |
| `test_routing_config` | `POST /manage/v1/routing-configs/{id}/test` |

## See also

- [Routing (concept)](/docs/concepts/routing)
- [Fallback chain guide](/docs/guides/fallback-chain)
- [A/B testing guide](/docs/guides/ab-testing)

---

<!-- source: https://modelux.ai/docs/api/management-providers -->

> Add, rotate, and monitor provider credentials.

# Providers API

Provider credentials are the upstream API keys Modelux uses to proxy your
requests. Stored encrypted; Modelux never logs plaintext keys.

## List providers

```
GET /manage/v1/providers
```

Returns all provider credentials for the organization.

**Response**

```json
{
  "data": [
    {
      "id": "prov_01HXY...",
      "vendor": "openai",
      "name": "OpenAI Production",
      "base_url": null,
      "status": "active",
      "health": {
        "state": "healthy",
        "p50_latency_ms": 320,
        "last_check_at": "2026-04-14T12:00:00Z"
      },
      "created_at": "2026-04-01T10:00:00Z"
    }
  ]
}
```

## Add a provider

```
POST /manage/v1/providers
```

**Request body**

```json
{
  "vendor": "openai",
  "name": "OpenAI Production",
  "api_key": "sk-...",
  "base_url": null
}
```

Valid vendors: `openai`, `anthropic`, `google`, `azure`, `bedrock`, `groq`, `fireworks`.

For Azure OpenAI, set `base_url` to your resource endpoint. For Bedrock, pass
IAM credentials instead of an API key (see the Bedrock section below).

## Get a provider

```
GET /manage/v1/providers/{provider_id}
```

## Update a provider (rotate key)

```
PATCH /manage/v1/providers/{provider_id}
```

**Request body**

```json
{
  "api_key": "sk-new..."
}
```

Modelux verifies the new key before swapping it atomically. In-flight
requests finish with the old key.

## Delete a provider

```
DELETE /manage/v1/providers/{provider_id}
```

Fails if any routing config still references this provider directly. Detach
first, then delete.

## Health

```
GET /manage/v1/providers/{provider_id}/health
```

Returns latency percentiles and success rate over rolling windows.

## Bedrock credentials

For AWS Bedrock, send IAM credentials as the `api_key` field:

```json
{
  "vendor": "bedrock",
  "name": "Bedrock US-West",
  "api_key": "AKIA...::wJalrXUtnFEMI...::us-west-2",
  "base_url": null
}
```

Format: `ACCESS_KEY_ID::SECRET_ACCESS_KEY::REGION`. Optional `::SESSION_TOKEN`
suffix for STS temporary credentials.

## MCP tools

| Tool | Maps to |
|---|---|
| `list_providers` | `GET /manage/v1/providers` |
| `get_provider` | `GET /manage/v1/providers/{id}` |
| `add_provider` | `POST /manage/v1/providers` |
| `update_provider` | `PATCH /manage/v1/providers/{id}` |
| `delete_provider` | `DELETE /manage/v1/providers/{id}` |
| `get_provider_health` | `GET /manage/v1/providers/{id}/health` |

## See also

- [Providers (concept)](/docs/concepts/providers)

---

<!-- source: https://modelux.ai/docs/api/management-budgets -->

> Create and manage spend caps and alerts.

# Budgets API

Budgets enforce spending limits across org, project, tag, or end-user scopes.
Cap breaches can alert, auto-downgrade, or block.

## List budgets

```
GET /manage/v1/budgets
```

## Get a budget

```
GET /manage/v1/budgets/{budget_id}
```

Returns the budget config, current spend, and period bounds.

## Create a budget

```
POST /manage/v1/budgets
```

**Request body**

```json
{
  "name": "Q2 production cap",
  "scope": {
    "type": "project",
    "project_id": "proj_01HXY..."
  },
  "cap_usd": 500,
  "period": "monthly",
  "action_at_cap": "auto_downgrade",
  "downgrade_to": "@cheap"
}
```

Valid `scope.type`: `org`, `project`, `tag`, `end_user`.
Valid `action_at_cap`: `alert`, `block`, `auto_downgrade`.

## Update a budget

```
PATCH /manage/v1/budgets/{budget_id}
```

## Delete a budget

```
DELETE /manage/v1/budgets/{budget_id}
```

## Reset a budget

```
POST /manage/v1/budgets/{budget_id}/reset
```

Clears current-period spend without waiting for the next reset date. Useful
after resolving a cost anomaly.

## Alerts

```
GET    /manage/v1/budgets/{budget_id}/alerts
POST   /manage/v1/budgets/{budget_id}/alerts
DELETE /manage/v1/budgets/{budget_id}/alerts/{alert_id}
```

Each alert specifies a threshold percentage (e.g. 80) and one or more
channels (email, webhook).

## Events

```
GET /manage/v1/budgets/{budget_id}/events
```

Returns the history of threshold crossings, cap actions, and resets.

## MCP tools

| Tool | Maps to |
|---|---|
| `list_budgets` | `GET /manage/v1/budgets` |
| `get_budget` | `GET /manage/v1/budgets/{id}` |
| `create_budget` | `POST /manage/v1/budgets` |
| `update_budget` | `PATCH /manage/v1/budgets/{id}` |
| `delete_budget` | `DELETE /manage/v1/budgets/{id}` |
| `reset_budget` | `POST /manage/v1/budgets/{id}/reset` |
| `list_budget_alerts` | `GET /manage/v1/budgets/{id}/alerts` |
| `create_budget_alert` | `POST /manage/v1/budgets/{id}/alerts` |
| `delete_budget_alert` | `DELETE /manage/v1/budgets/{id}/alerts/{aid}` |
| `list_budget_events` | `GET /manage/v1/budgets/{id}/events` |

## See also

- [Budgets (concept)](/docs/concepts/budgets)
- [Cost optimization guide](/docs/guides/cost-optimization)

---

<!-- source: https://modelux.ai/docs/api/management-analytics -->

> Query request logs, analytics reports, and decision traces.

# Analytics & Logs API

Query aggregated metrics and individual request logs.

## Analytics report

```
GET /manage/v1/analytics/report
```

**Query parameters**

| Name | Type | Description |
|---|---|---|
| `start` | ISO 8601 | Window start |
| `end` | ISO 8601 | Window end |
| `granularity` | enum | `hour`, `day` |
| `project_id` | string | Scope to one project |
| `group_by` | enum | `model`, `provider`, `project`, `tag:<key>`, `end_user` |
| `tags` | JSON | Additional tag filters (e.g. `{"tenant":"acme"}`) |
| `include_comparison` | bool | Include previous-period series |

**Response (abbreviated)**

```json
{
  "series": [
    {
      "bucket": "2026-04-14T00:00:00Z",
      "requests": 1247,
      "cost_usd": 3.142,
      "input_tokens": 98213,
      "output_tokens": 45021,
      "errors": 3,
      "p50_latency_ms": 238,
      "p95_latency_ms": 801,
      "p99_latency_ms": 1420
    }
  ],
  "totals": {
    "requests": 1247,
    "cost_usd": 3.142,
    "error_rate": 0.0024
  }
}
```

## Decisions summary

```
GET /manage/v1/analytics/decisions
```

Aggregate routing decisions: per config, which attempts ran, how often
fallbacks fired, why.

## Request logs

```
GET /manage/v1/logs
```

**Query parameters**

| Name | Type | Description |
|---|---|---|
| `start` | ISO 8601 | Window start |
| `end` | ISO 8601 | Window end |
| `project_id` | string | Filter |
| `status` | string | Filter by status class: `2xx`, `4xx`, `5xx` |
| `model` | string | Filter by model name |
| `provider` | string | Filter by provider |
| `end_user` | string | Filter by end-user tag |
| `tags` | JSON | Tag key-value filters |
| `min_latency_ms` | integer | Slow-query filter |
| `cursor` | string | Pagination |

Returns a paginated list of request summaries. Use the single-request
endpoint below for full details.

## Single request

```
GET /manage/v1/logs/{request_id}
```

Returns the full request: input messages (if retention allows), output,
decision trace, per-attempt metrics, cost breakdown.

## Request trace

```
GET /manage/v1/logs/{request_id}/trace
```

Just the decision trace: attempts, timings, reasons, final decision.

## Replay

```
POST /manage/v1/logs/{request_id}/replay
```

Re-run a request against a specified routing config. Useful for debugging
individual requests after a config change.

**Request body**

```json
{
  "routing_config": "@candidate-v2",
  "dry_run": false
}
```

## MCP tools

| Tool | Maps to |
|---|---|
| `get_analytics_report` | `GET /manage/v1/analytics/report` |
| `get_decisions_summary` | `GET /manage/v1/analytics/decisions` |
| `list_logs` | `GET /manage/v1/logs` |
| `get_log` | `GET /manage/v1/logs/{id}` |
| `get_request_trace` | `GET /manage/v1/logs/{id}/trace` |
| `replay_log_entry` | `POST /manage/v1/logs/{id}/replay` |

## See also

- [Analytics & Logs (concept)](/docs/concepts/analytics)

---

<!-- source: https://modelux.ai/docs/api/management-webhooks -->

> Manage webhook endpoints and deliveries.

# Webhooks API

Webhooks deliver Modelux events to your own infrastructure. Each endpoint
subscribes to one or more event types and is called asynchronously with
HMAC-signed payloads.

## List endpoints

```
GET /manage/v1/webhooks/endpoints
```

## Get an endpoint

```
GET /manage/v1/webhooks/endpoints/{endpoint_id}
```

## Create an endpoint

```
POST /manage/v1/webhooks/endpoints
```

**Request body**

```json
{
  "url": "https://your-app.example.com/hooks/modelux",
  "event_types": [
    "budget.threshold_reached",
    "routing_config.updated"
  ],
  "description": "Production webhook"
}
```

Returns the endpoint with a generated `signing_secret`. Shown once — save it
for verifying delivery signatures.

## Update an endpoint

```
PATCH /manage/v1/webhooks/endpoints/{endpoint_id}
```

Supports updating `url`, `event_types`, `description`, and `active` flag.

## Delete an endpoint

```
DELETE /manage/v1/webhooks/endpoints/{endpoint_id}
```

Soft-delete. In-flight deliveries still complete.

## Rotate signing secret

```
POST /manage/v1/webhooks/endpoints/{endpoint_id}/rotate-secret
```

Generates a new signing secret. Returned once in the response. The old
secret remains valid for 24 hours to allow graceful rollover in your
verifier.

## Send test event

```
POST /manage/v1/webhooks/endpoints/{endpoint_id}/test
```

**Request body**

```json
{
  "event_type": "budget.threshold_reached"
}
```

Sends a synthetic event to the endpoint for connectivity testing.

## List deliveries

```
GET /manage/v1/webhooks/deliveries
```

**Query parameters**

| Name | Type | Description |
|---|---|---|
| `endpoint_id` | string | Filter to one endpoint |
| `status` | enum | `pending`, `delivered`, `failed` |
| `event_type` | string | Filter by event type |
| `cursor` | string | Pagination |

## Get a delivery

```
GET /manage/v1/webhooks/deliveries/{delivery_id}
```

Returns the delivery payload, response status, response body, and attempt
history.

## Replay a delivery

```
POST /manage/v1/webhooks/deliveries/{delivery_id}/replay
```

Re-send the same payload. Useful after fixing an endpoint outage.

## List event types

```
GET /manage/v1/webhooks/event-types
```

Returns all event types with a short description. Useful for building
configuration UIs.

## MCP tools

| Tool | Maps to |
|---|---|
| `list_webhook_endpoints` | `GET /manage/v1/webhooks/endpoints` |
| `get_webhook_endpoint` | `GET /manage/v1/webhooks/endpoints/{id}` |
| `create_webhook_endpoint` | `POST /manage/v1/webhooks/endpoints` |
| `update_webhook_endpoint` | `PATCH /manage/v1/webhooks/endpoints/{id}` |
| `delete_webhook_endpoint` | `DELETE /manage/v1/webhooks/endpoints/{id}` |
| `rotate_webhook_secret` | `POST /manage/v1/webhooks/endpoints/{id}/rotate-secret` |
| `send_webhook_test` | `POST /manage/v1/webhooks/endpoints/{id}/test` |
| `list_webhook_deliveries` | `GET /manage/v1/webhooks/deliveries` |
| `get_webhook_delivery` | `GET /manage/v1/webhooks/deliveries/{id}` |
| `replay_webhook_delivery` | `POST /manage/v1/webhooks/deliveries/{id}/replay` |
| `list_webhook_event_types` | `GET /manage/v1/webhooks/event-types` |

## See also

- [Webhooks (concept)](/docs/concepts/webhooks)

---