# Modelux Documentation (Full) > The control plane for LLMs. Every Modelux docs page concatenated into one file for ingestion by LLMs with long context windows. Source: https://modelux.ai/llms-full.txt Individual pages: https://modelux.ai/docs/.md Docs tree (JSON): https://modelux.ai/docs.json --- # Section: Getting Started --- > Modelux is the control plane for LLMs. Policy-driven routing, finance-grade budgets, decision traces, and replay across every provider. # Modelux Docs Welcome. Modelux is the **control plane for your LLM stack**. You point your OpenAI SDK at Modelux and get policy-driven routing across every provider, finance-grade budgets, full decision traces, and a replay simulator — without changing your application code. ## Start here - **[Quickstart](/docs/quickstart)** — Send your first request in under 2 minutes. - **[Concepts / Routing](/docs/concepts/routing)** — How routing configs work. - **[API Reference](/docs/api/overview)** — The proxy and management APIs. ## What Modelux does - **Policy-driven routing.** Fallback chains, cost-optimized, latency-optimized, ensembles, A/B tests, cascades, custom rule DSL across OpenAI, Anthropic, Google, Azure, Bedrock, Groq, Fireworks. - **Finance-grade budgets.** Scoped spend caps with auto-downgrade, alerts, and tag-based attribution. - **Decision-level observability.** Every request stores the full routing decision: attempts, reasons, per-attempt timings and costs. - **Replay & versioning.** Configs are versioned with one-click rollback. Replay historical traffic against candidate configs before you ship them. - **Audit & governance.** Audit log, role-based access, SSO/SAML, IP allowlists. - **AI-native management.** REST API + MCP server — manage everything from your AI agent. ## What Modelux doesn't do - Prompt management / versioning (use a dedicated tool) - Model fine-tuning or hosting (we route to providers) - Prompt evaluation (planned, not shipped) --- > Send your first request through Modelux in under 2 minutes. # Quickstart Two minutes from zero to routing. This guide walks you through: creating an account, adding a provider, creating a project + API key, and sending your first request. ## 1. Create an account Go to [app.modelux.ai](https://app.modelux.ai/login) and sign in with Google or a passwordless email link. When you log in for the first time, Modelux creates a personal organization for you. ## 2. Add a provider Modelux is BYO-keys — we proxy requests using your own provider credentials. 1. Open **Providers** in the sidebar. 2. Click **Add provider**. 3. Pick a provider (OpenAI, Anthropic, Google, Azure, Bedrock, etc.). 4. Paste your API key. Modelux stores it encrypted and verifies it with a test call. ## 3. Create a project Projects group routing configs, API keys, and usage analytics. 1. Open **Projects** in the sidebar. 2. Click **Create project**. Give it a name like `my-app`. 3. Create an API key scoped to the project — it'll be shown once, prefixed with `mlx_sk_`. ## 4. Configure routing (optional) By default, you can call any model directly by name (`gpt-4o`, `claude-sonnet-4-5`, etc.) and Modelux will route it to the matching provider. For more advanced routing — fallbacks, ensembles, cost optimization — create a **routing config** under **Routing** in the sidebar. Each config gets a stable slug like `@production` that your app calls instead of a raw model name. ## 5. Send your first request The OpenAI SDK works unchanged. Just swap the `base_url` and API key: ```python from openai import OpenAI client = OpenAI( base_url="https://api.modelux.ai/v1", api_key="mlx_sk_...", ) response = client.chat.completions.create( model="gpt-4o-mini", # or "@production" for a routing config messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content) ``` ```javascript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.modelux.ai/v1", apiKey: process.env.MODELUX_API_KEY, }); const response = await client.chat.completions.create({ model: "gpt-4o-mini", messages: [{ role: "user", content: "Hello!" }], }); console.log(response.choices[0].message.content); ``` ```bash curl https://api.modelux.ai/v1/chat/completions \ -H "Authorization: Bearer $MODELUX_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ## What to do next - **[Routing concepts](/docs/concepts/routing)** — Understand how routing configs work. - **[Set up a fallback chain](/docs/guides/fallback-chain)** — Reliability in 5 minutes. - **[Cost optimization](/docs/guides/cost-optimization)** — Cut your bill with smart routing. - **[MCP setup](/docs/guides/mcp-setup)** — Manage Modelux from Claude Code. --- # Section: Concepts --- > How routing configs work in Modelux. # Routing A **routing config** is a named, versioned resource that tells Modelux how to handle a request. Your application calls Modelux with a routing config slug (like `@production`) and Modelux decides which model(s) and provider(s) to actually invoke. Routing configs live in Modelux, not in your code. Change the routing behavior without redeploying your app. ## Calling a routing config Use the slug prefixed with `@` as the model name: ```python client.chat.completions.create( model="@production", messages=[...] ) ``` You can also call raw model names (`gpt-4o`, `claude-sonnet-4-5`) directly — Modelux auto-routes them to the matching provider using your credentials. ## Strategies | Strategy | Description | |---|---| | `single` | Lock traffic to one model + provider. | | `fallback` | Ordered list of attempts with per-attempt timeouts. Retries on 429, 5xx, timeout. | | `cost_optimized` | Pick the cheapest model meeting a quality tier, from an allowlist. | | `latency_optimized` | Route to the lowest-p50-latency healthy provider. | | `ensemble` | Parallel fan-out + aggregation (voting, first-valid, weighted). | | `ab_test` | Percentage-based split across sub-configs. | | `cascade` | Sequential attempts with early stop on success. | | `custom_rules` | Programmable DSL over cost, latency, budget, tags. | ## Versioning Every save creates a new version. You can: - **Diff** two versions side-by-side - **Rollback** to any previous version with one click - **Promote** a candidate from a simulation result ## Model aliases Instead of hardcoding `gpt-4o-mini` in every request, create a routing config at `@fast` or `@cheap` and reference those slugs. Change the underlying model later without touching your app code. ## Tags Tag requests with arbitrary key-value pairs to scope routing, analytics, and budgets: ```python client.chat.completions.create( model="@production", messages=[...], extra_body={ "mlx:tags": { "tenant": "acme", "feature": "summarize", }, }, ) ``` Custom rules can branch on tags: `if tenant == "enterprise" then use @premium else use @production`. --- > Managing provider credentials in Modelux. # Providers A **provider** is an upstream LLM vendor — OpenAI, Anthropic, Google, Azure OpenAI, AWS Bedrock, Groq, Fireworks. Modelux proxies your requests using provider credentials you supply (BYO keys). We don't mark up per-token costs. ## Supported providers | Provider | Status | |---|---| | OpenAI | Shipped | | Anthropic | Shipped | | Google (Gemini) | Shipped | | Azure OpenAI | Shipped | | AWS Bedrock | Shipped | | Groq | In progress | | Fireworks | In progress | ## Adding a provider 1. Open **Providers** in the dashboard. 2. Click **Add provider**. 3. Select the vendor, paste your API key, optionally set a base URL for self-hosted or regional endpoints. 4. Modelux stores the credential encrypted and runs a verification call before marking it active. ## Health monitoring Modelux tracks provider health continuously: - **Success rate** — rolling window of 2xx vs 4xx/5xx - **p50 latency** — per-model, per-region where applicable - **Last check timestamp** — indicates how fresh the health signal is When a provider is marked unhealthy, health-aware routing strategies automatically prefer other providers until it recovers. ## Credential rotation Rotate a provider's API key without downtime: 1. Edit the provider in the dashboard 2. Paste the new key and save 3. Modelux verifies the new key, then atomically swaps it Old in-flight requests finish with the old key; new requests pick up the new key immediately. ## Custom base URLs For Azure OpenAI deployments, self-hosted vLLM endpoints, or regional Bedrock routes, set a custom base URL when creating the provider. Modelux will use that URL for all requests routed to this provider. --- > Projects, API keys, and access control. # Projects & API Keys A **project** is a logical grouping of API keys, routing configs, and usage analytics within an organization. Typical pattern: one project per app, one project per environment (prod/staging), or one project per customer. ## Projects Projects give you: - **Scoped API keys** — each key belongs to exactly one project - **Scoped routing configs** — configs are created within a project - **Scoped analytics** — request logs, cost, and latency filterable per project - **Scoped budgets** — spend caps can apply to a single project ## API keys Modelux API keys use the prefix `mlx_sk_`. They're shown once at creation time — copy it somewhere safe. We store a SHA-256 hash, so we can't show you the key again later. Each key can have: - **Name** — for humans (e.g., "staging-server") - **Optional expiry** — auto-revoke after N days - **Optional rate limit** — requests per minute - **Revoked** status — revoke at any time without rotating other keys ## Using an API key Pass it as a Bearer token: ```bash Authorization: Bearer mlx_sk_xxxxxxxxxxxx ``` Or via the OpenAI SDK: ```python client = OpenAI( base_url="https://api.modelux.ai/v1", api_key="mlx_sk_...", ) ``` ## Project-scoped routing Routing configs belong to a project, but their slug (`@production`) is unique within the project. A request authenticated with a project's API key can only reference routing configs from that project. --- > Spend caps, alerts, and auto-downgrade. # Budgets Budgets let you set spending limits at multiple scopes and enforce them automatically. Modelux tracks per-request cost across every provider and applies your budget rules in real time. ## Scopes A budget applies to one of: - **Organization** — total spend across all projects - **Project** — spend on a specific project - **End-user** — spend per end-user tag - **Tag** — spend scoped by any custom tag key ## Caps Each budget has: - **Monthly cap** — reset on the 1st of each month, or a custom reset schedule - **Soft cap** — alert threshold (defaults to 80%) - **Hard cap** — enforcement action when reached ## Actions When a hard cap is hit, Modelux can: - **Alert only** — send an email / webhook; continue serving - **Auto-downgrade** — automatically route to a cheaper model (configured per budget) - **Block** — return `402 Payment Required` with an upgrade prompt ## Alerts Every budget supports multiple alert thresholds with configurable actions. Alerts fire via: - Email to designated recipients - Webhook (Slack-compatible format supported) - Dashboard banner ## Forecasting Modelux shows a projected end-of-month spend based on current rate, plus period-over-period comparisons so you can catch cost regressions quickly. --- > Request logging, analytics, and decision traces. # Analytics & Logs Modelux captures every request in ClickHouse for fast analytics and keeps a full decision trace so you can answer "why did this request go to that model" for anything in your history. ## Request logs Every request gets a log entry with: - **Timestamp, model, provider, project** - **Input and output** (tokens + optional full content based on retention config) - **Cost** — input cost, output cost, total, in USD - **Latency** — time-to-first-token and total latency - **Status** — 2xx/4xx/5xx with error class on failures - **Decision trace** — attempts, reasons, per-attempt metrics - **Tags** — whatever custom tags you attached to the request Browse logs in the dashboard or query via the management API. Logs are searchable by tag, user, project, status, latency threshold, and time range. ## Analytics The analytics page aggregates logs into: - **Volume** — requests over time, stacked by model or provider - **Cost** — total cost over time, broken down by model/provider/project/tag - **Latency** — p50/p95/p99 percentiles per model - **Error rates** — grouped by error class and provider - **End users** — top users by spend, volume, latency - **Forecasts** — projected monthly spend with period-over-period comparison Filters: date range, project, model, provider, status, tags. ## Decision traces A decision trace answers: what routing strategy ran, which attempts were tried, why they succeeded or failed, and what was chosen. Example: ``` config: @production (v4) strategy: fallback attempt_1 claude-haiku-4-5 timeout (2000ms) attempt_2 gpt-4o-mini 200 OK decision: fallback → attempt_2 reason: primary timeout, secondary healthy ``` Click any request in the dashboard to see the full trace. ## Log retention Retention varies by tier: 7 days (Free), 30 days (Pro), 60 days (Team), 90+ days (Enterprise, configurable). Structured analytics are retained longer than raw request/response payloads. --- > Run multiple models in parallel and aggregate their outputs. # Ensembles An ensemble routing config fans a single request out to multiple models in parallel, then aggregates their responses into a single final output. Done right, ensembles of smaller models can match or exceed frontier-model quality at a fraction of the cost. ## Aggregation strategies | Strategy | Description | Best for | |---|---|---| | `first_valid` | Return the first attempt that succeeds the validation | Latency-sensitive with reliability fallback | | `weighted_vote` | Classification-style vote across outputs | Categorical / structured outputs | | `weighted_average` | Numeric outputs combined by weight | Scoring, ratings | | `llm_judge` | Send all outputs to a judge model for best-pick | Open-ended generation | ## Configuration ```json { "strategy": "ensemble", "aggregation": "weighted_vote", "members": [ { "model": "claude-haiku-4-5", "weight": 1.0 }, { "model": "gpt-4o-mini", "weight": 1.0 }, { "model": "gemini-2.5-flash", "weight": 0.8 } ], "timeout_ms": 5000 } ``` ## Cost math A 3-model ensemble of cheap models costs roughly 3x the cost of one cheap model. Example: - Frontier model (e.g. GPT-4o): ~$0.015 per 1k tokens - 3-model ensemble (haiku + 4o-mini + flash): ~$0.003 per 1k tokens That's 5x cheaper, often at comparable quality for many tasks. The ensemble cost estimator in the dashboard shows live per-request cost based on your typical prompt size. ## When to use ensembles Good fits: - Structured / classification tasks where voting helps - Quality-critical tasks where you'd otherwise use a frontier model - Tasks where small model variance is the main quality issue Less ideal: - Streaming-heavy workloads (ensembles don't stream) - Latency-critical (you wait for slowest member, bounded by timeout) - Tasks where cheap models already suffice on their own --- > Exact-match and semantic caching. # Caching Modelux caches successful responses keyed by request content. Cache hits skip the provider call entirely and return the cached response with sub-millisecond latency at zero provider cost. ## Cache modes | Mode | Match behavior | |---|---| | `exact` | Request body must match byte-for-byte (default) | | `semantic` | Embedding similarity above a threshold (advanced) | ## TTL Configure TTL per routing config: ```json { "cache": { "mode": "exact", "ttl_seconds": 3600 } } ``` ## What gets cached - Successful chat completions and embeddings - Streaming responses (stitched into a complete response server-side) - Structured outputs (JSON mode) ## What doesn't get cached - Failed responses (4xx, 5xx) - Requests with `temperature > 0` unless you opt in explicitly - Requests with tool-calling (by default; opt-in available) ## Cache-control per-request Override the cache at request time: ```python client.chat.completions.create( model="@production", messages=[...], extra_body={"mlx:cache": {"skip": True}}, ) ``` --- > Event-driven integrations via webhooks. # Webhooks Webhooks let you react to Modelux events in your own infrastructure: budget alerts, config changes, provider health transitions, request anomalies. ## Event types - `budget.threshold_reached` - `budget.exceeded` - `routing_config.updated` - `routing_config.created` - `routing_config.deleted` - `provider.health_changed` - `api_key.revoked` - `request.anomaly_detected` ## Endpoint setup 1. Open **Integrations -> Webhooks** in the dashboard. 2. Click **Add endpoint**. Enter the destination URL. 3. Select which event types to subscribe to. 4. Modelux generates a signing secret. Save it to verify incoming payloads. ## Signature verification Every delivery includes an `X-Modelux-Signature` header with an HMAC-SHA256 of the raw body using your signing secret. ```javascript import crypto from "crypto"; function verify(body, signature, secret) { const expected = crypto .createHmac("sha256", secret) .update(body) .digest("hex"); return crypto.timingSafeEqual( Buffer.from(signature), Buffer.from(expected), ); } ``` ## Delivery & retries - Deliveries run asynchronously through a durable queue - On non-2xx response or timeout: retried with exponential backoff up to 24h - The dashboard shows per-delivery status, payload, response body, and a replay button for manual redelivery ## Slack-compatible format Set the endpoint URL to a Slack webhook URL. Modelux detects it and formats the payload as a Slack message automatically. --- # Section: Guides --- > Move your app from direct OpenAI to Modelux in 3 lines. # Migrating from OpenAI If your app already uses the OpenAI SDK, you can point it at Modelux with three changes — no other code modifications needed. ## 1. Add OpenAI as a provider in Modelux In the dashboard, add your existing OpenAI API key as a provider. Modelux will use that key to proxy requests — you keep your existing OpenAI account, billing, and rate limits. ## 2. Create a Modelux API key Create a project, then generate an API key scoped to it. Copy the `mlx_sk_...` value. ## 3. Update your client config Change two lines in your app: ```diff from openai import OpenAI client = OpenAI( - api_key=os.environ["OPENAI_API_KEY"], + base_url="https://api.modelux.ai/v1", + api_key=os.environ["MODELUX_API_KEY"], ) ``` That's it. Your existing `client.chat.completions.create(...)` calls work unchanged. Model names like `gpt-4o-mini` are routed to OpenAI through your credentials. ## What you get for free Just by routing through Modelux, with zero other code changes: - **Full request logs** with searchable traces - **Per-request cost tracking** - **Latency percentiles** by model - **Team-level analytics** if multiple apps share one org ## Next steps Once traffic is flowing, you can add value without further code changes: - **Add a fallback chain** to improve reliability — create a routing config `@production` that falls back from `gpt-4o-mini` to `claude-haiku-4-5`, then update your app to call `model="@production"` instead of `gpt-4o-mini`. - **Set a monthly budget** with auto-downgrade to cap your spend. - **Enable the replay simulator** to test changes against historical traffic. ## Streaming still works Streaming responses pass through unchanged: ```python stream = client.chat.completions.create( model="@production", messages=[...], stream=True, ) for chunk in stream: print(chunk.choices[0].delta.content or "", end="") ``` --- > Build a reliable routing config with automatic failover. # Setting up a fallback chain Fallback chains are the simplest way to improve reliability. You define an ordered list of model attempts with per-attempt timeouts. If the first attempt fails (timeout, 429, or 5xx), Modelux automatically retries with the next model. ## Why fallbacks? - Every provider has outages, rate limits, and intermittent failures - Different models fail at different times — a good fallback chain is rarely fully down - Your app sees a consistent response — no retry logic to write ## Create the config In the dashboard, go to **Routing -> Create config**, pick **Fallback**, and add attempts: ```json { "strategy": "fallback", "attempts": [ { "model": "claude-haiku-4-5", "timeout_ms": 2000 }, { "model": "gpt-4o-mini", "timeout_ms": 3000 }, { "model": "gemini-2.5-flash", "timeout_ms": 5000 } ], "retry_on": ["429", "5xx", "timeout"] } ``` ## Tips ### Primary: fast, cheap, usually good Put your preferred cheap+fast model first. Aggressive timeout (1-2s for simple prompts, longer for reasoning tasks). ### Secondary: different provider Diversify across providers — if OpenAI is down, Anthropic usually isn't. ### Tertiary: conservative timeout Longer timeout on the last attempt; you've already burned latency budget on the first two. ### Don't chain too many Three attempts is usually enough. More than that and you'll hit your overall request timeout before the last attempts even run. ## Verify Check the decision trace of a few requests in **Logs** to see which attempt actually served each request. Over time, the **Analytics** page will show attempt distribution. ## Call it from your app ```python client.chat.completions.create( model="@production", # the slug of the routing config messages=[...], ) ``` --- > Strategies to cut your LLM bill with Modelux. # Cost optimization LLM bills scale with usage. Modelux gives you three levers to cut costs without code changes: smaller models for simpler traffic, ensembles of cheap models, and budget enforcement. ## Lever 1: Right-size the model Most teams over-provision. GPT-4o isn't needed for a classification task that GPT-4o-mini handles fine. Use a **cost-optimized** routing config: ```json { "strategy": "cost_optimized", "quality_tier": "standard", "allowed_models": [ "gpt-4o-mini", "claude-haiku-4-5", "gemini-2.5-flash" ] } ``` Modelux picks the cheapest allowed model that meets the quality tier. You'll see typical savings of 50-70% vs. using a frontier model for everything. ## Lever 2: Ensembles For quality-critical tasks where you'd otherwise reach for a frontier model, try a 3-model ensemble of cheap models. Voting across multiple cheap models often matches frontier-model quality at 20% of the cost. See [Ensembles](/docs/concepts/ensembles) for full configuration details. ## Lever 3: Budget caps with auto-downgrade Set a monthly budget with auto-downgrade: - Cap: $500/month on your production project - At 80%: email alert - At 100%: automatically downgrade all traffic to a cheaper routing config Your app keeps serving requests — just at a lower cost — until the budget resets next month. ## Lever 4: Caching Enable exact-match caching on routing configs where prompts repeat. Cache hits cost $0. Even 10% cache hit rate on a high-volume endpoint pays for Modelux many times over. ## Measure the savings Go to **Analytics -> Period comparison**. Set the baseline to the month before you enabled cost optimization. You'll see period-over-period cost and volume diffs side-by-side. ## A typical result A team spending $10k/month on GPT-4o: - Adds a cost-optimized config routing 60% of traffic to `gpt-4o-mini` + `claude-haiku-4-5` - Keeps 40% on `gpt-4o` for complex queries - New monthly bill: ~$5,200 - Modelux cost: $199/month (Team tier) - **Net savings: ~$4,600/month** --- > Run controlled experiments to compare models in production. # A/B testing models A/B tests route a configurable percentage of traffic to each sub-config so you can compare cost, latency, and quality in real production traffic. ## Why A/B test? - Changing models is high-stakes. Benchmarks don't match your specific use case. - Ensemble configs are especially tricky — aggregation behavior depends on your data distribution. - Cost/latency claims from vendors rarely match your real numbers. ## Create an A/B test ```json { "strategy": "ab_test", "variants": [ { "weight": 80, "config": "@production" }, { "weight": 20, "config": "@production-candidate" } ] } ``` Call the wrapper config from your app: ```python client.chat.completions.create( model="@experiment", messages=[...], ) ``` Modelux logs which variant ran per request, so you can compare. ## Read the results Go to **Analytics -> Compare variants**. Modelux shows side-by-side: - Request volume - Mean cost per request - p50 / p95 latency - Error rate If you tag requests with a quality signal from your app (e.g., user thumbs-up/down), the analytics can also compare quality metrics across variants. ## Promote a variant Once you've seen enough volume to be confident, promote the winner: 1. Go to **Simulations** or the routing config's versions view 2. Select the variant 3. Click **Promote** — Modelux atomically switches your traffic over ## Replay before you A/B If you want signal before sending real traffic, use the **replay simulator**: take the last 24h of requests and run them through the candidate config. You'll see the cost/latency diff without risking production quality. --- > Manage Modelux from Claude Code, Cursor, or any MCP client. # Connecting your AI (MCP) Modelux exposes every management action as a tool through the Model Context Protocol (MCP). Connect an MCP-aware AI client — Claude Code, Cursor, etc. — and you can manage routing, budgets, providers, analytics, and more through natural language. ## Get your MCP URL and management token 1. In the dashboard, go to **Settings -> API & MCP**. 2. Copy the MCP server URL: `https://api.modelux.ai/mcp` 3. Create a **management API key** (separate from the proxy API keys) — this token has broader scope and can modify your org's configuration. ## Configure Claude Code Add the server to your `~/.claude/claude_desktop_config.json` (or Claude Code settings): ```json { "mcpServers": { "modelux": { "url": "https://api.modelux.ai/mcp", "headers": { "Authorization": "Bearer mlx_mk_..." } } } } ``` Restart Claude Code. It will connect to the server and list available tools. ## What you can ask - "Create a cascade that tries haiku first then falls back to sonnet" - "Show me yesterday's spend by model" - "Set a $500/month cap on my production project with auto-downgrade to @cheap" - "Which provider had the highest error rate last week?" - "Rotate my OpenAI API key and verify it works" - "Replay the last 24 hours of traffic against the candidate config" ## Scope The MCP key is org-scoped. Whoever has the key can perform any management action within that org. Treat it like any privileged credential: - Don't commit it to source control - Rotate periodically - Revoke immediately if exposed ## Tools available Modelux exposes 80+ tools covering the full management API surface: projects, routing configs, providers, budgets, API keys, analytics, audit logs, webhooks, simulations, and org/member management. --- > Connect your identity provider to Modelux so your team signs in with their work account. # SAML SSO overview Modelux supports **SAML 2.0** for workforce single sign-on. Connect your identity provider (IdP) — Okta, Microsoft Entra ID, Google Workspace, JumpCloud, OneLogin, or any SAML-compliant IdP — and members of your org sign in with their work account instead of a password. SSO is available on the **Enterprise** plan. Pair it with [SCIM provisioning](/docs/guides/scim) to automate user lifecycle. ## When do I need this? You need SSO when any of these apply: - Your IT / security team requires federated auth for every SaaS - You want one-click deactivation when someone leaves the company - You need SOC 2 or ISO 27001 evidence for access control - Your organization has more than ~25 seats If none of those apply, password + Google + passwordless email login is probably simpler. ## How Modelux's SAML works Modelux is a **SAML 2.0 service provider (SP)**. Your IdP authenticates the user and sends a signed assertion back to Modelux, which creates the session. Two login entry points are supported: - **SP-initiated:** user visits `app.modelux.ai/login`, clicks "Use SAML SSO", enters their work email, and we redirect them to your IdP. - **IdP-initiated:** user clicks the Modelux tile in their workforce portal (Okta dashboard, myapps.microsoft.com, etc.) and lands straight in the Modelux dashboard. On first login, Modelux **just-in-time provisions** the user into your org with the default role you configured. If they already have a Modelux user with a matching verified email, we merge — no duplicate accounts. ## What you'll need Before starting: - An **admin** or **owner** role in your Modelux org - An account with **app-administrator** rights in your IdP (to register a new SAML app) - A verified email domain you can add a DNS TXT record to ## High-level setup The flow is the same for every IdP: 1. In Modelux → **Settings → SSO**, note these three values. Your IdP needs them to register Modelux as a service provider: - **SP Entity ID** (e.g. `https://app.modelux.ai/api/sso/metadata`) - **Assertion Consumer Service URL** (e.g. `https://app.modelux.ai/api/sso/saml/acs`) - **SP metadata XML** (we publish the full metadata document — some IdPs let you import from a URL) 2. Create a new SAML app in your IdP and paste those values. 3. Copy three values **back** from the IdP into Modelux's **Settings → SSO** → Identity provider form: - **IdP Entity ID** (often called *Identifier* or *Issuer URL*) - **IdP SSO URL** (the SAML 2.0 Single Sign-On URL) - **IdP Certificate** (PEM-encoded x509 signing certificate) 4. Set a **default role** for new SSO users (member is usually right). 5. Click **Test connection** — this validates the cert parses without attempting a live login. 6. Click **Save**. 7. Add your email domain under **Email domains**, publish the TXT record we give you at `_modelux.`, and click **Verify**. Once verified, anyone signing in from `you@your-domain.com` is automatically routed to your org's IdP. 8. Once the flow works end-to-end for at least one test user, flip **Require SAML for all members** to block password / Google logins for people in your org. ## Provider-specific guides - [Okta](/docs/guides/sso-okta) - [Microsoft Entra ID](/docs/guides/sso-entra) (formerly Azure AD) - [Google Workspace](/docs/guides/sso-google) > Using JumpCloud, OneLogin, or another IdP? The setup follows the same > pattern — any IdP that exports standard SAML 2.0 metadata works. > Follow the high-level flow above. If you hit a snag, email > support@modelux.ai and we'll walk you through. ## Attribute mapping Modelux reads these attributes from the SAML assertion (first match wins per field): | Field | Claims we read | | --- | --- | | Email | `email`, `emailAddress`, `mail`, `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress`, `urn:oid:0.9.2342.19200300.100.1.3`, or the NameID if it is an email | | First name | `firstName`, `givenName`, `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname`, `urn:oid:2.5.4.42` | | Last name | `lastName`, `surname`, `sn`, `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname`, `urn:oid:2.5.4.4` | | Display name | `displayName`, `name`, `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name`, `cn` | Most IdPs emit the first set of names by default. If yours is picky about URIs, use the OIDs — those are the SAML 2.0 canonical forms. ## Enforcement With **Require SAML for all members** enabled, Modelux rejects every non-SAML login for anyone who is a member of your org. Google and passwordless email sign-ins return a friendly banner pointing them at the SSO flow. Don't enable this until: - You've tested the flow end-to-end with a non-admin account - Your domain is DNS-verified - At least one owner can log in via SSO (so a bad IdP config doesn't lock every owner out — owners retain their SSO identity through provider changes) If you do lock yourself out, email support@modelux.ai — we can disable enforcement for you. ## Rotating the IdP certificate IdP signing certs expire. When yours is about to rotate: 1. Get the new PEM from your IdP 2. Paste it into the **IdP certificate** field in Modelux and save 3. The current fingerprint shown next to the field updates — verify it matches what your IdP shows Modelux stores the cert encrypted at rest and only ever audits the SHA-256 fingerprint. --- > Connect Okta as your SAML identity provider for Modelux. # SSO with Okta Step-by-step Okta setup. Assumes you have the **Super Admin** role in your Okta tenant and **admin** or **owner** in Modelux. If you haven't read the [SAML SSO overview](/docs/guides/sso), start there. ## 1. Collect Modelux SP details In a separate browser tab, open Modelux → **Settings → SSO**. You'll copy these values into Okta in step 3: - **SP Entity ID** - **Assertion Consumer Service URL** - **SP metadata XML** URL (optional — Okta doesn't accept metadata import for custom SAML apps; you'll paste the individual fields instead) Keep the tab open; you'll also paste values **back** from Okta. ## 2. Create a SAML app in Okta 1. In the Okta admin console: **Applications → Applications → Create App Integration → SAML 2.0 → Next**. 2. **General Settings:** - App name: `Modelux` - (Optional) Upload the Modelux logo - Click **Next**. ## 3. Configure SAML settings On the **Configure SAML** step: - **Single sign-on URL:** paste Modelux's **Assertion Consumer Service URL**. Check *"Use this for Recipient URL and Destination URL"*. - **Audience URI (SP Entity ID):** paste Modelux's **SP Entity ID**. - **Name ID format:** `EmailAddress` - **Application username:** `Email` ### Attribute Statements Add these (case-sensitive): | Name | Name format | Value | | --- | --- | --- | | `email` | Basic | `user.email` | | `firstName` | Basic | `user.firstName` | | `lastName` | Basic | `user.lastName` | | `displayName` | Basic | `user.displayName` | Click **Next**, pick *"I'm an Okta customer adding an internal app"*, then **Finish**. ## 4. Copy Okta → Modelux Back on your new Okta app's **Sign On** tab, click **View SAML setup instructions**. Copy these into the Modelux **Identity provider** form: | Okta field | Modelux field | | --- | --- | | *Identity Provider Issuer* | IdP Entity ID | | *Identity Provider Single Sign-On URL* | IdP SSO URL | | *X.509 Certificate* (download or copy the PEM block) | IdP certificate | Set **Default role** to `member` (the typical choice). Click **Test connection** in Modelux — this confirms the cert parses. Then click **Save**. ## 5. Assign users in Okta - On the Okta app's **Assignments** tab, click **Assign → Assign to People** (or Groups). - Add yourself (and a test user if possible) to the app. ## 6. Verify domain + test 1. In Modelux **Settings → SSO**, add your email domain (e.g. `acme.com`). Publish the DNS TXT record we give you at `_modelux.acme.com` and click **Verify**. 2. In an incognito window, visit [app.modelux.ai/login](https://app.modelux.ai/login). 3. Click **Use SAML SSO**, enter your work email, and confirm you land back in the Modelux dashboard logged in. ## 7. Turn on enforcement Once a non-admin test user has logged in successfully, return to **Settings → SSO** and toggle **Require SAML for all members**. This blocks password / Google logins for any member of your org. > Don't enable enforcement until at least one org **owner** has > successfully signed in via SAML. If the IdP config is wrong and > enforcement is on, everyone is locked out. ## SCIM provisioning (optional) To have Okta push user create / update / deactivate events into Modelux, enable SCIM on the same app. See the [SCIM provisioning guide](/docs/guides/scim#okta). ## Troubleshooting - **"Invalid SAML assertion"** — the cert in Modelux doesn't match the Okta signing cert. Re-copy the x509 from Okta's setup instructions. Make sure you include `-----BEGIN CERTIFICATE-----` and `-----END CERTIFICATE-----`. - **"No SSO configured for this email's domain"** — add the domain in Modelux and verify the TXT record. - **Landing on /login instead of the dashboard after clicking the Okta tile** — IdP-initiated SSO works, but Okta needs the *Default RelayState* set. Modelux accepts either a missing or `/` RelayState; most Okta setups work out of the box. --- > Connect Microsoft Entra ID (formerly Azure AD) as your SAML identity provider for Modelux. # SSO with Microsoft Entra ID Step-by-step Entra ID (formerly Azure AD) setup. Assumes you have **Cloud Application Administrator** (or Global Admin) in Entra and **admin** or **owner** in Modelux. If you haven't read the [SAML SSO overview](/docs/guides/sso), start there. ## 1. Collect Modelux SP details In a separate browser tab, open Modelux → **Settings → SSO** and keep these handy: - **SP Entity ID** - **Assertion Consumer Service URL** ## 2. Create an Enterprise Application 1. Go to the [Entra admin center](https://entra.microsoft.com/) → **Applications → Enterprise applications → New application**. 2. Click **Create your own application**. 3. Name it `Modelux`, pick **Integrate any other application you don't find in the gallery (Non-gallery)**, and click **Create**. ## 3. Enable SAML SSO 1. On the app's overview, click **Single sign-on → SAML**. 2. In the **Basic SAML Configuration** panel, click **Edit** and fill in: - **Identifier (Entity ID):** Modelux's **SP Entity ID** - **Reply URL (Assertion Consumer Service URL):** Modelux's **ACS URL** - Leave Sign on URL / Relay State / Logout URL blank - Save. ## 4. Configure attribute claims Entra ships reasonable defaults but they use the SAML 2.0 canonical URIs. Modelux reads those, so the defaults usually "just work." If you want to add friendlier names: 1. In the **Attributes & Claims** panel, click **Edit**. 2. Add these claims alongside the defaults: | Claim name | Source | Source attribute | | --- | --- | --- | | `email` | Attribute | `user.mail` | | `firstName` | Attribute | `user.givenname` | | `lastName` | Attribute | `user.surname` | | `displayName` | Attribute | `user.displayname` | Make sure the **Unique User Identifier (Name ID)** claim maps to `user.mail` (or `user.userprincipalname` if mail isn't populated for every user). ## 5. Copy Entra → Modelux In the **SAML Certificates** panel: - Download **Certificate (Base64)** — this is the PEM x509. In the **Set up Modelux** panel: | Entra field | Modelux field | | --- | --- | | *Microsoft Entra Identifier* (or Azure AD Identifier) | IdP Entity ID | | *Login URL* | IdP SSO URL | | *Certificate (Base64)* (contents of the downloaded file) | IdP certificate | In Modelux's **Identity provider** form, paste these values, set **Default role** to `member`, click **Test connection**, then **Save**. ## 6. Assign users Back in the Entra Enterprise app → **Users and groups → Add user/group**. Assign yourself and a test user. > Entra defaults to requiring user assignment. If you disable *User > assignment required* on the app's Properties panel, any licensed user > in your tenant can sign in — only do this if that's intentional. ## 7. Verify domain + test 1. In Modelux **Settings → SSO**, add your email domain. Publish the DNS TXT record at `_modelux.` and click **Verify**. 2. In an incognito window, sign in at [app.modelux.ai/login](https://app.modelux.ai/login) via **Use SAML SSO**. ## 8. Turn on enforcement Once verified, toggle **Require SAML for all members** in Modelux. ## SCIM provisioning To automate user lifecycle from Entra, see the [SCIM provisioning guide](/docs/guides/scim#microsoft-entra-id). ## Troubleshooting - **"AADSTS50105" — the signed-in user is not assigned to a role**: add the user under **Enterprise application → Users and groups**, or disable *User assignment required*. - **Cert errors in Modelux**: Entra exports the cert as a `.cer` file. Open it in a text editor — if it starts with `-----BEGIN CERTIFICATE-----`, paste it as-is. If it's binary (raw DER), download the **Certificate (Base64)** variant instead. - **Attributes not appearing**: by default Entra emits claims under the `http://schemas.xmlsoap.org/ws/2005/05/identity/claims/...` URIs. Modelux reads those, so it should work without customization. If you overrode the defaults, make sure at least one of the email claims in the [attribute mapping table](/docs/guides/sso#attribute-mapping) is present. --- > Connect Google Workspace as your SAML identity provider for Modelux. # SSO with Google Workspace Step-by-step Google Workspace setup. Requires the **Super Admin** role in Google Workspace and **admin** or **owner** in Modelux. If you haven't read the [SAML SSO overview](/docs/guides/sso), start there. ## 1. Collect Modelux SP details In Modelux → **Settings → SSO**, note: - **SP Entity ID** - **Assertion Consumer Service URL** ## 2. Create a custom SAML app in Google 1. [admin.google.com](https://admin.google.com) → **Apps → Web and mobile apps → Add app → Add custom SAML app**. 2. App name: `Modelux`. Click **Continue**. 3. On the **Google Identity Provider details** page, you'll see three values: - SSO URL - Entity ID - Certificate (click **Download certificate** — this is a `.pem` file) 4. **Keep this browser tab open** — you'll paste these into Modelux in the next step. Click **Continue**. ## 3. Configure the SP side On the **Service Provider details** page: - **ACS URL:** Modelux's **Assertion Consumer Service URL** - **Entity ID:** Modelux's **SP Entity ID** - **Name ID format:** `EMAIL` - **Name ID:** `Basic Information > Primary email` - Click **Continue**. ## 4. Configure attribute mapping | Google directory attribute | App attribute | | --- | --- | | `Primary email` | `email` | | `First name` | `firstName` | | `Last name` | `lastName` | | `Full name` (if available) | `displayName` | Click **Finish**. ## 5. Copy Google → Modelux Back on the Google side, re-open the **Google Identity Provider details** panel (you can find it on the app's overview → **SP detailed configure** or by re-entering the wizard): | Google field | Modelux field | | --- | --- | | *SSO URL* | IdP SSO URL | | *Entity ID* | IdP Entity ID | | *Certificate* (contents of the downloaded `.pem`) | IdP certificate | Paste into Modelux's **Identity provider** form, set **Default role** to `member`, click **Test connection**, then **Save**. ## 6. Turn the app on for users In Google Admin, on the Modelux app page: 1. Under **User access**, click the tile and choose **ON for everyone** or **ON for certain organizational units / groups**. 2. Save. Propagation to end users takes a few minutes. ## 7. Verify domain + test 1. In Modelux **Settings → SSO**, add your email domain, publish the TXT record at `_modelux.`, and click **Verify**. 2. Open [app.modelux.ai/login](https://app.modelux.ai/login) in an incognito window → **Use SAML SSO** → enter your work email. ## 8. Turn on enforcement Once the test user logs in successfully, toggle **Require SAML for all members** in Modelux. ## SCIM provisioning Google Workspace supports SCIM but auto-provisioning setup is less common than Okta/Entra. Contact support@modelux.ai if you need help wiring Google's SCIM client. ## Troubleshooting - **"Clock skew"-style errors**: Google-signed assertions are valid for ~10 minutes. If your server clocks drift, verification fails. NTP usually handles this; check your server time matches real time within a few seconds. - **Certificate mismatch**: Google rotates signing certs periodically. If users start failing to log in, re-download the certificate from Google and paste the new PEM into Modelux. - **"No Primary email"**: some Google Workspace users (service accounts) don't have a primary email. Those can't use SSO — they weren't real users anyway. --- > Automate user lifecycle (create, update, deactivate) from your IdP using SCIM 2.0. # SCIM user provisioning SCIM 2.0 lets your IdP push user lifecycle events — create, update, deactivate — into Modelux automatically. Without it, you manage seats manually: when someone leaves the company, an admin has to remember to remove them from Modelux. SCIM is available on the **Enterprise** plan. You'll typically set it up alongside [SAML SSO](/docs/guides/sso). ## What SCIM does in Modelux Each SCIM "User" is a **membership** in your Modelux org: - **Create** → add the user to your org. If they already have a Modelux account with the same verified email, we link to it. - **Update / PATCH `active: false`** → mark the membership deactivated. The user can't sign in or use management API keys they created. - **Update / PATCH `active: true`** → reactivate. - **Delete** → remove the membership from your org. The user's global account (if shared with other orgs) is not deleted. Role is transmitted via a Modelux SCIM extension: `urn:modelux:params:scim:schemas:extension:2.0:User`. If your IdP doesn't fill it, new members get the **default role** from your SSO configuration. ## Create a SCIM token 1. In Modelux → **Settings → SSO**, scroll to the **SCIM tokens** card. 2. Click **Create token**. Give it a name like `Okta` or `Entra provisioning`. 3. Copy the token value — it starts with `mlx_scim_`. **It is shown exactly once.** If you lose it, revoke the token and create a new one. The base URL your IdP will POST to is: ``` https://app.modelux.ai/api/scim/v2 ``` ## Okta 1. On your Modelux SAML app in Okta → **General → App Settings**, enable **Provisioning → SCIM** (requires the Lifecycle Management add-on). 2. **Base URL:** `https://app.modelux.ai/api/scim/v2` 3. **Unique identifier field for users:** `userName` 4. **Supported provisioning actions:** check Push New Users, Push Profile Updates, Push Groups are not used. 5. **Authentication Mode:** HTTP Header 6. **Authorization:** `Bearer ` 7. Click **Test API Credentials** — you should get a green checkmark. On the **To App** settings, enable *Create Users*, *Update User Attributes*, and *Deactivate Users*. ## Microsoft Entra ID 1. On your Modelux Enterprise app → **Provisioning → Get started → Automatic**. 2. **Tenant URL:** `https://app.modelux.ai/api/scim/v2` 3. **Secret token:** your `mlx_scim_...` token 4. Click **Test Connection** — Entra should report success. 5. Save, then under **Mappings → Provision Microsoft Entra ID Users**, confirm `userPrincipalName` or `mail` maps to `userName`, and the name/email attributes map to their SCIM equivalents. 6. Set **Provisioning Status** to **On** and save. Entra provisioning cycles every ~40 minutes. Use **Provision on demand** to test a single user immediately. ## Deactivation behavior A SCIM `active: false` patch keeps the membership row but sets `deactivatedAt`. This preserves the audit trail (when they were deactivated, by which token) and keeps history intact. Reactivating clears the flag; the user gets their previous role back. A SCIM `DELETE /Users/` removes the membership entirely. The global `User` record stays — they may be a member of other Modelux orgs, and their personal account (if any) shouldn't be collateral damage. ## Last-owner protection Modelux refuses to delete or deactivate the **last owner** of an org. SCIM returns a `409 mutability` error. Add another owner first, or contact support if you need a workaround. ## Testing - Create a test user in your IdP, assign them to the Modelux app, and trigger provisioning manually. They should appear under **Settings → Team** within a few seconds (Okta) or minutes (Entra). - Deactivate them in the IdP. They should appear in Modelux with a *deactivated* indicator and be unable to sign in. - Reactivate and confirm they can sign in again. ## Auditing Every SCIM mutation emits an audit event under **Settings → Audit log**, scoped to the SCIM token that made the call. --- # Section: API Reference --- > Conventions for the Modelux proxy and management APIs. # API overview Modelux exposes two HTTP APIs: - **Proxy API** at `https://api.modelux.ai/v1/*` — OpenAI-compatible surface for running inference. Authenticated with a project API key. - **Management API** at `https://api.modelux.ai/manage/v1/*` — REST API for managing projects, routing configs, providers, budgets, analytics, etc. Authenticated with a management API key. ## Authentication All API calls require a bearer token: ``` Authorization: Bearer mlx_sk_... // proxy / project key Authorization: Bearer mlx_mk_... // management key ``` ### API key prefixes | Prefix | Use | |---|---| | `mlx_sk_` | Project API key — for the proxy API | | `mlx_mk_` | Management API key — for the management API and MCP | ## Base URLs | Environment | Base URL | |---|---| | Production | `https://api.modelux.ai` | | Dev (self-hosted) | `http://localhost:8080` (proxy), `http://localhost:5100` (app) | ## Error format Errors follow the OpenAI error shape for familiarity: ```json { "error": { "type": "invalid_request_error", "message": "Missing required field: messages", "code": "missing_field" } } ``` Management API errors use the same shape with additional fields where useful: ```json { "error": { "type": "not_found", "message": "routing config @missing not found", "code": "routing_config_not_found", "resource": "routing_config", "id": "@missing" } } ``` ## Status codes | Code | Meaning | |---|---| | `200` | Success | | `400` | Bad request — malformed input | | `401` | Missing or invalid authentication | | `402` | Payment required — budget exceeded | | `403` | Forbidden — authenticated but lacks permission | | `404` | Not found | | `409` | Conflict — duplicate resource | | `422` | Unprocessable entity — validation failed | | `429` | Rate limited | | `500` | Server error — check status page | | `502`/`504` | Upstream provider error or timeout | ## Rate limits Proxy API limits are per-API-key. Default: 600 requests/minute. Upgrade tiers increase this. Custom limits can be set per key. Management API: 60 req/min per management key. ## Idempotency Mutating management API endpoints accept an `Idempotency-Key` header. Same key + same body returns the original response within 24 hours. ## Pagination Management API list endpoints return: ```json { "data": [...], "next_cursor": "opaque_cursor_or_null", "has_more": true } ``` Pass `?cursor=...` to fetch the next page. --- > POST /v1/chat/completions — the primary inference endpoint. # Chat completions Create a chat completion. OpenAI-compatible request and response shape. ``` POST /v1/chat/completions ``` ## Request ```json { "model": "@production", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" } ], "temperature": 0.7, "max_tokens": 1024, "stream": false } ``` ### Model identifier - **Raw model name** — `gpt-4o-mini`, `claude-sonnet-4-5`, `gemini-2.5-flash` - **Routing config slug** — `@production`, `@fallback`, `@experiment` ## Modelux extensions Pass extra fields under `extra_body` (OpenAI SDK) or top-level `mlx:*` keys: | Field | Description | |---|---| | `mlx:tags` | Object of key-value tags for analytics + routing | | `mlx:end_user` | End-user identifier (for per-user analytics + budgets) | | `mlx:cache` | Cache controls: `{ skip: true }` to bypass cache for this request | | `mlx:trace` | Set `true` to include the decision trace in the response | Example: ```python response = client.chat.completions.create( model="@production", messages=[...], extra_body={ "mlx:tags": {"tenant": "acme", "feature": "summarize"}, "mlx:end_user": "user_abc", }, ) ``` ## Response Standard OpenAI response shape: ```json { "id": "chatcmpl_...", "object": "chat.completion", "created": 1710000000, "model": "gpt-4o-mini", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello!" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 8, "total_tokens": 20 } } ``` ## Streaming Set `stream: true`. Modelux returns SSE events in the OpenAI streaming format. Works the same with cascades and fallbacks — the stream starts when the first successful attempt begins responding. ## Tool / function calling Passes through unchanged to the provider. Modelux normalizes tool-call behavior across OpenAI, Anthropic, and Google so the same tool schemas work across all three. ## Structured output (JSON mode) `response_format: { type: "json_object" }` works across all providers. `response_format: { type: "json_schema", json_schema: {...} }` works where the underlying provider supports it; otherwise Modelux falls back to JSON mode and validates post-hoc. ## Headers Modelux includes response headers on every successful request: ``` x-modelux-request-id: req_a1b2c3 x-modelux-model-used: gpt-4o-mini x-modelux-provider: openai x-modelux-cost-usd: 0.002134 x-modelux-latency-ms: 238 x-modelux-config: @production x-modelux-config-version: 4 ``` --- > POST /v1/embeddings — vector embeddings for text. # Embeddings Create vector embeddings. OpenAI-compatible request and response shape. ``` POST /v1/embeddings ``` ## Request ```json { "model": "text-embedding-3-small", "input": ["Hello world", "Another string"] } ``` Supports: - Single string or array of strings - Routing config slugs (`@embeddings`) just like chat completions - OpenAI, Google, Cohere, and Voyage embedding models through their respective providers ## Response ```json { "object": "list", "data": [ { "object": "embedding", "index": 0, "embedding": [0.012, -0.034, ...] }, { "object": "embedding", "index": 1, "embedding": [0.056, 0.089, ...] } ], "model": "text-embedding-3-small", "usage": { "prompt_tokens": 10, "total_tokens": 10 } } ``` ## Dimensions Pass `dimensions: N` to request a specific vector dimensionality (where the provider supports it, e.g. `text-embedding-3-small` and `text-embedding-3-large`). --- > REST endpoints for managing Modelux resources. # Management API overview The management API lets you do everything the dashboard does: create projects, edit routing configs, rotate credentials, fetch analytics, set budgets. Base URL: `https://api.modelux.ai/manage/v1` Authenticate with a management API key (`mlx_mk_...`), not a project key. ## Resources | Resource | Reference | |---|---| | [Projects](/docs/api/management-projects) | List, create, get, update, delete | | [Routing configs](/docs/api/management-routing) | CRUD + versions, test, restore | | [Providers](/docs/api/management-providers) | CRUD + health, credential rotation | | [Budgets](/docs/api/management-budgets) | CRUD + alerts, events, reset | | [Analytics & logs](/docs/api/management-analytics) | Reports, decisions, logs, traces, replay | | [Webhooks](/docs/api/management-webhooks) | Endpoints, deliveries, event types | | API keys | List, create, revoke | | Simulations | Create, list, results, promote, estimate | | Audit log | List, get | | Org & members | Update org, invite, list, role updates | ## OpenAPI spec The full OpenAPI spec is available at: ``` https://api.modelux.ai/manage/v1/openapi.yaml ``` Use it to generate clients in any language or import into Postman / Insomnia. ## MCP tools Every management endpoint has a corresponding MCP tool. See [MCP setup](/docs/guides/mcp-setup) to connect Claude Code or another MCP client. ## Idempotency All mutating endpoints accept an `Idempotency-Key` header. Same key + same body within 24h returns the cached response. Ideal for retries. ## Pagination List endpoints return cursor-paginated responses: ```json { "data": [...], "next_cursor": "opaque_cursor_or_null", "has_more": true } ``` --- > Create, list, and manage projects via the Management API. # Projects API Projects group API keys, routing configs, and usage analytics. Most accounts use one project per app or environment. ## List projects ``` GET /manage/v1/projects ``` Returns all projects in the authenticated organization. **Query parameters** | Name | Type | Description | |---|---|---| | `cursor` | string | Opaque pagination cursor | | `limit` | integer | Max items per page (default 50, max 200) | **Response** ```json { "data": [ { "id": "proj_01HXY...", "name": "production", "slug": "production", "description": "main API traffic", "created_at": "2026-04-01T12:00:00Z" } ], "next_cursor": null, "has_more": false } ``` ## Get a project ``` GET /manage/v1/projects/{project_id} ``` ## Create a project ``` POST /manage/v1/projects ``` **Request body** ```json { "name": "staging", "description": "optional — shown in the dashboard" } ``` Returns the created project. ## Update a project ``` PATCH /manage/v1/projects/{project_id} ``` Supports partial updates of `name` and `description`. ## Delete a project ``` DELETE /manage/v1/projects/{project_id} ``` Soft-delete. Existing API keys scoped to this project are revoked. Historical analytics remain queryable with `include_deleted=true`. ## MCP tools | Tool | Maps to | |---|---| | `list_projects` | `GET /manage/v1/projects` | | `get_project` | `GET /manage/v1/projects/{id}` | | `create_project` | `POST /manage/v1/projects` | | `update_project` | `PATCH /manage/v1/projects/{id}` | | `delete_project` | `DELETE /manage/v1/projects/{id}` | ## See also - [Projects & API Keys (concept)](/docs/concepts/projects) - [Management API overview](/docs/api/management) --- > Create and manage routing configs, versions, and test them. # Routing Configs API Routing configs define how requests are dispatched to providers. Every config has a stable `@slug` your app calls instead of raw model names. ## List routing configs ``` GET /manage/v1/routing-configs ``` **Query parameters** | Name | Type | Description | |---|---|---| | `project_id` | string | Filter to configs in one project | | `cursor` | string | Pagination cursor | | `limit` | integer | Max items per page | ## Get a routing config ``` GET /manage/v1/routing-configs/{config_id} ``` Returns the current active version. To fetch a specific version, use the versions endpoint below. ## Create a routing config ``` POST /manage/v1/routing-configs ``` **Request body** ```json { "project_id": "proj_01HXY...", "name": "production", "slug": "production", "strategy": "fallback", "config": { "attempts": [ { "model": "claude-haiku-4-5", "timeout_ms": 2000 }, { "model": "gpt-4o-mini", "timeout_ms": 3000 } ], "retry_on": ["429", "5xx", "timeout"] } } ``` Valid strategies: `single`, `fallback`, `cost_optimized`, `latency_optimized`, `ensemble`, `ab_test`, `cascade`, `custom_rules`. ## Update a routing config ``` PATCH /manage/v1/routing-configs/{config_id} ``` Any update creates a new version. The previous version stays queryable for rollback. ## Versions ``` GET /manage/v1/routing-configs/{config_id}/versions GET /manage/v1/routing-configs/{config_id}/versions/{version_id} POST /manage/v1/routing-configs/{config_id}/versions/{version_id}/restore ``` `restore` promotes an old version back to the active version (creating a new version that matches). ## Test a routing config ``` POST /manage/v1/routing-configs/{config_id}/test ``` **Request body** ```json { "messages": [{ "role": "user", "content": "Hello" }], "dry_run": true } ``` Returns the routing decision that would be made, without actually calling the provider. With `dry_run: false`, runs the request end-to-end for verification. ## Delete a routing config ``` DELETE /manage/v1/routing-configs/{config_id} ``` Soft-delete. The slug becomes reusable immediately for new configs. ## MCP tools | Tool | Maps to | |---|---| | `list_routing_configs` | `GET /manage/v1/routing-configs` | | `get_routing_config` | `GET /manage/v1/routing-configs/{id}` | | `create_routing_config` | `POST /manage/v1/routing-configs` | | `update_routing_config` | `PATCH /manage/v1/routing-configs/{id}` | | `delete_routing_config` | `DELETE /manage/v1/routing-configs/{id}` | | `list_routing_config_versions` | `GET /manage/v1/routing-configs/{id}/versions` | | `get_routing_config_version` | `GET /manage/v1/routing-configs/{id}/versions/{version}` | | `restore_routing_config_version` | `POST /manage/v1/routing-configs/{id}/versions/{version}/restore` | | `test_routing_config` | `POST /manage/v1/routing-configs/{id}/test` | ## See also - [Routing (concept)](/docs/concepts/routing) - [Fallback chain guide](/docs/guides/fallback-chain) - [A/B testing guide](/docs/guides/ab-testing) --- > Add, rotate, and monitor provider credentials. # Providers API Provider credentials are the upstream API keys Modelux uses to proxy your requests. Stored encrypted; Modelux never logs plaintext keys. ## List providers ``` GET /manage/v1/providers ``` Returns all provider credentials for the organization. **Response** ```json { "data": [ { "id": "prov_01HXY...", "vendor": "openai", "name": "OpenAI Production", "base_url": null, "status": "active", "health": { "state": "healthy", "p50_latency_ms": 320, "last_check_at": "2026-04-14T12:00:00Z" }, "created_at": "2026-04-01T10:00:00Z" } ] } ``` ## Add a provider ``` POST /manage/v1/providers ``` **Request body** ```json { "vendor": "openai", "name": "OpenAI Production", "api_key": "sk-...", "base_url": null } ``` Valid vendors: `openai`, `anthropic`, `google`, `azure`, `bedrock`, `groq`, `fireworks`. For Azure OpenAI, set `base_url` to your resource endpoint. For Bedrock, pass IAM credentials instead of an API key (see the Bedrock section below). ## Get a provider ``` GET /manage/v1/providers/{provider_id} ``` ## Update a provider (rotate key) ``` PATCH /manage/v1/providers/{provider_id} ``` **Request body** ```json { "api_key": "sk-new..." } ``` Modelux verifies the new key before swapping it atomically. In-flight requests finish with the old key. ## Delete a provider ``` DELETE /manage/v1/providers/{provider_id} ``` Fails if any routing config still references this provider directly. Detach first, then delete. ## Health ``` GET /manage/v1/providers/{provider_id}/health ``` Returns latency percentiles and success rate over rolling windows. ## Bedrock credentials For AWS Bedrock, send IAM credentials as the `api_key` field: ```json { "vendor": "bedrock", "name": "Bedrock US-West", "api_key": "AKIA...::wJalrXUtnFEMI...::us-west-2", "base_url": null } ``` Format: `ACCESS_KEY_ID::SECRET_ACCESS_KEY::REGION`. Optional `::SESSION_TOKEN` suffix for STS temporary credentials. ## MCP tools | Tool | Maps to | |---|---| | `list_providers` | `GET /manage/v1/providers` | | `get_provider` | `GET /manage/v1/providers/{id}` | | `add_provider` | `POST /manage/v1/providers` | | `update_provider` | `PATCH /manage/v1/providers/{id}` | | `delete_provider` | `DELETE /manage/v1/providers/{id}` | | `get_provider_health` | `GET /manage/v1/providers/{id}/health` | ## See also - [Providers (concept)](/docs/concepts/providers) --- > Create and manage spend caps and alerts. # Budgets API Budgets enforce spending limits across org, project, tag, or end-user scopes. Cap breaches can alert, auto-downgrade, or block. ## List budgets ``` GET /manage/v1/budgets ``` ## Get a budget ``` GET /manage/v1/budgets/{budget_id} ``` Returns the budget config, current spend, and period bounds. ## Create a budget ``` POST /manage/v1/budgets ``` **Request body** ```json { "name": "Q2 production cap", "scope": { "type": "project", "project_id": "proj_01HXY..." }, "cap_usd": 500, "period": "monthly", "action_at_cap": "auto_downgrade", "downgrade_to": "@cheap" } ``` Valid `scope.type`: `org`, `project`, `tag`, `end_user`. Valid `action_at_cap`: `alert`, `block`, `auto_downgrade`. ## Update a budget ``` PATCH /manage/v1/budgets/{budget_id} ``` ## Delete a budget ``` DELETE /manage/v1/budgets/{budget_id} ``` ## Reset a budget ``` POST /manage/v1/budgets/{budget_id}/reset ``` Clears current-period spend without waiting for the next reset date. Useful after resolving a cost anomaly. ## Alerts ``` GET /manage/v1/budgets/{budget_id}/alerts POST /manage/v1/budgets/{budget_id}/alerts DELETE /manage/v1/budgets/{budget_id}/alerts/{alert_id} ``` Each alert specifies a threshold percentage (e.g. 80) and one or more channels (email, webhook). ## Events ``` GET /manage/v1/budgets/{budget_id}/events ``` Returns the history of threshold crossings, cap actions, and resets. ## MCP tools | Tool | Maps to | |---|---| | `list_budgets` | `GET /manage/v1/budgets` | | `get_budget` | `GET /manage/v1/budgets/{id}` | | `create_budget` | `POST /manage/v1/budgets` | | `update_budget` | `PATCH /manage/v1/budgets/{id}` | | `delete_budget` | `DELETE /manage/v1/budgets/{id}` | | `reset_budget` | `POST /manage/v1/budgets/{id}/reset` | | `list_budget_alerts` | `GET /manage/v1/budgets/{id}/alerts` | | `create_budget_alert` | `POST /manage/v1/budgets/{id}/alerts` | | `delete_budget_alert` | `DELETE /manage/v1/budgets/{id}/alerts/{aid}` | | `list_budget_events` | `GET /manage/v1/budgets/{id}/events` | ## See also - [Budgets (concept)](/docs/concepts/budgets) - [Cost optimization guide](/docs/guides/cost-optimization) --- > Query request logs, analytics reports, and decision traces. # Analytics & Logs API Query aggregated metrics and individual request logs. ## Analytics report ``` GET /manage/v1/analytics/report ``` **Query parameters** | Name | Type | Description | |---|---|---| | `start` | ISO 8601 | Window start | | `end` | ISO 8601 | Window end | | `granularity` | enum | `hour`, `day` | | `project_id` | string | Scope to one project | | `group_by` | enum | `model`, `provider`, `project`, `tag:`, `end_user` | | `tags` | JSON | Additional tag filters (e.g. `{"tenant":"acme"}`) | | `include_comparison` | bool | Include previous-period series | **Response (abbreviated)** ```json { "series": [ { "bucket": "2026-04-14T00:00:00Z", "requests": 1247, "cost_usd": 3.142, "input_tokens": 98213, "output_tokens": 45021, "errors": 3, "p50_latency_ms": 238, "p95_latency_ms": 801, "p99_latency_ms": 1420 } ], "totals": { "requests": 1247, "cost_usd": 3.142, "error_rate": 0.0024 } } ``` ## Decisions summary ``` GET /manage/v1/analytics/decisions ``` Aggregate routing decisions: per config, which attempts ran, how often fallbacks fired, why. ## Request logs ``` GET /manage/v1/logs ``` **Query parameters** | Name | Type | Description | |---|---|---| | `start` | ISO 8601 | Window start | | `end` | ISO 8601 | Window end | | `project_id` | string | Filter | | `status` | string | Filter by status class: `2xx`, `4xx`, `5xx` | | `model` | string | Filter by model name | | `provider` | string | Filter by provider | | `end_user` | string | Filter by end-user tag | | `tags` | JSON | Tag key-value filters | | `min_latency_ms` | integer | Slow-query filter | | `cursor` | string | Pagination | Returns a paginated list of request summaries. Use the single-request endpoint below for full details. ## Single request ``` GET /manage/v1/logs/{request_id} ``` Returns the full request: input messages (if retention allows), output, decision trace, per-attempt metrics, cost breakdown. ## Request trace ``` GET /manage/v1/logs/{request_id}/trace ``` Just the decision trace: attempts, timings, reasons, final decision. ## Replay ``` POST /manage/v1/logs/{request_id}/replay ``` Re-run a request against a specified routing config. Useful for debugging individual requests after a config change. **Request body** ```json { "routing_config": "@candidate-v2", "dry_run": false } ``` ## MCP tools | Tool | Maps to | |---|---| | `get_analytics_report` | `GET /manage/v1/analytics/report` | | `get_decisions_summary` | `GET /manage/v1/analytics/decisions` | | `list_logs` | `GET /manage/v1/logs` | | `get_log` | `GET /manage/v1/logs/{id}` | | `get_request_trace` | `GET /manage/v1/logs/{id}/trace` | | `replay_log_entry` | `POST /manage/v1/logs/{id}/replay` | ## See also - [Analytics & Logs (concept)](/docs/concepts/analytics) --- > Manage webhook endpoints and deliveries. # Webhooks API Webhooks deliver Modelux events to your own infrastructure. Each endpoint subscribes to one or more event types and is called asynchronously with HMAC-signed payloads. ## List endpoints ``` GET /manage/v1/webhooks/endpoints ``` ## Get an endpoint ``` GET /manage/v1/webhooks/endpoints/{endpoint_id} ``` ## Create an endpoint ``` POST /manage/v1/webhooks/endpoints ``` **Request body** ```json { "url": "https://your-app.example.com/hooks/modelux", "event_types": [ "budget.threshold_reached", "routing_config.updated" ], "description": "Production webhook" } ``` Returns the endpoint with a generated `signing_secret`. Shown once — save it for verifying delivery signatures. ## Update an endpoint ``` PATCH /manage/v1/webhooks/endpoints/{endpoint_id} ``` Supports updating `url`, `event_types`, `description`, and `active` flag. ## Delete an endpoint ``` DELETE /manage/v1/webhooks/endpoints/{endpoint_id} ``` Soft-delete. In-flight deliveries still complete. ## Rotate signing secret ``` POST /manage/v1/webhooks/endpoints/{endpoint_id}/rotate-secret ``` Generates a new signing secret. Returned once in the response. The old secret remains valid for 24 hours to allow graceful rollover in your verifier. ## Send test event ``` POST /manage/v1/webhooks/endpoints/{endpoint_id}/test ``` **Request body** ```json { "event_type": "budget.threshold_reached" } ``` Sends a synthetic event to the endpoint for connectivity testing. ## List deliveries ``` GET /manage/v1/webhooks/deliveries ``` **Query parameters** | Name | Type | Description | |---|---|---| | `endpoint_id` | string | Filter to one endpoint | | `status` | enum | `pending`, `delivered`, `failed` | | `event_type` | string | Filter by event type | | `cursor` | string | Pagination | ## Get a delivery ``` GET /manage/v1/webhooks/deliveries/{delivery_id} ``` Returns the delivery payload, response status, response body, and attempt history. ## Replay a delivery ``` POST /manage/v1/webhooks/deliveries/{delivery_id}/replay ``` Re-send the same payload. Useful after fixing an endpoint outage. ## List event types ``` GET /manage/v1/webhooks/event-types ``` Returns all event types with a short description. Useful for building configuration UIs. ## MCP tools | Tool | Maps to | |---|---| | `list_webhook_endpoints` | `GET /manage/v1/webhooks/endpoints` | | `get_webhook_endpoint` | `GET /manage/v1/webhooks/endpoints/{id}` | | `create_webhook_endpoint` | `POST /manage/v1/webhooks/endpoints` | | `update_webhook_endpoint` | `PATCH /manage/v1/webhooks/endpoints/{id}` | | `delete_webhook_endpoint` | `DELETE /manage/v1/webhooks/endpoints/{id}` | | `rotate_webhook_secret` | `POST /manage/v1/webhooks/endpoints/{id}/rotate-secret` | | `send_webhook_test` | `POST /manage/v1/webhooks/endpoints/{id}/test` | | `list_webhook_deliveries` | `GET /manage/v1/webhooks/deliveries` | | `get_webhook_delivery` | `GET /manage/v1/webhooks/deliveries/{id}` | | `replay_webhook_delivery` | `POST /manage/v1/webhooks/deliveries/{id}/replay` | | `list_webhook_event_types` | `GET /manage/v1/webhooks/event-types` | ## See also - [Webhooks (concept)](/docs/concepts/webhooks) ---