$ modelux changelog --all
[rss]
Changelog
What's shipped, newest first. Major features roll into the features page ; bug fixes and small improvements don't always appear here. Subscribe via RSS or the updates list .
- [batches]
Async batches + files (Anthropic + OpenAI)
- ▸
POST /anthropic/v1/messages/batcheswith the full retrieve / list / results / cancel / delete surface — drop-in for the official Anthropic SDK with no body changes. See the docs. - ▸
POST /openai/v1/batches+POST /openai/v1/filesfor the OpenAI side (batches reference uploaded JSONL files); multipart upload preserves the boundary, large result downloads stream through without buffering. See the docs. - ▸ Each provider's 50% async discount applies untouched on the upstream side.
- ▸ Authenticated thin passthrough — auth, BYOK (
X-Modelux-Provider-Key), rate limits, and observability all on top of byte-identical request forwarding. - ▸ Batch traffic shows up in the dashboard's Logs + Analytics with per-operation request_type breakdowns alongside synchronous traffic.
- ▸
- [responses]
OpenAI Responses API
- ▸
POST /openai/v1/responsesproxied as a thin authenticated passthrough — sync + SSE streaming + background mode. See the docs. - ▸ Stored / chained responses (
previous_response_id) work via the retrieve / cancel / delete / input_items endpoints. - ▸ Usage capture pulls input_tokens / output_tokens / cached_tokens out of the terminal
response.completedevent so streaming traffic shows up with full token + cost breakdowns in analytics. - ▸ OpenAI-specific request fields (
response_format,seed,logprobs,parallel_tool_calls) now pass through byte-identical on/openai/v1/chat/completionstoo — strictjson_schemastructured outputs and reproducible sampling work end-to-end.
- ▸
- [anthropic]
Anthropic prompt caching + Files API
- ▸
cache_controlmarkers pass through verbatim everywhere Anthropic accepts them: per-content-block on messages, per-system-block when system is sent as an array of blocks, and per-tool. Anthropic's native cache discount applies on the next matching request. - ▸ Cache hit / write counts land on the request log row (
cache_read_tokens,cache_creation_tokens) so you can answer "did my marker actually hit?". - ▸ Per-provider cost calculation now applies the right cache discount automatically (Anthropic 0.10× reads / 1.25× writes, OpenAI 0.50× reads, Google 0.25× reads).
- ▸
POST /anthropic/v1/filesproxied for the Anthropic Files API beta — upload documents/images once, reference by id frommessagescontent blocks. The proxy forwards youranthropic-betaheader so the SDK's beta-tag declaration reaches the upstream untouched. See the docs.
- ▸
- [billing]
Self-serve plans and billing
- ▸ Upgrade, downgrade, or switch between Free, Pro, and Team from Settings → Billing.
- ▸ Stripe-powered checkout and customer portal for payment methods, invoices, and tax IDs.
- ▸ Monthly or annual billing — annual is two months free (~17% off).
- ▸ Plan-based feature gating and usage meters surface what's included and how much you've used.
- [providers]
Nine new providers
- ▸ Added Groq, Fireworks, DeepSeek, xAI, Mistral, Cerebras, Together, Perplexity, and Cohere.
- ▸ All available through the same OpenAI-compatible surface — drop them into any routing config.
- ▸ Fourteen providers in total now, with normalized tool-calling and structured-output behavior.
- [people]
People and Customers
- ▸ New People entity represents the humans inside your company who use your API keys — attach a key to a Person, see per-person spend and activity.
- ▸ New Customers page shows end-user spend and volume in external-persona projects.
- ▸ Offboarding a Person revokes their keys in one step, with an explicit confirmation.
- ▸ Projects now declare a persona (internal vs. customer-facing) to keep the two surfaces distinct.
- [scim]
SCIM provisions People
- ▸ SCIM now creates Person records (not dashboard users), with one token scoped per project.
- ▸ Deactivating someone in your IdP automatically revokes their attached API keys.
- ▸ Matches how enterprises actually model employee access — your joiner/mover/leaver flow maps cleanly onto projects and keys.
- ▸ See the updated SCIM guide.
- [security]
Security settings
- ▸ SSO, SCIM, and related controls consolidated under Settings → Security with a status-first layout.
- ▸ At-a-glance status for SAML, domain verification, and SCIM tokens.
- [keys]
API key improvements
- ▸ Reveal a key after creation — the plaintext is encrypted at rest so you can copy it again later instead of re-minting.
- ▸ Optionally attach a key to a Person at creation time for clear ownership and spend attribution.
- ▸ New Person column on the keys list; clickable counts jump you straight to that person's keys.
- [onboarding]
Agent-first onboarding
- ▸ Rich, agent-friendly 401 responses now tell assistants exactly what's missing and how to unblock you.
- ▸ New
setup_statusMCP tool lets Claude Code, Cursor, and other agents inspect and complete onboarding end-to-end. - ▸ New onboarding checklist in the dashboard with an explicit "connect an assistant" step.
- [site]
Marketing site + developer docs
- ▸ Launched modelux.ai with terminal-themed marketing pages and developer docs.
- ▸ Every docs page available as raw markdown (
/docs/<slug>.md) and through/llms.txt+/llms-full.txtfor LLM ingestion. - ▸ Pagefind full-text search in the top nav on docs pages.
- ▸ JSON-LD structured data, OG images, sitemap, AI-crawler-friendly robots.txt.
- [analytics]
Users page, cost forecasting, period-over-period
- ▸ New Users page surfaces top end-users by spend, volume, and latency.
- ▸ Cost forecasting card projects end-of-month spend with trend confidence.
- ▸ Period-over-period comparison overlays a previous window on every chart.
- [analytics]
Tag filtering across logs and analytics
- ▸ Filter logs and analytics by arbitrary tag key-value pairs you attach at request time.
- ▸ New analytics dashboard with stacked series, per-provider health rollups, and per-tag breakdowns.
- [exports]
Warehouse export via S3 Parquet
- ▸ Configure scheduled exports of request logs, audit events, and aggregates to your own S3 bucket.
- ▸ Parquet format with predictable per-hour partitioning.
- ▸ BullMQ-backed worker with retries, backfills, and resumable cursors.
- ▸ Tests cover transforms, PII handling, cursors, and multi-tenant isolation.
- [integrations]
Integrations surface + developer API keys
- ▸ Consolidated integration settings under a single Integrations page: webhooks, MCP, exports, management tokens.
- ▸ Rotate management API keys and view MCP tool usage from one place.
- [mcp]
MCP server with 80+ management tools
- ▸ New MCP server at
api.modelux.ai/mcpexposes every management API action as an MCP tool. - ▸ Works with Claude Code, Cursor, and any MCP-compatible client.
- ▸ Natural-language workflows for creating configs, setting budgets, rotating credentials, inspecting logs.
- ▸ New MCP server at
- [routing]
Custom rule DSL
- ▸ New
custom_rulesrouting strategy with a small expression DSL over cost, latency, budget, and tags. - ▸ Test-harness endpoint lets you evaluate rules against sample requests before promoting.
- ▸ Tenant-aware routing: branch on
tags.tenantto dispatch enterprise traffic differently.
- ▸ New
- [audit]
Audit log + config versioning
- ▸ Every management-API mutation now writes an audit event with actor, target resource, diff, and source (UI, API, MCP).
- ▸ Routing configs and provider credentials keep a full version history.
- ▸ One-click rollback to any previous version.
- [replay]
Replay experiments
- ▸ Pick a window of historical traffic (up to 24h) and replay it against a candidate routing config.
- ▸ Side-by-side cost, latency, and success-rate diff vs. the current config.
- ▸ Promote the winner with a single click; promotion creates an audited new version.
- [budgets]
Finance-grade budgets with auto-downgrade
- ▸ Scoped budgets (org, project, tag, end-user) with soft-alert and hard-cap thresholds.
- ▸ At-cap actions: alert, block with 402, or auto-downgrade to a cheaper routing config.
- ▸ Budget-aware routing lets custom rules read
budget.used_pct. - ▸ Email + Slack-compatible webhook alerts on threshold crossings.
- [webhooks]
Webhook endpoints for events
- ▸ Subscribe to budget alerts, config changes, provider health transitions, and request anomalies.
- ▸ HMAC-SHA256 signatures, durable delivery queue with exponential backoff, replay from the dashboard.
- ▸ Slack-format auto-detection for webhook URLs pointing at Slack.
- [sdks]
Official Python + TypeScript SDKs
- ▸ Released
modeluxon PyPI and npm. - ▸ Thin wrappers over the OpenAI SDK with extra helpers for tags, end-user IDs, routing slugs, and decision traces.
- ▸ MIT licensed; source in the monorepo.
- ▸ Released
- [cache]
Semantic caching
- ▸ New semantic-match cache mode: request embeddings against a cache of recent responses, return on high similarity.
- ▸ Per-routing-config mode (
exact/semantic/ off), tunable similarity threshold. - ▸ Cache-hit metrics broken out in analytics.
- [routing]
Ensembles + cascades
- ▸ New
ensemblerouting strategy: parallel fan-out to N models, aggregation via weighted vote, first-valid, or LLM judge. - ▸ New
cascadestrategy: sequential attempts with early stop on success — quality-tier fallback made easy. - ▸ Live cost estimator in the routing config builder for both strategies.
- ▸ New
- [routing]
Cost- and latency-optimized routing
- ▸
cost_optimizedstrategy picks the cheapest allowed model meeting a quality tier. - ▸
latency_optimizedstrategy uses rolling p50 measurements to prefer the fastest healthy provider. - ▸ A/B testing strategy lands for controlled rollouts between configs.
- ▸
- [providers]
AWS Bedrock + Azure OpenAI adapters
- ▸ Added Bedrock with IAM credential format (access key::secret::region[::session]).
- ▸ Added Azure OpenAI with configurable base URLs per resource.
- ▸ Both adapters normalize tool-calling and structured-output behavior to the OpenAI shape.
- [dashboard]
Visual routing builder
- ▸ Drag-and-drop builder for fallback chains and ensembles.
- ▸ Live dry-run panel shows the decision trace for a sample prompt without calling the provider.
- ▸ Version diff view for every change.
- [observability]
Decision traces + full request logs
- ▸ Every request now records the full routing decision: attempts tried, reasons, per-attempt timings and costs.
- ▸ Log detail view in the dashboard with a routing trace card.
- ▸ Structured tags on every log entry for filtering and analytics group-by.
- [core]
Fallback routing, health tracking, retries
- ▸
fallbackrouting strategy with per-attempt timeouts and retry-on conditions (429, 5xx, timeout). - ▸ Per-provider rolling health (success rate, p50 latency) powers health-based routing.
- ▸ OpenAI SDK streaming (SSE) passes through unchanged.
- ▸
- [core]
modelux 1.0 — public beta
- ▸ OpenAI-compatible
/v1/chat/completionsand/v1/embeddingsacross OpenAI, Anthropic, and Google. - ▸ Projects, API keys, and BYO provider credentials.
- ▸ Per-request cost computation with per-model pricing tables.
- ▸ Free + Pro plans launched.
- ▸ OpenAI-compatible