modelux
$ modelux changelog --all [rss]

Changelog

What's shipped, newest first. Major features roll into the features page ; bug fixes and small improvements don't always appear here. Subscribe via RSS or the updates list .

  • [batches]

    Async batches + files (Anthropic + OpenAI)

    • POST /anthropic/v1/messages/batches with the full retrieve / list / results / cancel / delete surface — drop-in for the official Anthropic SDK with no body changes. See the docs.
    • POST /openai/v1/batches + POST /openai/v1/files for the OpenAI side (batches reference uploaded JSONL files); multipart upload preserves the boundary, large result downloads stream through without buffering. See the docs.
    • Each provider's 50% async discount applies untouched on the upstream side.
    • Authenticated thin passthrough — auth, BYOK (X-Modelux-Provider-Key), rate limits, and observability all on top of byte-identical request forwarding.
    • Batch traffic shows up in the dashboard's Logs + Analytics with per-operation request_type breakdowns alongside synchronous traffic.
  • [responses]

    OpenAI Responses API

    • POST /openai/v1/responses proxied as a thin authenticated passthrough — sync + SSE streaming + background mode. See the docs.
    • Stored / chained responses (previous_response_id) work via the retrieve / cancel / delete / input_items endpoints.
    • Usage capture pulls input_tokens / output_tokens / cached_tokens out of the terminal response.completed event so streaming traffic shows up with full token + cost breakdowns in analytics.
    • OpenAI-specific request fields (response_format, seed, logprobs, parallel_tool_calls) now pass through byte-identical on /openai/v1/chat/completions too — strict json_schema structured outputs and reproducible sampling work end-to-end.
  • [anthropic]

    Anthropic prompt caching + Files API

    • cache_control markers pass through verbatim everywhere Anthropic accepts them: per-content-block on messages, per-system-block when system is sent as an array of blocks, and per-tool. Anthropic's native cache discount applies on the next matching request.
    • Cache hit / write counts land on the request log row (cache_read_tokens, cache_creation_tokens) so you can answer "did my marker actually hit?".
    • Per-provider cost calculation now applies the right cache discount automatically (Anthropic 0.10× reads / 1.25× writes, OpenAI 0.50× reads, Google 0.25× reads).
    • POST /anthropic/v1/files proxied for the Anthropic Files API beta — upload documents/images once, reference by id from messages content blocks. The proxy forwards your anthropic-beta header so the SDK's beta-tag declaration reaches the upstream untouched. See the docs.
  • [billing]

    Self-serve plans and billing

    • Upgrade, downgrade, or switch between Free, Pro, and Team from Settings → Billing.
    • Stripe-powered checkout and customer portal for payment methods, invoices, and tax IDs.
    • Monthly or annual billing — annual is two months free (~17% off).
    • Plan-based feature gating and usage meters surface what's included and how much you've used.
  • [providers]

    Nine new providers

    • Added Groq, Fireworks, DeepSeek, xAI, Mistral, Cerebras, Together, Perplexity, and Cohere.
    • All available through the same OpenAI-compatible surface — drop them into any routing config.
    • Fourteen providers in total now, with normalized tool-calling and structured-output behavior.
  • [people]

    People and Customers

    • New People entity represents the humans inside your company who use your API keys — attach a key to a Person, see per-person spend and activity.
    • New Customers page shows end-user spend and volume in external-persona projects.
    • Offboarding a Person revokes their keys in one step, with an explicit confirmation.
    • Projects now declare a persona (internal vs. customer-facing) to keep the two surfaces distinct.
  • [scim]

    SCIM provisions People

    • SCIM now creates Person records (not dashboard users), with one token scoped per project.
    • Deactivating someone in your IdP automatically revokes their attached API keys.
    • Matches how enterprises actually model employee access — your joiner/mover/leaver flow maps cleanly onto projects and keys.
    • See the updated SCIM guide.
  • [security]

    Security settings

    • SSO, SCIM, and related controls consolidated under Settings → Security with a status-first layout.
    • At-a-glance status for SAML, domain verification, and SCIM tokens.
  • [keys]

    API key improvements

    • Reveal a key after creation — the plaintext is encrypted at rest so you can copy it again later instead of re-minting.
    • Optionally attach a key to a Person at creation time for clear ownership and spend attribution.
    • New Person column on the keys list; clickable counts jump you straight to that person's keys.
  • [onboarding]

    Agent-first onboarding

    • Rich, agent-friendly 401 responses now tell assistants exactly what's missing and how to unblock you.
    • New setup_status MCP tool lets Claude Code, Cursor, and other agents inspect and complete onboarding end-to-end.
    • New onboarding checklist in the dashboard with an explicit "connect an assistant" step.
  • [site]

    Marketing site + developer docs

    • Launched modelux.ai with terminal-themed marketing pages and developer docs.
    • Every docs page available as raw markdown (/docs/<slug>.md) and through /llms.txt + /llms-full.txt for LLM ingestion.
    • Pagefind full-text search in the top nav on docs pages.
    • JSON-LD structured data, OG images, sitemap, AI-crawler-friendly robots.txt.
  • [analytics]

    Users page, cost forecasting, period-over-period

    • New Users page surfaces top end-users by spend, volume, and latency.
    • Cost forecasting card projects end-of-month spend with trend confidence.
    • Period-over-period comparison overlays a previous window on every chart.
  • [analytics]

    Tag filtering across logs and analytics

    • Filter logs and analytics by arbitrary tag key-value pairs you attach at request time.
    • New analytics dashboard with stacked series, per-provider health rollups, and per-tag breakdowns.
  • [exports]

    Warehouse export via S3 Parquet

    • Configure scheduled exports of request logs, audit events, and aggregates to your own S3 bucket.
    • Parquet format with predictable per-hour partitioning.
    • BullMQ-backed worker with retries, backfills, and resumable cursors.
    • Tests cover transforms, PII handling, cursors, and multi-tenant isolation.
  • [integrations]

    Integrations surface + developer API keys

    • Consolidated integration settings under a single Integrations page: webhooks, MCP, exports, management tokens.
    • Rotate management API keys and view MCP tool usage from one place.
  • [mcp]

    MCP server with 80+ management tools

    • New MCP server at api.modelux.ai/mcp exposes every management API action as an MCP tool.
    • Works with Claude Code, Cursor, and any MCP-compatible client.
    • Natural-language workflows for creating configs, setting budgets, rotating credentials, inspecting logs.
  • [routing]

    Custom rule DSL

    • New custom_rules routing strategy with a small expression DSL over cost, latency, budget, and tags.
    • Test-harness endpoint lets you evaluate rules against sample requests before promoting.
    • Tenant-aware routing: branch on tags.tenant to dispatch enterprise traffic differently.
  • [audit]

    Audit log + config versioning

    • Every management-API mutation now writes an audit event with actor, target resource, diff, and source (UI, API, MCP).
    • Routing configs and provider credentials keep a full version history.
    • One-click rollback to any previous version.
  • [replay]

    Replay experiments

    • Pick a window of historical traffic (up to 24h) and replay it against a candidate routing config.
    • Side-by-side cost, latency, and success-rate diff vs. the current config.
    • Promote the winner with a single click; promotion creates an audited new version.
  • [budgets]

    Finance-grade budgets with auto-downgrade

    • Scoped budgets (org, project, tag, end-user) with soft-alert and hard-cap thresholds.
    • At-cap actions: alert, block with 402, or auto-downgrade to a cheaper routing config.
    • Budget-aware routing lets custom rules read budget.used_pct.
    • Email + Slack-compatible webhook alerts on threshold crossings.
  • [webhooks]

    Webhook endpoints for events

    • Subscribe to budget alerts, config changes, provider health transitions, and request anomalies.
    • HMAC-SHA256 signatures, durable delivery queue with exponential backoff, replay from the dashboard.
    • Slack-format auto-detection for webhook URLs pointing at Slack.
  • [sdks]

    Official Python + TypeScript SDKs

    • Released modelux on PyPI and npm.
    • Thin wrappers over the OpenAI SDK with extra helpers for tags, end-user IDs, routing slugs, and decision traces.
    • MIT licensed; source in the monorepo.
  • [cache]

    Semantic caching

    • New semantic-match cache mode: request embeddings against a cache of recent responses, return on high similarity.
    • Per-routing-config mode (exact / semantic / off), tunable similarity threshold.
    • Cache-hit metrics broken out in analytics.
  • [routing]

    Ensembles + cascades

    • New ensemble routing strategy: parallel fan-out to N models, aggregation via weighted vote, first-valid, or LLM judge.
    • New cascade strategy: sequential attempts with early stop on success — quality-tier fallback made easy.
    • Live cost estimator in the routing config builder for both strategies.
  • [routing]

    Cost- and latency-optimized routing

    • cost_optimized strategy picks the cheapest allowed model meeting a quality tier.
    • latency_optimized strategy uses rolling p50 measurements to prefer the fastest healthy provider.
    • A/B testing strategy lands for controlled rollouts between configs.
  • [providers]

    AWS Bedrock + Azure OpenAI adapters

    • Added Bedrock with IAM credential format (access key::secret::region[::session]).
    • Added Azure OpenAI with configurable base URLs per resource.
    • Both adapters normalize tool-calling and structured-output behavior to the OpenAI shape.
  • [dashboard]

    Visual routing builder

    • Drag-and-drop builder for fallback chains and ensembles.
    • Live dry-run panel shows the decision trace for a sample prompt without calling the provider.
    • Version diff view for every change.
  • [observability]

    Decision traces + full request logs

    • Every request now records the full routing decision: attempts tried, reasons, per-attempt timings and costs.
    • Log detail view in the dashboard with a routing trace card.
    • Structured tags on every log entry for filtering and analytics group-by.
  • [core]

    Fallback routing, health tracking, retries

    • fallback routing strategy with per-attempt timeouts and retry-on conditions (429, 5xx, timeout).
    • Per-provider rolling health (success rate, p50 latency) powers health-based routing.
    • OpenAI SDK streaming (SSE) passes through unchanged.
  • [core]

    modelux 1.0 — public beta

    • OpenAI-compatible /v1/chat/completions and /v1/embeddings across OpenAI, Anthropic, and Google.
    • Projects, API keys, and BYO provider credentials.
    • Per-request cost computation with per-model pricing tables.
    • Free + Pro plans launched.