Question 1

Why flat tiers instead of per-token pricing?

Accepted Answer

Predictable cost. You already pay providers per-token — adding a per-token fee on top feels like double-taxation. We charge a flat subscription so you know what you'll pay. We also want to encourage more traffic through modelux (more routing data = better decisions).

Question 2

Do I pay modelux for the LLM calls?

Accepted Answer

No. modelux proxies your requests using your own provider credentials (BYO keys). You pay OpenAI, Anthropic, etc. directly. modelux charges only for the control plane.

Question 3

What happens if I exceed my tier's request limit?

Accepted Answer

Soft limit by default — service continues, you get an email and a dashboard banner suggesting an upgrade. 10% grace buffer before the nudge. No overage charges. Enterprise customers can configure hard limits if needed.

Question 4

Can I self-host?

Accepted Answer

Not today. modelux is managed SaaS. If you need on-prem or VPC deployment, talk to us about Enterprise — we're evaluating dedicated deployments on a case-by-case basis.

Question 5

How much can I save with smart routing?

Accepted Answer

Example: a team spending $10k/month on GPT-4o typically saves $4-5k by routing 60% of traffic to GPT-4o-mini for simpler queries. Ensembles of smaller models can match frontier-model quality at 20% of the cost. modelux pays for itself many times over.

Feature	Team	Pro	Free	Enterprise
Core
Monthly requests	1M	100k	10k	Custom
Projects	Unlimited	Unlimited	Unlimited	Unlimited
Provider credentials	Unlimited	Unlimited	Unlimited	Unlimited
API keys	Unlimited	Unlimited	Unlimited	Unlimited
Team members	Unlimited	Unlimited	Unlimited	Unlimited
Routing
Single model	✓	✓	✓	✓
Fallback chains	✓	✓	✓	✓
Cost-optimized	✓	✓	—	✓
Latency-optimized	✓	✓	—	✓
Ensembles	✓	✓	—	✓
A/B tests	✓	✓	—	✓
Cascade	✓	✓	—	✓
Custom rule DSL	✓	—	—	✓
Control plane
Budgets & caps	✓	✓	—	✓
Replay experiments	✓	—	—	✓
Decision traces	✓	✓	✓	✓
Webhooks	✓	✓	—	✓
Audit logs	—	—	—	✓
Observability
Log retention	Unlimited	Unlimited	Unlimited	Unlimited
Request analytics	✓	✓	✓	✓
Latency percentiles	✓	✓	✓	✓
Cost forecasting	✓	✓	—	✓
Warehouse export	—	—	—	✓
Reliability & performance
Multi-provider failover	✓	✓	✓	✓
Health-aware routing	✓	✓	✓	✓
Per-attempt timeouts & retries	✓	✓	✓	✓
Uptime target	99.9%	99.9%	Best-effort	99.95%
Dedicated capacity	—	—	—	✓
Security & support
Team management	✓	—	—	✓
SSO / SAML	—	—	—	✓
Support	Priority	Email	Community	Dedicated
Contractual SLA	—	—	—	✓

Flat tiers. No per-token markup.

Feature comparison

Questions you might have

Why flat tiers instead of per-token pricing?

Do I pay modelux for the LLM calls?

What happens if I exceed my tier's request limit?

Can I self-host?

How much can I save with smart routing?