<!-- source: https://modelux.ai/docs/guides/ab-testing -->

> Run controlled experiments to compare models in production.

# A/B testing models

A/B tests route a configurable percentage of traffic to each sub-config so
you can compare cost, latency, and quality in real production traffic.

## Why A/B test?

- Changing models is high-stakes. Benchmarks don't match your specific use case.
- Ensemble configs are especially tricky — aggregation behavior depends on your data distribution.
- Cost/latency claims from vendors rarely match your real numbers.

## Create an A/B test

```json
{
  "strategy": "ab_test",
  "variants": [
    { "weight": 80, "config": "@production" },
    { "weight": 20, "config": "@production-candidate" }
  ]
}
```

Call the wrapper config from your app:

```python
client.chat.completions.create(
    model="@experiment",
    messages=[...],
)
```

Modelux logs which variant ran per request, so you can compare.

## Read the results

Go to **Analytics -> Compare variants**. Modelux shows side-by-side:

- Request volume
- Mean cost per request
- p50 / p95 latency
- Error rate

If you tag requests with a quality signal from your app (e.g., user
thumbs-up/down), the analytics can also compare quality metrics across
variants.

## Promote a variant

Once you've seen enough volume to be confident, promote the winner:

1. Go to **Simulations** or the routing config's versions view
2. Select the variant
3. Click **Promote** — Modelux atomically switches your traffic over

## Replay before you A/B

If you want signal before sending real traffic, use the **replay simulator**:
take the last 24h of requests and run them through the candidate config.
You'll see the cost/latency diff without risking production quality.