<!-- source: https://modelux.ai/docs/api/openai-batches -->

> /openai/v1/batches and /openai/v1/files — async batch processing at 50% upstream discount.

# Batches & files

modelux proxies the OpenAI Batches API (`/v1/batches/*`) and the
companion Files API (`/v1/files/*`) as authenticated thin
passthroughs. OpenAI's batch input and results are file-based, so
both surfaces are needed end-to-end. OpenAI's 50% async batch
discount applies on the upstream side untouched.

```
# Files (used as batch input + carrier for batch results)
POST   /openai/v1/files                 multipart upload (purpose=batch)
GET    /openai/v1/files                 list (purpose / limit / order / after)
GET    /openai/v1/files/{id}            metadata
GET    /openai/v1/files/{id}/content    download bytes (results)
DELETE /openai/v1/files/{id}

# Batches
POST   /openai/v1/batches               create (references input_file_id)
GET    /openai/v1/batches               list (limit / after)
GET    /openai/v1/batches/{id}          retrieve
POST   /openai/v1/batches/{id}/cancel
```

## End-to-end walkthrough

The full lifecycle is upload → create → poll → download → (optional)
delete. The official OpenAI SDK works as a drop-in client; just point
`base_url` at modelux:

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.modelux.ai/openai/v1",
    api_key="mlx_sk_...",
)

# 1. Upload the JSONL batch input file.
input_file = client.files.create(
    file=open("requests.jsonl", "rb"),
    purpose="batch",
)

# 2. Create a batch referencing the file.
batch = client.batches.create(
    input_file_id=input_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={"source": "nightly-rerank"},
)

# 3. Poll until done.
import time
while True:
    b = client.batches.retrieve(batch.id)
    if b.status in ("completed", "failed", "expired", "cancelled"):
        break
    time.sleep(30)

# 4. Download the output file (a JSONL of responses keyed by custom_id).
if b.status == "completed":
    output = client.files.content(b.output_file_id).read()
```

## JSONL input format

Each line in the upload file is one chat-completions (or embeddings,
or other supported endpoint) request:

```json
{"custom_id":"row-1","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Summarize: ..."}]}}
{"custom_id":"row-2","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Classify: ..."}]}}
```

`custom_id` correlates each result back to the input row.

## JSONL output format

Once the batch completes, `output_file_id` references a JSONL file
where each line is one response:

```json
{"id":"batch_req_...","custom_id":"row-1","response":{"status_code":200,"body":{...full chat completion...}}}
{"id":"batch_req_...","custom_id":"row-2","response":{"status_code":200,"body":{...full chat completion...}}}
```

Errors get an `error_file_id` instead — same JSONL shape, with
`error.code` and `error.message` per row.

## Multipart uploads

POST `/openai/v1/files` accepts `multipart/form-data` exactly like
the upstream. The proxy preserves the boundary and forwards the body
unchanged. Required fields:

| Field | Value |
|---|---|
| `purpose` | `batch` (or `assistants`, `fine-tune`, `vision`, `user_data` per OpenAI's catalog) |
| `file` | the file part — JSONL for batch input |

The proxy buffers the upload body once before forwarding (OpenAI's
upload endpoint requires `Content-Length` and doesn't accept chunked
transfer encoding). Practical limit per call is OpenAI's documented
file-size cap — currently 512MB / 200M tokens for JSONL batch files —
which the proxy doesn't enforce locally; oversize uploads get the
upstream's error response forwarded back.

## Streaming downloads

`GET /v1/files/{id}/content` streams the upstream body chunk-by-chunk
via `io.Copy` — the proxy never buffers the file. Result JSONLs of
hundreds of MB stream through with no memory blow-up.

## BYOK

`X-Modelux-Provider-Key: sk-...` overrides the org's stored OpenAI
credential for this call. Wins for the upstream key; the base URL
still comes from any stored credential (so self-hosted
OpenAI-compatible endpoints work with BYOK).

## Observability

Each endpoint logs to ClickHouse with a distinct `request_type`:

- `file_upload`
- `file_list`
- `file_retrieve`
- `file_download`
- `file_delete`
- `batch_create`
- `batch_list`
- `batch_retrieve`
- `batch_cancel`

Per-sub-request analytics inside a batch isn't expanded into
individual log rows; query the output JSONL directly for that
breakdown. The proxy captures `input_file_id` for create, `id`
path-parameter for retrieve/cancel/delete, and the upstream error
envelope (when one is returned) for failed calls.

## See also

- [Chat completions](/docs/api/chat-completions) — the synchronous OpenAI surface
- [Message batches (Anthropic)](/docs/api/anthropic-batches) — the Anthropic equivalent
- [Capability matrix](/docs/concepts/capability-matrix) — what's supported where
