Batches & files
modelux proxies the OpenAI Batches API (/v1/batches/*) and the
companion Files API (/v1/files/*) as authenticated thin
passthroughs. OpenAI’s batch input and results are file-based, so
both surfaces are needed end-to-end. OpenAI’s 50% async batch
discount applies on the upstream side untouched.
# Files (used as batch input + carrier for batch results)
POST /openai/v1/files multipart upload (purpose=batch)
GET /openai/v1/files list (purpose / limit / order / after)
GET /openai/v1/files/{id} metadata
GET /openai/v1/files/{id}/content download bytes (results)
DELETE /openai/v1/files/{id}
# Batches
POST /openai/v1/batches create (references input_file_id)
GET /openai/v1/batches list (limit / after)
GET /openai/v1/batches/{id} retrieve
POST /openai/v1/batches/{id}/cancel
End-to-end walkthrough
The full lifecycle is upload → create → poll → download → (optional)
delete. The official OpenAI SDK works as a drop-in client; just point
base_url at modelux:
from openai import OpenAI
client = OpenAI(
base_url="https://api.modelux.ai/openai/v1",
api_key="mlx_sk_...",
)
# 1. Upload the JSONL batch input file.
input_file = client.files.create(
file=open("requests.jsonl", "rb"),
purpose="batch",
)
# 2. Create a batch referencing the file.
batch = client.batches.create(
input_file_id=input_file.id,
endpoint="/v1/chat/completions",
completion_window="24h",
metadata={"source": "nightly-rerank"},
)
# 3. Poll until done.
import time
while True:
b = client.batches.retrieve(batch.id)
if b.status in ("completed", "failed", "expired", "cancelled"):
break
time.sleep(30)
# 4. Download the output file (a JSONL of responses keyed by custom_id).
if b.status == "completed":
output = client.files.content(b.output_file_id).read()
JSONL input format
Each line in the upload file is one chat-completions (or embeddings, or other supported endpoint) request:
{"custom_id":"row-1","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Summarize: ..."}]}}
{"custom_id":"row-2","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Classify: ..."}]}}
custom_id correlates each result back to the input row.
JSONL output format
Once the batch completes, output_file_id references a JSONL file
where each line is one response:
{"id":"batch_req_...","custom_id":"row-1","response":{"status_code":200,"body":{...full chat completion...}}}
{"id":"batch_req_...","custom_id":"row-2","response":{"status_code":200,"body":{...full chat completion...}}}
Errors get an error_file_id instead — same JSONL shape, with
error.code and error.message per row.
Multipart uploads
POST /openai/v1/files accepts multipart/form-data exactly like
the upstream. The proxy preserves the boundary and forwards the body
unchanged. Required fields:
| Field | Value |
|---|---|
purpose | batch (or assistants, fine-tune, vision, user_data per OpenAI’s catalog) |
file | the file part — JSONL for batch input |
The proxy buffers the upload body once before forwarding (OpenAI’s
upload endpoint requires Content-Length and doesn’t accept chunked
transfer encoding). Practical limit per call is OpenAI’s documented
file-size cap — currently 512MB / 200M tokens for JSONL batch files —
which the proxy doesn’t enforce locally; oversize uploads get the
upstream’s error response forwarded back.
Streaming downloads
GET /v1/files/{id}/content streams the upstream body chunk-by-chunk
via io.Copy — the proxy never buffers the file. Result JSONLs of
hundreds of MB stream through with no memory blow-up.
BYOK
X-Modelux-Provider-Key: sk-... overrides the org’s stored OpenAI
credential for this call. Wins for the upstream key; the base URL
still comes from any stored credential (so self-hosted
OpenAI-compatible endpoints work with BYOK).
Observability
Each endpoint logs to ClickHouse with a distinct request_type:
file_uploadfile_listfile_retrievefile_downloadfile_deletebatch_createbatch_listbatch_retrievebatch_cancel
Per-sub-request analytics inside a batch isn’t expanded into
individual log rows; query the output JSONL directly for that
breakdown. The proxy captures input_file_id for create, id
path-parameter for retrieve/cancel/delete, and the upstream error
envelope (when one is returned) for failed calls.
See also
- Chat completions — the synchronous OpenAI surface
- Message batches (Anthropic) — the Anthropic equivalent
- Capability matrix — what’s supported where