Batches & files

modelux proxies the OpenAI Batches API (/v1/batches/*) and the companion Files API (/v1/files/*) as authenticated thin passthroughs. OpenAI’s batch input and results are file-based, so both surfaces are needed end-to-end. OpenAI’s 50% async batch discount applies on the upstream side untouched.

# Files (used as batch input + carrier for batch results)
POST   /openai/v1/files                 multipart upload (purpose=batch)
GET    /openai/v1/files                 list (purpose / limit / order / after)
GET    /openai/v1/files/{id}            metadata
GET    /openai/v1/files/{id}/content    download bytes (results)
DELETE /openai/v1/files/{id}

# Batches
POST   /openai/v1/batches               create (references input_file_id)
GET    /openai/v1/batches               list (limit / after)
GET    /openai/v1/batches/{id}          retrieve
POST   /openai/v1/batches/{id}/cancel

End-to-end walkthrough

The full lifecycle is upload → create → poll → download → (optional) delete. The official OpenAI SDK works as a drop-in client; just point base_url at modelux:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.modelux.ai/openai/v1",
    api_key="mlx_sk_...",
)

# 1. Upload the JSONL batch input file.
input_file = client.files.create(
    file=open("requests.jsonl", "rb"),
    purpose="batch",
)

# 2. Create a batch referencing the file.
batch = client.batches.create(
    input_file_id=input_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={"source": "nightly-rerank"},
)

# 3. Poll until done.
import time
while True:
    b = client.batches.retrieve(batch.id)
    if b.status in ("completed", "failed", "expired", "cancelled"):
        break
    time.sleep(30)

# 4. Download the output file (a JSONL of responses keyed by custom_id).
if b.status == "completed":
    output = client.files.content(b.output_file_id).read()

JSONL input format

Each line in the upload file is one chat-completions (or embeddings, or other supported endpoint) request:

{"custom_id":"row-1","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Summarize: ..."}]}}
{"custom_id":"row-2","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Classify: ..."}]}}

custom_id correlates each result back to the input row.

JSONL output format

Once the batch completes, output_file_id references a JSONL file where each line is one response:

{"id":"batch_req_...","custom_id":"row-1","response":{"status_code":200,"body":{...full chat completion...}}}
{"id":"batch_req_...","custom_id":"row-2","response":{"status_code":200,"body":{...full chat completion...}}}

Errors get an error_file_id instead — same JSONL shape, with error.code and error.message per row.

Multipart uploads

POST /openai/v1/files accepts multipart/form-data exactly like the upstream. The proxy preserves the boundary and forwards the body unchanged. Required fields:

Field	Value
`purpose`	`batch` (or `assistants`, `fine-tune`, `vision`, `user_data` per OpenAI’s catalog)
`file`	the file part — JSONL for batch input

The proxy buffers the upload body once before forwarding (OpenAI’s upload endpoint requires Content-Length and doesn’t accept chunked transfer encoding). Practical limit per call is OpenAI’s documented file-size cap — currently 512MB / 200M tokens for JSONL batch files — which the proxy doesn’t enforce locally; oversize uploads get the upstream’s error response forwarded back.

Streaming downloads

GET /v1/files/{id}/content streams the upstream body chunk-by-chunk via io.Copy — the proxy never buffers the file. Result JSONLs of hundreds of MB stream through with no memory blow-up.

BYOK

X-Modelux-Provider-Key: sk-... overrides the org’s stored OpenAI credential for this call. Wins for the upstream key; the base URL still comes from any stored credential (so self-hosted OpenAI-compatible endpoints work with BYOK).

Observability

Each endpoint logs to ClickHouse with a distinct request_type:

file_upload
file_list
file_retrieve
file_download
file_delete
batch_create
batch_list
batch_retrieve
batch_cancel

Per-sub-request analytics inside a batch isn’t expanded into individual log rows; query the output JSONL directly for that breakdown. The proxy captures input_file_id for create, id path-parameter for retrieve/cancel/delete, and the upstream error envelope (when one is returned) for failed calls.