What the Batch API actually is — and why the separate quota matters
The Batch API is OpenAI's asynchronous job system. You upload a JSONL file via the Files endpoint, create a Batch object pointing to that file, and OpenAI runs every request in the file within the 24-hour SLA window. Output comes back as another JSONL file you download. There is no streaming, no synchronous response, no per-request latency control — you trade real-time-ness for **50% off** both input and output token prices and access to a separate rate-limit pool.
**The separate-quota architecture is the point.** Real-time API calls consume your tier's RPM (requests per minute) and TPM (tokens per minute) budget — at Tier 3, that's a few hundred RPM and a few hundred thousand TPM on gpt-5.5. Submitting a 10-million-token batch consumes zero RPM and zero TPM against your real-time budget. The batch sits in a separate queue, governed only by the per-tier enqueued-token cap. This means a single account can run real-time customer-facing traffic *and* a multi-million-token evaluation batch concurrently with no contention.
This is why teams that have not unlocked Tier 5 yet can still process workloads at Tier-5-equivalent throughput — as long as the workload is asynchronous. Training-set generation, weekly classification at scale, large-scale evaluations, document summarization across an entire corpus, dense-retrieval embedding precompute — all of these belong in Batch, not in the real-time API, regardless of your tier.