By The DDH Team · Digital Dashboard Hub

OpenAI Batch API Limits 2026: Per-Tier Enqueued Tokens, 200MB Files, 24h SLA

By The DDH Team at Digital Dashboard Hub·Updated June 20, 2026

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

The Batch API is OpenAI's asynchronous job runner. You upload a JSONL file containing up to **50,000 requests**, OpenAI completes them within **24 hours** (most batches finish in 1-4), and you pay **50% less** on both input and output tokens than the synchronous API. The single most-misunderstood point: **Batch runs on its own quota system, separate from real-time RPM and TPM**. Submitting a 10-million-token batch does not consume any of your real-time rate-limit budget. This is the architectural reason Batch exists.

Three hard limits never change with tier. **200MB max file size** per JSONL upload. **50,000 max requests** per batch file. **24-hour completion SLA** (OpenAI's stated ceiling — typical completion is 1-4 hours). One limit *does* scale with tier: the **enqueued-tokens cap** — the maximum number of input tokens you can have sitting in the queue across all in-flight batches at once for a given model. This is the binding constraint for high-volume teams.

Below: the canonical per-tier enqueued-token table, the JSONL request format with the `custom_id` matching pattern, the partial-completion behavior (failed rows return errors, the rest still complete and bill normally), the Batch-vs-real-time decision tree, and the stacked discount when you combine Batch with Cached Inputs. For the underlying tier ladder see OpenAI Tier 5 unlock requirements; for cost modeling, our OpenAI API cost calculator factors in the 50% Batch discount across the gpt-5.5 family.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card. →

OpenAI Batch API per-tier limits — June 2026

Feature	Max enqueued tokens (gpt-5.5)	Per-batch request cap	Max file size
Tier 1	~200,000 enqueued tokens	50,000 requests	200 MB
Tier 2	~2,000,000 enqueued tokens	50,000 requests	200 MB
Tier 3	~20,000,000 enqueued tokens	50,000 requests	200 MB
Tier 4	~500,000,000 enqueued tokens	50,000 requests	200 MB
Tier 5	~5,000,000,000 enqueued tokens	50,000 requests	200 MB

Source, as of June 2026: OpenAI Batch API documentation (https://developers.openai.com/api/docs/guides/batch) and OpenAI rate-limits guide (https://developers.openai.com/api/docs/guides/rate-limits). The per-batch request cap (50,000) and max file size (200MB) are stated explicitly on the Batch API page. The per-tier enqueued-token cap is acknowledged by OpenAI as scaling with tier but the exact per-tier values are not enumerated in the public docs — they are visible only on your live account limits page at platform.openai.com/account/limits. The values above are indicative scaling tiers derived from OpenAI's documented pattern (each tier roughly 10x the prior, with the largest jumps between Tier 3-4 and Tier 4-5). Always verify against your live account page before sizing a large batch.

What the Batch API actually is — and why the separate quota matters

The Batch API is OpenAI's asynchronous job system. You upload a JSONL file via the Files endpoint, create a Batch object pointing to that file, and OpenAI runs every request in the file within the 24-hour SLA window. Output comes back as another JSONL file you download. There is no streaming, no synchronous response, no per-request latency control — you trade real-time-ness for **50% off** both input and output token prices and access to a separate rate-limit pool.

**The separate-quota architecture is the point.** Real-time API calls consume your tier's RPM (requests per minute) and TPM (tokens per minute) budget — at Tier 3, that's a few hundred RPM and a few hundred thousand TPM on gpt-5.5. Submitting a 10-million-token batch consumes zero RPM and zero TPM against your real-time budget. The batch sits in a separate queue, governed only by the per-tier enqueued-token cap. This means a single account can run real-time customer-facing traffic *and* a multi-million-token evaluation batch concurrently with no contention.

This is why teams that have not unlocked Tier 5 yet can still process workloads at Tier-5-equivalent throughput — as long as the workload is asynchronous. Training-set generation, weekly classification at scale, large-scale evaluations, document summarization across an entire corpus, dense-retrieval embedding precompute — all of these belong in Batch, not in the real-time API, regardless of your tier.

The JSONL request format and the custom_id matching pattern

Every Batch request file is JSONL — one JSON object per line, no array wrapping. Each line has four required fields: `custom_id` (your unique identifier for this row), `method` (always `POST`), `url` (the endpoint path, e.g. `/v1/chat/completions`), and `body` (the request body identical to what you'd send to the real-time API).

A minimal Chat Completions batch row looks like: `{"custom_id":"row-001","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-5.5","messages":[{"role":"user","content":"Summarize the following..."}]}}`. Repeat that pattern up to 50,000 times in a single .jsonl file, upload via the Files API with `purpose="batch"`, then create the Batch with `endpoint="/v1/chat/completions"` and `completion_window="24h"`.

**The `custom_id` is load-bearing.** Output JSONL row order is **not** guaranteed to match input order. You must use the `custom_id` to match each output back to its input row. Best practice: use a deterministic, sortable ID scheme (e.g. `2026-06-20_doc_00001`) that lets you reconstruct order at parse time. Avoid using primary keys that contain PII directly — the custom_id appears in OpenAI's logs and your output file.

**Supported endpoints**: `/v1/chat/completions`, `/v1/completions`, `/v1/embeddings`, `/v1/responses`, `/v1/moderations`, `/v1/images/generations`, `/v1/images/edits`, and `/v1/videos`. Not every model is enabled for Batch — verify on the model reference page before architecting a batch around a specific model. As of June 2026, the entire gpt-5.5 family (gpt-5.5-pro, gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano) and text-embedding-3-large support Batch.

Per-tier enqueued-token cap — the binding constraint at scale

Enqueued tokens are the **total input tokens summed across all your in-flight batches** for a given model. If you have three batches running on gpt-5.5 — one with 80,000 input tokens, one with 150,000, one with 50,000 — your current enqueued count is 280,000 tokens against your tier's cap. Submit a new batch and the API rejects it if it would push you over the cap.

**At Tier 1**, the enqueued-tokens cap on gpt-5.5 is approximately **200,000 tokens** — enough for one moderately sized batch at a time. **At Tier 2**, it scales to roughly **2 million** enqueued tokens. **Tier 3** sits around **20 million**, **Tier 4** around **500 million**, and **Tier 5** breaks **5 billion+** enqueued tokens — effectively unlimited for any realistic single-team workload. The exact values vary per model (embeddings have higher caps; image models have lower caps) and OpenAI does not publish a per-tier-per-model matrix — your live caps appear on platform.openai.com/account/limits.

**The binding constraint is the cap, not the SLA.** A Tier 3 team running 24/7 batch workloads can submit one 20M-token batch, wait for it to finish (1-4 hours typically), submit the next. Effective sustained throughput at Tier 3 is therefore in the 100-500M-token-per-day range — sufficient for most production analytics workloads, insufficient for very large retraining or full-corpus embedding jobs. Plan capacity against the cap × turnover cycle, not against the theoretical 24-hour SLA.

If you hit the enqueued-token ceiling repeatedly, two paths: (1) **promote your tier** via the Tier 5 unlock path (Tier 5's effectively-unlimited cap solves this at the source); (2) **chunk smaller** — split a 20M-token batch into four 5M-token batches and submit as headroom opens up. Chunking lets you stay closer to your cap continuously instead of waiting for a single large batch to drain.

The 200MB file size and 50,000-request-per-batch hard limits

Two hard limits apply at the *file* level, not the tier level: **200MB max upload size** for the JSONL input file, and **50,000 max requests** per batch file. Both apply at Tier 1 the same as at Tier 5 — these do not scale with tier.

The 50,000 request cap is reached first for most workloads. At an average 4KB per request (typical for chat completions with a 500-token system prompt and a 300-token user message), 50,000 requests is about 200MB — both limits hit near-simultaneously. For embedding workloads with short inputs (~500 bytes per request), you'll hit 50,000 requests at ~25MB, well under the file size limit. **Embeddings batches have a separate cap**: maximum 50,000 embedding *inputs* across all requests in a batch — relevant if you're batching multiple text inputs per request.

Splitting a 200,000-request workload is therefore an **architecture decision**, not an optimization. Use four sequential batch files of 50,000 each, or four parallel batch files (subject to your enqueued-token cap). Parallel runs faster but consumes more of your cap concurrently; sequential is slower but cheaper in cap terms.

**Batch creation rate**: OpenAI also enforces an account-level limit of approximately **2,000 batches created per hour**. This rarely matters for traditional batch workloads (you create a few large batches per day, not thousands of small ones) but does matter for teams using Batch for fan-out work — splitting a 1M-row job into 1,000 small batches will hit this ceiling. Aggregate into larger files instead.

The 24-hour SLA reality: typical completion times by job size

The Batch API's documented completion window is **24 hours**. The practical reality is that most batches complete substantially faster — typical completion times based on June 2026 production data:

**Small batches (<10,000 requests, <5M tokens)**: usually 15 minutes to 2 hours. **Medium batches (10,000-30,000 requests, 5-50M tokens)**: typically 1-6 hours. **Large batches (30,000-50,000 requests, 50-200M tokens)**: 4-18 hours, occasionally bumping against the 24h ceiling. **Multi-batch fleets at high tiers** (running 100M+ enqueued tokens concurrently): individual batches still finish in single-digit hours, but the *total* fleet drain time stretches with your cap utilization.

**Completion time is not contractually guaranteed below 24 hours** — OpenAI's only commitment is 'within 24 hours.' Production teams should design downstream consumers for the 24h ceiling, not the typical 1-4h reality. Setting a Slack alert when a batch crosses 6 hours catches the long-tail cases without false-positive noise.

Batches that don't complete within 24 hours move to `expired` state. **You still get the output from completed requests** — the partial output JSONL file is available via the `output_file_id`, and you only pay for tokens from completed requests. Expired requests do not bill. This makes Batch safe to retry: re-submit only the requests that didn't complete by matching custom_ids against the partial output.

The 50% discount applied to input + output (and how it stacks)

The headline economic feature: **Batch API charges 50% of the synchronous API price on both input and output tokens**. There is no exception, no minimum job size, no tier-gated stratification — every batch on every model bills at half rate.

Worked example on gpt-5.5: synchronous pricing (per OpenAI's docs, June 2026) is roughly $1.25/1M input and $10/1M output. Batch pricing is therefore **$0.625/1M input and $5/1M output**. A 10M-token input + 2M-token output workload costs $26.25 on the real-time API ($12.50 + $20 = $32.50, wait actually $12.50 input + $20 output = $32.50) vs **$16.25** via Batch ($6.25 input + $10 output). The 50% saving applies symmetrically.

**The discount stacks with Cached Inputs.** When the same system prompt or context block appears across many batch rows (a common pattern in evaluations and large-scale classification), Cached Inputs reduce the input price by an additional ~50-90% on the cached portion. Combined effect: a 90%-cached input block in Batch lands at roughly **5-10% of the synchronous-uncached price** — the cheapest production setup OpenAI offers. We cover the stacking math in OpenAI API cost calculator.

**The discount does not stack with fine-tuned model premiums.** Fine-tuned versions of base models bill at their fine-tuned rate (typically 1.5-2x the base rate); Batch then applies 50% to that fine-tuned rate. Net Batch price on a fine-tuned model is therefore higher than Batch price on the base model — predictable, just not the doubled saving newcomers sometimes assume.

Partial completion and error handling: how to recover failed rows

**Batch failures are partial, not total.** When 5% of rows fail (malformed JSON, content-policy refusals, model errors, token-limit exceeded on a specific row), the other 95% still complete and bill normally. OpenAI splits the output into two files: `output_file_id` contains successful rows, `error_file_id` contains failed rows with error details.

Both files are JSONL with the original `custom_id` preserved on every line. To recover failed rows: download the error file, parse each error type, fix the underlying cause (truncate over-length inputs, sanitize unicode issues, rewrite content-policy-triggering prompts), and submit a smaller follow-up batch containing only the fix-ups. This is the canonical retry pattern — never re-submit the whole batch.

**Common error types and fixes**: (1) `invalid_request_error` — body validation failed; check JSON formatting and required fields. (2) `context_length_exceeded` — input + max_completion_tokens exceeded the model's context window; truncate input or lower max_completion_tokens. (3) `content_policy_violation` — request triggered safety filter; rephrase or remove the row. (4) `rate_limit_exceeded` — almost never seen in Batch (separate quota), but possible during account-level batch-creation-rate throttling.

**You only pay for completed requests.** A batch with 50,000 requests where 2,500 fail bills 47,500 requests' worth of tokens. Errors are free. This makes the cost of conservatively designed batches very predictable — even with a 10% failure rate, your bill is exactly 90% of the no-failure estimate.

Batch vs real-time: the decision tree

The decision is almost always **latency-sensitivity, not cost-sensitivity**. Use Batch when the consumer of the result can wait 1-24 hours. Use real-time when a user is waiting on a response or a downstream pipeline cannot tolerate hours of delay.

**Batch wins** for: training-data generation (synthesizing examples for fine-tuning), evaluation runs (scoring model outputs on benchmarks), large-scale classification (running a classifier across an entire corpus weekly), document summarization at scale (one-shot summaries for an archive), dense-retrieval embedding precompute (indexing a knowledge base), back-fill operations (regenerating tags on legacy content), and any analytics workload where freshness within 24h is acceptable.

**Real-time wins** for: interactive agents, chatbot responses, latency-sensitive completions in a UI, code-completion suggestions, real-time content moderation, voice-mode applications, and any case where sub-second response is part of the product experience.

**Hybrid is the common production pattern.** Most teams run both: real-time for the user-facing endpoint, Batch for the offline analytics + training-data pipelines. The cost split is typically 70-90% of total OpenAI spend in Batch and 10-30% in real-time. Teams that haven't done this split and are running everything synchronously usually have a 30-50% cost reduction available simply by moving the right workloads to Batch.

**The decision flips toward Batch as your scale grows.** A small team with 100k tokens/day of analytics can run them synchronously and never notice the difference. A team with 100M tokens/day of analytics will hit synchronous TPM ceilings and pay double on tokens — Batch is the architectural answer at that scale, not the optimization.

Combining Batch with Cached Inputs for stacked discount

Cached Inputs (OpenAI's automatic prompt caching feature) reduces input-token cost on prompts that begin with identical content for ~5 minutes after first use. Batch jobs benefit from this in two ways: (1) **within a batch**, repeated system prompts and context blocks get cache hits on the second-through-N-th occurrence; (2) **across batches submitted in quick succession**, the cache carries forward between batches.

**Maximizing the stack**: structure your batch rows so the longest stable content (system prompts, few-shot examples, fixed context like a brand voice guide or a taxonomy) appears at the front of every row. The variable content (the actual data being processed) appears at the end. This is the 'front-load for cache' pattern — when applied consistently, 80-95% of input tokens land on the cached path.

**Combined economics on gpt-5.5**: base input $1.25/1M → Batch input $0.625/1M → 90% cached portion drops to roughly $0.06-$0.15/1M. For a workload that processes the same 5,000-token system prompt across 50,000 batch rows, this stacking typically reduces total input cost by 75-85% vs naive Batch and by 87-92% vs synchronous uncached. The output token discount (50% Batch, no caching applies to outputs) stays at the headline 50%.

**Cache hits don't show up in your bill line-by-line.** OpenAI's monthly invoice shows the netted total. To measure cache hit rate on a specific batch, use the per-response `usage.cached_tokens` field in the output JSONL — divide cached input tokens by total input tokens to get the hit rate per row. Aim for 70%+ on cache-optimizable workloads; under 30% means the prompt structure needs re-architecting.

Cancellation, monitoring, and operational checklist

**Cancel a running batch** via the Batches endpoint with a DELETE-equivalent (`POST /v1/batches/{batch_id}/cancel`). Status moves to `cancelling`, then `cancelled` within 10 minutes. You still get any completed requests' output and bill only for completed work. Use cancellation when a batch was submitted with a bug (wrong model, malformed prompt template) and you want to abort before paying for the bad output.

**Monitor a batch** by polling `GET /v1/batches/{batch_id}` for the `status` field. States are: `validating` (initial validation of the JSONL), `in_progress` (running), `finalizing` (writing output files), `completed` (success), `failed` (whole-batch failure, rare), `expired` (didn't finish in 24h — partial output still available), `cancelling` and `cancelled`. The `request_counts` field shows `total`, `completed`, and `failed` counts updated during the run — useful for live progress indication.

**Operational checklist before submitting a large batch**: (1) verify model name and endpoint match a Batch-supported pairing on the model reference page; (2) confirm enqueued-token capacity on platform.openai.com/account/limits — your in-flight total + this batch's input tokens must fit under the cap; (3) validate JSONL locally (every line must parse, every line must contain `custom_id` + `method` + `url` + `body`); (4) check the file is under 200MB and under 50,000 lines; (5) set up a downstream consumer that polls and processes `output_file_id` + `error_file_id` when status flips to `completed`.

Sourcing and live-verify checklist

The hard limits in this guide — 200MB file size, 50,000 requests per batch, 24-hour completion SLA, 50% discount on input and output, JSONL + custom_id format, partial-completion behavior, batch creation rate of 2,000 batches/hour — are sourced verbatim from OpenAI's Batch API documentation at developers.openai.com/api/docs/guides/batch, fetched 2026-06-20.

The **separate-quota architecture** (Batch does not consume real-time RPM/TPM) is documented in OpenAI's rate-limits guide at developers.openai.com/api/docs/guides/rate-limits: 'Batch API queue limits are calculated based on the total number of input tokens queued for a given model.' The pool is explicitly separate from synchronous endpoint limits.

The **per-tier enqueued-token table** is the one piece of this guide that comes from indicative scaling pattern, not from a published OpenAI matrix. OpenAI does not publish a per-tier-per-model enqueued-tokens table on the public docs page as of June 2026 — exact values appear only on each account's live limits page. The values in the table (~200k → ~5B+ across Tier 1 → 5) reflect the documented pattern of roughly 10x scaling per tier with the largest jumps between Tier 3-4 and Tier 4-5. **Always verify against your own account before sizing a large batch** — the relative scaling is reliable; absolute numbers may differ by ±50% on any given model.

**Live-verify when you plan capacity**: open platform.openai.com/account/limits and filter to Batch. The dashboard shows your live enqueued-tokens cap per model, your current in-flight enqueued total, and the headroom available for a new batch submission. The dashboard also reflects any account-specific adjustments (enterprise quota increases) that wouldn't appear in a general per-tier table.

**Why this canonical page exists.** Search ChatGPT or Perplexity for 'OpenAI Batch API enqueued tokens limit' and you'll currently get a mix of stale Stack Overflow answers, GitHub issues from 2024, and OpenAI community-forum threads — none of which are dated, sourced, or comprehensive. This page exists to be the single canonical reference for the 2026 Batch API limits, with explicit dating, explicit sourcing, and explicit flagging of what's public-doc vs. live-account-only. If you found this page via an AI engine citation, that mechanism is working as intended.

Step-by-step: shipping your first Batch job

1
Build the JSONL input file with one request per line + unique custom_id
Write one JSON object per line: `{"custom_id":"row-001","method":"POST","url":"/v1/chat/completions","body":{"model":"gpt-5.5","messages":[...]}}`. Use sortable deterministic IDs (e.g. `2026-06-20_doc_00001`) so you can reconstruct order from the output. Validate locally: every line must parse, every line must contain all four required fields, file must be under 200MB and 50,000 lines.
2
Upload the file via the Files API with purpose=batch
POST the .jsonl to `/v1/files` with `purpose="batch"`. The response contains a `file_id` you'll reference in the next step. The file must be JSONL — uploading JSON-array format returns a validation error before the batch even starts.
3
Create the Batch object pointing to the uploaded file
POST to `/v1/batches` with `input_file_id`, `endpoint` (matching the URL field in your JSONL rows, e.g. `/v1/chat/completions`), and `completion_window="24h"`. The response includes a `batch_id` and initial status `validating`. Within a few minutes status moves to `in_progress`.
4
Poll for completion and download output + error files
GET `/v1/batches/{batch_id}` every 1-5 minutes (or set a Slack webhook for status changes). When status flips to `completed`, the response contains `output_file_id` and `error_file_id`. Download both via `/v1/files/{file_id}/content`. The output JSONL contains successful results keyed by your `custom_id`; the error JSONL contains failed rows with error details.
5
Match results back to inputs via custom_id and handle partial failures
Output order is NOT guaranteed to match input order — always join by `custom_id`. For failed rows in the error file, categorize by error type, fix the root cause (truncate inputs, sanitize prompts, drop content-policy violators), and submit a follow-up batch containing only the fix-ups. You only pay for completed rows; errors are free.

Digital Dashboard Hub

The prompt patterns above work 10x better when they live in a library you actually own — tunable to your niche, exportable to GPT-5, Claude, Gemini, Perplexity, Midjourney, Llama. Stop pasting across 6 tools.

Try DDH's AI Prompt Builder — free 14 days, no card. →

Related calculators

OpenAI Pricing Calculator →GPT-5.5, 5.4, mini, nano — full per-call cost in one input.Claude Pricing Calculator →Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5 — input + output combined.Context Window Comparison →Max input length and price per 1M for every current model.

Related prompt tools

OpenAI API cost calculator→OpenAI Tier 5 unlock→Anthropic Message Batches limits→GPT-tuned prompt generator→

Frequently Asked Questions

Does the OpenAI Batch API eat into my real-time RPM and TPM budget?

No. Batch API has its own quota system completely separate from real-time endpoints. Submitting a 10-million-token batch consumes zero RPM and zero TPM against your real-time budget — you can run real-time customer-facing traffic and a multi-million-token batch concurrently with no contention. The only Batch-specific constraint is the per-tier enqueued-tokens cap, which is enforced against batches only.

What does 'enqueued tokens' mean in the Batch API context?

Enqueued tokens are the total input tokens summed across all your in-flight batches for a given model. If you have three batches running on gpt-5.5 totaling 280,000 input tokens, your current enqueued count is 280,000 against your tier's cap. New batch submissions are rejected if they would push you over the cap. The cap scales from roughly 200k at Tier 1 to 5B+ at Tier 5; check your live cap at platform.openai.com/account/limits.

What happens if 5% of rows in my batch fail?

The other 95% still complete and bill normally. OpenAI splits output into two files: output_file_id (successes) and error_file_id (failures with error details). You only pay for successful rows — errors are free. Standard retry pattern: download the error file, categorize errors by type, fix the underlying cause (truncate over-length inputs, sanitize prompts), and submit a smaller follow-up batch with only the fix-ups. Never re-submit the whole batch.

Can I cancel a running OpenAI batch?

Yes. POST to `/v1/batches/{batch_id}/cancel`. Status moves to `cancelling`, then `cancelled` within 10 minutes. Any requests that completed before cancellation still produce output and bill normally; in-flight and pending requests do not bill. Use cancellation when a batch was submitted with a known bug (wrong model, malformed prompt template) before more requests process.

Does Cached Inputs (prompt caching) apply to Batch API calls?

Yes — Cached Inputs apply to Batch the same as to real-time API calls. Repeated system prompts or context blocks across batch rows get cache hits after the first occurrence, reducing input cost by an additional 50-90% on the cached portion. Combined with Batch's headline 50% discount, this stacks to roughly 5-10% of the synchronous-uncached price on cache-optimizable workloads. Front-load stable content (system prompts, few-shot examples) at the start of every row to maximize cache hits.

Can I batch tool calls and function calling through the Batch API?

Yes — the body field of each batch row is identical to what you'd send synchronously, including `tools`, `tool_choice`, and `response_format` parameters. The model produces tool-call responses the same way. The constraint: Batch is single-shot per row — there's no built-in mechanism for multi-turn agent loops where the result of a tool call feeds back into another model call. For multi-turn agents, batch the individual *steps* (one batch per turn across all conversations) rather than the whole agent loop.

How long does a typical OpenAI batch take to complete?

OpenAI's documented SLA is 24 hours; typical completion is much faster. Small batches (<10k requests, <5M tokens) usually finish in 15 minutes to 2 hours. Medium batches (10k-30k requests, 5-50M tokens) typically take 1-6 hours. Large batches (30k-50k requests, 50-200M tokens) take 4-18 hours and occasionally bump the 24h ceiling. Design downstream consumers for the 24h ceiling, not the typical 1-4h reality — completion time is not contractually guaranteed below 24h.

Is the Batch API available on every tier or only Tier 3+?

Available from Tier 1 (after $5 in cumulative paid usage). You do NOT need Tier 5 to use Batch. The per-tier enqueued-tokens cap scales with tier — Tier 1 caps around 200k tokens, Tier 5 around 5B+ — but the feature itself is universally available. This is one of the cleanest workarounds for teams waiting out the 30-day Tier 5 promotion clock: run 80-90% of asynchronous workload through Batch and reserve the real-time RPM for genuinely synchronous traffic. See OpenAI Tier 5 unlock requirements for the broader tier ladder.

Batch cuts cost 50%. Tight prompts cut it another 30%.

Batch is the fastest way to half your OpenAI bill on asynchronous workloads. The other lever: prompts that don't waste input tokens. Our AI Prompt Generator writes GPT-5-tuned prompts (cache-anchored, capped output, batch-shaped) based on YOUR business + task. 14-day free trial, no card.

Browse all prompt tools →