What Message Batches is — and why the separate quota pool changes the math
The Message Batches API is Anthropic's asynchronous bulk processing endpoint, exposed at `POST /v1/messages/batches`. You submit an array of Messages requests as a single payload, each tagged with a developer-supplied `custom_id`. Anthropic processes them in parallel on its own infrastructure, then makes the results available for download as a JSONL file from a `results_url` once the batch ends. Each individual request inside the batch uses the same parameters as a synchronous Messages call — same `model`, same `max_tokens`, same `messages`, same `system`, same `tools`. The only difference is the wrapper and the processing pattern.
The architecturally important detail: **batches run on a separate quota pool from real-time Messages**. This is not the case on every provider — some treat batch and real-time as a shared bucket and just discount the batch path. Anthropic gives batches their own rate-limit pool covering both batch HTTP requests and the in-flight request count inside batches. This means you can run a 100,000-request batch in the background while serving real-time traffic at your full per-tier RPM/ITPM/OTPM ceiling, with no interaction.
The use cases this unlocks at scale: **large-scale evaluations** (run 10,000 test cases through three model variants in a single batch each), **content moderation backlogs** (classify a week's user-generated content overnight at half price), **dataset enrichment** (summarize, tag, or extract structured data from a corpus), **synthetic data generation** (produce training examples at scale), **bulk transformations** (rewrite, translate, or reformat documents in bulk). For any workload where 'I need the answer within 24 hours' is acceptable, the batch path is the right answer — half the price, and your real-time capacity is untouched.