Workloads that ALWAYS belong in Batch
Eight shapes meet the criteria of (a) high token volume, (b) zero user-facing latency requirement, (c) predictable schedule or queue-able trigger. For these, Batch is the default and synchronous API calls are the anti-pattern.
**Nightly content generation.** Programmatic SEO pages, daily report narratives, email digests, scheduled social posts. The job runs once per day, results are consumed by humans the next morning. Latency budget: hours. Use Batch.
**Weekly classification runs over accumulated data.** Support ticket categorization, lead enrichment, sentiment analysis over the past week's traffic, abuse-detection re-scoring. Run Sunday night, results land Monday morning. Use Batch.
**Embedding bulk loads.** Backfilling a new vector index from a corpus of 100k+ documents, re-embedding when you switch embedding models, embedding a freshly scraped dataset. The job is a one-time or weekly operation — no user is staring at a loading spinner. Use Batch (OpenAI's text-embedding-3-large at 50% off comes to roughly $0.065 per million tokens batched).
**Evaluation runs.** Running your test set of 1,000 prompts through three candidate models to compare outputs, regression-testing prompt changes against historical examples, building a leaderboard. Evals run on a CI cadence (daily, on every prompt-set change) — minutes-to-hours latency is fine. Use Batch.
**A/B prompt testing at scale.** Generating outputs from two prompt variants against the same 5k inputs to compare downstream metrics. The team analyzes results the next day. Use Batch.
**Document summarization queues.** A pipeline that ingests new PDFs from a partner feed each day and produces summaries for an internal knowledge base. The PDF arrival is asynchronous, the summarization is asynchronous. Use Batch.
**Dataset labeling.** Generating training labels for a fine-tune dataset, producing synthetic data for distillation, building a golden test set. One-shot operation, no latency requirement. Use Batch.
**Retroactive analysis.** Going back through 6 months of chat transcripts to score them for some new metric, re-summarizing historical support cases under a new taxonomy, repricing a backlog of contracts. The data is historical, the analysis is asynchronous. Use Batch.
If any of those eight patterns describes a workload you currently run on the synchronous API, you are paying 2x for nothing. That is the entire pitch.