The 6 production workload patterns
**Pattern 1 — High-volume classification.** Categorizing customer-support tickets, sentiment-tagging social posts, flagging policy violations. Volume: 10K–10M+ requests per month. Per-task: 200–1000 tokens. The dominant cost driver. Quality requirement: above 92% accuracy on the specific task, not state-of-the-art on general reasoning.
**Pattern 2 — Structured extraction.** Pulling fields from documents, parsing emails into CRM entries, converting natural language to JSON. Volume: 1K–500K requests per month. Per-task: 1K–4K tokens. Quality requirement: structurally correct output (passes schema validation) more important than fluency.
**Pattern 3 — Customer-facing chat (support, sales, assistant).** Volume: 5K–500K conversations per month, 10–40 turns each. Per-conversation: 5K–50K tokens. Quality requirement: high — model misbehavior directly affects user experience and product perception.
**Pattern 4 — Content generation (articles, marketing copy, reports).** Volume: 50–5K requests per month. Per-task: 5K–25K tokens. Quality requirement: very high — output goes to humans who evaluate it on craft, not just correctness.
**Pattern 5 — Agent/tool-use workflows.** Multi-step reasoning with tool calls, retries, planning. Volume: 1K–100K agent runs per month. Per-run: 20K–200K tokens (compounding across turns). Quality requirement: very high — bad plans cause downstream failures.
**Pattern 6 — Code generation / engineering assistance.** Volume: 5K–500K requests per month. Per-task: 5K–40K tokens. Quality requirement: high — bad code produces real engineering debt.