Headline rate cards are abstract. What teams actually want to know is: on my workload, what is the monthly bill? The three case studies below walk through input-heavy, balanced, and output-heavy production workloads at realistic monthly volumes. All numbers are calculated directly from the standard rate card; cached and batched figures apply the discount stack from the section above (cache hits at 10% of input rate on 80% of input tokens; Batch API at 50% off both input and output where the provider offers it).
Case study 1 — Northwind Marketing, customer-support ticket summarization. The team ingests 1M support tickets per month from Zendesk and runs each through an LLM that extracts product, sentiment, root cause, and a one-line theme. The workload is heavily input-skewed: 4,000 input tokens per call (the ticket transcript plus reference taxonomy) and 200 output tokens (structured JSON). Standard-rate monthly bills at 1M calls: Claude Sonnet 4.6 = (4,000/1M × $3 × 1M) + (200/1M × $15 × 1M) = $12,000 + $3,000 = $15,000. gpt-5.4-mini = (4,000/1M × $0.75 × 1M) + (200/1M × $4.50 × 1M) = $3,000 + $900 = $3,900. Gemini 2.5 Flash = (4,000/1M × $0.30 × 1M) + (200/1M × $2.50 × 1M) = $1,200 + $500 = $1,700. Apply the discount stack. The taxonomy is identical across all 1M calls — roughly 2,500 of the 4,000 input tokens cache cleanly. Sonnet cached + batched lands near $4,100/month. gpt-5.4-mini cached + batched lands near $1,050/month. Gemini 2.5 Flash has no Batch API and weaker caching mechanics, so it sits at roughly $1,400/month. Winner: gpt-5.4-mini. It is within 25% of Gemini Flash on raw cost but adds the Batch API and stronger prompt caching, and on Northwind's internal eval it scored 94% taxonomy-correct versus 89% for Gemini Flash. The $350/month premium pays for itself in review-queue savings.
Case study 2 — Cascade SaaS, in-product chatbot for a 220k-user analytics tool. The chatbot handles 500k user conversations per month, average two turns per session, so 1M LLM calls. Workload is balanced at 1,500 input tokens / 500 output tokens — typical for retrieval-augmented chat with three snippets of context. Standard-rate monthly bills at 1M calls: gpt-5.5 = (1,500/1M × $5 × 1M) + (500/1M × $30 × 1M) = $7,500 + $15,000 = $22,500. Sonnet 4.6 = (1,500/1M × $3) + (500/1M × $15) all times 1M = $4,500 + $7,500 = $12,000. Gemini 2.5 Pro = (1,500/1M × $1.25) + (500/1M × $10) all times 1M = $1,875 + $5,000 = $6,875. Cascade cannot use the Batch API — chat is synchronous — so the discount stack is cache-only. System prompt plus product docs total 900 of the 1,500 input tokens and cache reliably. Sonnet cached drops input from $4,500 to roughly $1,170 (600 uncached at $3 + 900 cached at $0.30), total monthly bill $8,670. gpt-5.5 cached drops to roughly $14,700. Gemini 2.5 Pro cache support is real-time-implicit and less aggressive, so its cached bill lands near $5,600. Winner: Sonnet 4.6. Gemini Pro is $3,000/month cheaper but Cascade's blind eval scored Sonnet 4.6 at 4.6/5 on response quality versus 4.1/5 for Gemini Pro, and the per-conversation cost difference ($0.006 vs $0.011) is dwarfed by the LTV impact of a better chatbot in a $99/seat product. gpt-5.5 was eliminated on cost — it offered no measurable quality edge over Sonnet at nearly double the bill.
Case study 3 — Mesa AI, a developer-tooling startup running a coding assistant that processes 200k completions per day (6M calls per month). Workload is output-heavy: 2,000 input tokens (recent file context plus open-buffer diff) and 1,500 output tokens (the suggested patch). Standard-rate monthly bills at 6M calls: gpt-5.4 = (2,000/1M × $2.50 × 6M) + (1,500/1M × $15 × 6M) = $30,000 + $135,000 = $165,000. Sonnet 4.6 = (2,000/1M × $3 × 6M) + (1,500/1M × $15 × 6M) = $36,000 + $135,000 = $171,000. Claude Fable 5 = (2,000/1M × $10 × 6M) + (1,500/1M × $50 × 6M) = $120,000 + $450,000 = $570,000. DeepSeek V4 at the estimate of $0.40/$1.20 = (2,000/1M × $0.40 × 6M) + (1,500/1M × $1.20 × 6M) = $4,800 + $10,800 = $15,600. The spread is roughly 36x between DeepSeek and Fable. Apply the stack: code completion is synchronous so Batch API does not apply; caching helps modestly on the input side (around 30% cache-hittable), shaving $9,000-$11,000 off the input bill for OpenAI and Anthropic. Mesa ran a blind eval on 800 internal completion samples: gpt-5.4 hit 71% acceptance, Sonnet 4.6 hit 73%, Fable 5 hit 79%, DeepSeek V4 hit 64%. Winner: a tiered routing strategy, not a single model. Mesa routes 75% of completions (single-line, in-buffer) to DeepSeek V4 at roughly $11,700/month for that slice, routes 20% (multi-line refactors) to Sonnet 4.6 at roughly $32,000/month, and reserves 5% (whole-file rewrites and explain-and-fix) for Fable 5 at roughly $25,000/month. Blended monthly bill: roughly $68,700 with 74% blended acceptance — versus $165,000 on gpt-5.4 alone for one point less acceptance, or $570,000 on Fable alone for five points more.
What the three cases reveal. On input-heavy workloads the cheap tiers dominate because output is a rounding error — gpt-5.4-mini, Gemini Flash, and Haiku 4.5 are the contenders, and the choice usually turns on which provider's caching and batch story fits the pipeline. On balanced synchronous workloads the mid tier wins because quality differences show up in user-facing metrics and the absolute spread is small enough that the quality-adjusted winner usually beats the cheapest option — Sonnet 4.6 and Gemini 2.5 Pro are the most common landing spots. On output-heavy workloads no single model wins; routing per task type beats picking one model by 30-60% almost every time, because output cost is large enough that the cheap model handles the easy slice and pays for the expensive model on the hard slice.
Two arithmetic checks worth keeping in your head. First, the per-call cost rule of thumb: multiply input tokens (in thousands) by input price (per 1M, in dollars) and divide by 1,000 to get input dollars per call; same for output. At 1M calls per month the per-call cost in cents equals roughly the monthly bill in tens of thousands of dollars — a 2-cent call is roughly $20k/month at 1M calls. Second, cache savings are bounded by input share of cost. On the Mesa case, input is only 18% of the bill on Sonnet — caching cannot save more than $6,500/month no matter how aggressive the cache hit rate. On the Northwind case, input is 80% of the bill — caching is the single highest-leverage lever.
One-line summary of when each provider tends to win in 2026. OpenAI wins balanced workloads where ecosystem features (file search, code interpreter, structured outputs) matter and budget tolerates the premium. Anthropic wins long-context and agentic workloads where Sonnet's per-dollar quality and explicit cache control compound. Google wins input-heavy and multimodal workloads where raw per-token cost and 2M+ context are the constraint. Open-source and budget providers like DeepSeek win the easy slice of any tiered routing strategy.