The headline savings: Sonnet → Flash
The single biggest migration win in 2026 is the Sonnet 4.6 → Gemini 2.5 Flash swap on high-volume short-input workloads. Flash is plausibly a Sonnet replacement on: closed-set classification (sentiment, intent, routing), structured extraction (entity, attribute, schema-bound JSON output), simple summarization (single article, single email thread), short-form rewriting, deterministic tool-calling where the tool list is small and the parameter shapes are tight. On those tasks, our internal evals show Flash matching Sonnet within 1-3% on accuracy and within 5-8% on JSON-validity rates — well inside the noise band for production triage workloads.
Flash is NOT a Sonnet replacement on: nuanced long-form writing (Flash's prose runs more generic), complex multi-step reasoning (Flash's chain-of-thought collapses on >4 hop problems unless you flip thinking-mode on), agentic chains with >5 tool calls (Flash's tool-use reliability degrades faster than Sonnet's on long horizons), legal or medical synthesis (Flash hallucinates citations more), and any task where Sonnet's writing style is the load-bearing feature. Test before you assume.
**Worked example — 5M ticket classifications per month.** Average prompt: 600 input tokens (instructions + ticket body), 40 output tokens (JSON {category, priority, route}). Sonnet 4.6 cost: (5M × 600 / 1M × $3) + (5M × 40 / 1M × $15) = $9,000 + $3,000 = **$12,000/mo**. With Anthropic's 90% cache hit on the static instruction block (~500 tokens cached, ~100 input tokens per call uncached), effective cost drops to roughly **$3,750/mo** — this is what most production teams actually pay. Gemini 2.5 Flash cost on the same workload, no caching: (5M × 600 / 1M × $0.30) + (5M × 40 / 1M × $2.50) = $900 + $500 = **$1,400/mo**. With Gemini's implicit caching (~75% discount on the cached portion): roughly **$525/mo**. That is an **86% reduction** vs cached Sonnet, or a **96% reduction** vs un-cached Sonnet. On 5M monthly classifications, the absolute savings clear $39,000/year.
The bigger the static prefix and the higher the call volume, the bigger the savings. The smaller the prompt and the higher the quality bar, the smaller the savings — and at some volume, the migration engineering cost (45 min × prompt count × eng rate) eats the savings. We see breakeven on prompt-count migrations land around **150-200 prompts at $150/hr engineering cost** for sub-$2,000/mo workloads. Above $5,000/mo Anthropic spend, migration almost always pays back in under a quarter.