The cost-quality curve: why it's not linear
The relationship between model cost and output quality is not linear — it's a curve with a flat bottom, an inflection point, and a steep top. At simple tasks (extraction, classification, summarization, simple Q&A), the quality difference between Haiku 4.5 ($0.80/$4.00) and Opus 4.7 ($15/$75) is 5-10 percentage points — meaningful but not transformative. The cost difference is 20-100×. **At simple tasks, cheap models are 85-92% of flagship quality at 1/10th to 1/20th the cost.** The cost-quality curve is nearly flat in this regime.
At complex tasks — multi-step coding, research synthesis, hard reasoning, novel problem-solving — the curve steepens dramatically. The quality gap between Opus 4.7 and Haiku 4.5 can be 15-25 percentage points on genuinely hard tasks. More importantly, it's not just average quality that differs — it's the tail. Opus handles the 5th-percentile hardest queries in your distribution significantly better than cheaper models. For tasks where the long tail matters (production code, high-stakes decisions, complex analysis), the premium is worth paying.
**Understanding where your workload sits on the cost-quality curve is the most important cost optimization decision you'll make.** If you've been running Opus 4.7 on every task without testing whether cheaper models meet your quality bar, you are almost certainly paying 5-20× more than necessary for a significant fraction of your workload. Conversely, if you've been running Haiku on every task to save money, you may be compromising quality on the hard 10-20% of tasks in ways you haven't measured.
The practical test: take 50 representative tasks from your production distribution. Run each through Haiku 4.5, Sonnet 4.6, and Opus 4.7 blind (don't know which model produced which output during evaluation). Rate output quality blind. Calculate quality scores and costs for each model tier. The distribution of where Haiku meets your bar vs where you need Sonnet vs where you need Opus tells you the optimal allocation for your workload. Most teams find roughly 50-60% Haiku, 30-35% Sonnet, 5-15% Opus as the optimal allocation — but this varies significantly by domain.
A second dimension of the cost-quality curve: **task complexity affects different quality dimensions differently.** Haiku is surprisingly competitive with Opus on factual recall from training data, but significantly behind on complex reasoning, code generation, and novel problem-solving. If your use case is primarily about retrieving and formatting factual information, the curve is flat and Haiku wins. If your use case requires genuine reasoning, the curve is steep and Opus earns its price.
The 2026 pricing landscape has made the cost-quality curve more important, not less. The gap between cheapest and most expensive models is larger than ever: GPT-5 mini at $0.30/$1.20 vs Opus 4.7 at $15/$75 is a 50-62× per-token price difference. The quality gap is real but does not match the price gap on routine tasks. Optimizing model allocation is the single largest cost optimization lever available to most production agent systems.