Q1 2026: the gpt-5-mini shock
The defining moment of H1 2026 was OpenAI's April 14, 2026 release of **gpt-5-mini at $0.25 input / $2.00 output per 1M tokens** — roughly 50% below the gpt-4o-mini price it replaced and within a hair's breadth of DeepSeek V3.1's $0.27/$1.10 floor. The model itself benchmarks within 5-8% of gpt-5.4 on standard reasoning evals (MMLU, GPQA, HumanEval) per Artificial Analysis's April 22 report, while shipping at one-tenth the cost. OpenAI's own announcement framed it as 'the new default for production workloads' — a clear concession that gpt-4o-mini pricing was no longer competitive.
The pressure was external. DeepSeek V3.1 had been live on Together and Fireworks since January 2026 at $0.27/$1.10, and Gemini 2.5 Flash had held $0.30/$2.50 since November 2025 with a 2M-token context window. Both routinely cleared 70-75% on MMLU and produced output that was indistinguishable from GPT-4-class on most production tasks. OpenAI's choice was either reprice mid-tier or watch developers migrate; per The Information's April 2026 reporting, internal data showed gpt-4o-mini API call volume had grown only 8% year-over-year while Gemini Flash API volume had grown 280%. The pricing decision was a defensive move.
Anthropic responded indirectly. Rather than cutting Sonnet 4.6 (which held $3/$15), Anthropic shipped a Haiku 4.5 price cut from $1/$5 to $0.80/$4 in May 2026 and deepened the cache discount from 75% to 90% on all models — turning Sonnet's effective cost on cache-heavy workloads from $3 down to $0.30 per 1M input. The net result is that mid-tier API costs in late Q2 2026 are roughly 35-45% below where they sat in December 2025, depending on workload shape and cache hit rate.
What this means for buyers: if you have not re-tested gpt-5-mini, Sonnet 4.6 (with cache), and Gemini 2.5 Flash against your production workload in the last 90 days, you are almost certainly overpaying. The price-quality curve has shifted enough that defaults from 2025 are no longer defaults in 2026. We document the test framework in the 3-way GPT/Claude/Gemini cost calculator — run your typical prompt against all three and compare blended cost per 1k requests.
Mid-tier model price war — Apr-May 2026
| Feature | Input/M | Output/M | Tok/s (avg) | Context |
|---|---|---|---|---|
| OpenAI gpt-5-mini | $0.25 | $2.00 | 180 t/s | 400k |
| Anthropic Sonnet 4.6 | $3.00 | $15.00 | 85 t/s | 1M (200k standard) |
| Anthropic Haiku 4.5 | $0.80 | $4.00 | 165 t/s | 200k |
| Google Gemini 2.5 Flash | $0.30 | $2.50 | 240 t/s | 2M |
| DeepSeek V3.1 (Together) | $0.27 | $1.10 | 95 t/s | 128k |
| Llama 4 Maverick (Groq) | $0.50 | $0.85 | 750 t/s (LPU) | 256k |
Throughput numbers are median tokens-per-second measured by Artificial Analysis (artificialanalysis.ai) between April 15 and May 30, 2026. Groq's 750 t/s figure reflects LPU inference, not standard GPU. Context windows are the maximum supported; effective working context for many models degrades past 50-60% utilization (see Anthropic's 2026 long-context degradation report).