Per-query cost math by context tier
Per-million-input-tokens pricing at frontier mid-tier as of mid-2026 lands roughly $2-3 (Claude Sonnet class, GPT-4o class). At efficiency tier (Gemini Flash, GPT-mini), $0.075-0.30. These are the rates that determine whether your context-tier choice matters financially.
**8K context query (~5K tokens input):** $0.015 at mid-tier, $0.0004 at efficiency tier. Negligible per-query cost; cost is dominated by output tokens.
**32K context query (~25K tokens input):** $0.075 at mid-tier, $0.002 at efficiency tier. Still small per-query but starts to matter at volume.
**200K context query (~150K tokens input):** $0.45 at mid-tier, $0.011 at efficiency tier. Now meaningful — $0.45 × 10K queries/month = $4,500/month just on input.
**1M context query (~750K tokens input):** $2.25 at mid-tier, $0.056 at efficiency tier. At 10K queries/month: $22,500/month. The cost separates the workloads that genuinely need long context from those using it speculatively.
Anthropic's pricing page and OpenAI's model pricing document exact rates; tiers vary by 4-10× across providers but the relative scaling is consistent.