What's in this guide
This is a reference page. Skim to the table you need:
1. How per-token pricing actually works (input vs output, why output costs more).
2. OpenAI API pricing — the full gpt-5.5 and gpt-5.4 family plus codex and media models.
3. Anthropic / Claude API pricing — Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5.
4. Google Gemini API pricing — Gemini 3.5, 3.1, and 2.5 tiers.
5. The all-models comparison table at a glance.
6. Prompt caching — how cache reads cut input cost by up to 90%.
7. Batch discounts — 50% off when latency doesn't matter.
8. Context-window pricing — why long context can quietly double a bill.
9. How to estimate your real monthly cost.
10. Sources & further reading.