Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Cost Per Token Across All Major AI Models (2026)

Input and output prices per million tokens for every major OpenAI, Anthropic, and Google model, plus the caching, batch, and context-window mechanics that decide your real monthly bill — current as of June 2026.

By The DDH Team at Digital Dashboard HubUpdated

As of June 2026, AI model pricing is quoted per million tokens (MTok) and split into a cheaper input rate and a more expensive output rate. The cheapest capable frontier-class models — gpt-5.4 at $2.50 in / $15.00 out, Claude Sonnet 4.6 at $3 / $15, and Gemini 2.5 Pro at $1.25 / $10 — cluster closely, while the top-tier reasoning models (gpt-5.5-pro at $30 / $180, Claude Fable 5 at $10 / $50) cost 5-12x more. Prices below are pulled from each provider's live pricing page and should be re-checked there before you commit a budget.

Token pricing alone never predicts your bill. Prompt caching, batch discounts, and how much context you stuff into each call swing real costs by 2-10x. This guide lists every current price, then shows the mechanics that actually move the number — and you can plug your own volumes into our AI Prompt Cost Calculator (how it works) to estimate a monthly figure.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Per-token pricing, all major models (per 1M tokens, June 2026)

Feature
Input ($/MTok)
Output ($/MTok)
Tier
OpenAI gpt-5.55.0030.00Frontier
OpenAI gpt-5.5-pro30.00180.00Premium reasoning
OpenAI gpt-5.42.5015.00Workhorse
OpenAI gpt-5.4-mini0.754.50Efficient
OpenAI gpt-5.4-nano0.201.25Bulk / cheap
OpenAI gpt-5.3-codex1.7514.00Coding
Claude Opus 4.85.0025.00Frontier
Claude Sonnet 4.63.0015.00Workhorse
Claude Haiku 4.51.005.00Efficient
Claude Fable 510.0050.00Premium
Gemini 3.5 Flash1.509.00Workhorse
Gemini 3.1 Pro (Preview)2.0012.00Frontier (≤200k)
Gemini 3.1 Flash-Lite0.251.50Bulk / cheap
Gemini 2.5 Pro1.2510.00Workhorse
Gemini 2.5 Flash0.302.50Efficient
Gemini 2.5 Flash-Lite0.100.40Cheapest

Prices as of June 2026, per [OpenAI](https://developers.openai.com/api/docs/pricing), [Anthropic](https://claude.com/pricing) ([API detail](https://platform.claude.com/docs/en/about-claude/pricing)), and [Google Gemini](https://ai.google.dev/gemini-api/docs/pricing). Subject to change; confirm on the live pages.

What's in this guide

This is a reference page. Skim to the table you need:

1. How per-token pricing actually works (input vs output, why output costs more).

2. OpenAI API pricing — the full gpt-5.5 and gpt-5.4 family plus codex and media models.

3. Anthropic / Claude API pricing — Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5.

4. Google Gemini API pricing — Gemini 3.5, 3.1, and 2.5 tiers.

5. The all-models comparison table at a glance.

6. Prompt caching — how cache reads cut input cost by up to 90%.

7. Batch discounts — 50% off when latency doesn't matter.

8. Context-window pricing — why long context can quietly double a bill.

9. How to estimate your real monthly cost.

10. Sources & further reading.


How per-token pricing works

Every major API meters usage in tokens, not words or characters. A token is a sub-word chunk; in English, roughly 1 token ≈ 4 characters ≈ 0.75 words (per Anthropic and OpenAI tokenization docs). A 500-word email is about 670 tokens; a 10-page document is roughly 5,000-6,000 tokens.

Pricing is almost always split into two rates. Input tokens (your prompt, system message, and any context you attach) are billed at one rate; output tokens (what the model generates) are billed at a higher rate — typically 4-6x the input rate. That asymmetry is why summarization (long input, short output) is cheap and generation (short input, long output) is comparatively expensive.

Rates are quoted per 1,000,000 tokens (1M, written MTok). So gpt-5.4 at $2.50 / $15.00 means $2.50 per million input tokens and $15.00 per million output tokens. A single request of 4,000 input + 1,000 output tokens on gpt-5.4 costs (4,000/1,000,000 × $2.50) + (1,000/1,000,000 × $15.00) = $0.01 + $0.015 = $0.025.

Three modifiers change that base math: prompt caching (cheaper repeated input), batch processing (cheaper non-urgent jobs), and context-window tier pricing (some models charge more above a context threshold). All three are covered below. To turn token counts into dollars without doing the arithmetic by hand, use the AI Prompt Cost Calculator.


OpenAI API pricing (as of June 2026)

OpenAI's gpt-5.5 family is the frontier tier; the gpt-5.4 family is the cost-efficient workhorse line, with mini and nano variants for high-volume, low-stakes work. The gpt-5.3-codex model is tuned for coding agents. All figures below are per 1M tokens and taken from the OpenAI API pricing page; confirm there before budgeting.

``` Model Input ($/MTok) Output ($/MTok) gpt-5.5 5.00 30.00 gpt-5.5-pro 30.00 180.00 gpt-5.4 2.50 15.00 gpt-5.4-mini 0.75 4.50 gpt-5.4-nano 0.20 1.25 gpt-5.3-codex 1.75 14.00 ```

Media is priced separately: gpt-image-2 runs $8.00 input / $30.00 output per 1M tokens, and Sora-2 video is metered by the second — $0.10/sec at 720p and $0.50/sec at 1024p.

Picking within the family: gpt-5.4-nano at $0.20 / $1.25 is roughly 25x cheaper on input than gpt-5.5 and is the right default for classification, tagging, and routing. Reserve gpt-5.5-pro for genuinely hard reasoning — at $180/MTok output it is the most expensive output token of any model in this guide.


Anthropic / Claude API pricing (as of June 2026)

Anthropic's Claude line spans Haiku (fastest, cheapest), Sonnet (balanced), Opus (most capable general model), and Fable 5 (the premium tier). Notably, the Opus and Sonnet 4.6+ generations carry a flat price regardless of which point release you call. Figures are per 1M tokens from the Claude pricing page and the API pricing detail.

``` Model Input ($/MTok) Output ($/MTok) Claude Opus 4.8 5 25 Claude Opus 4.5 / 4.6 / 4.7 5 25 Claude Sonnet 4.6 / 4.5 3 15 Claude Haiku 4.5 1 5 Claude Fable 5 10 50 ```

Two structural advantages stand out. First, Claude Opus 4.8 matches gpt-5.5's input price ($5) but undercuts its output by $5/MTok ($25 vs $30) — and is dramatically cheaper than gpt-5.5-pro. Second, cache reads on Opus 4.8 cost just $0.50/MTok (10% of base input), which makes repeated-context workloads far cheaper than the headline rate suggests (see caching below).

Anthropic also bills server-side tools separately: the web search tool is $10 per 1,000 searches. If you build a research agent that searches on every turn, that line item can rival your token spend, so meter it explicitly.


Google Gemini API pricing (as of June 2026)

Google's Gemini line is generally the cheapest of the three providers at comparable capability tiers, especially the Flash-Lite variants for high-volume work. Some tiers (Gemini 3.1 Pro Preview) quote the rate at or below a context threshold. Figures are per 1M tokens from the Gemini API pricing page.

``` Model Input ($/MTok) Output ($/MTok) Gemini 3.5 Flash 1.50 9.00 Gemini 3.1 Pro (Preview) 2.00 12.00 (≤200k context) Gemini 3.1 Flash-Lite 0.25 1.50 Gemini 2.5 Pro 1.25 10.00 Gemini 2.5 Flash 0.30 2.50 Gemini 2.5 Flash-Lite 0.10 0.40 ```

Gemini 2.5 Flash-Lite at $0.10 / $0.40 is the cheapest model in this entire guide — half the input cost of gpt-5.4-nano and a quarter of Gemini 3.1 Flash-Lite's output rate. For extraction, classification, and other bulk low-stakes tasks where you don't need frontier reasoning, it sets the price floor.

Note the 3.1 Pro Preview's context note: its $2.00 / $12.00 rate is quoted at or below 200k tokens. As with any provider, long-context calls can move into a different pricing tier, so check the live page for the exact thresholds before sending very large prompts.


All major models at a glance

The table below collapses every model into a single comparison so you can see where each lands. Output price is the number that usually dominates real bills, since generation tends to produce more tokens than you'd guess. The 'cheap workhorse' tier (gpt-5.4, Sonnet 4.6, Gemini 2.5 Pro) is where most production traffic should sit unless a task genuinely needs the frontier tier.


Prompt caching: the biggest lever on input cost

Prompt caching lets you reuse a large, stable chunk of input — a long system prompt, a knowledge base, a document — across many requests at a steep discount. Instead of paying full input price every call, you pay a one-time write cost and then a tiny read cost on every cache hit.

On Anthropic's API, the mechanics are explicit (pricing detail): a 5-minute cache write costs 1.25x the base input rate, a 1-hour write costs 2x, and a cache read (hit) costs just 0.1x base input — i.e. 10% of the input price. For Claude Opus 4.8 that means cache reads at $0.50/MTok instead of $5.00/MTok, a 90% saving on the cached portion.

The math: suppose you attach a 50,000-token knowledge base to 1,000 Opus 4.8 requests. Without caching that's 50M input tokens at $5 = $250 just for the repeated context. With caching, you pay one write (~50,000 tokens × 1.25x = roughly $0.31) plus 999 reads at 10% (≈49.95M × $0.50/MTok ≈ $25). You cut the repeated-context cost from $250 to about $25 — roughly 90% off.

Caching pays off whenever the same large prefix appears across many calls within the cache window. It does nothing for one-off prompts or prompts where the bulk of the input changes every time. OpenAI and Google also offer caching; check each provider's pricing page for current discount rates and minimums.


Batch discounts: 50% off when latency doesn't matter

If a job doesn't need an immediate response — overnight summarization of a backlog, bulk classification, dataset labeling — batch APIs trade latency for a discount. Anthropic's Batch API is 50% off both input and output (pricing); OpenAI and Google offer comparable batch tiers (check their pricing pages for exact percentages and turnaround windows).

Stacking matters: batch and caching can combine. A nightly job that re-uses a fixed system prompt across thousands of records can take the cache-read discount on the prefix and the batch discount on the rest. For a workload that is both repetitive and non-urgent, the effective rate can land well under half the headline price.

The trade-off is turnaround — batch jobs typically resolve within a window (often up to 24 hours) rather than in seconds. Use batch for pipelines, not for anything a user is waiting on.


Context-window pricing: the quiet bill multiplier

Modern models accept enormous context windows — Anthropic includes a 1M-token context window at standard pricing on Opus 4.6+, Sonnet 4.6, and Fable 5. That capability is a double-edged sword: every token you put into context is billed at the input rate on every call.

The trap is RAG and long-conversation apps that keep appending. If you grow a conversation to 100,000 tokens of context and make 20 more turns, you re-pay for those 100,000 input tokens on each of the 20 turns — 2M input tokens of overhead before counting any new content. On Opus 4.8 that's $10 in pure context-replay cost for a single conversation.

Two mitigations: (1) cache the stable portion of context so the replay is billed at 10% instead of 100%; (2) summarize or truncate old turns so the window doesn't grow unbounded. Some providers also tier pricing above a context threshold (e.g. Gemini 3.1 Pro Preview quotes its rate at ≤200k) — verify the threshold on the live pricing page before sending very large prompts.


How to estimate your real monthly cost

Headline per-token rates are the starting point, not the answer. To estimate a real monthly bill, work through five numbers: (1) requests per month, (2) average input tokens per request, (3) average output tokens per request, (4) which model, and (5) what fraction of input is cacheable or batchable.

Worked example. Say a support assistant handles 100,000 requests/month on Claude Sonnet 4.6 ($3 / $15), averaging 3,000 input + 500 output tokens, with a 2,000-token system prompt that's cacheable. Base input: 100k × 3,000 = 300M tokens; of that, ~200M is the repeating cacheable prompt. Cached: ~200M × $0.30/MTok (10% of $3) = $60; uncacheable input ~100M × $3 = $300; output 100k × 500 = 50M × $15 = $750. Total ≈ $1,110/month — versus roughly $1,650 without caching.

The two biggest estimation mistakes are underestimating output tokens (models are wordier than people expect) and ignoring context replay in multi-turn apps. Build a small spreadsheet, or skip the arithmetic and plug your volumes into the AI Prompt Cost Calculator — see how the calculator works for the methodology. Then validate against your first real week of API billing; estimates are directional until metered usage confirms them.

Cost-control checklist: route easy tasks to the cheapest capable model (Gemini 2.5 Flash-Lite, gpt-5.4-nano, Haiku 4.5); cache stable prefixes; batch anything non-urgent; cap output length; and trim context aggressively. These five levers routinely cut a bill by half or more — far more than switching providers for a fractional rate difference.


Sources & further reading

All prices in this guide are quoted as of June 2026 and are subject to change — always confirm on the live pages below before committing a budget.

OpenAI API pricing: https://developers.openai.com/api/docs/pricing

Anthropic / Claude pricing: https://claude.com/pricing

Claude API pricing detail (caching, batch, tools): https://platform.claude.com/docs/en/about-claude/pricing

Google Gemini API pricing: https://ai.google.dev/gemini-api/docs/pricing

Token-to-text rule of thumb (1 token ≈ 4 characters ≈ 0.75 words): per Anthropic and OpenAI tokenization documentation.

Estimate your own spend with the AI Prompt Cost Calculator and read the calculator methodology.

Frequently Asked Questions

Which AI model has the cheapest cost per token in 2026?

As of June 2026, Gemini 2.5 Flash-Lite is the cheapest at $0.10 input / $0.40 output per 1M tokens, per the Gemini pricing page. Among comparable cheap tiers, gpt-5.4-nano ($0.20 / $1.25) and Claude Haiku 4.5 ($1 / $5) are the OpenAI and Anthropic equivalents. These are best for high-volume, low-stakes tasks like classification and extraction, not frontier reasoning.

Why is output more expensive than input?

Generating tokens is more compute-intensive than reading them — each output token requires a full forward pass through the model, while input can be processed more efficiently. Across providers, output rates run roughly 4-6x the input rate. This is why summarization (long input, short output) is cheap and open-ended generation (short input, long output) is comparatively expensive. See each provider's pricing page for exact ratios.

How much does prompt caching actually save?

On Anthropic's API, a cache read costs 10% of the base input rate, so reusing a large stable prefix saves about 90% on that portion (pricing detail). For Claude Opus 4.8, cached input drops from $5.00 to $0.50 per 1M tokens. The catch: caching only helps when the same large prefix repeats across many calls within the cache window. It does nothing for one-off prompts.

What is the batch API discount?

Anthropic's Batch API is 50% off both input and output for jobs that don't need an immediate response (pricing). OpenAI and Google offer comparable batch tiers — check their pricing pages for exact percentages and turnaround windows. Batch and caching can stack, so a repetitive non-urgent pipeline can run at well under half the headline rate.

Does a bigger context window cost more?

The window itself is often included at standard pricing — Anthropic includes 1M-token context at standard rates on Opus 4.6+, Sonnet 4.6, and Fable 5. But you pay the input rate for every token you actually put into context, on every call. Multi-turn apps that keep appending re-pay for the whole context each turn, which quietly multiplies the bill. Caching the stable portion and trimming old turns are the main mitigations.

How do I estimate my real monthly AI cost?

Multiply requests/month by average input and output tokens per request, apply the model's per-token rates, then discount any cacheable or batchable portion. Output tokens and multi-turn context replay are the two most-underestimated costs. The fastest way is to skip the arithmetic and use the AI Prompt Cost Calculator (methodology here), then validate against your first real week of API billing.

Estimate your real AI bill before you commit.

Plug your volumes into the free AI Prompt Cost Calculator — no signup, part of 40+ free prompt tools from Digital Dashboard Hub.

Browse all prompt tools →