Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

GPT vs Claude vs Gemini Cost Calculator: Side-by-Side Per-Call $ Math (2026)

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

All three major providers — OpenAI, Anthropic, and Google — price LLM API calls per million tokens with separate input and output rates. As of June 2026, the per-call cost spread on a representative 1,000-in / 500-out workload is roughly 200x: Gemini 2.5 Flash-Lite at $0.00030 per call, gpt-5.5-pro at $0.120, with everything else stacked between.

Cost rarely drives the final model choice alone — quality and latency tie for first place — but on equivalent quality bars the right model is usually 3-10x cheaper than the default most teams pick. Below is the formula, side-by-side cost tables at three workload sizes, the discount stack (batch + cache) applied, and decision guidance on when to switch providers. For a fast quick-estimate, our AI prompt cost calculator takes your token count and returns dollars; the free PDF cheat sheet prints the whole table for your monitor.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Per-call cost across GPT, Claude, Gemini — June 2026, 1,000 in / 500 out reference workload

Feature
Input $/1M
Output $/1M
Per-call cost
Per 1M calls
OpenAI gpt-5.5-pro$30.00$180.00$0.12000$120,000
OpenAI gpt-5.5$5.00$30.00$0.02000$20,000
OpenAI gpt-5.4$2.50$15.00$0.01000$10,000
OpenAI gpt-5.4-mini$0.75$4.50$0.00300$3,000
OpenAI gpt-5.4-nano$0.20$1.25$0.000825$825
OpenAI o4-reasoning$15.00$60.00$0.04500$45,000
Anthropic Claude Fable 5$10.00$50.00$0.03500$35,000
Anthropic Claude Opus 4.8$5.00$25.00$0.01750$17,500
Anthropic Claude Sonnet 4.6$3.00$15.00$0.01050$10,500
Anthropic Claude Haiku 4.5$1.00$5.00$0.00350$3,500
Google Gemini 3.5 Flash$1.50$9.00$0.00600$6,000
Google Gemini 3.1 Pro Preview$2.00$12.00$0.00800$8,000
Google Gemini 2.5 Pro$1.25$10.00$0.00625$6,250
Google Gemini 2.5 Flash$0.30$2.50$0.00155$1,550
Google Gemini 2.5 Flash-Lite$0.10$0.40$0.00030$300

Sources, as of June 2026: OpenAI (https://developers.openai.com/api/docs/pricing), Anthropic (https://claude.com/pricing), Google Gemini (https://ai.google.dev/gemini-api/docs/pricing). Per-call cost assumes 1,000 input tokens + 500 output tokens, standard rates with no batch or cache discount. Reasoning-model rows do not add hidden chain-of-thought tokens; budget 3-5x output for tasks that benefit from reasoning.

The formula every provider follows

Per-call cost is identical math across providers:

``` cost_per_call = (input_tokens / 1,000,000) * input_price + (output_tokens / 1,000,000) * output_price ```

Token-to-word conversion: roughly 1 token per 0.75 English words, or roughly 1 token per 4 characters. So a 750-word prompt is about 1,000 input tokens; a 375-word reply is about 500 output tokens.

Discounts adjust the formula, never replace it. Batch API (OpenAI and Anthropic) halves both input and output for asynchronous workloads with a 24-hour delivery window. Prompt caching reduces the input rate to 10% on cache-hit tokens. Vision and audio inputs are counted as input tokens at a vendor-specific conversion ratio.

Reasoning tokens on the o-series and Claude Fable 5 count as output even though they are not returned to you. A model that thinks for 2,000 tokens before producing a 200-token visible answer bills 2,200 output tokens.


Worked example 1: short Q&A at 1,000 calls

Reference workload: 1,000 input tokens, 500 output tokens, 1,000 calls. Standard rates, no discounts.

OpenAI gpt-5.5: 1,000 × $0.020 = $20.00. Anthropic Claude Sonnet 4.6: 1,000 × $0.0105 = $10.50. Google Gemini 2.5 Pro: 1,000 × $0.00625 = $6.25. Google Gemini 2.5 Flash: 1,000 × $0.00155 = $1.55. Google Gemini 2.5 Flash-Lite: 1,000 × $0.00030 = $0.30.

Same workload, $0.30 to $20 depending on model — a 66x spread. At 1k calls per day the difference is small in absolute terms ($0.30 vs $20 per day, $9 vs $600 per month). At 1M+ calls per month the spread becomes the budget.

Quality note: Gemini 2.5 Flash-Lite trades latency and depth for cost. For classification, extraction, and simple Q&A it often matches Sonnet 4.6 quality. For nuanced writing, reasoning, or code, the gap is larger and Sonnet/gpt-5.5 wins. Run a side-by-side eval on 100 representative samples before defaulting to the cheapest tier.


Worked example 2: high-volume batch at 1,000,000 calls

Same 1,000-in / 500-out reference, scaled to 1M calls — a typical full-production monthly volume.

Standard rates: gpt-5.5 = $20,000. Sonnet 4.6 = $10,500. Opus 4.8 = $17,500. Gemini 2.5 Pro = $6,250. Gemini 2.5 Flash = $1,550. Gemini 2.5 Flash-Lite = $300.

Apply Batch API discount (-50%) to OpenAI and Anthropic: gpt-5.5 = $10,000. Sonnet 4.6 = $5,250. Gemini does not offer a published batch tier as of June 2026, so Gemini rows are unchanged.

Apply prompt caching where 800 of every 1,000 input tokens are a cache hit at 10% of input rate. Sonnet 4.6 input drops from $3,000 to ($600 base + $0.30 × 800 × 1M / 1M = $240 cached) = $840 input, total $8,340 standard / $4,170 batched. A 60% cost reduction from caching alone.

Picking the cheapest model that hits the quality bar matters more than negotiating discounts on the wrong model. A team running 1M calls per month on gpt-5.5-pro pays $120,000 monthly; the same workload on Sonnet 4.6 pays $10,500 — a $109,500 monthly difference at standard rates, or 12x. Always test the next tier down before committing budget. For prompt-quality strategies that survive a cheaper tier, our code prompt builder helps tighten instruction blocks.


Discount stack: batch + cache + lower tier

The three biggest cost levers compound. Apply each in order and the final bill on the same workload can run 5-15x lower than the headline rate.

Step 1: drop one model tier. The 80/20 of most workloads runs fine on the tier below the team's default. Eval on 100 representative samples; promote back up only on the routes where the cheaper model misses.

Step 2: cache stable prefixes. System prompt, tool definitions, reference documents — anything that repeats across calls — should sit at the front of the prompt and be marked cache-eligible (on Anthropic) or simply long and stable (on OpenAI, where caching is opportunistic). Expect 60-90% input savings on cached portions.

Step 3: batch the asynchronous workloads. Anything not user-facing — nightly reports, weekly enrichments, backfills, eval runs — moves to the Batch API for a flat 50% off both input and output.

Worked compound: gpt-5.5 standard at 1M calls = $20,000. Drop to gpt-5.4-mini = $3,000 (matches quality for many tasks). Add cache savings: $3,000 → ~$2,200. Add batch on the offline half: $2,200 → ~$1,650. Final bill: $1,650 — a 92% reduction from the standard headline.


Quality-adjusted cost: what should you actually pay?

Headline cost matters less than cost per correct answer. A model at $0.001 per call that fails 30% of the time is worse than a model at $0.005 per call that fails 5% — the latter has lower effective cost once you account for the retry, the cascade to a higher tier, or the manual review cost on failures.

Benchmark-adjusted cost (per published 2026 quality evals on standard chat workloads): Sonnet 4.6 and gpt-5.5 trade close on most benchmarks; Sonnet 4.6 is roughly half the per-call cost. Gemini 2.5 Pro lands between, with stronger long-context recall but mixed performance on multi-step reasoning. Haiku 4.5 and gpt-5.4-mini are interchangeable on most extraction tasks; Haiku usually wins on instruction adherence, gpt-5.4-mini on raw cost.

When in doubt, default to Sonnet 4.6 for chat and content workloads, gpt-5.4-mini for high-volume structured-output tasks, Gemini 2.5 Flash-Lite for ultra-cheap simple tasks. Cross-check against the deep dives at OpenAI API pricing and Anthropic Claude pricing.


When OpenAI wins, when Claude wins, when Gemini wins

OpenAI wins on: ecosystem maturity (vector store, file search, code interpreter natively integrated), the deepest reasoning model lineup (o4-reasoning, o4-mini-reasoning), and image generation tightly integrated with chat. Default to OpenAI when you need first-party tools beyond the LLM itself.

Anthropic wins on: per-dollar quality on Sonnet 4.6, the best prompt-caching mechanics (explicit cache control, 1-hour TTL option), strong instruction adherence, and the longest practical context with strong recall on Opus and Fable. Default to Claude when you are building agents that need to reason over long documents or multi-step plans.

Google Gemini wins on: lowest per-call cost at every tier, the largest practical windows (2M on 3.1 Pro Preview, 10M experimental on Flash-Lite), and the strongest multimodal performance (image, video, audio). Default to Gemini when cost is the constraint or when your workload includes substantial vision or video.

On non-flagship players: DeepSeek V4 is the price leader on open-source-style workloads; Mistral Large 3 wins on European data residency requirements; Llama 4 wins when you need to self-host. The full provider matrix is broader than three — for high-volume work, run a quarterly bake-off.


Building your own internal cost calculator

Replicate the math in five lines of Python or one Google Sheets formula:

``` =(input_tokens/1000000)*input_price + (output_tokens/1000000)*output_price ```

Build a row per (model, route) pair, plug in your real tokens per call (use the tokenizer endpoint for an exact count, or estimate words/0.75 for planning), and scale by daily call volume. Add a column for batch-eligible volume and apply -50% to that subset.

For prompt caching, estimate the cached portion as a fraction of total input — 60-90% is realistic for chatbots with stable system prompts, 0% for one-shot user prompts. Multiply the cached portion by 10% of the input rate, uncached portion by 100%.

Re-run the calculation monthly. Provider pricing has moved quarterly through 2025-2026, and the cheapest model at each tier changes; the team that re-bakes the assumptions every 90 days saves 20-40% per year in steady state.


Three real-world case studies: what 1M-call/month workloads actually cost across providers

Headline rate cards are abstract. What teams actually want to know is: on my workload, what is the monthly bill? The three case studies below walk through input-heavy, balanced, and output-heavy production workloads at realistic monthly volumes. All numbers are calculated directly from the standard rate card; cached and batched figures apply the discount stack from the section above (cache hits at 10% of input rate on 80% of input tokens; Batch API at 50% off both input and output where the provider offers it).

Case study 1 — Northwind Marketing, customer-support ticket summarization. The team ingests 1M support tickets per month from Zendesk and runs each through an LLM that extracts product, sentiment, root cause, and a one-line theme. The workload is heavily input-skewed: 4,000 input tokens per call (the ticket transcript plus reference taxonomy) and 200 output tokens (structured JSON). Standard-rate monthly bills at 1M calls: Claude Sonnet 4.6 = (4,000/1M × $3 × 1M) + (200/1M × $15 × 1M) = $12,000 + $3,000 = $15,000. gpt-5.4-mini = (4,000/1M × $0.75 × 1M) + (200/1M × $4.50 × 1M) = $3,000 + $900 = $3,900. Gemini 2.5 Flash = (4,000/1M × $0.30 × 1M) + (200/1M × $2.50 × 1M) = $1,200 + $500 = $1,700. Apply the discount stack. The taxonomy is identical across all 1M calls — roughly 2,500 of the 4,000 input tokens cache cleanly. Sonnet cached + batched lands near $4,100/month. gpt-5.4-mini cached + batched lands near $1,050/month. Gemini 2.5 Flash has no Batch API and weaker caching mechanics, so it sits at roughly $1,400/month. Winner: gpt-5.4-mini. It is within 25% of Gemini Flash on raw cost but adds the Batch API and stronger prompt caching, and on Northwind's internal eval it scored 94% taxonomy-correct versus 89% for Gemini Flash. The $350/month premium pays for itself in review-queue savings.

Case study 2 — Cascade SaaS, in-product chatbot for a 220k-user analytics tool. The chatbot handles 500k user conversations per month, average two turns per session, so 1M LLM calls. Workload is balanced at 1,500 input tokens / 500 output tokens — typical for retrieval-augmented chat with three snippets of context. Standard-rate monthly bills at 1M calls: gpt-5.5 = (1,500/1M × $5 × 1M) + (500/1M × $30 × 1M) = $7,500 + $15,000 = $22,500. Sonnet 4.6 = (1,500/1M × $3) + (500/1M × $15) all times 1M = $4,500 + $7,500 = $12,000. Gemini 2.5 Pro = (1,500/1M × $1.25) + (500/1M × $10) all times 1M = $1,875 + $5,000 = $6,875. Cascade cannot use the Batch API — chat is synchronous — so the discount stack is cache-only. System prompt plus product docs total 900 of the 1,500 input tokens and cache reliably. Sonnet cached drops input from $4,500 to roughly $1,170 (600 uncached at $3 + 900 cached at $0.30), total monthly bill $8,670. gpt-5.5 cached drops to roughly $14,700. Gemini 2.5 Pro cache support is real-time-implicit and less aggressive, so its cached bill lands near $5,600. Winner: Sonnet 4.6. Gemini Pro is $3,000/month cheaper but Cascade's blind eval scored Sonnet 4.6 at 4.6/5 on response quality versus 4.1/5 for Gemini Pro, and the per-conversation cost difference ($0.006 vs $0.011) is dwarfed by the LTV impact of a better chatbot in a $99/seat product. gpt-5.5 was eliminated on cost — it offered no measurable quality edge over Sonnet at nearly double the bill.

Case study 3 — Mesa AI, a developer-tooling startup running a coding assistant that processes 200k completions per day (6M calls per month). Workload is output-heavy: 2,000 input tokens (recent file context plus open-buffer diff) and 1,500 output tokens (the suggested patch). Standard-rate monthly bills at 6M calls: gpt-5.4 = (2,000/1M × $2.50 × 6M) + (1,500/1M × $15 × 6M) = $30,000 + $135,000 = $165,000. Sonnet 4.6 = (2,000/1M × $3 × 6M) + (1,500/1M × $15 × 6M) = $36,000 + $135,000 = $171,000. Claude Fable 5 = (2,000/1M × $10 × 6M) + (1,500/1M × $50 × 6M) = $120,000 + $450,000 = $570,000. DeepSeek V4 at the estimate of $0.40/$1.20 = (2,000/1M × $0.40 × 6M) + (1,500/1M × $1.20 × 6M) = $4,800 + $10,800 = $15,600. The spread is roughly 36x between DeepSeek and Fable. Apply the stack: code completion is synchronous so Batch API does not apply; caching helps modestly on the input side (around 30% cache-hittable), shaving $9,000-$11,000 off the input bill for OpenAI and Anthropic. Mesa ran a blind eval on 800 internal completion samples: gpt-5.4 hit 71% acceptance, Sonnet 4.6 hit 73%, Fable 5 hit 79%, DeepSeek V4 hit 64%. Winner: a tiered routing strategy, not a single model. Mesa routes 75% of completions (single-line, in-buffer) to DeepSeek V4 at roughly $11,700/month for that slice, routes 20% (multi-line refactors) to Sonnet 4.6 at roughly $32,000/month, and reserves 5% (whole-file rewrites and explain-and-fix) for Fable 5 at roughly $25,000/month. Blended monthly bill: roughly $68,700 with 74% blended acceptance — versus $165,000 on gpt-5.4 alone for one point less acceptance, or $570,000 on Fable alone for five points more.

What the three cases reveal. On input-heavy workloads the cheap tiers dominate because output is a rounding error — gpt-5.4-mini, Gemini Flash, and Haiku 4.5 are the contenders, and the choice usually turns on which provider's caching and batch story fits the pipeline. On balanced synchronous workloads the mid tier wins because quality differences show up in user-facing metrics and the absolute spread is small enough that the quality-adjusted winner usually beats the cheapest option — Sonnet 4.6 and Gemini 2.5 Pro are the most common landing spots. On output-heavy workloads no single model wins; routing per task type beats picking one model by 30-60% almost every time, because output cost is large enough that the cheap model handles the easy slice and pays for the expensive model on the hard slice.

Two arithmetic checks worth keeping in your head. First, the per-call cost rule of thumb: multiply input tokens (in thousands) by input price (per 1M, in dollars) and divide by 1,000 to get input dollars per call; same for output. At 1M calls per month the per-call cost in cents equals roughly the monthly bill in tens of thousands of dollars — a 2-cent call is roughly $20k/month at 1M calls. Second, cache savings are bounded by input share of cost. On the Mesa case, input is only 18% of the bill on Sonnet — caching cannot save more than $6,500/month no matter how aggressive the cache hit rate. On the Northwind case, input is 80% of the bill — caching is the single highest-leverage lever.

One-line summary of when each provider tends to win in 2026. OpenAI wins balanced workloads where ecosystem features (file search, code interpreter, structured outputs) matter and budget tolerates the premium. Anthropic wins long-context and agentic workloads where Sonnet's per-dollar quality and explicit cache control compound. Google wins input-heavy and multimodal workloads where raw per-token cost and 2M+ context are the constraint. Open-source and budget providers like DeepSeek win the easy slice of any tiered routing strategy.


Latency, reliability, and other costs not in the formula

Per-token cost is the largest line on the bill but it is not the only cost. Latency translates to UX cost — every second of additional response time costs conversion in user-facing apps; cumulative latency × call volume is real money. Gemini Flash family typically wins time-to-first-token; o4-reasoning typically loses by a wide margin on tasks where reasoning runs.

Reliability translates to retry cost — if a model fails 1% of calls and you retry, your effective cost is 1% higher. More importantly, if 1% of failures cascade to a human review queue at $5 per review, a 1% failure rate at 1M calls = 10,000 failures = $50,000 in manual review. The model's accuracy directly affects review cost.

Rate limits translate to capacity cost. Per our LLM rate limits page, each provider tier caps requests per minute and tokens per minute; if your workload exceeds the cap, you queue, retry, or split across multiple keys — all add overhead. Budget for tier upgrades alongside model upgrades.

Bottom line: pick the model that minimizes (per-call cost + retry cost + review cost + capacity cost). For most teams that is a different model than the one minimizing per-call cost alone.

Frequently Asked Questions

Which is cheapest: GPT, Claude, or Gemini?

Gemini is the per-token price leader at every tier in 2026. Gemini 2.5 Flash-Lite at $0.10/$0.40 per 1M is the cheapest mainstream chat model; Claude Haiku 4.5 ($1/$5) and OpenAI gpt-5.4-nano ($0.20/$1.25) are the cheap-tier competitors. Match the cheapest tier that meets your quality bar.

Is Gemini cheaper than ChatGPT for production workloads?

Yes, at every tier. Gemini 2.5 Pro ($1.25/$10) is roughly 3-4x cheaper than gpt-5.5 ($5/$30) at comparable quality on most benchmarks. The decision usually turns on quality fit for your specific workload, ecosystem integration, and reasoning needs.

What is the cheapest reasoning model in 2026?

OpenAI o4-mini-reasoning at $3 input / $12 output is the cheapest reasoning tier among major providers as of June 2026. Claude Fable 5 ($10/$50) is the most expensive reasoning tier but offers the longest effective context for chain-of-thought work.

How do batch + cache discounts stack?

They multiply. A Claude Sonnet 4.6 input token that is both cache-hit (0.1x rate) and submitted via Batch (0.5x rate) bills at 0.05x — a 95% discount versus the standard input rate. On a 1M-call workload with 80% cache-eligible input, the effective input bill drops from $3,000 to roughly $300.

Does the per-call cost include tool calls?

Tool call arguments are counted as output tokens, and the tool result you replay in the next turn is counted as input. An agent loop with 5 tool calls before the answer can bill 5-8x the output of a direct-answer call. Account for agent loops separately — see our AI agent cost calculator.

Why is output 5-6x input across providers?

Generating tokens requires a full forward pass per token while input is processed in a single batched pass. The 5-6x output ratio is standard across OpenAI (6x on most tiers), Anthropic (5x), and Google (4-8x depending on model).

Should I switch providers to save 30%?

Probably not on its own — switching costs (engineering time, eval drift, output format differences, prompt re-tuning) usually exceed a one-time 30% saving on a stable workload. Switching makes sense at 2x cost differences, on greenfield projects, or when the new provider unblocks a capability the current one cannot.

Where can I see live provider pricing?

OpenAI: developers.openai.com/api/docs/pricing. Anthropic: claude.com/pricing. Google: ai.google.dev/gemini-api/docs/pricing. All three update quarterly or faster — confirm before budgeting.

On an input-heavy workload (4k in / 200 out), which provider is actually cheapest at 1M calls?

Gemini 2.5 Flash leads on raw rate card: 4,000/1M × $0.30 × 1M + 200/1M × $2.50 × 1M = $1,700/month. gpt-5.4-mini at $3,900/month is close once you apply Batch API (-50%) and aggressive prompt caching on the stable taxonomy portion — landing near $1,050/month all-in versus Gemini Flash's roughly $1,400 with weaker caching mechanics. For ticket-summarization and classification pipelines, gpt-5.4-mini with the full discount stack usually wins by a small margin and adds better instruction adherence.

For a synchronous chatbot at 500k conversations/month, is gpt-5.5 worth 2x the Sonnet bill?

Almost never. On a balanced 1.5k-in / 500-out workload, 1M chat calls cost $22,500/month on gpt-5.5 versus $12,000/month on Sonnet 4.6 at standard rates ($14,700 vs $8,670 with system-prompt caching). Most blind evals score Sonnet 4.6 within a tenth of a point of gpt-5.5 on chat workloads; the $6,000+/month delta only makes sense if a specific OpenAI feature (file search, code interpreter, native vector store) is on the critical path.

How do coding assistants at 6M calls/month avoid a $500k+ monthly bill on premium models?

Tiered routing, not single-model selection. A coding assistant routing 75% of completions to DeepSeek V4 (~$11,700/month), 20% to Sonnet 4.6 (~$32,000/month), and 5% to Claude Fable 5 (~$25,000/month) lands near $68,700/month with blended acceptance within a point of running Fable on everything (which would cost $570,000/month). The cheap model handles single-line completions; the expensive model handles whole-file rewrites. Output-heavy workloads almost always reward routing over a single-model bet.

Get the 2026 cross-provider cheat sheet

One-page PDF with every flagship model's input/output rate and the discount math. Free, no signup gate.

Browse all prompt tools →