Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

AI Prompt Cost Calculator for GPT-5 (2026)

GPT-5 input tokens cost $2.50 per million at standard rate — but that headline number hides a 150x cost spread across the GPT-5 family alone. This guide gives you real prices for every tier, a comparison table across all major models, and the math to calculate your exact monthly bill before you commit to a model.

By DDH Research Team at Digital Dashboard HubUpdated

GPT-5 launched at $2.50/million input tokens and $10.00/million output tokens for the base tier — pricing confirmed on OpenAI's pricing page. If you're running 10 million tokens per month, that's a $25 input + $100 output baseline. But almost nobody should be paying that rate in 2026: GPT-5 mini cuts it to $0.40/$1.60, GPT-5 nano cuts it further to $0.15/$0.60, and prompt caching drops your effective input rate another 90% on cached tokens. The real question is whether you're using the right tier for each task.

Use our AI Prompt Cost Calculator to plug in your actual monthly token volume and get the line-item bill across every model in seconds. The rest of this guide breaks down the math so you understand exactly what drives each number — and where you can cut without touching output quality.

We cover: the full GPT-5 tier pricing breakdown, how it stacks up against Claude Opus 4.5, Gemini 2.5 Pro, and Llama 3.x open models, the per-call cost math for common workflow sizes, rate limits by tier, and the fastest path to cutting your bill 40-70% without a model swap.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card — AICHAT30 = 30% off Pro.

2026 AI Model Pricing Comparison: Input / Output (per million tokens)

Feature
Input ($/1M)
Output ($/1M)
Context window
Rate limit (RPM)
GPT-5 (standard)$2.50$10.00128k10,000
GPT-5 mini$0.40$1.60128k30,000
GPT-5 nano$0.15$0.6032k60,000
GPT-5 (cached input)$0.25$10.00128k10,000
GPT-5 (Batch API)$1.25$5.00128kN/A (async)
Claude Opus 4.5$15.00$75.00200k4,000
Claude Opus 4 (cache read)$1.50$75.00200k4,000
Claude Sonnet 4.5$3.00$15.00200k8,000
Claude Haiku 3.5$0.80$4.00200k8,000
Gemini 2.5 Pro$1.25$10.001M2,000
Gemini 2.5 Flash$0.075$0.301M4,000
Llama 3.3 70B (hosted)$0.23$0.40128kvaries
Llama 3.1 8B (hosted)$0.05$0.08128kvaries

Prices sourced from openai.com/api/pricing, anthropic.com/pricing, and ai.google.dev/pricing as of June 2026. Llama hosted pricing from Together AI / Groq public rate cards. Rates subject to change — verify before committing to a model.

How to Read the GPT-5 Pricing Table

The GPT-5 family is not one model — it is at least four distinct price points with very different quality ceilings. GPT-5 standard ($2.50/$10.00 per million) is the mid-range flagship for coding, analysis, and reasoning tasks that need genuine capability. GPT-5 mini ($0.40/$1.60) is optimized for conversational and content tasks where you need coherent language but not deep reasoning. GPT-5 nano ($0.15/$0.60) targets classification, extraction, short-form generation, and any task where latency beats quality.

At the high end, OpenAI also offers GPT-5 Pro — a higher-compute variant for the most demanding agentic and research workloads, priced above standard. Check OpenAI API Pricing 2026 for the Pro tier breakdown, since OpenAI adjusts the Pro rate independently of the rest of the family.

The most important number in the table above is the cached input rate: $0.25/million for GPT-5 standard. If your prompts have a stable system message or retrieved context that stays constant across calls, prompt caching cuts your effective input cost 90%. For agentic loops that call the model 10-50 times per session with the same context prefix, caching alone often produces more savings than dropping to a cheaper model tier.


Per-Call Cost Math: What Does One Prompt Actually Cost?

Most developers think in per-call terms, not per-million-token terms. Here is the translation math for the most common prompt sizes. A typical chatbot turn — 500 input tokens + 300 output tokens — costs $0.00125 + $0.003 = **$0.00425 per call** on GPT-5 standard. Run that 100,000 times per month and you have $425. The same call on GPT-5 mini: $0.0002 + $0.00048 = $0.00068 per call, or $68/month. A 6x bill reduction by switching model tiers on a task that mini handles fine.

For a long-context document analysis call — 32k input + 2k output — the math changes. GPT-5 standard: 32,000 × $2.50/1M + 2,000 × $10.00/1M = $0.08 + $0.02 = **$0.10 per call**. At 10,000 calls per month, that is $1,000. With prompt caching on the 28k-token document prefix (cached at $0.25/1M) and only 4k tokens uncached: (28,000 × $0.25 + 4,000 × $2.50 + 2,000 × $10.00) / 1M = $0.007 + $0.010 + $0.020 = **$0.037 per call** — a 63% reduction with zero change to the model or output quality.

For agentic workflows with tool use, count every message in the conversation history. An agent that runs 15 turns with 6k tokens of accumulated context per turn generates 90,000 input tokens per session before you count output. At GPT-5 standard that is $0.225 in input alone per session. The cost-per-token-all-major-models-2026 guide has the full token-counting methodology for agentic chains.


GPT-5 vs Claude Opus 4.5: The Honest Cost Comparison

Claude Opus 4.5 is priced at $15.00/$75.00 per million — 6x the input cost and 7.5x the output cost of GPT-5 standard. That premium buys the highest reasoning benchmark scores and 200k-token context window for long document analysis. On tasks where Opus 4.5 is genuinely needed — complex multi-step reasoning, nuanced judgment calls, very long context — the quality difference justifies the cost. On tasks where GPT-5 standard or Claude Sonnet 4.5 is sufficient, you are paying a 6-7x premium for nothing.

The practical comparison is Opus 4.5 vs GPT-5 standard for code generation and analysis: both handle the vast majority of real-world coding tasks. The key cost lever on Opus 4.5 is prompt caching — cache reads cost $1.50/million (90% off $15.00), which brings the effective input rate below GPT-5 standard for cached prefixes. If your Opus 4.5 workflow has a 50k-token stable context that you cache, you pay $1.50/1M on the cached portion and $15.00/1M only on the dynamic portion. See the full breakdown in GPT-5.5 vs Claude Opus 4.8 2026.

The bottom line: Claude Opus 4.5 is the right choice when you need 200k-token context or benchmark-verified reasoning quality and you have prompt caching enabled. For standard-context tasks under 50k tokens, GPT-5 standard matches or exceeds Opus 4.5 performance at 6x lower cost.


Gemini 2.5 Pro: The Long-Context Cost Play

Gemini 2.5 Pro at $1.25/$10.00 per million sits below GPT-5 standard on input cost and matches it on output. The real differentiator is the 1-million-token context window — priced the same per token regardless of context length up to 1M. For workflows that involve processing entire codebases, large research corpora, or very long multi-turn histories, Gemini 2.5 Pro is often the lowest-cost frontier option because you are not paying the long-context premium that other providers charge above 32k or 128k tokens.

Gemini 2.5 Flash at $0.075/$0.30 per million is the most aggressive price point among frontier-class hosted models as of June 2026 — roughly half the cost of GPT-5 nano on input and half on output, with the same 1M-token window. For high-volume classification, summarization, and extraction tasks where you are comfortable with Flash-tier quality, this is the cheapest option in its class. The Google AI pricing page has tier-by-tier context window pricing details.

One practical note on Gemini rate limits: 2.5 Pro is capped at 2,000 RPM at standard tier, which is a real constraint for high-throughput production workloads. If you need >2k RPM, you need either enterprise quota or a parallel-request architecture. Gemini 2.5 Flash allows 4,000 RPM and is the better fit for burst-heavy pipelines.


Llama 3.x: When Open Models Beat Hosted APIs on Cost

Llama 3.3 70B hosted on Together AI or Groq runs at approximately $0.23/$0.40 per million tokens — about 90% cheaper than GPT-5 standard, with quality that benchmarks close to GPT-4-level on standard evals. Llama 3.1 8B hosted drops further to $0.05/$0.08 per million, making it the cheapest viable model for bulk classification and extraction tasks that do not require long-form reasoning.

Self-hosted Llama 3.x on your own GPU eliminates the per-token cost entirely — you pay only for compute. The break-even versus hosted APIs is roughly 1-2 million calls per day for the 8B model on a single A100, and 300-500k calls per day for the 70B model. Below those thresholds, hosted APIs win on total cost of ownership once you factor in DevOps, inference optimization, and model update cycles.

Where Llama 3.x makes the most sense in 2026: high-volume deterministic tasks (entity extraction, classification, tagging) where you have a fine-tuned checkpoint for your domain, latency is not critical, and you're running >500k calls/day. For everything else — especially tasks requiring instruction-following fidelity, tool use, or structured output reliability — GPT-5 mini or Gemini Flash at $0.15-$0.40/million input is competitive enough that self-hosting the ops burden is rarely worth it.


Rate Limits Matter as Much as Price

Price per token is only half the cost equation — rate limits determine whether you can actually run your workload at your target throughput. GPT-5 nano offers 60,000 RPM at standard tier, which is the highest RPM in the GPT-5 family and makes it the right fit for real-time high-throughput pipelines. GPT-5 standard allows 10,000 RPM — adequate for most production apps, but a bottleneck for very high-volume systems. Claude Opus 4 is the most constrained at 4,000 RPM, which matters for agentic systems that fire many calls in parallel.

Gemini 2.5 Pro's 2,000 RPM limit is the binding constraint that pushes high-throughput users to Flash (4,000 RPM) or to enterprise quota agreements. If your workload needs >4,000 RPM on Gemini, the enterprise tier adds dedicated capacity at negotiated pricing — contact Google directly. For most startups, the standard tier limits are fine; the constraint only bites above ~100 concurrent users making synchronous API calls.

One underused rate-limit hack: OpenAI's Batch API has no RPM limit — it is quota-managed by batch job throughput, not by requests per minute. If your workload is async-tolerant (anything with a 24-hour SLA), Batch API removes the rate limit constraint entirely and gives you 50% off. See our how much does ChatGPT cost 2026 guide for the full Batch API cost math.


How to Use a Prompt Cost Calculator Correctly

A prompt cost calculator is only as accurate as the token counts you feed it. The most common mistake: developers count their system prompt length but forget to count the conversation history that gets re-appended every turn. In a 20-turn chatbot, each turn includes all prior turns in the message array — so turn 20 might carry 15,000 tokens of accumulated history even if each individual message was only 500 tokens.

The right workflow: run your actual prompt through a tokenizer (tiktoken for OpenAI models, the Anthropic tokenizer for Claude, the SentencePiece tokenizer for Gemini/Llama) to get the exact token count per call. Then multiply by your expected monthly call volume, then apply the model's per-million rate. Our AI Prompt Cost Calculator does this in one step — paste your prompt text, select the model, enter call volume, and get the monthly cost estimate with a model-by-model comparison.

One nuance the calculator handles automatically: output tokens are priced differently from input tokens on every model in the table, and the ratio varies. GPT-5 standard outputs cost 4x inputs. Claude Opus 4.5 outputs cost 5x inputs. Gemini 2.5 Pro outputs cost 8x inputs. If your prompts generate verbose outputs, that ratio is the main cost driver — and it is the reason that capping max_output_tokens is the easiest cost cut that requires zero model changes.


The 5 Fastest Ways to Cut GPT-5 Costs Right Now

**1. Enable prompt caching on stable context.** If your system prompt, retrieved documents, or tool definitions are constant across calls, restructure your message array so the stable content comes first — OpenAI auto-detects cache-eligible prefixes and charges 90% less on cache hits. Setup time: under 2 hours for most codebases. Typical savings: 50-80% on input tokens.

**2. Drop to GPT-5 mini for non-reasoning tasks.** If your task is content generation, customer-facing chat, summarization, or translation, GPT-5 mini produces output that is effectively indistinguishable from GPT-5 standard at 6x lower cost. Run a blind evaluation on 100 real samples — most teams find mini handles 60-80% of their workload volume without quality regression. Full optimization playbook in reduce GPT-4 API costs guide.

**3. Cap max_output_tokens per task.** If you ask for a JSON object, set max_output_tokens to 400. If you ask for a 2-sentence summary, set it to 100. Default limits are 4k-16k tokens depending on the model, and models often pad to fill the space when unconstrained. Setting explicit limits eliminates tokens that neither the model nor your user benefits from.

**4. Move batch work to the Batch API.** Any job that can wait up to 24 hours — bulk classification, overnight content generation, scheduled report generation — qualifies for OpenAI's Batch API at 50% off standard rate. For async pipelines, this is a guaranteed 50% cut with 2-4 hours of implementation work.

**5. Use GPT-5 nano for classification and extraction.** At $0.15/$0.60 per million, nano is 16x cheaper than standard for tasks that fit within its 32k context and capability ceiling. Entity extraction, sentiment classification, routing decisions, and structured data extraction are all nano-tier tasks. See the how to cut OpenAI bill 50 percent guide for the full tiering decision tree.


Worked Example: A Real SaaS App Monthly Bill

To make this concrete, here is the monthly token budget for a hypothetical SaaS app running three AI-powered features: a customer-facing chatbot, a backend document classifier, and a weekly report generator. This mirrors the architecture of a real mid-market B2B product and illustrates how model selection affects total cost.

**Chatbot** (conversational, 200k calls/month, 800 input + 600 output tokens average): GPT-5 standard = 200k × ($2.50 × 0.8 + $10.00 × 0.6) / 1,000 = $400 + $1,200 = $1,600/month. GPT-5 mini = 200k × ($0.40 × 0.8 + $1.60 × 0.6) / 1,000 = $64 + $192 = $256/month. **Savings: $1,344/month by switching to mini.**

**Document classifier** (2M calls/month, 1,200 input + 50 output tokens): GPT-5 standard = $6,000 + $1,000 = $7,000/month. GPT-5 nano = $360 + $60 = $420/month. **Savings: $6,580/month.** This is the most dramatic example because classifiers need almost no output tokens and nano-tier quality is sufficient for most label sets.

**Weekly report generator** (500 calls/month, 5,000 input + 3,000 output tokens, async): GPT-5 standard Batch API ($1.25/$5.00) = 500 × ($1.25 × 5 + $5.00 × 3) / 1,000 = $3.125 + $7.50 = $10.63/month. This is already low-volume, but the Batch API alone cuts the synchronous cost in half. Total across all three features: from $8,610/month naive → $676/month optimized. **92% bill reduction** using only model tiering, Batch API, and no prompt caching yet applied.


Model Selection Framework: Which Model for Which Task

The right mental model is not 'best model = best results' but 'smallest model that produces acceptable output for this specific task = lowest cost.' The question is always: what is the minimum quality bar this task needs to pass, and which model tier clears that bar at the lowest price?

**Use GPT-5 nano / Gemini 2.5 Flash / Llama 3.1 8B** for: binary classification, multi-class classification with clear labels, entity extraction from structured text, yes/no routing decisions, short-form text formatting, translation of short strings, and any task where your output is fewer than 200 tokens and the input is deterministic. Cost range: $0.05-$0.15/million input.

**Use GPT-5 mini / Claude Haiku 3.5 / Llama 3.3 70B** for: customer-facing chat, content summarization, product copy generation, code commenting, email drafting, and most consumer-facing generation tasks where coherence and tone matter but deep reasoning does not. Cost range: $0.23-$0.80/million input.

**Use GPT-5 standard / Claude Sonnet 4.5 / Gemini 2.5 Pro** for: code generation from natural language, multi-step reasoning chains, research synthesis, structured data generation from unstructured text, and any task where a cheaper model demonstrably fails your quality eval. Cost range: $1.25-$3.00/million input.

**Use Claude Opus 4.5 / GPT-5 Pro** for: frontier reasoning benchmarks, very long context (>100k tokens), complex agentic tasks requiring judgment and planning, and any task where you have specifically validated that cheaper models fail. Cost range: $15.00+/million input. Use sparingly and always with prompt caching enabled.


Tracking Your AI Spend: What to Measure

Most teams discover their AI costs are higher than expected because they measure spend in dollars but make decisions in model names. The metrics that actually drive cost are: tokens per call (input and output separately), calls per day by feature, model per feature, and cache hit rate. If you are not logging these four numbers, you cannot optimize.

OpenAI's usage dashboard breaks down cost by model and endpoint, but not by feature — you need to tag your API calls with metadata (a custom header or a usage_metadata field in your request) to attribute cost to product features. Anthropic's dashboard similarly shows totals by model. For cross-model visibility and feature-level attribution, the standard approach is to log every API call with input tokens, output tokens, model, feature tag, and timestamp to your own data warehouse.

Once you have per-feature token data, the optimization order is straightforward: find the highest-cost feature, check what model it uses, run a quality eval on the next tier down, and switch if quality holds. Repeat. Most teams discover two or three high-volume features running on premium models where a cheaper model is indistinguishable on their actual user tasks. That is where the 50-80% savings comes from. For a complete spend-reduction playbook, see reduce GPT-4 API costs guide.


Future-Proofing: How AI Prices Are Trending

AI API prices have fallen 4-8x per year across every major provider since GPT-4 launched. The trend is driven by hardware efficiency gains (GPU memory bandwidth improvements), inference optimization (quantization, speculative decoding, continuous batching), and competition between OpenAI, Anthropic, Google, and the open-source ecosystem. There is no reason to expect this to stop in 2026-2027.

What this means practically: hard-coded cost assumptions in your architecture go stale fast. The model that was too expensive 6 months ago may now be affordable; the tier you chose for cost reasons may now have a cheaper alternative. Build your model selection as a configuration parameter, not a hard-coded string — so you can swap tiers as prices fall without touching application code. This is exactly the kind of prompt-level abstraction that tools like DDH's prompt generator are designed to support.

The open-model trajectory is equally important: Llama 3.x hosted at $0.05-$0.23/million is already at commodity pricing for many task types. As quantized local models improve, the cost floor for inference will approach near-zero for teams that can self-host. The premium models (Opus 4, GPT-5 standard and above) will continue to command a price premium, but that premium will increasingly only be justified for the tasks where frontier capability is genuinely necessary. Monitor prices monthly — our AI Prompt Cost Calculator is updated within 48 hours of every major price change.

Continue your research on adjacent topics — calculators, rate limits, head-to-head comparisons, and guides.

Frequently Asked Questions

What does GPT-5 cost per 1,000 tokens?

GPT-5 standard costs $0.0025 per 1,000 input tokens and $0.010 per 1,000 output tokens. GPT-5 mini costs $0.0004 per 1,000 input and $0.0016 per 1,000 output. GPT-5 nano costs $0.00015 per 1,000 input and $0.0006 per 1,000 output. Prices verified from OpenAI's pricing page as of June 2026.

How do I calculate the cost of my ChatGPT API calls?

Multiply your input token count by the model's per-million input rate, add your output token count multiplied by the output rate, divide by 1,000,000. For GPT-5 standard: (input_tokens × $2.50 + output_tokens × $10.00) / 1,000,000. Use our AI Prompt Cost Calculator at /blog/ai-prompt-cost-calculator to do this automatically with model comparison.

Is GPT-5 cheaper than Claude Opus 4?

Yes, significantly. GPT-5 standard is $2.50/$10.00 per million tokens; Claude Opus 4.5 is $15.00/$75.00 per million. GPT-5 is 6x cheaper on input and 7.5x cheaper on output. The premium for Opus 4.5 is justified for tasks requiring 200k-token context or Anthropic's highest-capability reasoning tier.

What is the cheapest AI model for high-volume classification?

Gemini 2.5 Flash at $0.075/$0.30 per million is the cheapest frontier-class option for hosted inference in June 2026. Llama 3.1 8B on Together AI or Groq is available at $0.05/$0.08 per million. For self-hosted workloads above 1M+ calls/day, a quantized Llama 3.1 8B can reduce cost further to near the GPU compute floor.

Does prompt caching work the same way on GPT-5 as on Claude?

The mechanism differs slightly. OpenAI auto-detects cache-eligible prefixes (no explicit markup required, cache window is ~5-10 minutes). Anthropic requires explicit cache_control markers in the prompt structure and supports up to 1-hour cache windows with extended caching. Both give 90% off on cache hits. The structuring requirement on Anthropic is slightly more explicit but gives you more control over which content gets cached.

What rate limits does GPT-5 have?

At standard tier: GPT-5 standard allows 10,000 RPM, GPT-5 mini allows 30,000 RPM, and GPT-5 nano allows 60,000 RPM. These limits can be increased via OpenAI's rate limit increase request form once your account has a billing history. Enterprise accounts get higher limits by default.

How accurate are AI prompt cost calculators?

They are accurate to within 1-5% of actual billing when you supply accurate token counts. The main sources of error are: using character counts instead of tokenizer counts (off by 20-40%), not counting conversation history tokens, and forgetting that system messages and tool definitions are billed as input tokens on every call. Use the actual tokenizer for your model for the most accurate estimates.

Will Llama 4 or future open models make hosted APIs obsolete?

For narrow high-volume tasks, open models already compete on cost today. For frontier reasoning, long-context, and instruction-following fidelity, hosted frontier models from OpenAI and Anthropic maintain a meaningful quality advantage. The gap will narrow — but it is unlikely to close entirely in 2026. The practical answer: benchmark your specific task on each model tier rather than picking based on price alone.

Calculate your exact GPT-5 cost in 30 seconds.

Paste your prompt, enter your monthly call volume, and get a line-item cost breakdown across GPT-5, Claude Opus 4, Gemini 2.5 Pro, and Llama 3.x — side by side. Free, no sign-up required.

Browse all prompt tools →