By The DDH Team · Digital Dashboard Hub

AI Prompt Cost Calculator: Estimate Token Costs Across Models (2026)

By DDH Research Team at Digital Dashboard Hub·Updated June 15, 2026

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

To estimate the cost of an AI prompt, count the tokens in and out, then multiply each by the model's per-token price: cost = (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price). A useful rule of thumb is that 1 token is roughly 4 characters or about 0.75 words of English, so 1,000 words is around 1,333 tokens.

Prices are quoted per million tokens and differ for input and output, with output usually costing several times more. Below are the formula, worked examples on real current prices, a full cross-provider table, and the caching and batch discounts that can cut bills substantially. Prices change often — always confirm against the live pricing pages linked in the table.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card — AICHAT30 = 30% off Pro. →

API prices per 1M tokens (input / output) — as of June 2026

Feature	Input ($/1M)	Output ($/1M)
OpenAI gpt-5.5	$5.00	$30.00
OpenAI gpt-5.5-pro	$30.00	$180.00
OpenAI gpt-5.4	$2.50	$15.00
OpenAI gpt-5.4-mini	$0.75	$4.50
OpenAI gpt-5.4-nano	$0.20	$1.25
Anthropic Claude Opus 4.8	$5.00	$25.00
Anthropic Claude Sonnet 4.6	$3.00	$15.00
Anthropic Claude Haiku 4.5	$1.00	$5.00
Anthropic Claude Fable 5	$10.00	$50.00
Google Gemini 3.5 Flash	$1.50	$9.00
Google Gemini 3.1 Pro (Preview, ≤200k)	$2.00	$12.00
Google Gemini 2.5 Pro	$1.25	$10.00
Google Gemini 2.5 Flash	$0.30	$2.50
Google Gemini 2.5 Flash-Lite	$0.10	$0.40

Sources, as of June 2026: OpenAI (https://developers.openai.com/api/docs/pricing), Anthropic (https://claude.com/pricing and https://platform.claude.com/docs/en/about-claude/pricing), Google Gemini (https://ai.google.dev/gemini-api/docs/pricing). Prices change frequently — confirm on the live pages before budgeting.

How is token cost calculated?

Two numbers drive every estimate: how many tokens you send (input) and how many the model returns (output). Each has its own price, quoted per 1,000,000 tokens.

The formula:

``` cost = (input_tokens / 1,000,000) * input_price_per_M + (output_tokens / 1,000,000) * output_price_per_M ```

To estimate token counts before you have an exact tokenizer count, use the rule of thumb that 1 token is approximately 4 characters or about 0.75 words in English (a rough estimate per OpenAI and Anthropic documentation). So a 500-word prompt is roughly 500 ÷ 0.75 ≈ 667 input tokens. This is an approximation; whitespace, punctuation, code, and non-English text shift the ratio.

Output dominates many bills because output prices are typically several times the input price, and long generations add up fast. If a task can return a short structured answer instead of prose, that alone cuts cost.

Worked example 1: a single Q&A call

Say you send a 750-word prompt and get back a 750-word answer. At ~0.75 words per token, that's about 1,000 input tokens and 1,000 output tokens — 0.001 M each.

On gpt-5.4 ($2.50 input / $15.00 output per 1M): input = 0.001 × $2.50 = $0.0025; output = 0.001 × $15.00 = $0.015; total ≈ $0.0175 per call.

On Claude Sonnet 4.6 ($3.00 / $15.00): input = 0.001 × $3.00 = $0.003; output = 0.001 × $15.00 = $0.015; total ≈ $0.018 per call.

On Gemini 2.5 Flash ($0.30 / $2.50): input = 0.001 × $0.30 = $0.0003; output = 0.001 × $2.50 = $0.0025; total ≈ $0.0028 per call.

Same workload, roughly 6x cheaper on Gemini 2.5 Flash than on the mid-tier OpenAI or Anthropic models — which is the whole point of matching model tier to task difficulty.

Worked example 2: scaling to 100,000 calls

Now run that same 1,000-in / 1,000-out call 100,000 times — say a batch classification or summarization job. Multiply the per-call totals above by 100,000:

gpt-5.4: $0.0175 × 100,000 ≈ $1,750. Claude Sonnet 4.6: $0.018 × 100,000 ≈ $1,800. Gemini 2.5 Flash: $0.0028 × 100,000 ≈ $280.

At this scale, discounts matter. If the job is not latency-sensitive, Anthropic's Batch API gives 50% off both input and output, halving the Sonnet figure to roughly $900. Prompt caching helps when a large, identical prefix (system prompt, instructions, reference doc) repeats across calls. See the methods below.

How do caching and batch discounts change the math?

Two mechanisms can sharply reduce cost when your workload fits them.

Prompt caching (Anthropic): when many calls share the same large prefix, you cache it once and pay a reduced rate on the cached portion of later calls. Per Anthropic's pricing, a cache read (a hit) costs 0.1x the base input price — that is, 10% of the normal input rate for the cached tokens. Writing to the cache costs more than base input (1.25x for a 5-minute cache, 2x for a 1-hour cache), so caching pays off when the same prefix is reused enough times to amortize that write. Example: Claude Opus 4.8 input is $5/M, and its cache read is $0.50/M — a 90% saving on the repeated portion.

Batch API (Anthropic): 50% off both input and output for asynchronous, non-time-sensitive jobs. This stacks cleanly onto large offline workloads like the 100,000-call example above.

Match the discount to the shape of the work: caching for a big repeated prefix with varying tail, batch for large jobs you can wait on. Confirm current rates and mechanics on the linked pricing pages, since these terms change.

How do I estimate without an exact tokenizer?

For planning, character or word counts get you close. Take your prompt's character count and divide by 4, or its word count and divide by 0.75, to estimate input tokens. Estimate output tokens from the length of answer you expect.

Then plug both into the formula and the price for your chosen model. Treat the result as an order-of-magnitude estimate, not a billing guarantee — the exact figure depends on the model's tokenizer and your actual output length. To draft tighter prompts (fewer tokens, clearer instructions) for whichever model you choose, our ChatGPT prompt generator and code prompt builder help keep inputs lean.

Digital Dashboard Hub

The prompt patterns above work 10x better when they live in a library you actually own — tunable to your niche, exportable to GPT-5, Claude, Gemini, Perplexity, Midjourney, Llama. Stop pasting across 6 tools.

Try DDH's AI Prompt Builder — free 14 days, no card. AICHAT30 = 30% off Pro. →

Related calculators

OpenAI Pricing Calculator →GPT-5.5, 5.4, mini, nano — full per-call cost in one input.Claude Pricing Calculator →Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5 — input + output combined.Context Window Comparison →Max input length and price per 1M for every current model.

Continue your research on adjacent topics — calculators, rate limits, head-to-head comparisons, and guides.

Related prompt tools

ChatGPT Prompt Generator→Code Prompt Builder→Blog Post Outline Generator→Product Description Generator→How to cut your OpenAI bill 50% (2026)→AI cost optimization checklist (2026)→How much does Claude cost? (2026)→

Frequently Asked Questions

How many tokens is 1,000 words?

Roughly 1,333 tokens, using the rule of thumb that 1 token is about 0.75 words (or about 4 characters) in English. This is an approximation; code, punctuation, and non-English text change the ratio.

Why is output more expensive than input?

Generating tokens is more computationally costly than reading them, so providers price output higher — often several times the input rate. On gpt-5.4, for example, output ($15/1M) is six times input ($2.50/1M), per OpenAI's pricing.

What's the cheapest way to run a large batch job?

For non-time-sensitive jobs, Anthropic's Batch API gives 50% off both input and output. Combine that with a low-cost model tier where quality allows. Confirm current terms at Anthropic's pricing.

How much does prompt caching save?

On Anthropic, a cache hit (read) costs 0.1x the base input price — 90% off the cached portion — though writing to the cache costs more than base input, so it pays off when a large prefix is reused enough times. See Anthropic's pricing detail.

Can I trust character-count estimates for budgeting?

As an order-of-magnitude estimate, yes — divide characters by 4 (or words by 0.75) for input tokens. For exact billing you need the model's tokenizer and your real output length, so treat the formula's result as a plan, not a guarantee.

Which current model is cheapest for simple tasks?

Among the tiers in the table, Gemini 2.5 Flash-Lite ($0.10 in / $0.40 out per 1M) and gpt-5.4-nano ($0.20 / $1.25) are the lowest-cost options as of June 2026. Match the cheapest tier that still meets your quality bar.

Do all providers charge separately for input and output?

Yes — OpenAI, Anthropic, and Google all quote distinct input and output prices per million tokens. The formula in this article applies to all three; only the per-token numbers differ. See each provider's linked pricing page for current figures.

Write leaner prompts

Fewer input tokens and tighter instructions mean lower bills. Start with our prompt generators.

Browse all prompt tools →