Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

AI Prompt Cost Calculator: Estimate Token Costs Across Models (2026)

How to estimate what a prompt costs in tokens — with the formula, worked examples on real June-2026 prices, and the discounts that change the math.

By The DDH Team at Digital Dashboard HubUpdated

To estimate the cost of an AI prompt, count the tokens in and out, then multiply each by the model's per-token price: cost = (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price). A useful rule of thumb is that 1 token is roughly 4 characters or about 0.75 words of English, so 1,000 words is around 1,333 tokens.

Prices are quoted per million tokens and differ for input and output, with output usually costing several times more. Below are the formula, worked examples on real current prices, a full cross-provider table, and the caching and batch discounts that can cut bills substantially. Prices change often — always confirm against the live pricing pages linked in the table.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

API prices per 1M tokens (input / output) — as of June 2026

Feature
Input ($/1M)
Output ($/1M)
OpenAI gpt-5.5$5.00$30.00
OpenAI gpt-5.5-pro$30.00$180.00
OpenAI gpt-5.4$2.50$15.00
OpenAI gpt-5.4-mini$0.75$4.50
OpenAI gpt-5.4-nano$0.20$1.25
Anthropic Claude Opus 4.8$5.00$25.00
Anthropic Claude Sonnet 4.6$3.00$15.00
Anthropic Claude Haiku 4.5$1.00$5.00
Anthropic Claude Fable 5$10.00$50.00
Google Gemini 3.5 Flash$1.50$9.00
Google Gemini 3.1 Pro (Preview, ≤200k)$2.00$12.00
Google Gemini 2.5 Pro$1.25$10.00
Google Gemini 2.5 Flash$0.30$2.50
Google Gemini 2.5 Flash-Lite$0.10$0.40

Sources, as of June 2026: OpenAI (https://developers.openai.com/api/docs/pricing), Anthropic (https://claude.com/pricing and https://platform.claude.com/docs/en/about-claude/pricing), Google Gemini (https://ai.google.dev/gemini-api/docs/pricing). Prices change frequently — confirm on the live pages before budgeting.

How is token cost calculated?

Two numbers drive every estimate: how many tokens you send (input) and how many the model returns (output). Each has its own price, quoted per 1,000,000 tokens.

The formula:

``` cost = (input_tokens / 1,000,000) * input_price_per_M + (output_tokens / 1,000,000) * output_price_per_M ```

To estimate token counts before you have an exact tokenizer count, use the rule of thumb that 1 token is approximately 4 characters or about 0.75 words in English (a rough estimate per OpenAI and Anthropic documentation). So a 500-word prompt is roughly 500 ÷ 0.75 ≈ 667 input tokens. This is an approximation; whitespace, punctuation, code, and non-English text shift the ratio.

Output dominates many bills because output prices are typically several times the input price, and long generations add up fast. If a task can return a short structured answer instead of prose, that alone cuts cost.


Worked example 1: a single Q&A call

Say you send a 750-word prompt and get back a 750-word answer. At ~0.75 words per token, that's about 1,000 input tokens and 1,000 output tokens — 0.001 M each.

On gpt-5.4 ($2.50 input / $15.00 output per 1M): input = 0.001 × $2.50 = $0.0025; output = 0.001 × $15.00 = $0.015; total ≈ $0.0175 per call.

On Claude Sonnet 4.6 ($3.00 / $15.00): input = 0.001 × $3.00 = $0.003; output = 0.001 × $15.00 = $0.015; total ≈ $0.018 per call.

On Gemini 2.5 Flash ($0.30 / $2.50): input = 0.001 × $0.30 = $0.0003; output = 0.001 × $2.50 = $0.0025; total ≈ $0.0028 per call.

Same workload, roughly 6x cheaper on Gemini 2.5 Flash than on the mid-tier OpenAI or Anthropic models — which is the whole point of matching model tier to task difficulty.


Worked example 2: scaling to 100,000 calls

Now run that same 1,000-in / 1,000-out call 100,000 times — say a batch classification or summarization job. Multiply the per-call totals above by 100,000:

gpt-5.4: $0.0175 × 100,000 ≈ $1,750. Claude Sonnet 4.6: $0.018 × 100,000 ≈ $1,800. Gemini 2.5 Flash: $0.0028 × 100,000 ≈ $280.

At this scale, discounts matter. If the job is not latency-sensitive, Anthropic's Batch API gives 50% off both input and output, halving the Sonnet figure to roughly $900. Prompt caching helps when a large, identical prefix (system prompt, instructions, reference doc) repeats across calls. See the methods below.


How do caching and batch discounts change the math?

Two mechanisms can sharply reduce cost when your workload fits them.

Prompt caching (Anthropic): when many calls share the same large prefix, you cache it once and pay a reduced rate on the cached portion of later calls. Per Anthropic's pricing, a cache read (a hit) costs 0.1x the base input price — that is, 10% of the normal input rate for the cached tokens. Writing to the cache costs more than base input (1.25x for a 5-minute cache, 2x for a 1-hour cache), so caching pays off when the same prefix is reused enough times to amortize that write. Example: Claude Opus 4.8 input is $5/M, and its cache read is $0.50/M — a 90% saving on the repeated portion.

Batch API (Anthropic): 50% off both input and output for asynchronous, non-time-sensitive jobs. This stacks cleanly onto large offline workloads like the 100,000-call example above.

Match the discount to the shape of the work: caching for a big repeated prefix with varying tail, batch for large jobs you can wait on. Confirm current rates and mechanics on the linked pricing pages, since these terms change.


How do I estimate without an exact tokenizer?

For planning, character or word counts get you close. Take your prompt's character count and divide by 4, or its word count and divide by 0.75, to estimate input tokens. Estimate output tokens from the length of answer you expect.

Then plug both into the formula and the price for your chosen model. Treat the result as an order-of-magnitude estimate, not a billing guarantee — the exact figure depends on the model's tokenizer and your actual output length. To draft tighter prompts (fewer tokens, clearer instructions) for whichever model you choose, our ChatGPT prompt generator and code prompt builder help keep inputs lean.

Frequently Asked Questions

How many tokens is 1,000 words?

Roughly 1,333 tokens, using the rule of thumb that 1 token is about 0.75 words (or about 4 characters) in English. This is an approximation; code, punctuation, and non-English text change the ratio.

Why is output more expensive than input?

Generating tokens is more computationally costly than reading them, so providers price output higher — often several times the input rate. On gpt-5.4, for example, output ($15/1M) is six times input ($2.50/1M), per OpenAI's pricing.

What's the cheapest way to run a large batch job?

For non-time-sensitive jobs, Anthropic's Batch API gives 50% off both input and output. Combine that with a low-cost model tier where quality allows. Confirm current terms at Anthropic's pricing.

How much does prompt caching save?

On Anthropic, a cache hit (read) costs 0.1x the base input price — 90% off the cached portion — though writing to the cache costs more than base input, so it pays off when a large prefix is reused enough times. See Anthropic's pricing detail.

Can I trust character-count estimates for budgeting?

As an order-of-magnitude estimate, yes — divide characters by 4 (or words by 0.75) for input tokens. For exact billing you need the model's tokenizer and your real output length, so treat the formula's result as a plan, not a guarantee.

Which current model is cheapest for simple tasks?

Among the tiers in the table, Gemini 2.5 Flash-Lite ($0.10 in / $0.40 out per 1M) and gpt-5.4-nano ($0.20 / $1.25) are the lowest-cost options as of June 2026. Match the cheapest tier that still meets your quality bar.

Do all providers charge separately for input and output?

Yes — OpenAI, Anthropic, and Google all quote distinct input and output prices per million tokens. The formula in this article applies to all three; only the per-token numbers differ. See each provider's linked pricing page for current figures.

Write leaner prompts

Fewer input tokens and tighter instructions mean lower bills. Start with our prompt generators.

Browse all prompt tools →