Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Claude API Cost Calculator (2026)

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

Anthropic bills Claude per token, quoted per 1,000,000 tokens. Each call has two priced streams: input (your prompt + system message + prior turns + tools) and output (everything the model writes back). The June 2026 lineup spans a 50x range from Haiku 4.5 ($1 input / $5 output per 1M) up to Fable 5 ($10 / $50). Opus 4.8 sits at $5 / $25 and Sonnet 4.6 at $3 / $15 — the workhorses for production traffic.

Claude's distinguishing pricing feature is prompt caching with two TTLs: 5-minute cache writes (priced 1.25x base input) and 1-hour cache writes (2x base input). Cache reads are always 10% of base input — a 90% discount on the cached portion. The 1-hour TTL is the high-EV lever for production: pay 2x once on the prefix write, then read at 10% across every subsequent call within the hour. The Batch API also takes 50% off both input and output for asynchronous jobs.

Below: the full June-2026 price table verified against Anthropic's live pricing page, the canonical cost formula with cache-write math, four worked examples (single call, 100k calls, 1M calls, a 5-turn agent loop), the model-selection decision tree, and a sourced FAQ. Quickly draft Claude-tuned prompts (XML tags, cache-anchored) with our free ChatGPT prompt generator. Sibling calculators: OpenAI API cost · Embeddings cost · Migration tutorial.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Claude API price per 1M tokens — June 2026

Feature
Input ($/1M)
5-min cache write ($/1M)
1-hour cache write ($/1M)
Cache read ($/1M)
Output ($/1M)
Claude Fable 5$10.00$12.50$20.00$1.00$50.00
Claude Opus 4.8$5.00$6.25$10.00$0.50$25.00
Claude Opus 4.7$5.00$6.25$10.00$0.50$25.00
Claude Sonnet 4.6$3.00$3.75$6.00$0.30$15.00
Claude Haiku 4.5$1.00$1.25$2.00$0.10$5.00

Source, as of June 2026: Anthropic API pricing (https://docs.anthropic.com/en/docs/about-claude/pricing) and the Anthropic console pricing page (https://www.anthropic.com/pricing). Batch API: 50% off both input and output for asynchronous jobs (https://docs.anthropic.com/en/docs/build-with-claude/batch-processing). Web search tool: $10 per 1,000 searches when enabled. Opus 4.7+ use a new tokenizer that produces roughly 35% more tokens for the same text — factor into cross-model comparisons.

The cost formula with cache write/read math

Claude pricing has three input rates instead of two: standard input, cache-write input (premium), cache-read input (90% discount). For a single call with no cache, the formula matches OpenAI's:

``` cost = (input_tokens / 1,000,000) × input_price_per_M + (output_tokens / 1,000,000) × output_price_per_M ```

When caching is enabled, the prefix you mark as cacheable bills at the cache-write rate on the first call (1.25x for 5-min TTL or 2x for 1-hour TTL), then bills at the cache-read rate (10% of base) on every subsequent call until the TTL expires. The amortized cost across N calls in the same TTL window:

``` amortized_cost = (cache_write_cost + (N-1) × cache_read_cost + N × non_cached_input_cost + N × output_cost) ```

Break-even on the 1-hour cache write (2x premium) happens after 2 cache hits. After that, every additional hit is pure savings. For a stable 2,000-token system prompt + tools on Sonnet 4.6 read across 100 calls in an hour: cache write = 2000 × $6/1M = $0.012 once, cache reads = 99 × 2000 × $0.30/1M = $0.0594 — vs reading the same prefix 100x at standard input = 100 × 2000 × $3/1M = $0.60. That is a **88% saving on the prefix portion**.

The Batch API stacks on top of everything else: 50% off both input and output for asynchronous jobs.


Worked example 1: a single 1,000-in / 500-out call

Same canonical call as our OpenAI calculator: a 1,000-token prompt that returns a 500-token answer, with no caching. At standard rates:

Claude Fable 5: (1000/1,000,000) × $10 + (500/1,000,000) × $50 = $0.010 + $0.025 = **$0.035 per call**.

Claude Opus 4.8: 0.001 × $5 + 0.0005 × $25 = $0.005 + $0.0125 = **$0.0175 per call**.

Claude Sonnet 4.6: 0.001 × $3 + 0.0005 × $15 = $0.003 + $0.0075 = **$0.0105 per call**.

Claude Haiku 4.5: 0.001 × $1 + 0.0005 × $5 = $0.001 + $0.0025 = **$0.0035 per call**.

Note Sonnet 4.6 ($0.0105) lands almost identical to OpenAI's gpt-5.4 ($0.010) on the same call. The choice between them is rarely about price at this volume; it is about quality on the specific task. For high-volume traffic, the 10x spread between Fable 5 and Haiku 4.5 is the real lever.


Worked example 2: 100,000 calls with prompt caching on Sonnet 4.6

Same per-call shape — 1,000 in / 500 out — at 100,000 calls/month with a 700-token cacheable system prompt that hits cache 90% of the time on the 1-hour TTL:

Base path (no cache): 100,000 × $0.0105 = **$1,050/month** on Sonnet 4.6.

Cached path: cache writes ≈ 10,000 × (700/1M × $6) = $42. Cache reads ≈ 90,000 × (700/1M × $0.30) = $18.90. Non-cached input (the other 300 tokens × 100,000) = 30,000,000 / 1M × $3 = $90. Output = 100,000 × (500/1M × $15) = $750. Total = **$900.90/month** — a 14% saving on this workload.

Cache wins compound at higher prefix-share. If 1,800 of every 2,000 input tokens are cacheable and hit 90% of the time, the same 100k workload drops from $1,050 to roughly $810 — a 23% saving. Restructure prompts so as much of the prefix as possible is stable, and the cache does the rest.


Worked example 3: scaling to 1,000,000 calls on Haiku 4.5

Production high-volume workloads (classification, summarization, intent detection) routinely live on Haiku 4.5. At 1M calls × 1,000-in / 500-out:

Base path: 1,000,000 × $0.0035 = **$3,500/month**.

With Batch API on the 60% of workload that's asynchronous: 0.6 × $3,500 × 0.5 + 0.4 × $3,500 = $1,050 + $1,400 = **$2,450/month** (30% saving).

Layer in prompt caching on the system prefix (assume 800 of 1,000 input tokens cache 80% of the time): saves another ~$280. Total: **~$2,170/month** for 1M Haiku calls — about $0.00217 per call. This is the price floor for serious production Claude traffic.

Compare gpt-5.4-mini at $3,000/month standard for the same workload — Claude Haiku 4.5 is cheaper at scale once you batch + cache.


Worked example 4: a 5-turn agent loop on Opus 4.8

Agent loops on Claude follow the same shape as OpenAI: the model replays the full transcript each turn. Take a 5-turn loop with a 2,500-token system prompt + tools, growing context 600 tokens per turn:

Turn 1: 3,100 in / 250 out. Turn 2: 3,250 in / 250 out. Turn 3: 3,400 in / 250 out. Turn 4: 3,550 in / 250 out. Turn 5: 3,700 in / 250 out. Total: 17,000 input + 1,250 output. On Opus 4.8: 0.017 × $5 + 0.00125 × $25 = $0.085 + $0.03125 = **$0.117 per query** uncached.

Now apply 1-hour cache on the 2,500-token system + tools prefix. Cache write turn 1: 2500 × $10/1M = $0.025. Cache reads turns 2-5: 4 × 2500 × $0.50/1M = $0.005. Non-cached input (the growing transcript portion) ≈ 5,000 × $5/1M = $0.025. Output: $0.03125. Total: **$0.086 per query** — a 26% saving and the cache lasts an hour, so cross-session reuse adds more.

If your agent serves 10k queries/hour, the cache writes amortize across thousands of reads — the per-query cost lands closer to $0.060.


When to pick Fable 5 vs Opus 4.8 vs Sonnet 4.6 vs Haiku 4.5

**Claude Fable 5** ($10 / $50): the new-frontier model. Best for high-complexity reasoning where Opus 4.8 hits its quality ceiling — multi-step agentic planning, dense scientific reasoning, large-context literature synthesis. 2x the price of Opus 4.8; reach for it only when the marginal quality lift earns the premium.

**Claude Opus 4.8** ($5 / $25): high-stakes reasoning and creative writing. Strong on long-form coherence, legal-grade drafting, complex code synthesis. Worth the premium over Sonnet 4.6 when correctness costs more than throughput.

**Claude Sonnet 4.6** ($3 / $15): the production default. Best general-purpose model for chat, agentic workflows, content generation. Sweet spot of quality + price for most teams — and the natural pair to gpt-5.5 for cross-model A/B tests.

**Claude Haiku 4.5** ($1 / $5): high-volume tasks where speed and price beat reasoning depth — classification, extraction, summarization, routing, simple Q&A. Cheaper than gpt-5.4-mini at scale once you cache. For the full cross-provider comparison see our GPT vs Claude vs Gemini calculator.


Prompt caching on Claude: the 5-min vs 1-hour decision

Anthropic's two cache TTLs cover different traffic shapes. The 5-minute cache write costs 1.25x base input — break-even after roughly 0.25 cache hits, so almost always positive EV for any prefix re-read within minutes. The 1-hour cache write costs 2x base input — break-even after 2 hits, which is trivial for any production agent serving multiple users per hour.

Rule of thumb: use 1-hour for system prompts + tool definitions + few-shot examples (anything stable across a session or across users). Use 5-minute for per-conversation context that does not survive long. Mark cache breakpoints explicitly in the messages array with `cache_control: { type: 'ephemeral' }`.

The single biggest mistake we see: caching the wrong layer. Caching only the system prompt and leaving 3,000-token tool definitions uncached misses the largest cache win. Caching everything stable up to and including the tools array is the right default. See Anthropic's prompt caching docs for breakpoint placement.


Batch API on Claude: 50% off, same 24-hour ceiling

Anthropic's Batch API mirrors OpenAI's: 50% off both input and output for asynchronous jobs completing within 24 hours. Submit a JSONL file of message-creation requests; poll or webhook for completion.

Workloads that fit: nightly classification, evaluation runs, training-set generation, weekly digests, embedding-equivalent dense-retrieval precompute, automated content moderation across yesterday's data, exception reporting. If the consumer of the output is asynchronous, batch it.

Batch + cache stack the way OpenAI's do — multiplicatively. A batched Sonnet 4.6 call that hits a 1-hour cache for 80% of input lands at roughly $0.0036 per 1,000-in / 500-out call, vs $0.0105 standard. That is a 66% blended saving across the workload.


Claude API vs Claude.ai consumer pricing: don't confuse them

Anthropic runs two parallel billing relationships. The **API** (priced per-token in the table above, accessed via console.anthropic.com / docs.anthropic.com) is for developers building on Claude. The **Claude.ai consumer** subscription (Claude Free, Claude Pro at $20/month, Claude Max at $40/month) is for end-users chatting with Claude in a UI. They share infrastructure but billing is separate.

What this means for builders: a $20/month Claude Pro subscription does **not** include API credit. If you're building an application on Claude, you set up API billing independently at console.anthropic.com — same way OpenAI's consumer plans don't include API credit.

Claude Max ($40/month, launched 2025) is the consumer power-user tier with higher message caps on Opus and Sonnet, longer context windows in the UI, priority access during high-demand periods, and (as of mid-2026) included Computer Use beta access. It's the rough Claude equivalent of ChatGPT Pro's positioning but at a lower price point. Heavy users who chat with Claude daily across multi-hour sessions are the target audience.

For teams: Anthropic offers Claude Team at $25/seat/month annually (similar to ChatGPT Team) with shared workspace, admin controls, and SSO/SCIM at higher tiers. Distinct from the API — Team is a Claude.ai subscription, the API is the per-token developer product.


Web search tool: $10 per 1,000 searches

Claude's web search tool — enabled via the `web_search` tool definition in a messages call — bills at $10 per 1,000 searches in addition to standard token charges. Each search invocation returns top results that count toward your input token bill on the next turn.

For research-heavy assistants, the search add-on is a clean per-call surcharge: budget $0.01 per search, plus the input cost of the results (typically 500-2,000 tokens each, depending on how many sources Claude pulls). At Sonnet 4.6 input rates, a 1,500-token search result costs $0.0045 extra on top of the $0.01 search fee — call it $0.015 all-in per searched turn.

Use search when the answer requires post-training data (current events, live pricing, recent papers). Disable it on workloads that can run from model weights alone — every search adds $0.01-$0.02 to the bill with no offsetting input savings.


Sourcing methodology and how to keep these numbers current

Every Claude price in this guide comes from Anthropic's live pricing page at docs.anthropic.com/en/docs/about-claude/pricing and the Anthropic console pricing surface at anthropic.com/pricing, fetched on 2026-06-20. Numbers were verified against three independent corroborating sources (community pricing aggregators, integration commits in the anthropic-sdk-python and anthropic-sdk-typescript repos, and the public Anthropic cookbook).

Anthropic publishes a cleaner pricing changelog than most providers — material price changes typically appear in their docs release notes within 48 hours. The current prices have been stable through 2026: Sonnet 4.6 at $3/$15, Haiku 4.5 at $1/$5, Opus 4.8 at $5/$25, Fable 5 at $10/$50. The 1-hour cache write tier is the newest addition (rolled out in late 2025) and remains the highest-EV cost lever for production traffic.

**How to verify before you budget**: open docs.anthropic.com/en/docs/about-claude/pricing in any browser (no auth required), copy your target model's full row (input / cache write 5-min / cache write 1-hour / cache read / output) into a spreadsheet. Compare against the table above. The Opus 4.7 → Opus 4.8 transition in early 2026 kept pricing unchanged, but the tokenizer shift means token counts moved ~35% higher for the same English text. Re-budget if you ported from Claude 3 era prompts.

**Reproducible methodology**: the GEO Playbook driving this guide (2026-06-19) requires every $ value to be sourced from the live provider page. Every row in the table above has a citation; every worked example references those rows; the FAQs reflect them. If you find a discrepancy with the live page, the live page is canonical.

How to estimate any Claude API call cost in 5 steps

  1. 1

    Estimate your input tokens

    Rule of thumb: 1 token ≈ 4 characters ≈ 0.75 English words. Note Opus 4.7+ use a new tokenizer that produces ~35% more tokens than older Claude models for the same text — factor this in when porting prompts from Claude 3.x.

    → Open the Claude-aware prompt generator
  2. 2

    Estimate your output tokens

    Words ÷ 0.75. Output is 5x input on every Claude model, so output volume drives most of the bill. Cap output with `max_tokens` anywhere you control consumption shape.

  3. 3

    Look up the input + cache + output prices

    From the table above: Sonnet 4.6 $3 / $15, Opus 4.8 $5 / $25, Haiku 4.5 $1 / $5, Fable 5 $10 / $50 per 1M. Cache reads bill at 10% of input. Cache writes bill 1.25x (5-min) or 2x (1-hour).

  4. 4

    Decide which prefix to cache

    Mark stable prefix layers with `cache_control: { type: 'ephemeral' }`: system prompt, tool definitions, few-shot examples. Anything that does not change between calls within minutes (5-min cache) or within an hour (1-hour cache) belongs behind a cache breakpoint.

  5. 5

    Apply Batch API for async workloads

    If the output is consumed asynchronously, batch it. 50% off both input + output, stacks with caching. Submit JSONL; poll for results within 24 hours. Live docs: docs.anthropic.com/en/docs/build-with-claude/batch-processing.

Frequently Asked Questions

How much does Claude Opus 4.8 cost per 1 million tokens in 2026?

As of June 2026, Claude Opus 4.8 costs $5.00 per 1M input tokens and $25.00 per 1M output tokens via the Anthropic API. The 1-hour cache write rate is $10/1M; cache reads bill at $0.50/1M (10% of input). Source: Anthropic's live API pricing page.

How much does Claude Sonnet 4.6 cost per call?

A 1,000-in / 500-out call on Sonnet 4.6 costs (1000 / 1,000,000) × $3 + (500 / 1,000,000) × $15 = $0.003 + $0.0075 = $0.0105 per call. The same call on Opus 4.8 is $0.0175, on Haiku 4.5 is $0.0035, on Fable 5 is $0.035.

How does Claude prompt caching pricing work?

Anthropic offers two cache TTLs. The 5-minute cache write bills at 1.25x base input; the 1-hour cache write bills at 2x base input. Cache reads always bill at 10% of base input — a 90% discount. Break-even on the 1-hour cache write is 2 hits; everything after is pure savings. On Sonnet 4.6, a 2,000-token cached prefix read 100 times in an hour saves ~88% on the prefix portion of those calls.

Is Claude cheaper than OpenAI GPT-5?

On a like-for-like 1,000-in / 500-out call: Sonnet 4.6 is $0.0105 vs gpt-5.4 at $0.010 — essentially identical. Haiku 4.5 is $0.0035 vs gpt-5.4-mini at $0.003 — also a near-wash. Claude wins decisively at scale when you use prompt caching effectively (Anthropic's 1-hour TTL with explicit breakpoints often beats OpenAI's opportunistic prefix cache). For premium tier, Opus 4.8 ($5/$25) is materially cheaper than gpt-5.5-pro ($30/$180).

What is the Claude Batch API discount?

The Anthropic Batch API takes 50% off both input and output token prices for asynchronous jobs completing within 24 hours. Submit a JSONL file of message-creation requests via the batches endpoint; poll or webhook for completion. Stacks with prompt caching.

How much does Claude's web search tool cost?

$10 per 1,000 searches, in addition to standard input/output token charges. Search results that Claude pulls back count toward your input token bill on the next turn — typically 500-2,000 tokens per result. Budget ~$0.015 all-in per searched turn on Sonnet 4.6.

Do Opus 4.7+ tokens cost more because of the new tokenizer?

Same per-token rate, but Opus 4.7 and Opus 4.8 use a new tokenizer that produces ~35% more tokens for the same English text vs Claude 3.x models. If you are budgeting from an old Claude 3 baseline, multiply token counts by 1.35 before applying the new rates. New code starting on Opus 4.7+ does not need to adjust — token estimates from tiktoken-equivalent libraries already use the new tokenizer.

Can I cache Claude tool definitions?

Yes — and you should. Tool definitions are often the largest portion of the input on agent workloads. Place the `cache_control` breakpoint after the tools array, not just after the system prompt. The single most common caching mistake we see on Claude is caching only the system prompt and leaving multi-thousand-token tool definitions uncached on every call.

Run Claude prompts that actually cache.

Our AI Prompt Generator writes Opus/Sonnet/Haiku/Fable prompts with the cache-anchor up top and XML tags Claude prefers — based on YOUR business + task. 14-day free trial, no card.

Browse all prompt tools →