Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Anthropic Claude Pricing 2026: Opus, Sonnet, Haiku, Fable Cost Breakdown

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

Anthropic charges per token across four Claude tiers in 2026: Opus 4.8 at $5.00 input / $25.00 output per 1M tokens, Sonnet 4.6 at $3.00 / $15.00, Haiku 4.5 at $1.00 / $5.00, and the new Fable 5 reasoning model at $10.00 / $50.00. Output is priced 5x input across every tier, mirroring the rest of the industry.

Two cost levers are unique to Claude and worth knowing cold. Prompt caching reads cached prefixes at 0.1x the base input rate (a 90% saving on the cached portion), and the Batch API knocks 50% off both input and output for jobs that can wait. Below is the full table, the caching formulas, and worked $ math for 1k, 100k, and 1M-call workloads. Confirm rates on Anthropic's pricing page before budgeting. To draft prompts that survive a cheaper tier, try our ChatGPT prompt generator, or grab the free 2026 LLM pricing cheat sheet PDF.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Claude API price per 1M tokens — June 2026

Feature
Input ($/1M)
Cache write 5m ($/1M)
Cache write 1h ($/1M)
Cache read ($/1M)
Output ($/1M)
Claude Fable 5$10.00$12.50$20.00$1.00$50.00
Claude Opus 4.8$5.00$6.25$10.00$0.50$25.00
Claude Sonnet 4.6$3.00$3.75$6.00$0.30$15.00
Claude Haiku 4.5$1.00$1.25$2.00$0.10$5.00

Sources, as of June 2026: Anthropic pricing (https://claude.com/pricing) and Anthropic API pricing detail (https://platform.claude.com/docs/en/about-claude/pricing). Cache write costs 1.25x base input for a 5-minute TTL and 2x base input for a 1-hour TTL; cache read (a hit) costs 0.1x base input. Batch API applies an additional 50% discount on top of any rate above. Prices change frequently — confirm on the live pricing page.

The Claude pricing model in 90 seconds

Three lines on every Claude invoice: base input, base output, and prompt-cache activity (split into cache writes and cache reads). Batch API requests apply a flat 50% discount on top of whichever line they hit.

Base formula:

``` cost = (input_tokens / 1,000,000) * input_price_per_M + (output_tokens / 1,000,000) * output_price_per_M ```

With caching, the input line splits. Some of your input tokens are cache reads (hits), billed at 0.1x base input. Some are cache writes — the first request to populate a new prefix — billed at 1.25x base input for the default 5-minute TTL or 2x for the 1-hour TTL. The rest bill at base input.

Caching pays off when the cached portion is large enough and repeats enough times to amortize the write cost. A useful rule: if you expect 10+ reads of the same prefix within the cache window, caching is almost certainly net-positive. Below 3 reads, it usually is not. See Anthropic's caching docs for the exact eligibility rules.


Worked example 1: a 1,000 in / 500 out call at every tier

Take the standard reference call — 1,000 input tokens, 500 output tokens — and compute the per-call cost at standard rates on each Claude model:

Claude Fable 5: (0.001 × $10) + (0.0005 × $50) = $0.010 + $0.025 = $0.035 per call. Claude Opus 4.8: (0.001 × $5) + (0.0005 × $25) = $0.005 + $0.0125 = $0.0175 per call. Claude Sonnet 4.6: $0.003 + $0.0075 = $0.0105 per call. Claude Haiku 4.5: $0.001 + $0.0025 = $0.0035 per call.

Haiku 4.5 is 10x cheaper than Fable 5 on the same call and roughly 5x cheaper than Opus 4.8. For most high-volume structured-output tasks — classification, extraction, summarization, simple Q&A — Haiku 4.5 is the right starting point. Move up to Sonnet 4.6 when accuracy starts limiting quality, not before.

If you want to draft prompts tight enough that Haiku matches Sonnet quality, our code prompt builder and meta-description generator help compress instructions without losing signal.


Worked example 2: scaling to 100,000 and 1,000,000 calls

Multiply the per-call numbers by 100,000 (typical batch job) and 1,000,000 (full production workload):

100k calls — Fable 5: $3,500. Opus 4.8: $1,750. Sonnet 4.6: $1,050. Haiku 4.5: $350.

1M calls — Fable 5: $35,000. Opus 4.8: $17,500. Sonnet 4.6: $10,500. Haiku 4.5: $3,500.

Apply the Batch API discount (-50%) to the Sonnet 4.6 row at 1M calls: $10,500 becomes $5,250. Apply prompt caching where 800 of every 1,000 input tokens are a stable system prefix that hits cache 90% of the time and you write it once per million calls. Of the 1B input tokens, 720M are cache reads at $0.30/1M = $216, 80M are cache writes at $3.75/1M = $300, and 200M are uncached base input at $3/1M = $600. Total input drops from $3,000 to $1,116 — a 63% saving on input alone, or about 18% off the full $10,500 bill. Stack with Batch and the same workload runs roughly $4,200.

Hit both discounts when you can. The math compounds quickly on workloads with stable system prompts.


When to pick Opus, Sonnet, Haiku, or Fable

Claude Opus 4.8 ($5/$25) is built for hard problems — multi-step reasoning over long context, complex code synthesis, agent loops that need to plan more than one step ahead. The 5x premium over Sonnet 4.6 is worth it when a single wrong answer costs more than the price difference across the whole workload. Most teams use Opus selectively, not as a default.

Claude Sonnet 4.6 ($3/$15) is the workhorse for production chat, content generation, long-form writing, and most agent loops. Sonnet matches or beats late-2024 Opus quality at a third the cost, which is why many teams that defaulted to Opus in 2024 moved their bulk traffic to Sonnet by 2026.

Claude Haiku 4.5 ($1/$5) handles structured-output tasks that do not require deep reasoning — classification, extraction, sentiment analysis, simple Q&A. At $0.0035 per 1,000/500 call, it is the highest-volume tier in most production deployments. Use it as the first attempt; promote to Sonnet only when accuracy demands it.

Claude Fable 5 ($10/$50) is the new reasoning-heavy model introduced in early 2026. It hides chain-of-thought tokens behind the output rate the way OpenAI's o-series does, so expect 3-5x the visible output token bill on hard problems. Use only when the task actively benefits from extended reasoning — agentic planning, math-heavy verification, complex code refactors. For straight generation, Sonnet 4.6 is cheaper and good enough.


Prompt caching: the lever most teams underuse

Anthropic's prompt cache lets you mark portions of a request as cacheable; subsequent requests within the cache TTL that share the same prefix get those tokens billed at 0.1x base input. The price of a cache write is 1.25x base input (5-minute TTL) or 2x base input (1-hour TTL).

Two prompt shapes win the most from caching. First, a long fixed system message — instructions, style guide, examples, taxonomy — repeated across thousands of user turns. Second, a stable reference document — a contract, a product spec, a knowledge base chunk — that you query repeatedly. Move the stable text to the front of the prompt, mark it as cache-eligible, and the cache will do the rest.

Break-even math: on Sonnet 4.6, a 10,000-token system prompt costs $0.03 to read uncached, $0.0375 to write to a 5-minute cache, and $0.003 to read from cache. If that prefix is reused 3 times within 5 minutes, you save (3 × $0.03) - ($0.0375 + 3 × $0.003) = $0.0375 — already net positive after 3 reads. At 100 reads per cache lifetime, you save $2.96 per write cycle.

Caching does not help if your prefix is unique each call, if the variable portion sits at the front of the prompt, or if you call the same prefix less than 2-3 times per cache window. Audit your prompt shapes before turning it on. See Anthropic's prompt caching documentation for the exact placement rules.


Batch API: 50% off, 24-hour delivery

The Anthropic Batch API accepts a JSONL file of requests and returns results within 24 hours at half the standard input and output rates. The discount applies on top of any caching activity, so the two stack cleanly.

Canonical fits: nightly summarization of yesterday's tickets, weekly classification of inbound leads, monthly enrichment of CRM contacts, one-off enrichment passes over historical data, periodic content audits, large eval runs against the model lineup. Anything that does not have to return within seconds is a candidate.

Worked math: a 1M-call Sonnet 4.6 summarization job at the standard rate costs $10,500. Submitted through Batch, the same job costs $5,250 — a $5,250 cost reduction for accepting a 24-hour SLA. If the work is already running on a nightly cron, the discount is free money.

Anti-fits: live chat, voice agents, anything in a checkout funnel, anything where a human is waiting on the response in real time. The 24-hour window kills the user experience there. Confirm current Batch terms against Anthropic's batch documentation.


How Claude pricing compares to OpenAI and Gemini

Sonnet 4.6 ($3/$15) sits below gpt-5.5 ($5/$30) on both input and output, making it the cheaper choice for general chat workloads of equivalent quality. Opus 4.8 ($5/$25) lines up with gpt-5.5 on input but is cheaper on output, which matters because output dominates most bills.

Haiku 4.5 ($1/$5) is more expensive than gpt-5.4-mini ($0.75/$4.50) and substantially more expensive than Gemini 2.5 Flash ($0.30/$2.50). For high-volume cheap-tier workloads, Gemini 2.5 Flash is the price leader; Haiku 4.5 wins on quality-per-dollar in many real evals. The right choice depends on which dimension matters more to your workload — run a side-by-side eval before committing.

Fable 5 ($10/$50) overlaps the OpenAI o4-reasoning tier ($15/$60) on the high end of the reasoning market — modestly cheaper, with longer effective context and stronger long-document recall in published evals. See our full side-by-side at the GPT vs Claude vs Gemini cost calculator and on individual provider pages for OpenAI and the upcoming Gemini pricing page.


Tool use, vision, and the things people forget to budget

Tool calls bill as output tokens — the function name, the arguments, and the tool result you replay back in the next turn. An agent loop with 6 tool calls before the final answer can bill 8-10x the output of a single direct-answer turn. If your agent runs 1,000 loops per day on Sonnet 4.6 with 6 tool calls each averaging 200 tokens, that is 1.2M extra output tokens per day, or about $18 per day on top of base traffic.

Vision inputs bill at the standard input rate, with images converted to tokens by resolution. A 1024×1024 image bills as roughly 1,600 input tokens on Claude — about $0.005 on Sonnet 4.6, $0.008 on Opus 4.8. PDFs are billed per page as both text and visual tokens, so a 10-page contract can run 8,000-15,000 input tokens depending on density.

Extended context (above 200k tokens) carries a small per-token surcharge on some tiers; check the live pricing page before designing a million-token workflow. For agent loop economics in detail, see our AI agent cost calculator.


Claude on AWS Bedrock vs Google Vertex AI vs the direct Anthropic API

Claude runs on three first-party surfaces in 2026: Anthropic's direct API at claude.com, AWS Bedrock, and Google Cloud Vertex AI. The per-token list rates are essentially identical across all three — Sonnet 4.6 is $3 input / $15 output on each platform, Opus 4.8 is $5 / $25, Haiku 4.5 is $1 / $5, Fable 5 is $10 / $50. Where they diverge is everything around the meter: which credits you can spend, how fast new models arrive, which regions serve traffic, how authentication works, and which discount levers actually function.

Billing is the most consequential difference for most finance teams. Bedrock usage flows through your AWS invoice — eligible for AWS Activate startup credits (up to $100k), Enterprise Discount Program (EDP) commitments, and the AWS Marketplace private offer mechanism. Vertex AI usage flows through your GCP invoice — eligible for the Google for Startups Cloud Program ($200k-$350k tiers), Committed Use Discounts (CUDs), and BigQuery-adjacent credits. The direct Anthropic API bills through Anthropic directly — eligible for the Anthropic Startup Program (up to $100k in Claude credits via Y Combinator, Techstars, and similar partner programs) but not portable to AWS or GCP invoices. A startup sitting on $80k of unused AWS credits that expire in 6 months has a clear answer: route Claude through Bedrock and burn the credits before they vaporize.

Worked example. Take a Series A startup spending $25,000/month on Claude Sonnet 4.6 for a production agent workload — about 1.4B input tokens and 600M output tokens monthly at standard rates. On the direct API, that is $25,000 of cash out the door. On Bedrock with $80,000 of AWS Activate credits, the same $25,000 invoice draws down credits at 100% face value — net cash cost $0 until the credits run out at month 3.2, an effective ~30% saving over a 12-month horizon if the remaining 8.8 months bill at list. On Vertex with a similar GCP credit balance, the math is identical. The lesson: route Claude to wherever your dormant cloud credits live. Run `aws ce get-cost-and-usage` or the GCP billing console to see what is actually expiring.

Model availability lags vary. New Claude models almost always land on the direct API first. Bedrock typically follows 2-6 weeks later, sometimes longer for the largest tiers — Opus 4.8 hit the direct API in February 2026 and only landed in Bedrock us-east-1 in late March. Vertex AI tracks Bedrock's cadence within a week or two on either side. If your product roadmap depends on day-zero access to a new Claude release, the direct API is the only safe bet; Bedrock and Vertex are appropriate for production workloads that can absorb a one-month delay on the latest model. Regional availability also differs — Bedrock now serves Claude from us-east-1, us-west-2, eu-central-1, eu-west-3, ap-northeast-1, and ap-southeast-2; Vertex covers us-central1, us-east5, europe-west4, and asia-northeast1; the direct API serves globally from Anthropic's own edge with no region selection.

Prompt caching and Batch API support are not at parity. The direct Anthropic API has the most mature caching implementation — both 5-minute and 1-hour TTLs, full support across all four tiers, and the cleanest pricing semantics (1.25x write, 0.1x read). Bedrock supports prompt caching as of Q1 2026 but with restrictions: 5-minute TTL only on most regions, no 1-hour TTL on Haiku 4.5 until Q3 2026, and a minimum cacheable prefix size of 1,024 tokens versus 512 on the direct API. Vertex AI supports caching with similar caveats. The Batch API exists on all three, but only the direct API offers the full 50% discount on every tier — Bedrock applies the discount through its own Bedrock Batch Inference jobs (similar mechanics, occasionally smaller discount on Fable 5), and Vertex uses its Batch Prediction surface. If your workload depends heavily on caching a 600-token system prompt or stacking caching + batch for compounded discounts, the direct API still wins on raw economics by 8-15%.

Access control is the last axis. Bedrock plugs into AWS IAM — you can scope a service account to a specific model ARN, attach SCPs at the AWS Organization level, and audit every invoke through CloudTrail. Vertex plugs into GCP IAM equivalently with Cloud Audit Logs. The direct Anthropic API uses workspace-scoped API keys with per-key spend limits and usage dashboards, but lacks the policy-engine depth that enterprise security teams expect — no SCP-equivalent, no ABAC, no native SSO-bound key rotation on the standard tier. For regulated workloads (HIPAA on AWS, FedRAMP-adjacent on GCP, SOC 2 audit trails) the cloud-provider surfaces typically win on compliance posture even when they lose on raw price. The pragmatic pattern that has emerged at most scaled teams: production traffic runs through Bedrock or Vertex for billing and compliance reasons, while development, evaluation, and prompt iteration run through the direct API for speed and feature freshness.


Five moves to cut your Claude bill this week

Drop one tier. If you are on Opus 4.8, run an eval against Sonnet 4.6 on 100 representative samples. Many teams discover Sonnet matches quality on 80%+ of their workload at a third the cost.

Cache your system prompt. Move all stable instructions to the front of every request and mark them as cache-eligible. For repeated workloads, this alone saves 60-80% on input billing.

Batch the offline work. Anything running on a cron, anything enriching a static dataset, anything not user-facing — push it through the Batch API for 50% off.

Cap output. Set max_tokens hard, ask for structured JSON instead of prose, and use stop sequences. A 200-token JSON response replaces a 1,000-token paragraph on most extraction tasks — a 5x output reduction.

Audit your most expensive route. Most teams have one route consuming 50-70% of total spend; the audit usually surfaces an obvious model-tier downgrade or a prompt restructure that drops the bill 30-50%.

Frequently Asked Questions

Which Claude model is cheapest in 2026?

Claude Haiku 4.5 at $1 input / $5 output per 1M tokens is the cheapest tier in the lineup. It is roughly 5x cheaper than Opus 4.8 and 3x cheaper than Sonnet 4.6 on output. Confirm against Anthropic's pricing page.

How much does prompt caching save on Claude?

Cache reads (hits) bill at 0.1x base input — a 90% saving on the cached portion. Cache writes cost 1.25x base input for a 5-minute TTL or 2x for a 1-hour TTL, so caching is net-positive when a prefix is reused at least 2-3 times within the cache window.

Does the Batch API stack with prompt caching?

Yes — the 50% Batch discount applies on top of cache read and cache write rates. A Sonnet 4.6 cache read through Batch costs $0.15/1M instead of $0.30/1M. Confirm current behavior on Anthropic's batch documentation.

Is Claude cheaper than OpenAI in 2026?

Sonnet 4.6 ($3/$15) is cheaper than gpt-5.5 ($5/$30) on both input and output. Opus 4.8 ($5/$25) matches gpt-5.5 on input but is cheaper on output. Haiku 4.5 ($1/$5) is slightly more expensive than gpt-5.4-mini ($0.75/$4.50). See the full comparison at our GPT vs Claude vs Gemini calculator.

Why is Claude output 5x more expensive than input?

Generating tokens requires a full forward pass per token while input tokens are processed in a single batched pass. Anthropic prices output at 5x input across every Claude tier, slightly tighter than the 6x ratio common on the OpenAI lineup.

What is Claude Fable 5 for?

Fable 5 ($10/$50) is the reasoning-heavy tier introduced in early 2026. It generates hidden chain-of-thought tokens billed at the output rate, similar to OpenAI's o-series. Use it for hard reasoning tasks (planning, math, complex code) where chain-of-thought materially improves accuracy; Sonnet 4.6 is cheaper for direct generation.

How much do vision and PDF inputs cost?

Image inputs bill at the standard input rate, with a 1024×1024 image converted to roughly 1,600 tokens — about $0.005 on Sonnet 4.6. PDFs bill per page as both text and visual tokens, typically 800-1,500 tokens per page depending on density.

How do I estimate Claude cost before sending a request?

Use cost = (input_tokens / 1M × input_price) + (output_tokens / 1M × output_price). Estimate token count as characters ÷ 4 or words ÷ 0.75. For a worked walk-through with current Claude prices, see our AI prompt cost calculator.

Is Claude cheaper on AWS Bedrock or the direct Anthropic API?

Per-token list rates are identical — Sonnet 4.6 is $3 input / $15 output on both. The practical difference is which credits you can apply. If you have unused AWS Activate credits or an EDP commitment, Bedrock is effectively cheaper because the spend draws down credit balances at face value. If you have Anthropic Startup Program credits or no cloud-provider credits at all, the direct API wins on caching depth (1-hour TTL, 512-token minimum prefix) and day-zero model access. New Claude releases typically reach Bedrock 2-6 weeks after the direct API.

Does Claude on Vertex AI support prompt caching and the Batch API?

Yes — both are available on Vertex AI in 2026, but with caveats relative to the direct Anthropic API. Vertex caching is 5-minute TTL only on most regions with a 1,024-token minimum prefix versus 512 on the direct API. Batch runs through Vertex Batch Prediction with similar 50% discount mechanics. The two stack cleanly. For maximum discount stacking — caching + batch on every tier — the direct API still has a 8-15% raw-price edge, though it is often outweighed by GCP credit availability for teams already on Google Cloud.

Which Claude surface do I use for HIPAA or SOC 2 workloads?

AWS Bedrock and Google Vertex AI both inherit their parent cloud's compliance posture — HIPAA-eligible on Bedrock with a signed AWS BAA, HIPAA-eligible on Vertex with a signed GCP BAA, with CloudTrail and Cloud Audit Logs providing the request-level audit trails most auditors expect. The direct Anthropic API offers a HIPAA BAA on the Enterprise tier but with a thinner policy-engine surface (no SCP equivalent, no ABAC). For regulated production traffic, most scaled teams route through Bedrock or Vertex; for development and evaluation, the direct API is fine.

Get the 2026 LLM pricing cheat sheet

One-page PDF with every Claude tier, the cache + batch math, and the formulas — free, no signup gate. Or browse our 40+ prompt-engineering tools to draft cheaper, leaner prompts.

Browse all prompt tools →