Model card · Verified against Anthropic docs · 2026-06-20

Claude Sonnet 4.6: Full Spec Sheet (June 2026)

By The DDH Team at Digital Dashboard Hub·Updated June 20, 2026

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

Claude Sonnet 4.6 is Anthropic's middle-tier model and the Claude 4 family's production workhorse. Released September 2025 as the successor to Sonnet 4 and Sonnet 3.7, it sits between Opus (flagship, $15/$75 per 1M) and Haiku (cost-volume tier, ~$1/$5). Sonnet has been Anthropic's most-deployed model since Claude 3 in 2024 because it captures most of Opus's discipline at roughly 20% of Opus's cost.

Headline numbers: $3 per 1M input tokens, $15 per 1M output, $0.30 per 1M for cached input reads (90% off), $3.75 per 1M for cache writes (5-min TTL). Context window is 200,000 tokens standard, with a 1,000,000-token (1M) beta available via the `context-1m-2025-08-07` beta header. Max output is 64,000 tokens. Modalities are text + vision input; text output only. Tool use, parallel tool calls, prompt caching, extended thinking, and the Batch API (50% off) are all supported.

Below: full spec table, when Sonnet is the right call vs Opus or GPT-5 mini, a side-by-side against the rest of the mid-tier menu, the minimal API request, and 8 FAQs. Sibling pages: Claude Opus 4.7 spec sheet · GPT-5 mini spec sheet · Gemini 2.5 Flash spec sheet. Write a Sonnet-tuned prompt free with our ChatGPT prompt generator (Claude mode).

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card. →

Claude Sonnet 4.6 — Full spec sheet (June 2026)

Feature	Sonnet 4.6 spec
Provider	Anthropic
Model ID (API)	claude-sonnet-4-6
Released	September 2025
Input price (per 1M)	$3.00
Cached input read (per 1M)	$0.30 (90% off)
Cache write (per 1M, 5-min TTL)	$3.75 (25% premium)
Cache write (per 1M, 1-hour TTL)	$6.00 (2× premium)
Output price (per 1M)	$15.00
Batch API discount	50% off input + output
Context window (standard)	200,000 tokens
Context window (1M beta)	1,000,000 tokens
1M beta input price (>200K input)	$6.00 (2× tier)
1M beta output price (>200K input)	$22.50 (1.5× tier)
Max output tokens	64,000 tokens
Modalities (input)	Text, image
Modalities (output)	Text
Tool use
Parallel tool use
Structured outputs (via tool schemas)
Streaming
Prompt caching
Extended thinking (reasoning)
Vision (image understanding)
Knowledge cutoff	March 2025
Endpoint	/v1/messages

Sources verified 2026-06-20: Anthropic models documentation (https://docs.anthropic.com/en/docs/about-claude/models), Anthropic pricing page (https://www.anthropic.com/pricing), Anthropic prompt caching docs (https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching). The 1M context beta requires the `anthropic-beta: context-1m-2025-08-07` header and bills at 2× input / 1.5× output tier for input >200K tokens. Re-verify the live pages before budgeting.

What Sonnet 4.6 actually is (and why it's the default)

Sonnet 4.6 is Anthropic's production-tier model: not as expensive as Opus, not as small as Haiku, with the full Claude 4 feature set. Anthropic positions Sonnet as the right choice for 'most production workloads' and the data backs that up — across customer deployments, Sonnet handles the bulk of traffic with Opus reserved for the hard subset and Haiku reserved for the cheap subset.

Same architecture as Opus 4.7 (same Claude 4 base), trained on the same data with the same RLHF discipline, with a smaller model size that trades a few percentage points of quality on hard reasoning for a 5× price reduction. The feature surface is identical: tool use, parallel tool calls, extended thinking, prompt caching, vision input, structured outputs via tool schemas, the Batch API.

Sonnet's standout feature vs every other mid-tier model: the optional 1M-token context window. Opt-in via the `anthropic-beta: context-1m-2025-08-07` header. Anthropic doubles input price and adds 50% to output for the portion of a request that exceeds 200K tokens — but the long-context option lets Sonnet compete with Gemini 2.5 Pro on document-scale workloads. No other Claude tier exposes 1M.

Pricing math: what Sonnet 4.6 actually costs per call

Standard rates: `cost = (input_tokens / 1M) × $3 + (output_tokens / 1M) × $15`. The representative 1,000-in / 500-out call: `0.001 × $3 + 0.0005 × $15 = $0.003 + $0.0075 = $0.0105`. About 1¢ per call — 5× cheaper than Opus on the same tokens, but 8× more expensive than gpt-5-mini.

Prompt caching is the largest cost lever. Mark stable blocks with `cache_control: {type: 'ephemeral'}` and Anthropic caches that prefix for 5 minutes (or 1 hour with `ttl: '1h'`). Cached reads drop from $3/M to $0.30/M — 90% off. First-write premiums: 25% on 5-min TTL, 100% on 1-hour TTL.

Worked: a customer-support pipeline with a 3,000-token cached system prompt, 100K calls/month, 600 dynamic input + 200 output. Without caching: `(0.0036 × $3 + 0.0002 × $15) × 100K = $4,080/month`. With caching (90% hit rate): system prefix bills at $0.30/M most of the time, total drops to ~$1,250/month. ~70% off — entirely from prompt structure, no model change.

Batch API on top: 50% off both streams for asynchronous workloads. The stack of caching + batching turns Sonnet into a price-competitive option even against gpt-5-mini for the right workload shape. Worked $ across the Claude family: Claude API cost calculator.

The 1M context beta — when it's worth the 2× input premium

Opt-in to 1M context by adding `anthropic-beta: context-1m-2025-08-07` to your API request headers. Sonnet then accepts up to 1,000,000 input tokens in a single call. The pricing changes when input exceeds 200K: input price tier shifts to $6/M (2×), output to $22.50/M (1.5×).

Worth the premium when: you need to fit an entire long document (a full book, a complete codebase chunk, a months-long conversation history) in one call for cross-section reasoning. The 1M context lets Sonnet do things no 200K model can — answer questions that require pulling from page 2 and page 600 of the same document.

Not worth the premium when: retrieval-augmented generation (RAG) over chunked documents is sufficient. RAG on Sonnet at standard 200K is dramatically cheaper than 1M-context Sonnet, and on most knowledge-Q&A workloads, well-tuned RAG hits the same answer quality.

Comparison: Gemini 2.5 Pro also offers 1M context at a different pricing structure ($2.50 input >200K vs Sonnet's $6). For raw 1M-context cost, Gemini 2.5 Pro is cheaper. For Anthropic-quality voice + discipline at 1M, Sonnet 4.6 1M beta is the only option.

Extended thinking on Sonnet 4.6

Sonnet 4.6 supports the same extended thinking feature as Opus 4.7. Configure via `thinking: {type: 'enabled', budget_tokens: 3000}` in the API call. Sonnet will burn up to 3,000 internal reasoning tokens before producing the visible answer.

The cost dynamic is different on Sonnet than on Opus because Sonnet's output rate is 5× cheaper. A 3,000-token thinking budget on Sonnet adds `0.003 × $15 = $0.045` to the call. The same budget on Opus adds `0.003 × $75 = $0.225`. Extended thinking is more economically defensible on Sonnet — you can afford to apply it to a wider class of tasks.

Typical Sonnet thinking budgets: 1,000-2,000 tokens for routine analysis tasks, 3,000-5,000 for code synthesis with non-trivial logic, 5,000+ for math/proof tasks. As on Opus, don't add thinking to classification or extraction — it adds cost without improving quality.

When to pick Sonnet 4.6 vs Opus 4.7 vs gpt-5-mini

**Pick Sonnet 4.6** for the production default in any workflow that benefits from Anthropic's voice, discipline, or tooling: customer-facing chat, content generation, structured-data pipelines, summarization with quality requirements, code review and explanation, agentic workflows that don't need Opus-tier planning. Most teams that standardize on Claude run >80% of traffic on Sonnet.

**Pick Opus 4.7** when Sonnet's quality on a specific hard subset isn't enough — complex agentic loops, multi-file code synthesis with strict correctness, legal/financial analysis where one wrong answer is costly, deep research synthesis. Pay 5× for the Opus boost only on the tasks that need it.

**Pick gpt-5-mini** ($0.25 / $2 per 1M) over Sonnet when: cost is the dominant constraint, the task is mechanical (classification, extraction), you're already in the OpenAI ecosystem, or you need the larger 400K context without the 1M-beta complexity. gpt-5-mini is 12× cheaper than Sonnet on input. Sonnet beats it on voice, long-form writing, and instruction-following discipline.

Cross-tier head-to-head: Claude Sonnet vs GPT-5 mini.

Tool use, structured outputs, and the XML-tag convention

Sonnet 4.6 ships with Anthropic's full tool-use API: define tools as JSON Schema in the `tools` parameter, Sonnet picks one (or several, in parallel) and returns the arguments in a `tool_use` content block. Parallel tool use is on by default; opt out with `disable_parallel_tool_use: true`.

Structured outputs follow the same tool-use pattern as Opus: define a tool whose input schema is your desired output schema, force the call with `tool_choice: {type: 'tool', name: 'extract_data'}`. The JSON Schema enforcement is reliable; invalid outputs are extremely rare on well-formed schemas.

XML-tag prompts work well: `<task>...</task>`, `<context>...</context>`, `<example>...</example>`, `<output_format>...</output_format>`. Sonnet, like Opus, is trained to attend to these reliably and Anthropic's advanced-pattern docs use them consistently. Markdown-headed or plain-paragraph prompts also work but tend to underperform vs XML-tagged equivalents on complex multi-section instructions.

Verified sources and how to re-check the numbers

Every number on this page was verified against Anthropic's live documentation on 2026-06-20. Sources: docs.anthropic.com/en/docs/about-claude/models for context window, modalities, and feature support; anthropic.com/pricing for input/output/cached prices and the 1M-beta tier; docs.anthropic.com/en/docs/build-with-claude/prompt-caching for cache write/read mechanics.

The 1M context beta requires the explicit `anthropic-beta: context-1m-2025-08-07` header. Anthropic versions beta features by ID — when the beta moves to general availability, the header changes. Watch docs.anthropic.com/en/release-notes for the GA announcement.

Methodology: when a number could not be cross-confirmed against an official Anthropic page on the verification date, it was omitted from this card rather than guessed.

Switch from gpt-5-mini (or Opus) to Sonnet 4.6 in 5 steps

1
Get an Anthropic API key
console.anthropic.com → Settings → API Keys → Create Key. Add a small credit purchase before the first call goes live. Set `ANTHROPIC_API_KEY=...` in `.env`.
2
Install the SDK and send a minimal call
`pip install anthropic` (Python) or `npm install @anthropic-ai/sdk` (Node). Python: `from anthropic import Anthropic; c = Anthropic(); r = c.messages.create(model='claude-sonnet-4-6', max_tokens=1024, messages=[{'role': 'user', 'content': 'Hello'}]); print(r.content[0].text)`. `max_tokens` is required.
3
Add prompt caching to your system prompt
Wrap your stable instructions: `system=[{'type': 'text', 'text': '...', 'cache_control': {'type': 'ephemeral'}}]`. 5-min TTL by default. Within the TTL, cached reads bill at $0.30/M instead of $3/M — 90% off the cached portion.
4
Convert your prompt to XML tags
Restructure: instead of 'You are a helpful assistant. Here is the user's request: ...', use `<role>helpful research assistant</role><task>{user_request}</task><output_format>JSON with fields summary, sources, confidence</output_format>`. Sonnet attends to XML tags more reliably than to markdown headings.
→ Open the ChatGPT prompt generator (Claude mode)
5
Opt in to 1M context only if you need it
Most Sonnet workloads fit in 200K. If you genuinely need to pass a 500K-token document in one call, add header `anthropic-beta: context-1m-2025-08-07`. Pricing for input >200K shifts to $6/M (2× tier). For most teams, RAG at 200K is dramatically cheaper than 1M context for the same answer quality.

Digital Dashboard Hub

The prompt patterns above work 10x better when they live in a library you actually own — tunable to your niche, exportable to GPT-5, Claude, Gemini, Perplexity, Midjourney, Llama. Stop pasting across 6 tools.

Try DDH's AI Prompt Builder — free 14 days, no card. →

Related calculators

OpenAI Pricing Calculator →GPT-5.5, 5.4, mini, nano — full per-call cost in one input.Claude Pricing Calculator →Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5 — input + output combined.Context Window Comparison →Max input length and price per 1M for every current model.

Related prompt tools

Prompt generator (Claude mode)→Code prompt builder (XML-tagged)→Claude Opus 4.7 spec sheet→GPT-5 mini spec sheet→Claude API cost calculator→

Frequently Asked Questions

How much does Claude Sonnet 4.6 cost in 2026?

$3 per 1M input tokens, $15 per 1M output tokens, $0.30 per 1M for cached input reads (90% off). Cache writes cost $3.75/M (5-min TTL) or $6/M (1-hour TTL). Batch API takes 50% off both standard streams. The 1M context beta bills at $6/M input (2× tier) and $22.50/M output for input >200K. A representative 1,000-in / 500-out call costs ~$0.0105. Source: anthropic.com/pricing, verified 2026-06-20.

What is Claude Sonnet 4.6's context window?

200,000 tokens standard, with an optional 1,000,000-token (1M) beta via the `anthropic-beta: context-1m-2025-08-07` header. The 1M beta bills input >200K at 2× the standard rate and output at 1.5×. Sonnet is the only Claude tier currently exposing 1M — Opus 4.7 does not.

What is the difference between Claude Sonnet 4.6 and Claude Opus 4.7?

Same context (200K standard), same modalities, same feature surface (tool use, prompt caching, extended thinking). Opus 4.7 is $15/$75 per 1M — 5× more expensive than Sonnet 4.6's $3/$15. Opus wins on hard reasoning, complex code synthesis, multi-step planning. Sonnet wins on price-performance and is the right default for >80% of production traffic. Sonnet also has the 1M-context beta; Opus does not.

Does Sonnet 4.6 support extended thinking?

Yes. Configure via `thinking={'type': 'enabled', 'budget_tokens': 3000}`. Sonnet burns up to 3,000 internal reasoning tokens before producing the visible answer; thinking tokens bill at the output rate ($15/M). More economical than Opus extended thinking ($75/M) — apply it to a wider class of analysis tasks.

How does prompt caching work on Sonnet 4.6?

Explicit: mark blocks with `cache_control: {type: 'ephemeral'}` (5-min TTL default) or `{type: 'ephemeral', ttl: '1h'}` (1-hour TTL). First write costs 25% more (5-min) or 100% more (1-hour). Subsequent reads within TTL bill at 10% of input price. Largest cost lever on Sonnet — a stable cached system prefix typically cuts the input bill 70-90%.

Should I use Sonnet 4.6 or gpt-5-mini?

gpt-5-mini is 12× cheaper on input, 7.5× cheaper on output ($0.25/$2 vs $3/$15). Sonnet 4.6 wins on long-form writing voice, instruction-following discipline on complex prompts, prompt-caching mechanics, and the optional 1M context. For raw cost-driven mechanical tasks, gpt-5-mini. For Claude-quality production workloads, Sonnet. See Claude Sonnet vs GPT-5 mini.

Where is Sonnet 4.6 available?

Anthropic API (console.anthropic.com), Amazon Bedrock, Google Cloud Vertex AI, and via the Claude consumer apps (claude.ai Pro tier and above). Bedrock and Vertex pricing matches Anthropic direct as of June 2026.

Can I fine-tune Sonnet 4.6?

Fine-tuning on Claude models is limited. Anthropic offers fine-tuning on Claude Haiku via Amazon Bedrock; Sonnet fine-tuning is not generally available as of June 2026. For most use cases, Sonnet + a well-crafted XML-tagged prompt + cached system prefix closes the quality gap fine-tuning would address.

Sonnet is the production sweet spot. Make every call cache.

Our AI Prompt Generator writes Sonnet-tuned prompts (XML-tagged, cache-anchored, dynamic context last) based on YOUR business + task — so the 90%-off cached read price actually triggers. 14-day free trial of DDH Pro, no card.

Browse all prompt tools →