Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

What Is a Context Window? (2026)

The context window is the model's working memory for a single request — and in 2026 it can hold an entire book.

By The DDH Team at Digital Dashboard HubUpdated

A context window is the maximum number of tokens a model can consider in a single request — both the input you send (system prompt, instructions, documents, conversation history) and the output it generates count toward it. Once you hit the limit, older or less relevant content has to be dropped, summarized, or never included.

Think of it as the model's short-term working memory: it has no memory of anything outside the current window unless you put it there. If tokens are new to you, start with what is a token in AI.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Context window scale (rough English estimates)

Feature
Approximate equivalent
8k tokens~6,000 words / a short article
128k tokens~96,000 words / a short novel
200k tokens~150,000 words / a long book
1M tokens~750,000 words / several books or a codebase

Word equivalents use the ~0.75 words/token rough estimate; actual token counts vary by language and content. 1M-token windows available at standard pricing on Claude Opus 4.6+/Sonnet 4.6/Fable 5 per https://claude.com/pricing; Gemini long-context tiers per https://ai.google.dev/gemini-api/docs/pricing. Verified June 2026.

How big are context windows in 2026?

Windows have grown enormously. As of June 2026, the top-tier models offer 1M-token context: Anthropic includes a 1M-token window at standard pricing on Claude Opus 4.6 and newer, Sonnet 4.6, and Fable 5 (see Anthropic pricing and the Claude API pricing detail). Google's Gemini line is also known for very large context, documented on the Gemini pricing page, where the Gemini 3.1 Pro preview tier is priced for context up to 200k tokens.

A 1M-token window is roughly 750,000 words — on the order of several long books at once. That is enough to drop entire codebases, long contracts, or months of transcripts into a single prompt without retrieval. Exact limits vary by model and change often, so check the provider's live docs before you architect around a specific number.


What are the cost implications of a big window?

A large window is a budget, not a free resource — you pay per input token for everything you put in it. Filling a 1M-token window on a model priced around $3–$5 per million input tokens costs several dollars per call before the model writes a single word of output. Do that on every request and costs add up fast.

Some providers also adjust pricing at very large context. Note that on Gemini, the published preview rate for Gemini 3.1 Pro applies to context up to 200k tokens (see Gemini pricing); always confirm the long-context tier before relying on it. The practical takeaway: a bigger window lets you include more, it does not make including more cheaper.

Prompt caching is the main lever here. If a big chunk of your context is stable across calls — a long system prompt, a reference document, a codebase — cache it. On Anthropic, cache reads are about 10% of the base input price, so a reused 1M-token prefix becomes far cheaper after the first call (details on the Claude API pricing page). To model the dollars for your own setup, use the AI prompt cost calculator.


Does a bigger window mean better answers?

Not automatically. Having room to include more is not the same as the model using all of it well. Models can still lose track of details buried in the middle of a very long context, and irrelevant filler can dilute attention and add noise. More context can mean a worse answer if most of it is off-topic.

The skill is curation, not maximization. Put the most relevant, highest-signal material in the window and leave the rest out. This is precisely the problem retrieval-augmented generation solves — fetching only the passages that matter instead of stuffing everything in. See what is RAG.


How do I use a context window well?

Budget it deliberately. Estimate the tokens for each part of your prompt and leave clear headroom for the output — remember the model's answer competes for the same budget.

Lead with the instructions and the most important material, and place key constraints where they are least likely to be lost. For long-running chats, summarize earlier turns instead of resending raw history. When your knowledge base dwarfs the window, retrieve the relevant chunks rather than including everything. And cache anything stable to keep repeated large prompts affordable.

Frequently Asked Questions

What is a context window in simple terms?

It is the maximum amount of text, measured in tokens, that a model can consider in one request — including both your prompt and its reply. It works like the model's short-term memory: anything outside it is unseen unless you include it.

How big are context windows in 2026?

The leading models reach 1M tokens. As of June 2026, Anthropic includes a 1M-token window at standard pricing on Claude Opus 4.6+/Sonnet 4.6/Fable 5 (see Anthropic pricing), and Gemini also offers very large context (see Gemini pricing).

Does the output count toward the context window?

Yes. The window covers the whole request — input plus generated output. A very long prompt leaves less room for the answer, so budget headroom for the response.

Is a bigger context window always better?

No. A larger window lets you include more, but models can lose details in very long contexts and irrelevant filler can degrade answers. Curate for relevance rather than maximizing length.

How much does it cost to fill a 1M-token window?

You pay per input token for everything in the window, so a full 1M-token prompt at $3–$5 per million input tokens costs several dollars per call before any output. Prompt caching cuts this for stable content — cache reads are about 10% of input price on Anthropic. Estimate yours with the AI prompt cost calculator.

What is the difference between context window and tokens?

A token is the unit of text (roughly 4 characters); the context window is how many tokens fit in one request. See what is a token in AI.

When should I use retrieval instead of a big window?

When your source material is far larger than the window, or when most of it is irrelevant to a given query. Retrieval-augmented generation fetches only the relevant passages, which is cheaper and often more accurate. See what is RAG.

Budget your context before you spend it

See what filling a window costs across current models, and where caching pays off.

Browse all prompt tools →