The context window is the maximum number of tokens the model can consider at once — your prompt, the conversation history, any attached documents, and the output it's generating all share that budget. In 2026, windows are large: Anthropic includes a 1M-token context window at standard pricing on its Opus 4.6+, Sonnet 4.6, and Fable 5 models, for example.
Two facts matter for prompting. First, everything outside the window doesn't exist to the model — in a long conversation, early turns can fall out of context, and the model genuinely cannot 'remember' them. Second, even within the window, position matters: models tend to attend most reliably to the beginning and end of the context, so burying a critical instruction in the middle of a huge prompt is risky.
Practical takeaways: put your most important instructions at the start (and optionally restate the key constraint at the end); for long documents, retrieve and include only the relevant chunks rather than pasting everything; and in long chats, periodically restate critical context because old turns may have scrolled out of the window. A bigger window is a capacity, not a reason to fill it — lean context generally produces sharper, cheaper output.