Token — The basic unit a model reads and generates. A token is a sub-word chunk, not a whole word; in English, roughly 1 token ≈ 4 characters ≈ 0.75 words (per Anthropic and OpenAI tokenization docs). Pricing and context limits are both measured in tokens, so understanding them is the basis for both cost and capacity. A 500-word email is about 670 tokens.
---
Context window — The maximum number of tokens a model can take in for a single call, covering the system prompt, your instruction, any retrieved documents, and the conversation history. Everything must fit inside it, and you pay the input rate for every token you place there. Windows are large in 2026 (Anthropic includes 1M tokens at standard pricing on Opus 4.6+, Sonnet 4.6, and Fable 5, per pricing), but large isn't free or infinite. See What Is a Context Window?.
---
Prompt — The input you give a model: instructions, context, examples, and the question or task. In practice a prompt is usually structured into a role, an instruction, supporting context, and an output specification. The quality and structure of the prompt is the single biggest controllable lever on output quality. The DAIR.ai guide is the canonical reference.
---
Completion — The model's generated output in response to a prompt. The term comes from the original framing of language models as text-completers: given a prefix, predict what comes next. Completion (output) tokens are billed at a higher rate than input tokens — typically 4-6x — because generating each token requires a full forward pass through the model.
---
Temperature — A decoding parameter (typically 0 to ~2) that controls randomness in the output. Low temperature makes the model more deterministic and focused — best for factual or structured tasks; high temperature increases variety and creativity at the cost of consistency. See the OpenAI API reference for how it interacts with other sampling parameters.
---
Top-p (nucleus sampling) — An alternative way to control randomness: instead of scaling all probabilities like temperature, top-p restricts sampling to the smallest set of next-token candidates whose cumulative probability exceeds p. A top-p of 0.1 means only the most likely tokens summing to 10% probability are considered. Providers generally advise tuning either temperature or top-p, not both at once (OpenAI API reference).
---
System prompt — A high-priority instruction, separate from the user's message, that sets the model's role, rules, tone, and constraints for the whole conversation. It's where you define persona and guardrails that should persist across turns. Because it carries elevated authority, it's also a security-sensitive surface — exposing it is its own risk (see System Prompt Leakage below).