Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

AI Prompt Engineering Terms: A Deep-Dive Glossary (2026)

Twenty-eight core prompt-engineering terms explained in two-to-four sentences each, grouped by theme, and linked to a canonical source — a deeper companion to a quick-reference glossary, current for 2026.

By The DDH Team at Digital Dashboard HubUpdated

This is a deep-dive glossary of 28 prompt-engineering terms grouped into five themes: foundations, prompting techniques, context and retrieval, model behavior and decoding, and safety. Each entry gives a two-to-four sentence explanation — enough to actually understand the concept, not just recognize the word — and links to a canonical source where one exists.

It's built as the in-depth companion to our quick-reference AI Prompt Engineering Glossary: use the quick glossary to look a term up fast, and this one when you want to understand it. Definitions draw on the DAIR.ai Prompt Engineering Guide, Learn Prompting, OWASP, and the original research papers.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

The five term groups at a glance

Feature
What it covers
Example terms
FoundationsThe units and parameters everything else builds onToken, context window, temperature, top-p, system prompt
Prompting techniquesHow you structure the instruction to get better outputZero-shot, few-shot, chain-of-thought, ReAct, tree of thoughts
Context & retrievalGetting the right information into the windowRAG, embedding, vector database, chunking, grounding, memory
Model behavior & decodingHow models generate and where they failHallucination, in-context learning, reasoning model, fine-tuning
Safety & reliabilityThe risks and the controls around themPrompt injection, system prompt leakage, jailbreak, guardrails

Grouping and definitions synthesized from the [DAIR.ai Prompt Engineering Guide](https://www.promptingguide.ai/), [Learn Prompting](https://learnprompting.org/), and the [OWASP LLM Top 10](https://genai.owasp.org/llm-top-10/), June 2026.

What's in this guide

Five themed groups, 28 terms. Jump to the group you need:

1. Foundations — token, context window, prompt, completion, temperature, top-p, system prompt.

2. Prompting techniques — zero-shot, few-shot, chain-of-thought, ReAct, tree of thoughts, role prompting, prompt template, prompt chaining.

3. Context & retrieval — RAG, embedding, vector database, chunking, grounding, memory.

4. Model behavior & decoding — hallucination, in-context learning, reasoning model, fine-tuning, max tokens.

5. Safety & reliability — prompt injection, system prompt leakage, jailbreak, guardrails.

6. Sources & further reading.


Foundations

Token — The basic unit a model reads and generates. A token is a sub-word chunk, not a whole word; in English, roughly 1 token ≈ 4 characters ≈ 0.75 words (per Anthropic and OpenAI tokenization docs). Pricing and context limits are both measured in tokens, so understanding them is the basis for both cost and capacity. A 500-word email is about 670 tokens.

---

Context window — The maximum number of tokens a model can take in for a single call, covering the system prompt, your instruction, any retrieved documents, and the conversation history. Everything must fit inside it, and you pay the input rate for every token you place there. Windows are large in 2026 (Anthropic includes 1M tokens at standard pricing on Opus 4.6+, Sonnet 4.6, and Fable 5, per pricing), but large isn't free or infinite. See What Is a Context Window?.

---

Prompt — The input you give a model: instructions, context, examples, and the question or task. In practice a prompt is usually structured into a role, an instruction, supporting context, and an output specification. The quality and structure of the prompt is the single biggest controllable lever on output quality. The DAIR.ai guide is the canonical reference.

---

Completion — The model's generated output in response to a prompt. The term comes from the original framing of language models as text-completers: given a prefix, predict what comes next. Completion (output) tokens are billed at a higher rate than input tokens — typically 4-6x — because generating each token requires a full forward pass through the model.

---

Temperature — A decoding parameter (typically 0 to ~2) that controls randomness in the output. Low temperature makes the model more deterministic and focused — best for factual or structured tasks; high temperature increases variety and creativity at the cost of consistency. See the OpenAI API reference for how it interacts with other sampling parameters.

---

Top-p (nucleus sampling) — An alternative way to control randomness: instead of scaling all probabilities like temperature, top-p restricts sampling to the smallest set of next-token candidates whose cumulative probability exceeds p. A top-p of 0.1 means only the most likely tokens summing to 10% probability are considered. Providers generally advise tuning either temperature or top-p, not both at once (OpenAI API reference).

---

System prompt — A high-priority instruction, separate from the user's message, that sets the model's role, rules, tone, and constraints for the whole conversation. It's where you define persona and guardrails that should persist across turns. Because it carries elevated authority, it's also a security-sensitive surface — exposing it is its own risk (see System Prompt Leakage below).


Prompting techniques

Zero-shot prompting — Asking the model to do a task with only an instruction and no worked examples. It relies entirely on what the model already learned in training. Zero-shot is the simplest approach and often sufficient for common tasks; when it underperforms, adding examples (few-shot) is the usual next step. Covered in the DAIR.ai guide.

---

Few-shot prompting — Including a handful of input/output examples in the prompt to show the model the pattern you want before giving it the real input. It steers format and style far more reliably than instructions alone. The technique was popularized by Brown et al., 2020 in the GPT-3 paper (Language Models are Few-Shot Learners). Use our few-shot prompt templates as a starting point.

---

Chain-of-thought (CoT) — Prompting the model to show its reasoning step by step before giving a final answer, which improves performance on math, logic, and multi-step problems. Introduced by Wei et al., 2022 (Chain-of-Thought Prompting Elicits Reasoning in Large Language Models). The simplest trigger is adding "let's think step by step." See our Chain-of-Thought Prompting Guide.

---

ReAct (Reason + Act) — A pattern that interleaves reasoning steps with actions, such as calling a tool or searching, so the model can gather information mid-task and reason over what it finds. It's foundational to agentic systems. Introduced by Yao et al., 2022 (ReAct: Synergizing Reasoning and Acting in Language Models).

---

Tree of Thoughts (ToT) — A technique that has the model explore multiple reasoning paths in parallel — a tree of partial solutions — and evaluate or backtrack among them, rather than committing to one linear chain. It suits problems where the first line of reasoning may be wrong and exploration pays off. Introduced by Yao et al., 2023 (Tree of Thoughts).

---

Role (persona) prompting — Assigning the model a role — "you are an experienced editor" — to shape tone, vocabulary, and the framing of its answer. It's a cheap, effective way to set voice and focus, usually placed in the system prompt. The DAIR.ai guide and provider docs cover its use; our Brand Voice Generator applies the idea to consistent brand persona.

---

Prompt template — A reusable, fill-in-the-blank prompt with fixed structure and variable input slots, run once per task. It keeps output consistent across many runs and lets a team share high-quality prompts without re-deriving them. See Prompt Templates vs Prompt Chaining (2026) and The AI Prompt Templates Library (2026).

---

Prompt chaining — Breaking a task into a sequence of prompts where each step's output becomes the next step's input, giving focused steps and checkpoints between them. It's the backbone of multi-stage workflows and agents. See the comparison in Prompt Templates vs Prompt Chaining (2026).


Context & retrieval

RAG (Retrieval-Augmented Generation) — A pattern that retrieves the passages most relevant to a query from an external corpus and places only those in the context window, so the model answers grounded in specific, current facts rather than memory alone. It controls cost and improves accuracy by keeping context small and relevant. See What Is RAG?.

---

Embedding — A numerical vector that represents the meaning of a piece of text, such that semantically similar texts sit close together in vector space. Embeddings are what make semantic retrieval possible: you embed both the query and the corpus, then find the nearest matches. They're the foundation under RAG and vector search.

---

Vector database — A store optimized for holding embeddings and finding the nearest vectors to a query embedding quickly, even across millions of items. It's the retrieval engine in most RAG systems: given a query vector, it returns the most semantically similar chunks. The quality of retrieval depends heavily on how the corpus was chunked and embedded.

---

Chunking — Splitting documents into smaller passages before embedding them for retrieval. Chunk size is a real tradeoff: chunks too large dilute relevance and waste window space, while chunks too small lose the context needed to answer. Good chunking — respecting natural boundaries like paragraphs or sections — is one of the highest-leverage decisions in a RAG system.

---

Grounding — Tying a model's output to verifiable source material so claims can be traced and trusted, rather than relying on the model's parametric memory. RAG is the most common grounding technique; citing retrieved passages is grounding in action. Grounding is the primary defense against hallucination on factual tasks.

---

Memory — How a system carries context beyond a single call: short-term memory is the running conversation in the window during a session; long-term memory is durable facts persisted across sessions and reloaded when relevant. Both consume window budget, so the engineering work is selective — summarize old turns and retrieve only what the current task needs. Discussed in Prompt Engineering vs Context Engineering (2026).


Model behavior & decoding

Hallucination — When a model generates confident, fluent output that is factually wrong or fabricated — an invented statistic, citation, or quote. It happens because models predict plausible text, not verified truth. The main mitigations are grounding the model in retrieved sources (RAG) and verifying every factual claim against a real source — never trust an unverified number from a model.

---

In-context learning — The ability of a large model to learn a task from examples or instructions given in the prompt at inference time, without any weight updates. This is what makes few-shot prompting work: the model adapts its behavior from the examples in the window. The phenomenon was a central finding of the GPT-3 paper (Brown et al., 2020).

---

Reasoning model — A model class tuned to spend extra compute "thinking" before answering, producing better results on hard multi-step problems at the cost of higher latency and token usage. In 2026 these are the frontier tier (e.g. gpt-5.5-pro, Claude Opus 4.8, Gemini 3.1 Pro). They're worth their premium only when a task has genuine reasoning depth — see How to Choose an AI Model (2026).

---

Fine-tuning — Further training a base model on your own labeled examples to specialize its behavior, as opposed to steering it at inference time with prompts or context. It's powerful but heavier than prompting: it needs data, cost, and re-training when things change. The usual advice is to exhaust prompt and context engineering before reaching for fine-tuning.

---

Max tokens (output limit) — The cap you set on how many tokens a model may generate in a single completion. It bounds both cost and verbosity, and a cap set too low can truncate an answer mid-thought. It's distinct from the context window, which limits total input; max tokens limits the output specifically (OpenAI API reference).


Safety & reliability

Prompt injection — An attack where adversarial instructions hidden in user input or in retrieved content trick the model into ignoring its original instructions and doing something unintended. It is ranked #1 (LLM01:2025) on the OWASP LLM Top 10, making it the most prominent LLM security risk. It's especially dangerous in RAG and agent systems where the model processes untrusted external text.

---

System prompt leakage — When a model is manipulated into revealing its hidden system prompt, exposing rules, secrets, or logic the operator intended to keep private. It appears as LLM07:2025 on the OWASP LLM Top 10. The lesson is to never place real secrets (keys, credentials) in a system prompt and treat it as potentially discoverable.

---

Jailbreak — A class of prompt attacks designed to bypass a model's safety guardrails and elicit content or behavior it's meant to refuse, often via role-play framing, obfuscation, or instruction-override tricks. Jailbreaks overlap with prompt injection but specifically target the model's safety constraints rather than the application's instructions. Defending against them is an ongoing arms race; the OWASP LLM Top 10 tracks the broader risk landscape.

---

Guardrails — The controls placed around a model to keep its inputs and outputs within safe, on-policy bounds: input validation, output filtering, allow/deny lists, and separate moderation checks. They're a defense-in-depth layer because prompting alone can't guarantee safe behavior, especially against injection and jailbreaks. The OWASP LLM Top 10 is the standard reference for what guardrails need to defend against.


Sources & further reading

Definitions in this glossary draw on the canonical guides and original research below.

DAIR.ai Prompt Engineering Guide: https://www.promptingguide.ai/

Learn Prompting: https://learnprompting.org/

OWASP LLM Top 10: https://genai.owasp.org/llm-top-10/

Chain-of-Thought Prompting (Wei et al., 2022): https://arxiv.org/abs/2201.11903

Few-shot / in-context learning (Brown et al., 2020): https://arxiv.org/abs/2005.14165

ReAct (Yao et al., 2022): https://arxiv.org/abs/2210.03629

Tree of Thoughts (Yao et al., 2023): https://arxiv.org/abs/2305.10601

OpenAI API reference (temperature, top-p, max tokens): https://platform.openai.com/docs/api-reference/chat

Claude prompt engineering overview: https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/overview

Quick-reference companion: the AI Prompt Engineering Glossary.

Frequently Asked Questions

What is the difference between a token and a word?

A token is a sub-word chunk a model reads and generates, not a whole word. In English, roughly 1 token ≈ 4 characters ≈ 0.75 words (per Anthropic and OpenAI tokenization docs), so a 500-word email is about 670 tokens. Both pricing and context-window limits are measured in tokens, which is why the unit matters for cost and capacity.

What is the difference between chain-of-thought and ReAct?

Chain-of-thought prompts the model to reason step by step before answering, improving multi-step reasoning (Wei et al., 2022). ReAct goes further by interleaving reasoning with actions — like calling a tool or searching — so the model can gather information mid-task and reason over it (Yao et al., 2022). ReAct is foundational to agentic systems; chain-of-thought is a building block within it.

What is the most common AI security risk in prompting?

Prompt injection, where adversarial instructions hidden in user input or retrieved content trick the model into ignoring its original instructions. It is ranked #1 (LLM01:2025) on the OWASP LLM Top 10. A related risk, system prompt leakage (LLM07:2025), is when a model is manipulated into revealing its hidden system prompt — which is why you should never put real secrets in a system prompt.

What is the difference between temperature and top-p?

Both control randomness in output. Temperature scales the probability distribution — low values make output more deterministic, high values more creative. Top-p (nucleus sampling) instead restricts sampling to the smallest set of candidate tokens whose cumulative probability exceeds p. Providers generally advise tuning one or the other, not both at once — see the OpenAI API reference.

What is grounding and how does it reduce hallucination?

Grounding ties a model's output to verifiable source material rather than its internal memory, so claims can be traced and trusted. RAG — retrieving relevant passages and placing them in the context window — is the most common grounding technique, and citing those passages is grounding in action. It's the primary defense against hallucination on factual tasks, though a human should still verify the claims. See What Is RAG?.

When should I fine-tune instead of prompt?

Reach for fine-tuning only after exhausting prompt and context engineering. Fine-tuning further trains a model on your own labeled examples to specialize its behavior, which is powerful but heavy — it needs data, cost, and re-training when requirements change. Prompting and retrieval steer behavior at inference time with no training, so they're the cheaper, faster levers to try first.

Know the terms, then build the prompt.

Turn these techniques into ready-to-run prompts with 40+ free tools from Digital Dashboard Hub — no signup.

Browse all prompt tools →