Skip to content
LLM agents · Production patterns · Pattern-mismatch failures

The 7 Canonical LLM Agent Design Patterns and When Each One Wins

Most agent failures aren't model failures — they're pattern mismatches. The team built single-shot when the workload needed ReAct, or built Plan-Execute when Routing would have shipped in one day. Here are the 7 patterns and the decision tree.

By Andy Gaber, Founder, Digital Dashboard HubUpdated

If you've shipped LLM agents to production, you've probably seen one of two failure patterns: an agent that loops forever consuming tokens without finishing, or an agent that completes its task but produces output you can't trust. Both are symptoms of pattern mismatch — the underlying workload needed a different agent architecture than the one your team built. The 7 canonical patterns documented in agent-design literature each fit a specific workload shape; picking the right one is more decisive than picking the right model or tuning the right prompts.

Sources for this taxonomy include Anthropic's 'Building effective agents' guide, Yao et al. 2023 'ReAct: Synergizing Reasoning and Acting in Language Models' (arXiv:2210.03629), Shinn et al. 2023 'Reflexion' (arXiv:2303.11366), and the LangChain Agents documentation. The patterns below appear in production-grade agent literature across multiple framework vendors; the names and exact boundaries vary slightly between sources but the underlying shapes are stable.

Below: each pattern's structure, the workload signature that picks it, cost-quality math, and the failure modes you should expect when forcing a pattern onto a workload that doesn't fit.

**Research + further reading:** Additional authoritative sources informing this guide: OpenAI at platform.openai.com, Google Gemini at ai.google.dev, LlamaIndex at docs.llamaindex.ai, Pinecone at pinecone.io, Weaviate at weaviate.io. Cross-reference these for broader context, peer-reviewed research, and ongoing developments in this domain.

7 agent patterns at a glance: when each wins

Feature
Single-shot
Tool Use
ReAct
Plan-Execute
Reflection
Routing
Multi-Agent
Model calls per query11-33-151 + N2-525-50
Per-task cost (relative)1-3×3-15×5-15×2-5×1.2×5-50×
Latency1-5s2-8s10-60s30s-5m10-60s<5sminutes-hours
Best forSimple, boundedExternal info neededMulti-step reasoningComplex sequencesQuality-criticalDiverse query typesTrue role specialization
Failure modeMisses info needsTool confusion at >15 toolsInfinite loopsPlans fail in realityDiminishing returns 2-3 itersMisclassificationOver-engineering

Patterns synthesized from [Anthropic's 'Building effective agents' guide](https://www.anthropic.com/research/building-effective-agents), [LangChain Agents documentation](https://python.langchain.com/docs/concepts/agents/), [Yao et al. 2023 ReAct paper](https://arxiv.org/abs/2210.03629), and [Shinn et al. 2023 Reflexion paper](https://arxiv.org/abs/2303.11366). Names and exact boundaries vary slightly across sources; the underlying shapes are stable. Further reading: [OpenAI at platform.openai.com](https://platform.openai.com/docs/guides/prompt-engineering), [Google Gemini at ai.google.dev](https://ai.google.dev/), [LlamaIndex at docs.llamaindex.ai](https://docs.llamaindex.ai/).

Pattern 1 — Single-shot (the simplest baseline)

**Structure:** One LLM call. Input goes in, output comes out. No tools, no iteration, no decomposition. The 'agent' is really just structured prompting.

**Workload signature:** Task fully specified in the input, output bounded and verifiable, no external information needed beyond what's in the prompt + model knowledge. Examples: text summarization, paraphrasing, classification with stable label set, simple Q&A from supplied context.

**Cost:** 1 model call. Cheapest and fastest of all patterns. Latency: single-call latency (typically 1-5 seconds depending on model + output length).

**Failure mode:** Use single-shot when the workload genuinely needs information not in the prompt (then it hallucinates) or when the workload requires multi-step reasoning (then it produces shallow output).


Pattern 2 — Tool Use / Function Calling

**Structure:** LLM call with a defined tool schema. Model decides whether to invoke tools, parses tool outputs into prompt context, generates final response. Single iteration through the loop or capped iterations.

**Workload signature:** Task requires external information (database lookup, API call, calculation), tool surface is small (<10 tools), tools are simple (deterministic, predictable).

**Cost:** 1-3 model calls typically (initial + tool result + final response). Latency: 2-8 seconds typical.

**Failure mode:** Tool surface too large (>15 tools) — model gets confused about which to call. Tools too complex (chained dependencies) — needs Plan-Execute instead. See Anthropic's tool use documentation for boundaries.


Pattern 3 — ReAct (Reasoning + Acting interleaved)

**Structure:** Model alternates between 'Thought' (reasoning about what to do next) and 'Action' (invoking a tool) in a loop until task completion. Each iteration: reason, act, observe result, reason about result, act again.

**Workload signature:** Multi-step reasoning required where each step depends on the previous step's result. Tools available but not all are needed for every query. Tasks like research, data analysis, multi-step computation.

**Cost:** 3-15 model calls depending on task complexity. Latency: 10-60 seconds. Per-task cost can be 3-15× single-shot.

**Failure mode:** Loops forever if no termination condition is well-defined. Wastes tokens when single-shot would have sufficed. Original paper: Yao et al. 2023, arXiv:2210.03629.


Pattern 4 — Plan-and-Execute

**Structure:** Two-stage: a planner LLM call creates a multi-step plan, then an executor model executes each step (often using Tool Use pattern). Planner sees the big picture; executor focuses on individual steps.

**Workload signature:** Complex multi-step tasks where the sequence of steps matters and benefits from upfront planning. Tasks where executor capability differs from planner capability (e.g., planner is frontier model, executor is cheaper model).

**Cost:** 1 planner call + N executor calls (N = plan length, typically 5-15). Latency: 30 seconds to several minutes. Cost varies based on plan length.

**Failure mode:** Plans that don't survive contact with reality — executor encounters surprises that weren't anticipated. Requires re-planning logic if plans frequently fail. See Anthropic's orchestrator-worker pattern documentation.


Pattern 5 — Reflection / Self-Critique

**Structure:** Initial generation followed by a critique pass that evaluates the output against criteria, then a revision pass that improves based on the critique. Can iterate 2-5 times.

**Workload signature:** Quality matters more than cost or speed. Output has subjective dimensions (writing quality, code elegance, argument strength) that benefit from critique. Tasks where single-shot output is good-but-not-great and you want great.

**Cost:** 2-5× single-shot. Latency: 3-15× single-shot. Quality typically improves 20-40% on subjective rubrics.

**Failure mode:** Marginal improvement after 2-3 iterations (diminishing returns). Critique can produce false-positive flaws (model criticizes things that were actually fine). See Shinn et al. 2023 'Reflexion' paper for the academic framework.


Pattern 6 — Routing / Dispatch

**Structure:** A small classification model (or cheap LLM call) routes incoming requests to specialized handlers — different prompts, different models, different downstream pipelines. The router is fast; the handlers are deep.

**Workload signature:** Diverse inbound query types where different queries benefit from different processing. Examples: support ticket triage (billing → handler A, technical → handler B, refund → handler C), content classification before specialized generation.

**Cost:** 1 routing call + 1 handler call typically. Latency: small overhead from router (typically <500ms). Cost dominated by the handler tier chosen.

**Failure mode:** Router misclassifies edge cases, sending them to wrong handlers. Mitigation: log misclassifications, retrain router with corrected examples. Anthropic's routing pattern documentation covers boundaries.


Pattern 7 — Multi-Agent Collaboration

**Structure:** Multiple agents with different roles (researcher, critic, planner, executor) collaborate on a shared task via structured handoffs. Each agent has its own prompt, tools, and sometimes its own model.

**Workload signature:** Tasks that benefit from genuine specialization OR adversarial evaluation. Examples: deep research (researcher + skeptic + synthesizer), code generation with built-in code review (writer + reviewer + tester).

**Cost:** 5-50× single-shot depending on number of agents and rounds. Latency: minutes to hours. Highest cost of all patterns.

**Failure mode:** Pattern abuse — using multi-agent when single-shot + reflection would suffice. Most multi-agent systems in production should have been simpler patterns. Per Anthropic's research, the multi-agent overhead is justified only when the task genuinely requires multiple distinct cognitive roles.

Pattern picked by trend ('we need an agent'): team builds Multi-Agent or Plan-Execute for workloads that would have shipped in single-shot. Months of engineering, expensive token bills, marginal quality lift.
Pattern picked by workload signature: single-shot for simple, Tool Use for external info, ReAct for multi-step reasoning, Routing for diverse inputs, Plan-Execute for complex sequences, Reflection for quality-critical, Multi-Agent only for true role-specialization. Ships faster, costs less.

Pick the right pattern for your workload (4 steps)

  1. 1

    Describe the workload in 1-2 sentences (precisely)

    What's the input? What's the output? Does the output require information not in the prompt? Does it require multiple steps? Are the steps' results dependencies on each other? Is the input shape variable enough to need routing? Most teams skip this and pick a pattern from fashion, not workload.

    → Open the Code Prompt Builder
  2. 2

    Apply the decision tree (single-shot → tool use → ReAct → Plan-Execute → Multi-Agent)

    Start at single-shot. Add complexity only when the workload demands it. Most production agents are over-engineered; the right pattern is usually 1-2 patterns simpler than the team's initial impulse. Per Anthropic's agent guide, simpler patterns win for ~70% of production agent workloads.

  3. 3

    Prototype the chosen pattern with 50 representative inputs

    Don't ship to production from a 3-example demo. Run 50 representative inputs through the pattern; score output quality + measure cost + measure latency. If the pattern doesn't hit your quality bar at the cost you can sustain, the pattern is wrong (or the model is wrong, or both).

  4. 4

    Production deploy with logging + cost monitoring

    Log every model call, tool call, and pattern transition. Monitor per-query cost daily. Most patterns that pass prototyping fail in production due to long-tail edge cases that increase per-query cost. Logging is what surfaces these; without it, you get a $30K monthly LLM bill before noticing.

Where to start when building an agent

If your workload is fully specified by the input: Start with single-shot. Add tool use only if external info is required. Most agent workloads are over-engineered; single-shot handles more than teams assume.

If your workload needs external information: Tool Use (function calling) is the right baseline. Add ReAct only if multiple steps depend on each other. The boundary between Tool Use and ReAct is iteration count — Tool Use stays at 1-3 calls; ReAct loops.

If output quality matters more than cost: Reflection / self-critique is the highest-quality-lift pattern at moderate cost (2-5× single-shot). Don't reach for Multi-Agent unless Reflection isn't enough.

If you're considering Multi-Agent: Verify your workload genuinely needs multiple distinct cognitive roles. Most teams reach for Multi-Agent when simpler patterns would suffice. Per Anthropic's research, Multi-Agent overhead is justified only for genuine role specialization.

Frequently Asked Questions

What are the canonical LLM agent design patterns?

Seven patterns appear consistently in agent literature: Single-shot, Tool Use (function calling), ReAct (reasoning + acting), Plan-and-Execute, Reflection / self-critique, Routing / dispatch, and Multi-Agent collaboration. Each fits a specific workload shape; picking the right one is more decisive than picking the right model. Sources: Anthropic's agent guide, Yao et al. 2023 ReAct, Shinn et al. 2023 Reflexion.

When should I use ReAct vs. Plan-and-Execute?

ReAct when each step depends on the previous step's result and you can't plan upfront (e.g., interactive research where each tool call's output determines the next call). Plan-and-Execute when the task can be decomposed upfront into a sequence of steps that mostly hold (e.g., complex tasks with predictable structure). ReAct is more flexible but harder to budget cost-wise; Plan-Execute is more predictable but breaks when reality diverges from the plan. Per Anthropic's documentation, the boundary is whether you can predict the sequence of operations.

Why is Multi-Agent often the wrong choice?

Multi-Agent has 5-50× the cost of single-shot and minutes-to-hours of latency. The pattern is justified only when the task genuinely needs multiple distinct cognitive roles (researcher + critic + synthesizer where each role brings different capability). Most teams reach for Multi-Agent for tasks that Single-shot + Reflection would handle at 1/10 the cost. Per Anthropic's agent research, simpler patterns win for ~70% of production agent workloads.

What's the cost difference between single-shot and ReAct?

ReAct typically uses 3-15× the model calls of single-shot per task, depending on task complexity and average loop length. For high-volume workloads, this means 3-15× the per-query cost. The quality lift is substantial for multi-step reasoning tasks but minimal for tasks single-shot could handle. The economics make sense for high-stakes individual queries and break down for high-volume low-stakes workloads.

How do I prevent ReAct agents from looping forever?

Cap maximum iterations (typically 5-15) and define explicit termination conditions in the system prompt. Per Yao et al. 2023's ReAct paper, well-designed ReAct agents include a 'final answer' state that the model can transition to. Without explicit termination logic, ReAct can loop indefinitely on ambiguous tasks. Most production ReAct implementations use a combination of iteration caps + explicit final-answer prompting + cost monitoring to abort runaway tasks.

When does Routing pay off?

When you have diverse inbound query types that benefit from specialized handling. Examples: customer-support triage (billing → handler A, technical → handler B), content classification before generation (blog post vs. product description vs. tweet). The router is cheap (typically a classification model or small LLM call); the handlers can be expensive frontier models. Per Anthropic's routing pattern docs, Routing typically pays for itself once you have 3+ distinct query types with meaningfully different handling needs.

Can I combine multiple patterns in one agent?

Yes, and most production agents do. Common combinations: Routing → ReAct (route by query type, then use ReAct for the multi-step queries), Plan-Execute with Tool Use inside each step, Multi-Agent where each agent uses ReAct internally. The patterns aren't mutually exclusive; they describe distinct architectural elements that compose. The skill is using the simplest combination that fits the workload — over-engineering is the dominant failure mode.

Pick the right agent pattern before writing any code.

The ChatGPT Prompt Generator and Code Prompt Builder help structure the workload description that determines pattern choice. Free, no signup. Part of 40+ free prompt tools.

Browse all prompt tools →