Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Multi-Agent vs Single-Agent: When to Fan Out and When to Stay Simple

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

Before you can answer the multi-agent question, you need to answer the architecture question — and that starts with understanding your agent framework options. LangGraph, CrewAI, AutoGen, and Pydantic AI make different tradeoffs around how easily they implement multi-agent patterns. LangGraph's graph model with fan-out nodes is purpose-built for multi-agent orchestration. CrewAI's Crew abstraction makes role-based multi-agent the default pattern. Pydantic AI is single-agent-first. Knowing which framework fits your team also shapes which architecture is easiest to implement and maintain.

The two legitimate reasons to fan out to multiple agents are: (1) true task parallelism — independent subtasks that can run simultaneously, reducing total wall-clock time — and (2) context isolation — each agent starts with a clean context window, avoiding the token cost and quality degradation of long accumulated context. If neither benefit applies to your task, single-agent is almost always the right choice. The coordination overhead alone (10-30% additional tokens for orchestration messages and summaries) is not worth it when you get neither parallelism nor isolation in return.

This guide covers the cost math, failure-mode analysis, and decision framework for choosing between architectures. For the framework-specific implementation details once you've made the decision, see the agent framework decision matrix and the RAG vs agent architecture guide. For running cost estimates on your specific workload, the Claude API cost calculator will model your exact token counts against current pricing.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Multi-agent vs single-agent — when each wins

Feature
Dimension
Single agent
Multi-agent
Context requirementsFits in one context windowExceeds context window or needs isolation
Task parallelizationSequential steps onlyTruly parallel subtasks (independent)
Specialization needGeneralist taskDifferent experts needed per domain
Failure isolationOne failure = full retryAgents fail independently (partial recovery)
Cost per task (simple)Lower (no overhead)Higher (orchestration cost)
Cost per task (complex)Higher (long context)Lower (parallel + isolated contexts)
Coordination overheadNoneNon-trivial (10-30% token overhead)
Debugging difficultyEasier (single trace)Harder (distributed trace)
Latency (parallelizable tasks)Serial (slower)Parallel (faster)
Latency (sequential tasks)FastSlower (handoff overhead)
Memory managementSingle contextEach agent has own context (can share)
Suitable frameworksPydantic AI, raw tool useAny multi-agent framework (LangGraph, CrewAI, AutoGen)
Anthropic use caseMost tool-use agentsResearch system per Anthropic engineering blog

Sources, fetched 2026-06-21: https://www.anthropic.com/engineering/built-multi-agent-research-system, https://langchain-ai.github.io/langgraph/concepts/multi_agent/, https://docs.anthropic.com/en/docs/about-claude/pricing

The core question: does your task benefit from parallelization or specialization?

The multi-agent pattern earns its overhead in exactly two circumstances: when your task has truly independent subtasks that can run in parallel (reducing wall-clock time proportional to the degree of parallelism), and when your task benefits from context isolation (each agent starts fresh, avoiding the quality degradation and increased token cost of long accumulated context). Every other argument for multi-agent — 'it's more modular,' 'each agent has a clear role,' 'it's more like how teams work' — is aesthetic, not economical.

**Parallelism requires genuine independence.** Two subtasks are truly independent if the output of one is not an input to the other. Searching for news about five different companies simultaneously is parallel — each search doesn't depend on the others. Searching for news, then summarizing findings, then writing a report is sequential — each step depends on the prior one. If your DAG (directed acyclic graph) of tasks is a straight line, multi-agent gives you coordination overhead with zero parallelism benefit.

Context isolation benefits accrue when your task has enough sequential steps that the accumulated input context becomes expensive or quality-degrading. **A common threshold: roughly 15-25 sequential steps**, beyond which a single agent's growing context costs more per turn (in tokens and quality) than the coordination overhead of splitting the task across agents with clean contexts. Below that threshold, single agent wins on simplicity every time.

The specialization argument is valid only when genuinely different capabilities are required. If step 3 of your task requires adversarial fact-checking that must be independent of step 1's generative work, a separate fact-checker agent with a clean context provides real quality value. If steps 1 through 5 all require the same kind of reasoning (summarization, extraction, Q&A), separate agents just multiply your API calls without adding capability.

When neither parallelism nor specialization applies, the multi-agent pattern is pure overhead: 10-30% more tokens spent on orchestration messages, summaries, and handoffs; a more complex observability setup; harder debugging; and more failure modes to manage. The simplest architecture that meets your requirements is almost always the right architecture. Start with single agent and refactor to multi-agent when you have concrete evidence of a bottleneck that multi-agent solves.


Anthropic's multi-agent research system: what they built and why

Anthropic's engineering team published a detailed account of building their multi-agent research system at https://www.anthropic.com/engineering/built-multi-agent-research-system — and it's the best publicly available case study for understanding when and why to fan out. Their system was designed to answer complex research questions that require synthesizing information from dozens of sources, a task that quickly overwhelms a single agent's context window.

**The architecture they chose is the canonical multi-agent pattern**: an orchestrator agent that receives the research question, decomposes it into subtopics, and fans out to specialized subagents — web search agents, document analysis agents, and synthesis agents. Each subagent receives a clean context containing only its specific subtask, not the full accumulated context of the research question. The subagents return summaries, not full outputs, to the orchestrator — this is a critical design choice that keeps the orchestrator's context manageable.

The reason Anthropic chose multi-agent for this specific use case is instructive. A 50-source research task would require a single agent to process hundreds of thousands of tokens of source material sequentially, accumulating context on every step. By the time the agent reached sources 40-50, it would be paying input token costs on the first 40 sources repeatedly, and the quality of synthesis would degrade as the context grew unwieldy. **Multi-agent with context isolation broke this cost and quality ceiling.**

Their orchestrator doesn't need to be the most expensive model. The orchestrator's job is routing and aggregation — it reads the original question, writes subtask instructions, and aggregates summaries. A well-designed orchestrator prompt on Sonnet 4.6 can handle this at $3/M input vs $15/M for Opus 4.7. The specialized subagents that actually read source documents benefit most from quality (Opus 4.7's reading comprehension edge shows up here) but only need to process their specific subsection of the task.

SWE-bench agents — the most production-relevant coding benchmark at https://swe-bench.github.io/ — use similar patterns. The top performers on SWE-bench Verified use a state-machine architecture (observe → plan → code → test → verify) that loops until a terminal condition is reached. This is structurally multi-step but not necessarily multi-agent; the distinction matters. Many high-performing SWE-bench solutions are single agents with LangGraph-style state machines, not multi-agent orchestrations. The key is the stateful loop, not the number of agent instances.

The lesson from Anthropic's system is not 'always use multi-agent for research.' It's more precise: **when your task requires processing more source material than fits cleanly in one context window, AND when the source material can be meaningfully divided into independent subtasks, AND when partial results (summaries per source) have value to the orchestrator — fan out.** If your research task can be answered by a single web search + synthesis, a single agent is faster and cheaper.


Cost math: when multi-agent is cheaper, when it's not

The cost comparison between single-agent and multi-agent is a function of one key variable: how quickly does the single agent's accumulated context cost exceed the multi-agent coordination overhead? Let's work through a concrete example using Claude Sonnet 4.6 at $3/M input tokens, $15/M output tokens as of June 2026 (see https://docs.anthropic.com/en/docs/about-claude/pricing).

**Single agent, 50-step research task scenario.** Each step generates roughly 500 tokens of output that becomes the next step's input context, plus a fixed 2,000-token system prompt. At step 10: input context = 2,000 (system) + 10 × 500 (accumulated) = 7,000 tokens. At step 30: 2,000 + 30 × 500 = 17,000 tokens. At step 50: 2,000 + 50 × 500 = 27,000 tokens. Total input tokens across 50 steps ≈ 2,000 × 50 + 500 × (0+1+2+...+49) = 100,000 + 612,500 = 712,500 tokens. At $3/M: **$2.14 in input costs alone.**

**Multi-agent, same 50-step task divided into 5 parallel agents of 10 steps each.** Each agent sees a fixed 2,000-token system prompt plus its 10 steps of context (max 7,000 tokens at step 10). Total input tokens per agent: 2,000 × 10 + 500 × (0+1+...+9) = 20,000 + 22,500 = 42,500 tokens. Five agents: 212,500 tokens. Plus orchestration overhead: 5 task instructions (500 tokens each) + 5 summary responses (500 tokens each) → roughly 10,000 tokens of orchestration. Total: ~222,500 tokens. At $3/M: **$0.67 in input costs.** Multi-agent is ~3× cheaper on this task.

Now run the same math on a 5-step task. Single agent: 2,000 × 5 + 500 × (0+1+2+3+4) = 10,000 + 5,000 = 15,000 tokens. Multi-agent with 5 parallel agents of 1 step each: 5 × 2,000 (system prompts) + 5 × 500 (outputs) + orchestration overhead = 15,000 + 10,000 = 25,000 tokens. **Single agent is cheaper by 40%.** For short tasks, the coordination overhead dominates and single agent wins.

**The crossover point is roughly 15-25 steps**, depending on your per-step output token count and system prompt size. Below the crossover: single agent. Above it: multi-agent with context isolation. This is a rough guide — run the calculation for your specific workload, factoring in your actual token counts. The Claude API cost calculator can model this precisely.

A critical nuance: the cost comparison above only covers input tokens. Output tokens cost 5× more per million ($15/M vs $3/M on Sonnet 4.6). If your multi-agent system requires the orchestrator to regenerate information that was already generated by a subagent (rather than just passing summaries), you're paying output token rates on that regeneration. Keep orchestrator outputs thin — it should aggregate, not regenerate.


Coordination overhead: how much extra it costs

Coordination overhead is the token cost of running a multi-agent system that wouldn't exist in a single-agent architecture. It has three components: orchestrator messages (instructions from orchestrator to subagents), subagent responses to the orchestrator (summaries, status updates, results), and any shared memory or blackboard reads/writes that require LLM calls. **The empirical range is 10-30% additional tokens** over the equivalent single-agent implementation.

LangGraph's coordination overhead is at the low end when well-designed. Graph traversal itself doesn't add tokens — the framework's state management is in-memory, not in-context. The overhead comes from what you put in your state object. If you accumulate every subagent's full output in the shared state (a common mistake), you're paying for all of it on every subsequent node's input. If you accumulate only summaries and key outputs, LangGraph overhead can be under 10%.

CrewAI's coordination overhead is higher by design. Each agent-to-agent handoff includes the task instruction (roughly 200-500 tokens) plus the prior agent's output as context. In a 5-agent sequential Crew where each agent outputs 800 tokens, the accumulated handoff overhead by agent 5 is roughly 4 × 800 = 3,200 tokens of prior-agent context plus 5 × 400 = 2,000 tokens of task instructions = 5,200 tokens of pure coordination overhead. At $3/M on Sonnet 4.6, that's $0.016 per Crew run in coordination cost alone. For 10,000 Crew runs/day: **$160/day in overhead**.

**Raw orchestration with direct API calls has the lowest possible overhead.** If you build your multi-agent system as a Python script that calls the Anthropic or OpenAI API directly (no framework), you control exactly what goes into each agent's context and there's no framework-generated preamble. The coordination overhead is limited to whatever your orchestrator's routing logic explicitly sends — typically 100-300 tokens per handoff. For cost-sensitive production systems at scale, this lean approach can save 15-20% vs framework-based orchestration.

The practical recommendation: design your orchestrator prompts with token discipline. Every extra token in an orchestrator instruction gets paid N times (once per subagent). Every extra token in a subagent summary gets paid once by the orchestrator and potentially once by each subsequent subagent if it's included in shared context. **Tight prompt discipline in multi-agent systems is worth 3-5× more than in single-agent systems** because overhead multiplies across agents.

A useful heuristic: measure your coordination overhead at baseline by running your multi-agent system on 10 representative tasks and calculating the ratio of orchestration tokens (tokens in messages between agents) to task tokens (tokens in actual task inputs and outputs). If the ratio is above 25%, your coordination overhead is too high and warrants prompt trimming.


Failure modes: how single vs multi-agent fails differently

Single-agent failure modes are simpler but deeper. When a single agent fails, the entire task fails — there is no partial recovery. The most common single-agent failures: context overflow (task requires more tokens than the context window, causing truncation or refusal), quality degradation late in context (model attention mechanisms perform worse on very long contexts — a known limitation of transformer-based models at 100k+ token contexts), tool call failure cascades (a failed tool call mid-task corrupts the agent's internal state), and stuck loops (the agent calls the same tool repeatedly without progress). All of these require a full task restart.

**Multi-agent failure modes are more numerous but shallower.** Individual agent failures are recoverable if the orchestrator is designed to handle them — a subagent that fails can be retried or bypassed without restarting the entire task. The orchestrator receives a failure signal instead of a summary and can route around it. This is a meaningful advantage for long-horizon tasks where a full restart is expensive.

The most dangerous multi-agent failure mode is the **agent loop** — agents calling each other in a cycle without a termination condition. Agent A calls agent B, which calls agent C, which calls agent A again. Without a max_iterations cap and a fallback completion condition, this burns through tokens and budget until hitting a rate limit or exhausting your API budget. This failure mode doesn't exist in single-agent systems. Always define max_iterations at the orchestrator level and implement a budget check before each subagent spawn.

Coordination message misinterpretation is another multi-agent-specific failure. When the orchestrator writes a task instruction that the subagent misinterprets, the subagent completes the wrong task and returns a plausible-looking wrong answer. The orchestrator, not having context on whether the task was interpreted correctly, incorporates the wrong answer into its synthesis. This failure mode is hard to detect without LLM-as-judge evaluation on subagent outputs. In single-agent systems, the same LLM that made the interpretation error also writes the final output — the error is more visible.

**Which failure mode is worse depends on whether partial results have value.** For a 50-source research task, partial results (40 of 50 sources successfully analyzed) have significant value — you can synthesize from 40 sources even if 10 failed. Multi-agent is the better failure model here. For a code generation task where the code is only useful if the entire function works correctly, partial results (parts of the function generated by different agents) may not compose correctly. Single-agent's simpler 'all or nothing' failure model may be safer.

The cost of a failure also differs. A single-agent failure means restarting a task that may have consumed many tokens. A multi-agent failure may be partially recoverable, but the orchestration overhead means you've already spent tokens on coordination even before the failure occurred. For tasks with high failure rates, single-agent's lower overhead per attempt can make it cheaper in expectation, even if multi-agent's parallelism advantage would benefit successful runs.


Parallelization: when wall-clock time is the metric that matters

The cleanest argument for multi-agent is latency reduction via parallelization. **Truly independent subtasks run simultaneously instead of serially — the wall-clock time is the max of the slowest subtask, not the sum of all subtasks.** For interactive applications where response latency is user-visible, this can be the difference between a good product and an unusable one.

The math is straightforward: 10 serial LLM calls at 3 seconds each = 30 seconds total wall-clock time. 5 parallel pairs of LLM calls at 3 seconds each = 6 seconds total wall-clock time — a 5× latency reduction. For a research task that genuinely has 10 independent subtasks, this is compelling. For an agentic customer support system where the customer is waiting, the difference between 30 seconds and 6 seconds is the difference between acceptable and unusable.

**LangGraph's fan-out/fan-in pattern is the cleanest implementation of parallel multi-agent execution.** A fan-out node spawns N parallel subgraph executions; a fan-in node waits for all N to complete and aggregates their results. LangGraph handles the async coordination automatically via its async executor — you just define the graph topology and the framework handles parallel execution. See the LangGraph multi-agent concepts documentation at https://langchain-ai.github.io/langgraph/concepts/multi_agent/ for implementation details.

Anthropic's research system uses this fan-out/fan-in pattern for exactly the parallelization benefit: the orchestrator decomposes a research question into N independent subtopics, fans out to N search agents simultaneously, waits for all results, then synthesizes. The wall-clock time is bounded by the slowest single search agent plus the orchestrator's synthesis time — not the sum of all search times. At 5 parallel agents, the system is ~5× faster than sequential execution.

**The prerequisite for parallelization is genuine independence.** Before using multi-agent for latency reduction, explicitly verify that your subtasks don't have hidden dependencies. If subtask 3 uses the output of subtask 2 as a filter, they're not independent — parallelizing them requires subtask 3 to run after subtask 2 completes, eliminating the latency benefit. Map your task's full dependency graph before parallelizing.

A practical note on rate limits: parallel agent execution amplifies your API request rate proportionally. Five parallel agents make 5× as many concurrent requests as a single agent. If you're on a lower-tier API plan with strict requests-per-minute limits, parallel multi-agent execution can hit rate limits that single-agent execution avoids. Check your provider's rate limits before assuming you can fan out to N agents — you may need a higher-tier plan or a request queuing layer.


Context isolation: the underrated multi-agent benefit

The parallelization benefit of multi-agent is widely discussed. The context isolation benefit is less frequently cited but is often the more important one in practice. **Each agent in a multi-agent system starts with a clean context window containing only its specific task context.** No prior-turn contamination from other subtasks, no quality degradation from long accumulated context, and no token cost from irrelevant prior steps.

Context contamination is a real quality risk in single-agent systems. Research on attention mechanism behavior in large language models consistently shows that model performance degrades as context grows, particularly for information at the beginning of a very long context (the 'lost in the middle' effect). A single agent processing a 50-step task accumulates all prior steps in context — by step 40, the model's effective attention on the original task instructions is significantly weaker than at step 1. Multi-agent with context isolation eliminates this degradation entirely.

**Context isolation also enables model specialization — one of the strongest cost optimization patterns for multi-agent systems.** Because each agent is independent, you can assign different models to different agents based on the capability requirements of each subtask. Use Claude Opus 4.7 ($15/$75 per M tokens) for the orchestrator and complex reasoning agents. Use Claude Sonnet 4.6 ($3/$15 per M tokens) for general research and extraction agents. Use Claude Haiku 4.5 ($0.80/$4.00 per M tokens) for classification, routing, and simple filtering agents. This model mixing is invisible to single-agent architectures but is a first-class capability in multi-agent systems.

The model specialization math is significant. A 5-agent research system where: 1 orchestrator runs on Opus 4.7 (20% of tokens), 3 research agents run on Sonnet 4.6 (60% of tokens), and 1 classification agent runs on Haiku 4.5 (20% of tokens), compared against a single-agent implementation on Sonnet 4.6 (100% of tokens). Blended cost per million tokens for the multi-agent system: (0.20 × $15) + (0.60 × $3) + (0.20 × $0.80) = $3 + $1.80 + $0.16 = $4.96/M. For single Opus 4.7: $15/M. For single Sonnet 4.6: $3/M. The multi-agent system's blended cost is close to Sonnet — but the quality is closer to Opus where it matters most.

**Context isolation also enables adversarial quality patterns** that are impossible in single-agent systems. You can fan out the same task to two independent agents and compare their outputs — if they agree, high confidence; if they disagree, escalate to a third agent or human review. This adversarial pattern is used in Anthropic's research system for fact-checking and is structurally impossible when both 'agents' share a context (the second call sees the first call's reasoning and anchors to it).

One underappreciated use of context isolation: preventing inter-task contamination in high-throughput production systems. When a single agent processes multiple user requests sequentially in a session context (without clearing between requests), earlier users' data can contaminate later users' responses — a serious privacy and quality risk. Multi-agent with per-request context isolation eliminates this risk by construction.


The hybrid: orchestrator + specialized subagents (the Anthropic pattern)

The production-grade multi-agent pattern in 2026 is not a flat collection of equal-rank agents. It's a hierarchy: one orchestrator agent that understands the full task and decomposes it, plus N specialized subagents that each execute a well-defined subtask and return a summary. **The orchestrator doesn't need to be the most capable model — it needs to be a good router and synthesizer.** The subagents need to be good at their specific domain.

This orchestrator + subagent pattern is described explicitly in Anthropic's multi-agent research system blog post (https://www.anthropic.com/engineering/built-multi-agent-research-system) and is implemented cleanly in LangGraph's subgraph architecture. The orchestrator is a LangGraph graph node; each subagent is a subgraph that can be called in parallel. The orchestrator's state accumulates only summaries from subagents, not their full context — this is the key architectural decision that keeps the orchestrator's context bounded.

**Shared memory and blackboard patterns** extend the orchestrator model for cases where subagents need to share information dynamically. A shared blackboard (a structured data store accessible to all agents) lets subagent A write a finding that subagent B reads without requiring the orchestrator to relay the information. This reduces orchestrator message overhead and allows more fluid inter-agent communication. LangGraph implements this via shared state; CrewAI via knowledge sources. The tradeoff: shared mutable state is harder to reason about and debug than explicit orchestrator-mediated communication.

When to add more agents vs add more tools to one agent is a nuanced decision. The rule of thumb: add an agent when the new task requires a clean context (to avoid contamination or reduce context cost) or requires a different model (for cost or quality reasons). Add a tool when the new capability can be expressed as a deterministic function that doesn't require LLM reasoning. Web search = tool. Document summarization = agent. Database lookup = tool. Comparative analysis = agent. This distinction keeps your agent count meaningful rather than architectural busy-work.

**The anti-pattern to avoid:** creating one agent per business concept ('a marketing agent, a finance agent, a legal agent') without checking whether those agents ever actually need context isolation from each other. If your marketing agent always runs after your finance agent and uses its output as input, they're not independent — they should be sequential nodes in a single agent, not separate agents. The multi-agent overhead is only worth paying when the agents genuinely benefit from isolation or parallelism.

The practical endpoint for most production systems: start with a single agent. When you observe a specific bottleneck — context too long and getting expensive, one subtask blocking others that could run in parallel, quality degrading late in context — extract exactly that bottleneck into a subagent. Don't pre-architect a multi-agent system before you have evidence of the problem it's solving. The Anthropic engineering team didn't build a multi-agent system because multi-agent is architecturally elegant — they built it because they had a specific task (research synthesis at scale) where single-agent hit concrete limits.

Deciding between single and multi-agent for your use case

  1. 1

    Step 1: Map your task into a DAG

    Draw out your task as a directed acyclic graph — nodes are subtasks, edges represent dependencies. Identify which nodes have no incoming edges from other task nodes (they're independent and can run in parallel) and which nodes have multiple incoming edges (they depend on prior results and are sequential). If your DAG is a straight line (every node depends on the previous one), multi-agent gives you coordination overhead with zero parallelism benefit — single agent is the right choice. If your DAG has a wide flat layer of independent nodes (research on 10 topics simultaneously, analyzing 10 documents in parallel), multi-agent is the right architecture for that layer.

  2. 2

    Step 2: Estimate context accumulation cost

    Calculate how expensive your single-agent context will become. Multiply your average per-step output token count by the number of sequential steps, then sum the triangular accumulation (step N pays for all prior steps' outputs). Use current Claude Sonnet 4.6 pricing ($3/M input, $15/M output) from https://docs.anthropic.com/en/docs/about-claude/pricing. Compare this against the coordination overhead cost of a multi-agent split (roughly 5 orchestration messages × 400 tokens each × $3/M = $0.006 per orchestration event). If the accumulated context cost exceeds the coordination overhead — which typically happens around 15-25 sequential steps — multi-agent wins on cost.

  3. 3

    Step 3: Identify whether specialization matters

    Ask: does step 3 of your task require a fundamentally different capability than step 1? If your task alternates between creative generation and analytical fact-checking, a multi-agent setup with a separate fact-checker agent (running on a clean context, without seeing the generative agent's reasoning) captures a quality gain impossible in single-agent. If all your steps require the same kind of reasoning and the same model, separate agents just multiply your API calls without adding capability. Specialization as a multi-agent justification requires a concrete quality difference, not just conceptual cleanliness.

  4. 4

    Step 4: Design for failure isolation

    In a multi-agent system, define what a partial result looks like and whether it has value before you commit to the architecture. If a research task where 8 of 10 subagents succeed gives you 80% of the value (you can synthesize from 8 sources), multi-agent's partial recovery model is an advantage. If a code generation task where 3 of 4 subagents produce correct code fragments and 1 produces a wrong fragment gives you zero value (the code doesn't compile), then the partial recovery model provides no benefit and single-agent's simpler all-or-nothing retry is preferable. Match your failure isolation design to whether partial results are useful.

  5. 5

    Step 5: Start single, refactor to multi-agent on evidence

    Build the single-agent version first. Run it on representative tasks. Observe specifically where it fails or underperforms: Is context getting too long and expensive? Is a specific step blocking other steps that could run in parallel? Is quality degrading late in the context window? Is one type of subtask failing consistently while others succeed? Each of these failures points to a specific multi-agent refactor: context too long → fan out to context-isolated subagents; serial bottleneck → parallel fan-out for the independent steps; quality degradation → clean-context subagent for the problematic step. Refactor exactly the bottleneck, not the whole architecture.

Frequently Asked Questions

When should I use multi-agent instead of single-agent?

Use multi-agent when you have one or both of: (1) truly parallel subtasks — independent work that can run simultaneously, where the wall-clock time reduction justifies the coordination overhead, or (2) context accumulation cost — a task long enough that the single agent's growing context becomes more expensive than the multi-agent coordination overhead (roughly 15-25+ sequential steps). If neither condition applies, single agent is faster to build, cheaper to run, and easier to debug. The most common mistake is choosing multi-agent for conceptual elegance ('it's more modular') rather than a concrete performance or cost benefit.

How much overhead does multi-agent coordination add to my API bill?

Empirically, 10-30% additional tokens over an equivalent single-agent implementation, depending on your framework and how carefully you design your orchestration messages. LangGraph with well-scoped state adds ~5-10%. CrewAI's inter-agent handoffs add roughly 200-500 tokens per agent transition plus accumulated context. Raw orchestration (no framework) can be as low as 100-300 tokens per handoff. At Claude Sonnet 4.6 pricing ($3/M input), 10,000 agent runs/day with 20% overhead = roughly $60/day in pure coordination cost. Not trivial at scale — design your orchestration messages to be tight and delegate only summaries, not full outputs, back to the orchestrator.

What did Anthropic build for their multi-agent research system?

An orchestrator agent that receives a research question, decomposes it into independent subtopics, and fans out to specialized subagents — web search agents, document analysis agents, and synthesis agents. Each subagent receives a clean context window containing only its specific subtask. Subagents return summaries (not full outputs) to the orchestrator, which aggregates them into a final synthesis. The orchestrator's context stays bounded because it accumulates summaries, not raw source material. The full engineering account is at https://www.anthropic.com/engineering/built-multi-agent-research-system — essential reading before building any multi-agent system.

Is multi-agent always more expensive than single-agent?

No — and this is a common misconception. For tasks with 15-25+ sequential steps, single-agent context accumulation (paying input token costs on all prior steps on every turn) typically costs more than multi-agent coordination overhead. The crossover depends on your per-step output token count and system prompt size. Below the crossover: single agent is cheaper. Above it: multi-agent with context isolation is cheaper. The cost advantage of multi-agent also compounds when you use model routing — cheaper models for simpler subagents, more expensive models only for the complex orchestrator and reasoning agents.

How does LangGraph support the multi-agent pattern?

LangGraph's graph model natively supports fan-out nodes (spawn N parallel subgraph executions), fan-in nodes (wait for all N, aggregate results), conditional edges (route to different subagents based on state), and subgraph nesting (encapsulate a complex agent as a reusable subgraph). The async executor handles parallel execution automatically — you define the graph topology and LangGraph handles concurrent scheduling. Shared state lets agents communicate without explicit orchestration messages. See the official documentation at https://langchain-ai.github.io/langgraph/concepts/multi_agent/ for implementation patterns and code examples.

What's the biggest failure mode in multi-agent systems?

Agent loops — agents calling each other in a cycle without a termination condition. Agent A calls agent B, which calls agent C, which triggers agent A again. Without a max_iterations cap and a fallback completion condition at the orchestrator level, this burns through tokens and budget until hitting a rate limit or exhausting your API spend cap. This failure mode doesn't exist in single-agent systems and is the most common cause of unexpected cost spikes in multi-agent production deployments. Always set max_iterations in your agent framework, implement a per-task token budget check, and alert on any task exceeding 3× your expected average token count.

Can I use different models for different agents in a multi-agent system?

Yes — and model routing across agents is one of the most impactful cost optimizations in multi-agent systems. Because each agent has its own context and is an independent LLM call, you can assign any model to any agent. Common production pattern: Claude Opus 4.7 ($15/$75 per M tokens) for the orchestrator and hard reasoning agents; Claude Sonnet 4.6 ($3/$15) for general research and extraction agents; Claude Haiku 4.5 ($0.80/$4.00) for classification, routing, and simple filtering. This model mixing brings your blended cost close to Sonnet while giving Opus-quality reasoning where the task demands it.

How does context isolation help in multi-agent systems?

Context isolation means each subagent starts with a clean context window containing only its specific task — no prior-turn contamination from other subtasks, no quality degradation from long accumulated context, and no token cost from irrelevant prior steps. This matters practically in two ways: (1) quality — models perform better on shorter, focused contexts than on long accumulated ones (the 'lost in the middle' effect); (2) cost — each agent pays only for its own task context, not the full accumulated history. For adversarial quality patterns (having two agents independently analyze the same source, then comparing results), context isolation is structurally required — a second agent that sees the first agent's reasoning will anchor to it, defeating the independence.

Whether you have one agent or ten, the prompt is what drives quality.

Our AI Prompt Generator builds orchestrator system prompts, subagent role definitions, and tool schemas tuned to Claude, GPT-5, and Gemini. 14-day free trial, no card.

Browse all prompt tools →