Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Multi-Agent Cost Per Task (2026): Orchestrator + N Workers Math

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

When you move from a single-agent loop to a multi-agent system, costs compound at the system level, not just the per-turn level. An orchestrator reasons about task delegation, each worker runs its own context and tool loop, and inter-agent messages carry context that gets billed again on every handoff. A naive 3-worker CrewAI research task that looks like $0.15/task on paper often runs $0.50-$1.20/task in production because the orchestrator pass, the worker outputs fed back to the orchestrator, and the synthesis step are each full LLM calls. See our companion agent loop cost calculator for the single-agent baseline.

The two dominant multi-agent patterns in 2026 — CrewAI's role-based crew architecture and LangGraph's supervisor graph — have different cost structures. CrewAI's crew manager does sequential delegation with explicit agent-to-agent message passing; LangGraph's supervisor sends subtask descriptions to specialized nodes and routes results back through a shared state graph. Both patterns incur orchestrator cost + N × worker cost + synthesis cost, but the overhead per handoff differs.

Below: the multi-agent cost formula, the CrewAI vs LangGraph overhead comparison, worked $ examples at 1, 3, and 5 workers on a research-and-write task, reasoning model overhead math, and the table production teams use to decide when to add workers vs keep a single-agent loop. For the tool-call overhead component specifically, see our tool use overhead cost calculator. Model pricing from Anthropic and OpenAI, framework docs from docs.crewai.com and langchain-ai.github.io/langgraph, fetched June 2026.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Multi-agent task cost: 1 vs 3 vs 5 workers, CrewAI vs LangGraph, June 2026

Feature
1 worker
3 workers
5 workers
CrewAI on Sonnet 4.6 (total $/task)$0.18$0.52$0.88
CrewAI on GPT-5.5 (total $/task)$0.30$0.87$1.46
CrewAI on Opus 4.7 (total $/task)$0.92$2.68$4.46
LangGraph supervisor Sonnet 4.6 (total $/task)$0.15$0.42$0.70
LangGraph supervisor GPT-5.5 (total $/task)$0.25$0.71$1.18
LangGraph supervisor Opus 4.7 (total $/task)$0.78$2.24$3.72
Orchestrator cost share (CrewAI)22%18%14%
Orchestrator cost share (LangGraph)18%14%11%
Worker cost share55%65%72%
Synthesis / reporting cost share23%17%14%
With reasoning model orchestrator (o4-mini)+$0.08+$0.08+$0.08
Monthly at 1K tasks/day Sonnet LangGraph$4,500$12,600$21,000

Sources, fetched 2026-06-21: Anthropic pricing (https://docs.anthropic.com/en/docs/about-claude/pricing) — Sonnet 4.6 $3/$15, Opus 4.7 $15/$75 per 1M input/output. OpenAI pricing (https://openai.com/api/pricing/) — GPT-5.5 $5/$25. CrewAI framework docs (https://docs.crewai.com/). LangGraph multi-agent docs (https://langchain-ai.github.io/langgraph/concepts/multi_agent/). Token model: orchestrator receives 3K task description + routes with 500-token subtask messages; each worker runs 3-turn loop with 5K average input per turn and 400 output tokens per turn; synthesis receives all worker outputs (~1.5K per worker) and produces 800-token final output. Reasoning model overhead (o4-mini at $1.10/$4.40 per 1M) modeled at 5K reasoning input + 1K output per orchestration pass.

The multi-agent cost formula

**The total cost of a multi-agent task has four components:** `total_cost = orchestrator_cost + Σ_workers(worker_loop_cost) + inter_agent_message_cost + synthesis_cost`. Each component is itself an LLM call (or loop) with its own input/output token bill. The orchestrator plans and routes; each worker executes; inter-agent messages carry context between calls; synthesis assembles the final output from all worker results.

**Orchestrator cost.** The orchestrator receives the full task description and outputs subtask assignments. On CrewAI, the manager LLM sees the full goal, the list of available agents with their role descriptions, and the current state — typically 2K-5K input tokens. It outputs a delegation decision: 200-500 tokens. On LangGraph supervisor, the supervisor node receives a shared state object (growing as workers report back) and outputs a routing decision. LangGraph supervisors are often more token-efficient because they operate on a structured state graph rather than natural-language plans.

**Worker loop cost.** Each worker runs its own agent loop — the same N-turn accumulation structure described in the agent loop cost calculator. For a 3-turn research worker with 5K average input per turn and 400 output tokens: 3 × 5K × input_rate / 1M + 3 × 400 × output_rate / 1M. On Sonnet 4.6: 15K × $3/1M + 1.2K × $15/1M = $0.045 + $0.018 = $0.063 per worker per task. For 3 workers: $0.189 just in worker execution.

**Inter-agent message cost.** Every time one agent passes output to another — worker result to orchestrator, subtask description from orchestrator to worker — those tokens are billed as input on the receiving agent's next call. In CrewAI's sequential delegation pattern, each worker output (~800 tokens) is included in the next agent's context. In a 5-worker chain: 4 × 800 = 3,200 tokens of inter-agent message overhead billed as input. On Opus 4.7: 3.2K × $15/1M = $0.048 extra per task from message passing alone.

**Synthesis cost.** After workers complete, the orchestrator (or a separate synthesis step) assembles the final output. On a research-and-write task, synthesis receives all worker outputs (~1.5K per worker for 3 workers = 4.5K tokens) plus the original task description (1K), emitting an 800-token final document: 5.5K × $3/1M + 0.8K × $15/1M = $0.0165 + $0.012 = $0.0285 on Sonnet 4.6. Small per-task, but synthesis is often run on a more expensive model to improve quality — running synthesis on Opus adds $0.0825 + $0.060 = $0.1425 instead.

**The formula simplifies to a multiplier.** Rule of thumb: a multi-agent task with N workers costs approximately (1.2 + N × 1.15) × single_worker_loop_cost. The 1.2 accounts for orchestrator overhead; the 1.15 per worker accounts for inter-agent message passing beyond the pure worker cost. This multiplier holds within ±15% for most production CrewAI and LangGraph deployments — it breaks down for very long worker loops (>8 turns) or large inter-agent payloads (>2K per message).


CrewAI cost structure: role-based crew patterns

**CrewAI's architecture** is built around role-defined agents (Researcher, Writer, Editor, etc.) coordinated by a manager that plans and delegates in natural language. The manager uses a configured LLM to create a plan, assign tasks to agents in sequence or in parallel, and aggregate results. This natural-language coordination is flexible but token-intensive — each delegation involves a natural-language instruction that includes agent role descriptions, task context, and any prior results. Source: CrewAI docs.

**CrewAI token overhead per delegation.** In a standard 3-agent crew (Researcher, Writer, Editor), each delegation message includes: agent role description (~300 tokens), task instructions (~500 tokens), relevant prior context (~800 tokens for writer/editor receiving researcher output). Total per delegation: ~1,600 tokens. Three delegations: ~4,800 tokens of orchestration input overhead per task, billed at whatever model runs the manager. On Sonnet 4.6: 4.8K × $3/1M = $0.0144 in orchestration overhead. On Opus 4.7: $0.072.

**When to run the manager on a cheaper model.** CrewAI lets you configure different LLMs for the manager vs individual agents. Running the manager on GPT-5.4 or Sonnet 4.6 (routing/planning tasks) while running workers on Opus 4.7 (complex reasoning tasks) is the standard cost optimization pattern. A 3-worker crew with Sonnet manager + Opus workers costs: Sonnet orchestrator $0.014 + 3 × Opus worker $0.063 each = $0.014 + $0.189 = $0.203, vs $0.72 for all-Opus. The manager's job is routing, not reasoning — it rarely needs a frontier model.

**CrewAI's process types affect cost.** Sequential process (one agent at a time, each receiving prior agent output) accumulates context across the pipeline — by the editor's turn, the context includes the researcher's 1,200-token output and the writer's 1,500-token draft. Hierarchical process (manager delegates all at once, collects in parallel) avoids that accumulation — each worker sees only its own task, not prior workers' outputs. For 5-worker crews, hierarchical process can cut total input tokens by 30-40% vs sequential. For tasks where workers need each other's context (iterative refinement, critique chains), sequential is required; for parallel research tasks, hierarchical saves money.

**Worked CrewAI 3-worker task on Sonnet 4.6.** Task: research a competitor, write a 1-page brief, edit for clarity. Manager: 3K input × $3/1M + 300 output × $15/1M = $0.009 + $0.0045 = $0.0135. Researcher (3-turn loop): 15K input + 1.2K output = $0.063. Writer (receives 1.2K researcher output, 2-turn loop): (5K + 1.2K) input + 0.8K output = (6.2K × $3 + 0.8K × $15) / 1M = $0.0186 + $0.012 = $0.0306. Editor (receives 1.5K draft, 1 turn): (3K + 1.5K) input + 500 output = $0.0135 + $0.0075 = $0.021. Synthesis: 4K input + 600 output = $0.012 + $0.009 = $0.021. **Total: $0.0135 + $0.063 + $0.0306 + $0.021 + $0.021 = $0.149/task on Sonnet 4.6.**

**At scale: 1,000 CrewAI tasks/day on Sonnet 4.6.** Monthly cost: 30,000 × $0.149 = **$4,470/month** for 3-worker sequential tasks. Add 80% cache hit on stable agent role descriptions (each ~300 tokens, cached at $0.30/1M): savings ≈ 30K × 3 × 300 × ($3 − $0.30) / 1M = $73/month. Modest because role descriptions are a small fraction of total tokens. The bigger lever is worker result size — keep researcher outputs under 500 tokens to avoid downstream accumulation. See CrewAI documentation on output structure for output truncation patterns.


LangGraph supervisor cost structure: graph-based coordination

**LangGraph's supervisor pattern** uses a central supervisor node that reads a shared state graph, decides which worker node to invoke next, and updates the state with each worker's output. The key difference from CrewAI: the supervisor operates on a structured state object (typed fields, not free-text plans) and routing decisions are often a simple classification ('which agent next?') rather than a natural-language plan. This makes LangGraph supervisors typically 20-30% more token-efficient than CrewAI managers for the same task. Source: LangGraph multi-agent concepts.

**LangGraph state as a cost lever.** The shared state object grows as workers add to it. A well-designed state schema keeps worker outputs in typed fields (e.g., `{research: string, draft: string, revision: string}`) rather than appending full conversation history. The supervisor sees only the current state fields, not the full conversation of each worker. This prevents the quadratic context accumulation that plagues sequential multi-agent chains — the supervisor's input stays bounded even as workers complete more work.

**LangGraph worker subgraphs.** Each worker node in LangGraph can be its own subgraph — a mini agent loop with its own tool set and memory. The worker subgraph cost is identical to the single-agent loop model: N turns × (input + output) billed at the worker node's configured model. The supervisor cost is typically 1 input call per routing decision (300-800 tokens input, 50-200 tokens output for a structured routing JSON). For a 3-worker graph with 2 supervisor routing decisions per worker: 6 supervisor calls × (500 input × $5/1M + 100 output × $25/1M) on GPT-5.5 = 6 × ($0.0025 + $0.0025) = $0.030 in supervisor overhead — leaner than CrewAI's natural-language delegation.

**Subgraph handoff tokens.** When a worker subgraph completes and passes results back to the supervisor, those results become part of the shared state — billed as input on the supervisor's next read. A researcher worker returning 1,200 tokens of findings into the state adds 1,200 tokens to the supervisor's next context read. For 3 workers returning 1,200 tokens each, the supervisor's final synthesis read has 3,600 tokens of worker results plus the original task: ~5,000 tokens input. On GPT-5.5: $0.025. Smaller than it looks because the supervisor only reads the state once per routing step, not once per prior turn.

**Worked LangGraph 3-worker task on GPT-5.5.** Task: research, write, edit. Supervisor routing (6 calls at 500 in / 100 out): 6 × ($0.0025 + $0.0025) = $0.030. Researcher worker (3-turn loop, same as above on GPT-5.5): 15K × $5/1M + 1.2K × $25/1M = $0.075 + $0.030 = $0.105. Writer worker (2-turn loop with 1.2K research context): (6.2K × $5 + 0.8K × $25) / 1M = $0.031 + $0.020 = $0.051. Editor worker (1 turn, 3K + 1.5K context): 4.5K × $5/1M + 500 × $25/1M = $0.0225 + $0.0125 = $0.035. Synthesis (5K input + 800 output): $0.025 + $0.020 = $0.045. **Total GPT-5.5 LangGraph: $0.030 + $0.105 + $0.051 + $0.035 + $0.045 = $0.266/task.** vs CrewAI on same model = $0.295. LangGraph is ~10% leaner on this task.

**At scale: 500 LangGraph tasks/day on GPT-5.5.** Monthly: 15,000 × $0.266 = **$3,990/month**. With 50% cache hit on stable supervisor routing instructions: savings ≈ 15K × 6 × 200 × ($5 − $0.50) / 1M = $81/month. Small but consistent. Main lever: keep the shared state schema compact — avoid appending full conversation histories to the state object. Each kilobyte you trim from the state saves supervisor + synthesis input tokens on every task.


1 vs 3 vs 5 workers: the cost-quality tradeoff

**Adding workers is not linear in value.** The cost of adding a 4th and 5th worker to a multi-agent task scales at roughly 1.0-1.2x per worker (worker loop cost + inter-agent message overhead), while the quality gain from adding workers beyond 3 typically follows diminishing returns. For most research-and-write tasks, a 3-worker crew achieves 90-95% of the quality of a 5-worker crew at 60% of the cost. The 5-worker case is justified primarily when: (1) tasks have truly independent parallel research tracks, (2) a specialized worker (e.g., a fact-checker or citation validator) can't be folded into a generalist worker, or (3) quality evaluation requires a dedicated critic agent.

**1-worker tasks on a powerful model.** For many 'multi-agent' tasks, a single capable model (Opus 4.7 or GPT-5.5) with well-designed tool use can match or exceed a 3-worker crew of weaker models in both quality and cost. Opus 4.7 on a 5-turn research loop costs ~$0.683 uncached; 3 workers of GPT-5.4 on the same task costs ~$0.35. The single Opus loop may produce better output than the 3-worker GPT-5.4 crew — test your specific task before defaulting to multi-agent.

**The parallelism premium.** LangGraph supports true parallel worker execution via async subgraphs. When 3 workers run in parallel, wall-clock time drops to ~1x worker duration + supervisor overhead, vs 3x for sequential. But token costs are identical regardless of execution order — parallelism saves time, not money. Build for parallel execution when latency matters; optimize tokens separately for cost.

**Flat cost scaling vs quality scaling.** The cost of a 5-worker system is approximately: orchestrator ($0.03) + 5 × worker ($0.063 on Sonnet 4.6) + synthesis ($0.035) = $0.38/task. Quality typically plateaus after 3 workers for most research tasks. At 10K tasks/month, the 5-worker configuration costs $3,800/month vs 3-worker at $2,100/month — a $1,700/month premium for marginal quality gains. Unless the task type genuinely benefits from 5 specialists (e.g., a 5-stage content pipeline: ideation/research/outline/draft/edit), stop at 3.

**When to scale workers vs upgrade model tier.** If your 3-worker Sonnet 4.6 crew produces mediocre quality, the fix is usually better prompts or a single worker upgrade to Opus 4.7 for the bottleneck step — not adding more Sonnet 4.6 workers. Adding weak workers to fix a quality problem never works; fixing the weakest agent in the chain does. Profile which step produces the worst output; upgrade that agent's model or rewrite its prompt before adding a 4th worker. Source: CrewAI best practices, LangGraph multi-agent docs.

**Worked comparison: 1-worker Opus vs 3-worker Sonnet.** 1-worker Opus 4.7 (5-turn research loop, uncached): $0.683. 3-worker Sonnet 4.6 (CrewAI, as modeled above): $0.149. Cost advantage for 3-worker Sonnet: 78% cheaper. Quality comparison: Sonnet per-turn SWE-bench ~65-68% vs Opus ~76%. For complex reasoning tasks, 1-worker Opus likely wins. For well-structured parallel research tasks, 3-worker Sonnet often wins on both cost AND quality (because task decomposition gives each worker a simpler, well-scoped job that Sonnet handles well).


Reasoning model overhead: when to use o4-mini as orchestrator

**Reasoning models (OpenAI o4-mini, o4)** are designed for complex planning and decision-making tasks. Using o4-mini as the multi-agent orchestrator — with GPT-5.4 or Sonnet 4.6 as workers — is a cost-effective pattern that concentrates reasoning budget where it matters most: the delegation and routing decisions. o4-mini is priced at approximately $1.10/$4.40 per 1M input/output with reasoning tokens billed as output. Source: OpenAI pricing.

**Reasoning token overhead.** o4-mini 'thinks' before routing — generating 2K-8K reasoning tokens internally before emitting the delegation decision. Those reasoning tokens bill at the output rate ($4.40/1M). For a single orchestration call with 5K reasoning tokens + 200-token routing output: 5K × $4.40/1M + 200 × $4.40/1M = $0.022 + $0.00088 = $0.023. vs GPT-5.4 routing the same task: 800 input × $2.50/1M + 200 output × $15/1M = $0.002 + $0.003 = $0.005. **o4-mini orchestration costs 4-5x more per routing call.** The premium buys a meaningfully more reliable delegation plan — worth it when routing mistakes are expensive (each wrong routing adds a full worker run at $0.063).

**When o4-mini orchestrator pays.** If the routing task is complex enough that GPT-5.4 makes wrong delegations on 10% of tasks, each wrong delegation adds a full worker re-run: 0.10 × $0.063 = $0.0063 in expected re-run cost. o4-mini's orchestration premium is $0.018 more per routing call — it pays if it reduces routing errors by more than $0.018 / $0.063 = 29% of the routing mistakes. For clearly structured tasks (always delegate in the same sequence), the routing is trivial and o4-mini adds no value. For complex dynamic tasks where the right agent depends on prior results, o4-mini's planning quality is material.

**Hybrid orchestrator pattern.** Run o4-mini for the initial task decomposition (most complex routing decision) and GPT-5.4 for subsequent routing decisions (simpler, pattern-matching work). Cost: 1 o4-mini call ($0.023) + 4 GPT-5.4 calls ($0.020) = $0.043 in orchestration, vs 5 GPT-5.4 calls ($0.025). The $0.018 premium for the initial o4-mini call is worth it for tasks where task decomposition quality determines final output quality — research synthesis, code architecture, multi-step analysis. Source: OpenAI assistants overview.

**Claude extended thinking as an alternative.** Anthropic's extended thinking mode on Opus 4.7 provides similar deep reasoning to o4-mini at the orchestrator level. Extended thinking tokens bill at the input rate for the thinking block (not output, unlike OpenAI's reasoning tokens) — a 4K thinking block on Opus: 4K × $15/1M = $0.060. This is significantly more expensive than o4-mini for pure orchestration. Extended thinking on Opus is better justified when the orchestrator also needs Opus-level tool use or world knowledge — combining planning and execution in the same model rather than separating them.

**Bottom line on reasoning model overhead.** Budget $0.020-$0.030 per orchestration call for o4-mini, $0.060+ for extended thinking Opus. For a 3-worker system running 10K tasks/month, the orchestration premium is $200-$300/month — small relative to total worker costs ($1,300-$2,280). Spend it on o4-mini if routing quality is your bottleneck; spend it on worker model upgrades if execution quality is your bottleneck.


Worked scenario: research task at 1K, 10K, 100K tasks/month

**Task profile: competitive analysis report.** Inputs: company name, 3 focus areas. Output: 800-word structured report. Pipeline: Researcher (web search, 3-turn loop), Analyst (synthesizes research, 2-turn loop), Writer (drafts report, 2-turn loop). Framework: LangGraph supervisor on Sonnet 4.6 workers. Supervisor model: GPT-5.4.

**Per-task cost breakdown.** Supervisor (GPT-5.4, 4 routing calls): 4 × (600 in × $2.50/1M + 150 out × $15/1M) = 4 × ($0.0015 + $0.00225) = $0.015. Researcher (Sonnet 4.6, 3-turn loop): 15K in × $3/1M + 1.2K out × $15/1M = $0.045 + $0.018 = $0.063. Analyst (Sonnet 4.6, 2-turn loop with 1.2K research context): (5K + 1.2K) × $3/1M + 0.8K × $15/1M = $0.0186 + $0.012 = $0.0306. Writer (2-turn loop with 1.5K analyst output): (5K + 1.5K) × $3/1M + 800 × $15/1M = $0.0195 + $0.012 = $0.0315. Synthesis: 4.5K in × $3/1M + 800 out × $15/1M = $0.0135 + $0.012 = $0.0255. **Total: $0.015 + $0.063 + $0.0306 + $0.0315 + $0.0255 = $0.1656/task.**

**1,000 tasks/month:** $165.60/month. Cache hit on stable supervisor routing instructions (each 600 tokens, hit 80% of the time): 4 × 1K × 0.8 × 600 × ($2.50 − $0.25) / 1M = $4.32 saved. **Cached total: $161/month.** Well within experimental budget for any production team.

**10,000 tasks/month:** $1,656/month uncached, ~$1,613/month with GPT-5.4 supervisor cache. At this volume, consider caching the stable Sonnet worker system prompts too (each worker has a ~500-token role description): savings ≈ 10K × 3 workers × 5 turns × 500 tokens × ($3 − $0.30) / 1M = $202/month. **Full cache: ~$1,411/month.** Meaningful but not transformative — worker execution dominates, not prefix overhead.

**100,000 tasks/month:** $16,560/month uncached. With all caching enabled: ~$14,110/month. This is where model choice starts to matter: switching the Analyst and Writer workers from Sonnet 4.6 to GPT-5.4 (20% cheaper input, same output) saves 100K × (Analyst + Writer input cost delta) ≈ $240/month — small. Switching the Researcher to GPT-5.4 saves more since it's the longest loop: 100K × (15K × ($3 − $2.50) / 1M) = $750/month. **Optimized 100K/month: ~$13,100.** From $16,560 uncached to $13,100 fully optimized = 21% total reduction. The prompt quality and worker result size constraints have higher EV than model-switching at this tier.

**Annual at 100K tasks/month:** $157,200 uncached, $157,200 optimized. The biggest single lever is keeping worker result sizes compact — every 100 tokens trimmed from Researcher output saves 100K × (Analyst turns + Writer turns + Synthesis) × $3/1M downstream tokens per month. A 500-token trim saves 100K × 4 downstream reads × 500 × $3/1M = $600/month. Write concise output instructions for each worker in the prompt.


Inter-agent message cost: the hidden tax

**Inter-agent messages are billed as input on every receiving agent's call.** In a sequential CrewAI pipeline, each worker's output (~800-1,500 tokens) is prepended to the next worker's context. In a 5-worker sequential chain where each worker adds 1,000 tokens: worker 2 receives 1K extra input, worker 3 receives 2K extra, worker 4 receives 3K extra, worker 5 receives 4K extra. Total inter-agent message overhead: (0 + 1K + 2K + 3K + 4K) = 10K extra input tokens beyond the base worker contexts. On Sonnet 4.6: 10K × $3/1M = $0.030 per task from message passing alone.

**LangGraph shared state vs CrewAI message chains.** CrewAI's default sequential process passes the full prior agent output as a message to the next agent. LangGraph's shared state approach stores worker outputs in typed fields — each downstream worker reads only the specific fields it needs, not the full conversation history. For a well-designed LangGraph state schema, the writer worker reads only `state.research` (800 tokens) rather than the full researcher conversation (3,000 tokens). This typically saves 1,000-2,000 tokens per downstream worker vs CrewAI sequential. Source: LangGraph state management docs.

**Truncate inter-agent outputs explicitly.** In both CrewAI and LangGraph, you can add output length limits to each agent's task description: 'Return your findings in under 400 words, structured as [format].' A 400-token research summary vs a 1,500-token research dump saves 1,100 tokens on every downstream worker that receives it. In a 3-downstream-worker chain: 3 × 1,100 × $3/1M = $0.0099 per task. At 100K tasks/month: $990/month from output truncation. Set explicit output length targets in every worker's task description.

**Tool results vs agent results.** Tool results (web search, database lookups, code execution output) and agent results (prior worker outputs) both add to downstream context. The difference: tool results are bounded by the tool itself (you can truncate at the API layer), while agent results depend on what the LLM chose to output. For tool results, always post-process to extract only the relevant fields before passing to the next agent — pass 200-token extracted facts, not 2,000-token raw API responses.

**The context accumulation cliff.** In a sequential multi-agent pipeline, if you don't truncate inter-agent outputs, context grows linearly with worker count and quadratically with worker turn count. A 5-worker × 5-turn sequential pipeline without truncation can accumulate 50K-80K tokens in later workers' contexts — approaching Sonnet's context limit and incurring costs 10x what a well-designed pipeline pays. Use LangGraph's state graph with compact typed fields as the architecture, not sequential message passing, for any pipeline with more than 3 workers. Source: LangGraph multi-agent architecture.


When to use multi-agent vs single-agent

**Multi-agent makes sense when tasks can be genuinely decomposed into parallel or sequential stages that benefit from specialization.** A 'research and write' task where the researcher needs different tools (web search, database access) than the writer (formatting tools, citation manager) is a natural fit. A 'code and test' task where the coder and the tester need different context and prompts is another. Multi-agent is overkill when the 'different agents' are just the same model with slightly different system prompts — use a single model with conditional tool access instead.

**Single-agent with strong tools often beats multi-agent.** A single Opus 4.7 agent with 8 well-defined tools (search, code execution, file read/write, calculator, API call, screenshot, citation lookup, draft formatter) can execute many tasks that naive multi-agent crews can't complete cleanly — without the inter-agent message overhead, the coordination errors, and the compound cost of 3-5 model calls. Test the single-agent baseline with good tool coverage before building a multi-agent system. Anthropic's agents and tools overview is the right starting point.

**Multi-agent is justified at scale for quality, not cost.** Multi-agent systems almost always cost more than single-agent — you pay orchestrator + N workers + synthesis. The justification is quality and latency: specialized workers produce better outputs on their narrow task than generalist models, parallel execution reduces wall-clock time, and critic/editor agents catch mistakes before the final output. If cost is the primary constraint, a single high-quality model is usually cheaper. If quality on a specific component is the constraint, a specialized worker for that component is the right answer.

**The 3-worker sweet spot.** Most production multi-agent systems that ship have 2-4 workers. Beyond 4 workers, inter-agent coordination overhead and error propagation risks outweigh quality gains for all but the most structured pipelines. The sweet spot for most content/research tasks is a 3-worker crew: specialist (researcher/coder/analyst) → generalist (drafter/assembler) → critic (editor/reviewer). This covers 80% of production use cases at 3x the single-worker cost — often the right trade. Source: CrewAI use cases, LangGraph examples.

**AutoGen's nested conversation pattern.** Microsoft's AutoGen uses a nested conversation pattern where agents can spawn subconversations and report back to a parent agent. The cost structure mirrors LangGraph supervisor: supervisor reads state, routes to workers, collects results. AutoGen's strength is in code execution agents with persistent workspace — the nested conversation allows a coder agent to iteratively debug with a critic agent as a subloop while the parent waits. Cost modeling follows the same orchestrator + N workers + synthesis formula; the key variable is how many iterations the nested loops take. Budget for tail cases where nested loops don't converge and add per-task maximum turn limits to prevent runaway costs.

**Decision rule.** Choose multi-agent when: (1) the task has genuinely independent stages that benefit from separate context/tooling, (2) quality from specialization outweighs the 2-3x cost multiplier vs single-agent, (3) latency from sequential single-agent loops is unacceptable and parallel workers help. Choose single-agent when: the task can be accomplished with good tool design, cost is the binding constraint, or the 'agents' in a multi-agent design would have identical prompts and tool sets.


Sourcing and methodology

**Model pricing sourced June 2026.** Sonnet 4.6 $3/$15, Opus 4.7 $15/$75, cache reads $0.30/$1.50 per 1M from Anthropic pricing. GPT-5.5 $5/$25, GPT-5.4 $2.50/$15 per 1M from OpenAI pricing. o4-mini $1.10/$4.40 per 1M including reasoning tokens. All sourced from official pricing pages on the date of publication.

**Token model.** Each worker loop modeled as a 3-turn ReAct agent with 5K average input per turn and 400 output tokens per turn (15K total input, 1.2K total output per worker). Orchestrator modeled at 600 input + 150 output per routing call, 4 routing calls per 3-worker task. Synthesis: all worker outputs (3 × 800 = 2.4K) + task description (1K) = 3.4K input, 800 output. This represents a mid-weight research-and-write task — heavier tasks (multi-hour research pipelines, complex code generation) will have proportionally larger worker loop costs.

**Framework version references.** CrewAI docs at docs.crewai.com — current stable as of June 2026. LangGraph at langchain-ai.github.io/langgraph — v0.3 stable with supervisor pattern documented in multi-agent concepts. AutoGen at microsoft.github.io/autogen/stable — v0.5 stable. Framework overhead estimates are based on documented token patterns in each framework's examples and our own profiling; actual overhead varies by task type and framework configuration.

**This content is updated quarterly.** LLM pricing changes without notice; multi-agent framework APIs evolve quickly. Verify pricing before procurement decisions using the linked official pages. For custom token profiles, use our agent loop cost calculator per-worker and sum the components.

How to estimate your multi-agent task cost in 5 steps

  1. 1

    Map your agent graph: orchestrator, workers, synthesis, message flows

    Draw the data flow: what goes into each agent (orchestrator instructions + prior context), what comes out (worker results fed to next agent), how the final output is assembled. This diagram directly maps to the cost formula components. Count the number of model calls per task: orchestrator routing calls + worker loop turns + synthesis call. Each is a billable LLM invocation.

  2. 2

    Profile each agent's token footprint separately

    Log input + output tokens for each agent role in a test run. The orchestrator, each worker type, and the synthesis step have different token profiles. Don't average them — orchestrator calls are input-heavy and output-light (routing decisions); workers are loop-heavy; synthesis is input-heavy (all worker results) and output-medium. Model each separately, then sum for per-task cost.

  3. 3

    Add inter-agent message overhead

    Measure the average size of each inter-agent message (worker output passed to next worker or supervisor). In a sequential pipeline, each downstream worker's input includes all prior worker outputs — track how fast this grows. Apply the truncation fix: add explicit output length limits to every worker's task description. Target 300-500 tokens per worker output for research/analysis tasks.

  4. 4

    Choose orchestrator model based on routing complexity, not status

    Run the orchestrator role on the cheapest model that produces correct routing decisions for your task type. For simple sequential tasks (always Researcher → Writer → Editor), a GPT-5.4-mini or Haiku 4.5 orchestrator at $0.001/routing call is fine. For complex dynamic tasks (routing depends on prior results, tasks can loop or branch), upgrade to GPT-5.4, Sonnet 4.6, or o4-mini. Don't default to Opus as orchestrator — you're paying for reasoning you mostly don't need at the routing step.

  5. 5

    Benchmark single-agent first, then multi-agent

    Before building a multi-agent system, run the same task as a single-agent loop with good tool coverage. Measure quality and cost. Multi-agent should only win when (a) the task has genuinely parallel stages, (b) specialization improves quality materially on at least one stage, and (c) the 2-3x cost premium is justified by the quality gain. If single-agent quality is acceptable, single-agent is the right answer.

Frequently Asked Questions

How much does a 3-worker CrewAI task cost on Claude Sonnet 4.6?

For a research-analyst-writer crew on Sonnet 4.6 with sequential process: approximately $0.149/task uncached, $0.13-$0.14 with caching on stable role descriptions. Breakdown: orchestrator manager $0.014, researcher $0.063, writer $0.031, editor $0.021, synthesis $0.021. At 10,000 tasks/month: ~$1,490. Source: Anthropic pricing $3/$15 per 1M Sonnet 4.6, docs.crewai.com architecture docs.

Is LangGraph cheaper than CrewAI for multi-agent tasks?

LangGraph supervisor is typically 10-20% cheaper than CrewAI sequential for equivalent tasks, because the structured state graph approach reduces inter-agent message overhead vs CrewAI's natural-language delegation chain. LangGraph supervisors route with compact JSON decisions; CrewAI managers write natural-language plans. For parallel tasks, the gap is smaller; for sequential tasks with large worker outputs, LangGraph's structured state saves 1,000-2,000 tokens per downstream worker. Source: LangGraph multi-agent concepts (https://langchain-ai.github.io/langgraph/concepts/multi_agent/).

How much does the orchestrator cost in a multi-agent system?

For a GPT-5.4 orchestrator making 4 routing decisions per task (600 input + 150 output per call): 4 × ($2.50 × 0.0006 + $15 × 0.00015) = 4 × ($0.0015 + $0.00225) = $0.015 per task. For o4-mini with reasoning overhead (5K reasoning + 200 output per call): $0.023 per call. Orchestrator typically accounts for 11-22% of total multi-agent task cost, decreasing as worker count increases. Don't use Opus as orchestrator unless the routing decisions require frontier reasoning.

What is the cost difference between 1, 3, and 5 workers?

On Sonnet 4.6 (LangGraph supervisor), our reference research task: 1 worker ≈ $0.15/task, 3 workers ≈ $0.42/task, 5 workers ≈ $0.70/task. Cost scales approximately linearly with worker count (1.0x, 2.8x, 4.7x) because orchestrator and synthesis cost stay relatively fixed. Quality gains from 3→5 workers are diminishing for most tasks. The 3-worker point is the cost-quality sweet spot for research and content tasks.

Should I use a reasoning model like o4-mini as orchestrator?

Yes, if routing complexity justifies it. o4-mini (approximately $1.10/$4.40 per 1M input/output including reasoning tokens) costs about $0.023 per orchestration call — 4-5x GPT-5.4's $0.005 per routing call. The premium pays when: routing mistakes add full worker re-runs at $0.063+ each, and o4-mini reduces routing errors by more than 30%. For clearly-structured sequential pipelines where routing is deterministic, o4-mini adds no value. Source: OpenAI pricing (https://openai.com/api/pricing/).

How do I reduce inter-agent message overhead in multi-agent systems?

Three tactics: (1) Use LangGraph's structured state graph with typed fields instead of CrewAI sequential message passing — downstream workers read only the specific fields they need, not full prior conversation history. (2) Set explicit output length limits in every worker's task description (target 300-500 tokens per worker output). (3) Post-process tool results before passing them downstream — extract the 200-token relevant facts, not the 2,000-token raw API response. Together these reduce downstream input tokens by 50-70% in typical pipelines.

When is single-agent better than multi-agent for cost?

Almost always, from a pure cost perspective. A single Opus 4.7 agent loop (5 turns, $0.683 uncached) often costs the same as a 3-worker Sonnet 4.6 crew ($0.149) while producing better quality on complex reasoning tasks. Multi-agent's justification is specialization (different tools/prompts for different roles) and parallelism (reduced wall-clock time). If cost is the binding constraint and task quality is acceptable with a single model, use single-agent. Test single-agent with good tools before building a multi-agent system.

How does AutoGen's cost compare to CrewAI and LangGraph?

AutoGen's nested conversation pattern follows the same orchestrator + N workers + synthesis cost formula. The key difference is that AutoGen's subconversations can run iteratively (a coder-critic loop within a parent conversation), which can compound costs if loops don't converge. Add per-task max turn limits to prevent runaway subconversation costs. Source: Microsoft AutoGen docs (https://microsoft.github.io/autogen/stable/). For single-task comparisons without iterative subloops, AutoGen cost is comparable to LangGraph supervisor within ±15%.

Every worker prompt compounds your multi-agent bill.

Our AI Prompt Generator writes concise, output-bounded worker prompts for CrewAI and LangGraph crews — trimming 30-40% of inter-agent message overhead without sacrificing task quality. Works with Claude and GPT-5. 14-day free trial, no card.

Browse all prompt tools →