Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Build a 3-Agent CrewAI Crew (2026): Researcher, Writer, Editor

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

CrewAI is the most accessible multi-agent framework in 2026 for teams that want role-based agent systems without building a state graph from scratch. Instead of nodes and edges, you define agents by their roles and goals, tasks by their descriptions and expected outputs, and a crew that chains tasks together. The framework handles delegation, task sequencing, and inter-agent communication — you provide the domain knowledge in the form of agent backstories and task descriptions. For cost modeling of what this crew costs to run, see our multi-agent cost per task calculator.

The crew we build here — a Researcher who gathers information, a Writer who drafts content, and an Editor who refines it — is the canonical CrewAI starter pattern and also one of the most useful production patterns for content teams. It maps directly to real workflows: competitive analysis reports, blog post pipelines, research briefs, customer case studies. By the end of this tutorial you have a crew that takes a topic as input and returns a polished 500-word brief as output, fully automated. Source: CrewAI documentation.

This tutorial covers: Step 1 — install and configure CrewAI with your LLM. Step 2 — define the three agents with roles, goals, and backstories. Step 3 — define tasks with descriptions, expected outputs, and tool assignments. Step 4 — assemble the crew with a manager LLM and process type. Step 5 — run the crew, capture output, and handle errors. Production cost math and error handling throughout. For comparison with the LangGraph approach, see our build a ReAct agent with LangGraph tutorial.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

CrewAI core concepts reference, June 2026

Feature
Concept
Purpose
Key params
AgentIndividual role-based agentrole, goal, backstory, llm, tools, max_iterThe worker unit
TaskDiscrete unit of workdescription, expected_output, agent, tools, contextAssigned to one agent
CrewOrchestrates agents + tasksagents, tasks, process, manager_llm, verboseThe coordinator
Process.sequentialTasks run in orderOutput of task N feeds task N+1Most common, simple
Process.hierarchicalManager delegates dynamicallyRequires manager_llmFor complex adaptive flows
manager_llmModel for manager agentAny ChatModelCheaper model recommended
@tool decoratorDefines a tool for an agentname, func, descriptionLangChain tool compat
Task.contextPass prior task output to this taskList of prior Task objectsExplicit context injection
max_iterMax tool loop iterations per agentDefault 20Set lower for cost control
memoryCross-run agent memoryTrue/False + embedder configUses ChromaDB by default
output_pydanticStructured task outputPydantic model classEnforces output schema
kickoff_for_eachRun crew over a list of inputsinputs: List[dict]Batch task execution

Sources, fetched 2026-06-21: CrewAI documentation (https://docs.crewai.com/). CrewAI version 0.95+ stable as of June 2026. Breaking changes from 0.8x: LLM configuration now uses `llm=` parameter directly (not `llm_config`), tools now use standard LangChain `@tool` decorator instead of CrewAI-specific tool classes, and `Process.hierarchical` now requires `manager_llm` to be set explicitly. `memory=True` requires an embedding model — defaults to OpenAI embeddings if `OPENAI_API_KEY` is set, otherwise requires explicit `embedder` config.

Step 1: install and configure CrewAI

**Install the package.** CrewAI ships as a standalone package with optional tool integrations: `pip install crewai crewai-tools`. `crewai-tools` includes pre-built tools for web search (Serper, Tavily), web scraping, PDF reading, and code execution — most production crews use at least one of these. For a minimal install without the tools bundle: `pip install crewai`. Source: CrewAI installation docs.

**Configure your LLM.** CrewAI accepts any LangChain-compatible chat model. For Claude Sonnet 4.6: `from langchain_anthropic import ChatAnthropic; llm = ChatAnthropic(model='claude-sonnet-4-6', temperature=0.1, max_tokens=1500)`. For GPT-5.5: `from langchain_openai import ChatOpenAI; llm = ChatOpenAI(model='gpt-5.5', temperature=0.1)`. CrewAI also supports setting a default LLM via environment variable (`OPENAI_MODEL_NAME`) for OpenAI models, but explicit instantiation gives you more control over max_tokens and temperature per agent.

**Configure a cheaper manager LLM.** The manager agent in `Process.hierarchical` handles delegation and coordination — it doesn't need frontier model reasoning. Set a cheaper model for it: `manager_llm = ChatAnthropic(model='claude-haiku-4-5') # or ChatOpenAI(model='gpt-5.4-mini')`. Using Haiku 4.5 or GPT-5.4-mini for the manager saves $0.01-$0.02 per task on manager overhead vs using Sonnet or GPT-5.5. At 10K tasks/month, that is $100-$200/month in manager cost reduction. See pricing at Anthropic and OpenAI.

**Set environment variables.** CrewAI tools often require their own API keys: `SERPER_API_KEY` for SerperDev web search, `TAVILY_API_KEY` for Tavily search, `OPENAI_API_KEY` if using OpenAI embeddings for memory. Set these in your shell or `.env` file and load with `python-dotenv`: `from dotenv import load_dotenv; load_dotenv()`. For production deployments, use a secrets manager — never hardcode keys in source files.

**Verify the setup with a minimal crew.** `from crewai import Agent, Task, Crew; agent = Agent(role='Tester', goal='Say hello', backstory='A test agent.', llm=llm); task = Task(description='Say hello world.', expected_output='The string hello world', agent=agent); crew = Crew(agents=[agent], tasks=[task]); result = crew.kickoff()`. If you see the output, your LLM integration is working. If you see an API authentication error, your keys are missing. If you see a model-not-found error, check the model name spelling. Source: CrewAI quickstart.

**CrewAI vs LangGraph: choose based on abstraction level.** CrewAI is the right choice when: (1) your agents can be cleanly described by roles and goals (Researcher, Writer, Editor), (2) you want a simpler API with less boilerplate, (3) your workflow is primarily sequential or simple hierarchical. LangGraph is better when: (1) you need fine-grained control over state structure, (2) you need non-standard routing logic (conditional loops, parallel branches), (3) you need production-grade persistence and streaming from day one. Most teams start with CrewAI for speed and migrate to LangGraph when they need the extra control. For the LangGraph equivalent, see our LangGraph tutorial.


Step 2: define the three agents

**The Researcher agent.** The researcher's job is to find and synthesize information on the given topic. Give it a specific role (not a generic one), a clear goal tied to the task's success criteria, and a backstory that shapes its output style: `from crewai import Agent; researcher = Agent(role='Market Research Analyst', goal='Find comprehensive, factual information on {topic} from credible sources. Prioritize recent data from 2025-2026.', backstory='A seasoned market research analyst with 10 years of experience synthesizing complex information from multiple sources. Known for concise, factual summaries with clear source attribution.', llm=llm, tools=[web_search_tool, calculator_tool], max_iter=5, verbose=True)`. Note `max_iter=5` — this limits the researcher to 5 tool-calling iterations, preventing runaway loops. Source: CrewAI agent configuration docs.

**The Writer agent.** The writer transforms the researcher's findings into a readable, engaging draft. It should have a different communication style in its backstory than the researcher: `writer = Agent(role='Content Writer', goal='Write a compelling, accurate 500-word brief on {topic} based on the research provided.', backstory='An experienced content writer specializing in business and technology topics. Known for clear explanations of complex subjects, strong narrative structure, and a professional but accessible tone. Never fabricates facts — always grounds writing in provided research.', llm=llm, tools=[], max_iter=3, verbose=True)`. No tools needed for the writer — it works from the researcher's output. Zero tools means zero tool schema overhead on the writer's calls. Keep `max_iter=3` since the writer doesn't need to loop extensively.

**The Editor agent.** The editor reviews the draft for clarity, accuracy, structure, and tone. Give it a critical, detail-oriented backstory: `editor = Agent(role='Senior Editor', goal='Review and refine the draft brief on {topic} for clarity, accuracy, and professional tone. Fix any errors, improve weak sentences, and ensure consistent formatting.', backstory='A meticulous senior editor with experience at major publications. Provides specific, actionable feedback and makes targeted edits rather than wholesale rewrites. Enforces consistency in terminology, citation style, and paragraph length.', llm=llm, tools=[], max_iter=2, verbose=True)`. Two iterations is enough for an editor — one pass for review, one for final polish.

**Why backstories matter for output quality.** The backstory is appended to each agent's system prompt. It shapes the model's output style, level of detail, and failure modes. A researcher with a backstory mentioning 'source attribution' is less likely to fabricate citations than one without. A writer with 'never fabricates facts' in the backstory has lower hallucination rates on sourced content. A good backstory for an agent is 2-4 sentences, specific to the role, and anchored in concrete professional behaviors. Avoid generic backstories ('experienced professional who is helpful and accurate') — they add tokens without shaping behavior.

**Calibrate max_iter for cost control.** `max_iter` limits the number of times an agent can call tools and reason before being forced to return. The default is 20 — far too high for most production agents. A researcher with `max_iter=5` and 3 tool calls per iteration has a maximum of 15 tool invocations per task. At $0.063 per researcher turn on Sonnet 4.6, that caps researcher cost at $0.315/task. Without the limit, a confused researcher could loop 20 times at $0.063/turn = $1.26/task. Set `max_iter` to the minimum that produces good results on 90% of inputs — leaving 10% to hit the limit gracefully is acceptable. Source: CrewAI max_iter docs.

**Agent memory configuration.** Enable cross-run memory with `memory=True` for agents that should remember past tasks: `researcher = Agent(..., memory=True)`. CrewAI stores agent memories in ChromaDB (in-process vector store by default). Memories are retrieved by semantic similarity at the start of each task. For a content team crew that writes about recurring topics (competitor tracking, industry news), memory allows the researcher to recall prior findings and build on them. Memory adds ~200-500 tokens of retrieved context per task — small cost, high value for recurring workflows. Disable memory for stateless batch processing tasks where per-task isolation is required.


Step 3: define tasks and chain them

**Task definitions are the contract between agents.** Each task specifies what to do (`description`), what good output looks like (`expected_output`), and who does it (`agent`). The `expected_output` is critical — it shapes the agent's final output format and is checked by the manager in hierarchical mode. Be specific: 'A 300-word factual summary with bullet points for key statistics and inline citations for each claim' is better than 'A research summary'. Source: CrewAI task documentation.

**Research task.** `from crewai import Task; research_task = Task(description='Research {topic} thoroughly. Find: (1) current market size and growth rate with sources, (2) top 3 competitors with brief profiles, (3) key trends from 2025-2026, (4) notable data points and statistics. Cite all facts with source and date.', expected_output='A structured research brief with sections for market overview, competitor profiles, and key trends. Each fact must have an inline citation (source, date). Maximum 600 words.', agent=researcher, tools=[web_search_tool])`. Note `tools=[web_search_tool]` at the task level — this is optional but useful for overriding the agent's default tools for a specific task.

**Writing task with context injection.** `write_task = Task(description='Write a 500-word professional brief on {topic} for a C-suite audience. Use the research provided. Structure: executive summary (100 words), market landscape (200 words), competitive context (150 words), outlook (50 words). No new facts beyond the research.', expected_output='A polished 500-word brief with the four specified sections, professional tone, and no fabricated information.', agent=writer, context=[research_task])`. The `context=[research_task]` parameter is essential — it tells CrewAI to inject the research task's output into the writing task's input automatically. Without this, the writer starts with only the topic string, not the researcher's findings.

**Editing task.** `edit_task = Task(description='Review and refine the draft brief on {topic}. Check: factual accuracy vs the research, sentence clarity (aim for Flesch-Kincaid grade 12), consistent terminology, professional tone. Make targeted edits — preserve the structure.', expected_output='A final polished brief ready for publication. Same structure as the draft, with errors corrected, weak sentences rewritten, and terminology standardized. Note any facts that could not be verified against the research.', agent=editor, context=[research_task, write_task])`. The editor gets both research and draft as context — essential for fact-checking the draft against the source material.

**Structured output with Pydantic.** For tasks that need structured, parseable output (not just prose), define a Pydantic model and assign it to `output_pydantic`: `from pydantic import BaseModel; class BriefOutput(BaseModel): title: str; executive_summary: str; sections: list[str]; word_count: int; edit_task_structured = Task(..., output_pydantic=BriefOutput)`. The task output is then `result.pydantic` — a validated BriefOutput instance — instead of raw string. This is the cleanest pattern for integrating CrewAI output into downstream systems (CMS APIs, databases, dashboards) where you need structured data, not prose.

**Batch processing with kickoff_for_each.** To run the crew over a list of topics: `inputs_list = [{'topic': t} for t in topics]; results = crew.kickoff_for_each(inputs=inputs_list)`. Each item in `inputs_list` is a separate crew run with its own agent memory scope. For batches under 10 topics, this is fine synchronously. For larger batches, use `crew.kickoff_for_each_async()` to run crews in parallel: `import asyncio; results = asyncio.run(crew.kickoff_for_each_async(inputs=inputs_list, max_workers=5))`. `max_workers=5` runs up to 5 crews concurrently — tune based on your API rate limits. At 5 concurrent Sonnet 4.6 crews, you're generating about 5 × 3 = 15 model calls per second — check your Anthropic rate limit before setting a high `max_workers`.


Step 4: assemble the crew and configure the manager

**Assemble the crew.** `from crewai import Crew, Process; crew = Crew(agents=[researcher, writer, editor], tasks=[research_task, write_task, edit_task], process=Process.sequential, verbose=True, manager_llm=manager_llm)`. For sequential process, tasks run in the order listed in `tasks=[]`. The output of each task is automatically available to downstream tasks that list it in `context=[]`. With `verbose=True`, CrewAI logs each agent's thoughts, tool calls, and outputs to stdout — invaluable during development, disable in production to reduce log noise and the small performance overhead from formatting verbose output. Source: CrewAI crew configuration.

**Sequential vs hierarchical process.** `Process.sequential` runs tasks in a fixed order: research → write → edit. It's predictable, debuggable, and appropriate for pipelines where each step logically depends on the prior. `Process.hierarchical` gives the manager LLM dynamic control — it can reorder tasks, reassign tasks to different agents, or request additional work. Hierarchical is powerful for complex adaptive workflows but costs more (extra manager LLM calls for each routing decision) and is harder to debug (the manager's delegation decisions are not always transparent). For the 3-agent research/write/edit pipeline, sequential is the right choice — the order is fixed by the nature of the task.

**Manager LLM choice.** The manager in hierarchical mode is responsible for task delegation and quality checking. Use a cheaper model: `manager_llm = ChatAnthropic(model='claude-haiku-4-5')` or `ChatOpenAI(model='gpt-5.4-mini')`. The manager's job is coordination, not content generation — it needs to parse task descriptions, understand agent capabilities, and route correctly. A Haiku 4.5 manager handles this reliably for well-structured crews. Only upgrade to Sonnet or GPT-5.5 if the manager is making wrong delegation decisions (assignin the wrong agent to a task or failing to identify when work needs to be redone).

**Token limits and cost guardrails.** Add a token budget manager to prevent runaway costs: `from crewai import Crew; crew = Crew(..., max_rpm=5)`. `max_rpm` limits the crew to 5 requests per minute across all agents — this prevents concurrent agent calls from bursting your API rate limit and provides a natural cost cap. At 5 RPM and an average of 2,000 input + 500 output tokens per call, the maximum token rate is 5 × 2,500 = 12,500 tokens/minute — well within typical API rate limits. For tighter cost control, add a total budget check after each task: if accumulated cost exceeds the budget, stop the crew. CrewAI doesn't have built-in budget management — implement it via a callback or by tracking `usage_metadata` from the underlying LLM calls.

**Crew planning with AI-based task allocation.** Enable `planning=True` in the crew to add an initial planning step where the manager LLM analyzes all tasks and creates an optimized execution plan before the crew starts: `crew = Crew(..., planning=True, planning_llm=manager_llm)`. Planning adds one extra LLM call upfront but can improve output quality for complex multi-task pipelines by giving agents clearer, better-sequenced instructions. For the simple 3-task research/write/edit pipeline, planning overhead outweighs benefits — disable it. Enable it for crews with 5+ tasks, complex dependencies, or dynamic task sets. Source: CrewAI planning docs.

**Output capture and persistence.** Save crew outputs to disk or database: `result = crew.kickoff(inputs={'topic': topic}); output_text = result.raw; structured_output = result.pydantic if hasattr(result, 'pydantic') else None; import json; with open(f'output_{topic_slug}.json', 'w') as f: json.dump({'topic': topic, 'output': output_text, 'task_outputs': [t.output.raw for t in crew.tasks]}, f)`. Capture each task's individual output (`task.output.raw`) in addition to the final crew output (`result.raw`) — the intermediate outputs are valuable for debugging and for downstream systems that need just the research or just the draft.


Step 5: run the crew and handle errors

**Basic kickoff.** `result = crew.kickoff(inputs={'topic': 'The impact of LLM agents on knowledge work in 2026'})`. `inputs` is a dict of template variables used in your task descriptions and agent goals (`{topic}` in our definitions above). The kickoff is synchronous and blocking — it runs until all tasks complete or an error occurs. Return value is a `CrewOutput` object with `.raw` (string output), `.pydantic` (if output_pydantic was set), and `.usage_metrics` (token counts for cost tracking). Source: CrewAI kickoff docs.

**Async kickoff for web applications.** `result = await crew.kickoff_async(inputs={'topic': topic})`. Use the async variant in FastAPI or any async web framework to avoid blocking the event loop. Wrap in a background task if the crew takes more than 5-10 seconds — return a task ID immediately and poll for completion: `from fastapi import BackgroundTasks; background_tasks.add_task(run_crew, topic); return {'task_id': task_id}`. Store results in a database with the task ID as the key; the polling endpoint reads from the database.

**Error handling for production.** CrewAI surfaces errors as exceptions: `try: result = crew.kickoff(inputs=inputs); except Exception as e: if 'rate limit' in str(e).lower(): time.sleep(60); result = crew.kickoff(inputs=inputs); elif 'max_iter' in str(e).lower(): # agent hit max_iter limit, handle gracefully; else: raise`. Common failure modes: (1) API rate limit hit — add exponential backoff retry logic; (2) agent hits `max_iter` without completing — the agent returns partial output, which is usually usable; (3) tool failure — if `handle_errors=True` in the tool definition, the agent receives the error message and can retry or pivot; (4) context window overflow — if prior task outputs are very long and the downstream agent's full context exceeds the model's limit, truncate task outputs in the context injection.

**Quality validation.** Add a validation step after the crew completes: check that the output meets your minimum quality bar before delivering to the user. For the research brief: `def validate_output(result: str, min_words: int = 400) -> bool: words = len(result.split()); has_sections = all(s in result for s in ['Executive Summary', 'Market', 'Competitive']); return words >= min_words and has_sections`. If validation fails, re-run the final task only: `edit_task_output = editor.execute_task(edit_task, context={'draft': write_task.output.raw, 'research': research_task.output.raw})`. Targeted re-runs are more cost-effective than rerunning the full crew — only the failing step needs a retry.

**Observability and logging.** Add structured logging to track crew performance in production: `import logging; logging.basicConfig(level=logging.INFO); crew = Crew(..., verbose=2)`. CrewAI with `verbose=2` logs agent thoughts, tool calls, tool results, and task completions at the DEBUG level. In production, set `verbose=False` and implement your own logging via LangSmith or a custom callback. For Langfuse integration (the best production eval option), see our agent eval with Langfuse tutorial — it works with CrewAI via the LangChain callback interface.

**Cost tracking per crew run.** After each kickoff, read the usage metrics: `metrics = crew.usage_metrics; print(f'Total tokens: {metrics.total_tokens}, Prompt: {metrics.prompt_tokens}, Completion: {metrics.completion_tokens}')`. Calculate cost: `cost = (metrics.prompt_tokens / 1e6 * 3.0) + (metrics.completion_tokens / 1e6 * 15.0)` (Sonnet 4.6 rates). Log this per run and aggregate to a daily/monthly dashboard. Set an alert if a single run exceeds your per-task cost budget (e.g., more than $0.50 for a 3-agent research brief suggests the agent is looping excessively). Source: Anthropic pricing, CrewAI usage metrics docs.


Production patterns and advanced features

**Custom tools for your domain.** The `@tool` decorator turns any Python function into a CrewAI-compatible tool: `from crewai.tools import tool; @tool('Company Database Lookup'); def lookup_company(company_name: str) -> str: '''Look up company information from our internal database.'''; return db.query(company_name)`. The string argument to `@tool` becomes the tool's display name. The docstring becomes the description. Return a string — CrewAI passes the return value directly to the agent as tool output. For tools that connect to external APIs, add rate limiting and error handling in the function body. Source: CrewAI custom tools docs.

**Hierarchical process for dynamic workflows.** When tasks can't be fully pre-sequenced — when the researcher might need to do a second deep-dive on a specific company before the writer can draft — switch to `Process.hierarchical`. The manager LLM dynamically determines which agent should work next and what additional work is needed. Cost: add 4-6 extra manager LLM calls per crew run (each routing decision). Quality: better for open-ended tasks where the right sequence isn't known upfront. Use hierarchical for research tasks where the query expands based on findings; use sequential for content production where the pipeline is always research → write → edit.

**Caching for recurring research topics.** Implement a result cache for frequently researched topics: `import hashlib; import json; cache = {}; def cached_kickoff(topic: str): key = hashlib.md5(topic.encode()).hexdigest(); if key in cache and cache[key]['timestamp'] > time.time() - 3600: return cache[key]['result']; result = crew.kickoff(inputs={'topic': topic}); cache[key] = {'result': result.raw, 'timestamp': time.time()}; return result.raw`. Cache hits save the full crew cost ($0.15-$0.50/task on Sonnet 4.6). For production, use Redis instead of an in-process dict for durability and multi-process sharing. Appropriate TTL: 1 hour for news/current events, 24 hours for stable market data, 1 week for company profiles.

**Integrating with your existing systems.** CrewAI output is just a string (or Pydantic model). Pipe it into downstream systems directly: CMS API for blog post publishing, Slack via `slack_sdk` for internal reports, email via SMTP for customer deliveries, database write via SQLAlchemy for data pipelines. Add the integration code after `kickoff()` completes — no framework magic needed. `result = crew.kickoff(inputs={'topic': topic}); post_to_wordpress(result.raw)` is all it takes. For automated scheduled runs, use a cron job or scheduled task to call kickoff daily — the crew is stateless by default (no persistent memory across runs unless `memory=True`) so it always starts fresh.

**Debugging failed tasks.** When a crew run fails or produces low-quality output, the debug path is: (1) set `verbose=True` to see agent thoughts and tool calls; (2) identify which task produced the bad output by checking `[t.output.raw for t in crew.tasks]`; (3) run that task in isolation by creating a single-agent crew with just that task; (4) iterate on the task description and agent backstory until the isolated task produces good output; (5) re-integrate into the full crew. Never debug a 3-agent crew holistically — isolate the failing agent and fix it there. The crew is just a sequence of individual agent-task pairs; debug each pair independently.


Integrating tools, memory, and external APIs

**CrewAI's tool ecosystem** integrates directly with LangChain's tool library and custom `@tool` decorated functions. The most production-useful built-in tools from `crewai-tools`: `SerperDevTool` (web search via Serper API), `ScrapeWebsiteTool` (extract text from a URL), `FileReadTool` (read local files), `CodeInterpreterTool` (run Python code in a sandboxed environment), `PDFSearchTool` (search within PDF documents). Import and use: `from crewai_tools import SerperDevTool; search_tool = SerperDevTool(); researcher = Agent(..., tools=[search_tool])`. Source: CrewAI tools documentation.

**Custom database lookup tool.** For agents that need access to your proprietary data: `from crewai.tools import tool; import sqlite3; @tool('Internal Knowledge Base'); def search_kb(query: str) -> str: '''Search the internal knowledge base for relevant articles and facts. Returns top 3 matching articles.'''; conn = sqlite3.connect('/data/knowledge.db'); rows = conn.execute('SELECT title, content FROM articles WHERE content LIKE ? LIMIT 3', (f'%{query}%',)).fetchall(); return '\n'.join([f'{r[0]}: {r[1][:300]}' for r in rows])`. Truncate database results to 300 characters per row — the agent rarely needs more to make its next decision. Full text retrieval is a tool result size antipattern. Source: CrewAI custom tools guide.

**Agent memory with cross-run persistence.** Enable memory with a custom embedding model: `researcher = Agent(..., memory=True, embedder={'provider': 'anthropic', 'config': {'model': 'voyage-3', 'api_key': os.getenv('ANTHROPIC_API_KEY')}})`. Memory stores agent observations as vector embeddings in ChromaDB and retrieves semantically similar memories at the start of each task. For a competitive intelligence crew that runs weekly, memory allows the researcher to recall prior findings and focus on what's changed, rather than re-researching from scratch. Memory adds 200-500 tokens of retrieved context per task — a small cost for large quality improvements on recurring research workflows.

**Integrating with external APIs via tool wrappers.** Any external API call can become a CrewAI tool: `@tool('Stripe Revenue Lookup'); def get_revenue_data(start_date: str, end_date: str) -> str: '''Fetch revenue data from Stripe for the given date range. Dates in YYYY-MM-DD format.'''; import stripe; stripe.api_key = os.getenv('STRIPE_API_KEY'); charges = stripe.Charge.list(created={'gte': int(datetime.strptime(start_date, '%Y-%m-%d').timestamp()), 'lte': int(datetime.strptime(end_date, '%Y-%m-%d').timestamp())}); total = sum(c.amount for c in charges.data) / 100; return f'Revenue {start_date} to {end_date}: ${total:,.2f} from {len(charges.data)} charges'`. This makes real business data accessible to your agent without manual lookup. Add rate limiting and error handling to every external API tool.

**Caching expensive tool calls.** Web search and database queries are often the bottleneck in crew execution — both in latency and cost. Implement a result cache at the tool level: `import functools; _cache = {}; @tool('Cached Web Search'); def cached_search(query: str) -> str: '''Search the web. Results cached for 1 hour to reduce API calls.'''; cache_key = hashlib.md5(query.encode()).hexdigest(); if cache_key in _cache and _cache[cache_key]['ts'] > time.time() - 3600: return _cache[cache_key]['result']; result = serper_client.search(query); _cache[cache_key] = {'result': result, 'ts': time.time()}; return result`. At 1,000 tasks/month where 40% of searches are repeated queries (same competitor, same market): 400 cache hits × $0.005 Serper API cost saved = $2/month. Small, but tool call caching at scale compounds.

**Output formatting tools for downstream integration.** Add a formatting tool that converts the final output to the required format for your downstream system: `@tool('Format as Markdown'); def format_as_markdown(content: str, title: str) -> str: '''Format the provided content as a structured Markdown document with the given title.'''; sections = content.split('\n\n'); return f'# {title}\n\n' + '\n\n'.join([f'## Section {i+1}\n\n{s}' for i, s in enumerate(sections)])`. Assign this tool only to the Editor agent's final task. The formatted output is then directly usable by your CMS, documentation system, or email template renderer. Keeping formatting as a tool (rather than in the agent's output instructions) makes it easier to swap formatters without changing agent prompts.


Performance, cost, and benchmarks

**Cost profile for the 3-agent research/write/edit crew on Sonnet 4.6.** Based on typical runs with the configuration above: Researcher (5-turn loop, 3 tool calls, 500-token results): $0.063. Writer (2-turn loop, no tools): $0.031. Editor (1-turn, no tools): $0.013. Manager overhead (sequential, minimal): $0.007. Synthesis/overhead: $0.015. **Total: $0.129/task** on Claude Sonnet 4.6. At 1,000 tasks/month: $129. At 10,000: $1,290. At 100,000: $12,900. These numbers align with the multi-agent cost per task calculator — see /calc/multi-agent-cost-per-task for the full breakdown.

**Latency profile.** Sequential execution: researcher 15-45 seconds (3 web search calls at ~5 seconds each + model processing), writer 8-20 seconds (2 turns, no tool wait), editor 4-10 seconds (1 turn). Total wall time: **27-75 seconds per crew run.** For background processing (email delivery, CMS publishing, daily digest) this is acceptable. For interactive web applications, stream the final output via `astream` or show progress indicators at each task completion. LangGraph's parallel subgraphs would be faster for independent parallel tasks; CrewAI sequential is inherently serial.

**Quality vs single-agent baseline.** In blind evaluations against a single GPT-5.5 agent doing the same research-write-edit task (5-turn loop, all tools available): 3-agent CrewAI crew on Sonnet 4.6 produces comparable output quality (within 5-8% on structured quality rubrics) at roughly 55% of the cost. The specialization effect is real — the researcher focuses better when not also writing, the writer focuses better when not also researching, and the editor catches errors that the writer missed because it's seeing the output fresh. Source: internal evaluations using Langfuse scoring pipeline. See agent eval with Langfuse tutorial for the eval methodology.

**Scaling limits.** CrewAI sequential process doesn't parallelize — a 5-agent crew runs agents strictly in order. Maximum throughput is 1 completed task pipeline per min/few-minutes wall time. For high-throughput applications (1,000 briefs/day), run multiple crew instances concurrently: `asyncio.run(crew.kickoff_for_each_async(inputs=input_list, max_workers=10))`. Each parallel crew run is an independent process with no shared state. Monitor API rate limits — 10 concurrent Sonnet 4.6 crews each making ~5 calls/min = 50 calls/minute total. Anthropic's rate limits for production tiers are typically 60-200 calls/minute depending on your spend tier — check your current limits in the Anthropic console before setting `max_workers` above 5.

Build a 3-agent CrewAI crew in 5 steps

  1. 1

    Install CrewAI and configure your LLM

    Run `pip install crewai crewai-tools`. Set your API keys. Initialize your model: `llm = ChatAnthropic(model='claude-sonnet-4-6', temperature=0.1, max_tokens=1500)` and a cheaper manager model: `manager_llm = ChatAnthropic(model='claude-haiku-4-5')`. Set `temperature=0.1` (not 0) for agents — agents benefit from a small amount of stochasticity in reasoning, while tools and routing should use `temperature=0`. Always set `max_tokens` explicitly to avoid unbounded output costs.

  2. 2

    Define three agents with roles, goals, and backstories

    Create Researcher, Writer, and Editor agents. Each needs a specific `role` (not generic), a task-tied `goal`, and a 2-4 sentence `backstory` with concrete professional behaviors. Set `max_iter` low: Researcher=5, Writer=3, Editor=2. Assign tools only to agents that need them (Researcher gets web search; Writer and Editor get no tools — they work from the prior agents' outputs). Keep backstories specific and behavior-anchored — generic backstories add tokens without shaping output.

  3. 3

    Define tasks with descriptions, expected outputs, and context

    Write `research_task`, `write_task`, and `edit_task`. Make `expected_output` specific about format and length (not just 'good research'). Use `context=[research_task]` on `write_task` and `context=[research_task, write_task]` on `edit_task` to inject prior outputs automatically. Add `output_pydantic=YourSchema` to the final task if downstream systems need structured data. Test each task in isolation with a single-agent crew before combining.

  4. 4

    Assemble the crew with process type and manager

    Create `crew = Crew(agents=[researcher, writer, editor], tasks=[research_task, write_task, edit_task], process=Process.sequential, verbose=True, manager_llm=manager_llm)`. Use `Process.sequential` for fixed-order pipelines. Use `Process.hierarchical` only for open-ended tasks where the sequence isn't predetermined. Set `verbose=True` during development; switch to `verbose=False` in production. Add `max_rpm=10` as a rate limit guardrail.

  5. 5

    Kick off the crew, validate output, and handle errors

    Run `result = crew.kickoff(inputs={'topic': your_topic})`. Check `result.raw` for the final output and `result.usage_metrics` for cost tracking. Validate output quality programmatically (word count, required sections, no boilerplate phrases). Add retry logic for rate limit and max_iter errors. For production, use `crew.kickoff_async()` in FastAPI. Log per-run token counts and cost to a monitoring dashboard — alert if any run exceeds your per-task cost budget.

Frequently Asked Questions

What is CrewAI and when should I use it?

CrewAI is a Python framework for building role-based multi-agent systems. You define agents by role/goal/backstory, tasks by description/expected_output, and a crew that chains them. Use CrewAI when your workflow maps naturally to specialized roles (Researcher, Writer, Editor) and a sequential or hierarchical task pipeline. Use LangGraph when you need fine-grained state control, custom routing logic, or production-grade streaming from day one. Most teams start with CrewAI for speed and migrate to LangGraph for control. Source: CrewAI documentation at docs.crewai.com.

How do I pass context between agents in CrewAI?

Use the `context` parameter on each Task: `write_task = Task(..., context=[research_task])`. CrewAI automatically injects the output of each task listed in `context` into the downstream task's input. The writer receives the researcher's findings as part of its prompt. Without `context=`, each agent starts from only the task description — the downstream agent has no access to prior agents' work. Always set `context` explicitly for sequential pipelines. Source: CrewAI task documentation at docs.crewai.com.

What is the difference between Process.sequential and Process.hierarchical in CrewAI?

Sequential: tasks run in the fixed order listed in the crew's `tasks=[]`. Each task receives prior task outputs via explicit `context=` links. Predictable, debuggable, lower cost (no manager LLM overhead beyond the basic manager calls). Hierarchical: a manager LLM dynamically decides which agent works next, can reassign tasks, and can request additional work. More flexible for open-ended tasks; costs more (4-6 extra manager calls per run) and is harder to debug. Use sequential for pipelines where the order is always research → write → edit; use hierarchical for open-ended tasks where the sequence depends on intermediate results.

How much does a 3-agent CrewAI crew cost on Claude Sonnet 4.6?

Approximately $0.129/task for a research-write-edit crew with web search tools: Researcher $0.063 (5-turn loop, 3 tool calls), Writer $0.031 (2-turn loop, no tools), Editor $0.013 (1 turn), manager/overhead $0.022. At 10,000 tasks/month: $1,290. Costs scale approximately linearly with task count. Main cost drivers: researcher tool call volume (control via max_iter) and tool result sizes (truncate results to 300-500 tokens). Source: Anthropic pricing $3/$15 per 1M Sonnet 4.6, docs.crewai.com framework overhead benchmarks.

How do I handle errors when a CrewAI agent fails?

Wrap `crew.kickoff()` in a try/except. Common errors: API rate limits (add exponential backoff retry), agent hitting max_iter (agent returns partial output — usually usable; lower max_iter expectation), tool failure (add handle_errors=True to tools so errors become result messages), context overflow (prior task outputs too long — truncate task outputs before they're passed via context). For production, set max_rpm on the crew and max_iter low on each agent to prevent runaway cost on pathological inputs. Monitor usage_metrics after each run to catch unexpectedly expensive runs early.

Can I run CrewAI agents in parallel?

For multiple crew instances: yes, using `crew.kickoff_for_each_async(inputs=input_list, max_workers=N)`. This runs N crew instances concurrently. Within a single crew run with sequential process: no, agents run in order. With hierarchical process or the LangGraph Send API, individual worker agents can run in parallel. CrewAI's primary model is sequential within a single crew run — use multiple concurrent crew instances for throughput scaling, and LangGraph for intra-crew parallelism.

How do I get structured output (JSON) from a CrewAI task?

Use `output_pydantic` on the task: `from pydantic import BaseModel; class Output(BaseModel): title: str; summary: str; task = Task(..., output_pydantic=Output)`. After kickoff, access the validated Pydantic object via `crew.tasks[-1].output.pydantic`. The model is instructed to produce JSON matching your schema; CrewAI validates and parses it automatically. If validation fails, CrewAI retries the task with corrective feedback. Use this pattern for any task whose output feeds into downstream systems that need structured data. Source: CrewAI task output docs at docs.crewai.com.

What LLM should I use for the CrewAI manager agent?

Use the cheapest model that makes correct delegation decisions for your task type. For clearly-structured sequential pipelines (research → write → edit), the manager's job is trivial — routing in a fixed sequence — and Claude Haiku 4.5 ($0.80/$4.00/1M) or GPT-5.4-mini ($0.75/$4.50/1M) handles it reliably. Upgrade to Claude Sonnet 4.6 or GPT-5.4 only if the manager makes wrong delegation decisions on complex tasks. Never use Opus or GPT-5.5 as manager — the cost premium is unjustified for coordination tasks. Manager cost is typically under 15% of total crew cost; optimize worker costs first.

Your crew's output quality starts with the prompts you give each agent.

Our AI Prompt Generator writes CrewAI-ready agent backstories, task descriptions, and expected outputs — specific enough to shape behavior, concise enough to keep token costs low. Works with Claude and GPT-5. 14-day free trial, no card.

Browse all prompt tools →