Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Build a ReAct Agent with LangGraph (2026): 5-Step Tutorial

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

LangGraph lets you build agents as explicit state graphs — nodes that transform state, edges that route between nodes based on conditions, and a persistent store that keeps state across invocations. Unlike the implicit loops in simpler agent frameworks, LangGraph's graph-first model gives you full observability into what the agent is doing at each step and fine-grained control over when it loops, when it stops, and when it escalates. For cost modeling of the loop, see our agent loop cost calculator — a LangGraph ReAct agent with 5 tools and 5 turns follows the exact cost model described there.

The ReAct pattern (Reason + Act) is the most common agent architecture in 2026: the model reasons about what to do, decides which tool to call, receives the result, reasons again, and loops until done. LangGraph implements this as a graph with a model node (reasoning), a tool node (acting), and a conditional edge that loops back to the model when tool calls are pending or terminates when the model stops calling tools. It sounds simple — and it is, once you see the graph structure clearly. Source: LangGraph documentation and multi-agent concepts.

This tutorial covers: Step 1 — install and configure LangGraph with your model. Step 2 — define state schema and build the graph structure. Step 3 — add tools and wire the tool node with parallel execution. Step 4 — add conditional edges for the ReAct loop. Step 5 — add memory for cross-session persistence and enable streaming. Each step has working code. By the end you have a production-ready agent you can deploy or extend. For cost control, see the tool use overhead cost calculator and the multi-agent cost per task calculator for scaling this pattern to multiple workers.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

LangGraph ReAct agent: component reference, June 2026

Feature
Component
Purpose
Key config
StateGraphMain graph containerTypedDict state schemaThread-safe, serializable
MessageStateBuilt-in message list stateInherits from BaseModelHandles message deduplication
ToolNodeExecutes tool calls in parallelBinds tool listReturns tool_results to state
model.bind_tools()Attaches tools to model nodeTool schema auto-serializedEnables parallel calls
should_continue()Conditional edge functionChecks last message for tool_callsRoutes to tools or END
MemorySaverIn-process checkpointingThread-scoped persistenceFor dev/small scale
AsyncPostgresSaverProduction checkpointingCross-process persistenceRequires pg connection
interrupt_beforeHuman-in-the-loop gatePauses before tool executionResumes with .resume()
stream_modeStreaming output control'values' | 'updates' | 'messages'messages = token streaming
recursion_limitMax loop iterationsDefault 25Prevents infinite loops
subgraphWorker node in supervisorNested StateGraphEnables multi-agent
add_conditional_edgesBranching routing logicFunction → node mapCore ReAct routing

Sources, fetched 2026-06-21: LangGraph documentation (https://langchain-ai.github.io/langgraph/), LangGraph multi-agent concepts (https://langchain-ai.github.io/langgraph/concepts/multi_agent/), LangChain Python docs (https://python.langchain.com/docs/introduction/). All APIs and imports reflect LangGraph stable release as of June 2026. Breaking changes from v0.2 to v0.3 include: StateGraph now requires explicit input/output schema specification for production graphs; MemorySaver is still recommended for development; AsyncSqliteSaver added as a lightweight production option alongside AsyncPostgresSaver.

Step 1: install and configure LangGraph

**Install the package and dependencies.** LangGraph requires `langgraph`, the LangChain model integration for your chosen provider, and any tool dependencies. For a Claude-backed agent: `pip install langgraph langchain-anthropic` — this pulls in LangGraph's core graph engine and Anthropic's LangChain wrapper. For OpenAI: `pip install langgraph langchain-openai`. Source: LangGraph installation guide.

**Configure your API key.** `import os; os.environ['ANTHROPIC_API_KEY'] = '<your_key>'` or export it in your shell environment. Never hardcode API keys in application code — use environment variables or a secrets manager (AWS Secrets Manager, HashiCorp Vault, Doppler). LangGraph reads keys via the underlying LangChain model integration, which uses the standard `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` environment variable convention.

**Initialize the model.** `from langchain_anthropic import ChatAnthropic; llm = ChatAnthropic(model='claude-sonnet-4-6', temperature=0, max_tokens=1000)`. Set `temperature=0` for deterministic, reproducible agent behavior. Set `max_tokens` explicitly — LangGraph agents can loop for many turns, and unbounded output on each turn is the most common cause of unexpectedly high costs. For Opus 4.7: `model='claude-opus-4-7'`. For OpenAI: `from langchain_openai import ChatOpenAI; llm = ChatOpenAI(model='gpt-5.5', temperature=0)`.

**Verify your installation.** Run a quick smoke test: `result = llm.invoke('Say hi'); print(result.content)`. If you get a response, the model is configured correctly. If you get an authentication error, the API key is missing or malformed. If you get a model-not-found error, check the model ID — LangChain model IDs exactly mirror the provider's documented model names. See LangChain Anthropic integration docs for the full list.

**Set up LangSmith tracing for observability (optional but recommended).** `os.environ['LANGCHAIN_TRACING_V2'] = 'true'; os.environ['LANGCHAIN_API_KEY'] = '<langsmith_key>'`. LangSmith traces every LangGraph execution — nodes, edges, inputs, outputs, token counts, latency. For production agents, tracing is how you debug unexpected loops, high-cost runs, and tool failures. LangSmith Hobby plan is free; Pro is $39/month at usage-based pricing. Running without tracing in production is flying blind. See langchain-ai.github.io/langgraph for the full observability setup.

**LangGraph v0.3 breaking changes from earlier versions.** If you're migrating from LangGraph v0.1 or v0.2: `StateGraph` initialization changed — you now pass the state schema class as the type argument directly (`StateGraph(AgentState)` not `StateGraph(state_schema=AgentState)`). The `add_node` / `add_edge` API is unchanged. `MemorySaver` is now the only in-memory checkpointer (no more `InMemorySaver`). The `create_react_agent` shortcut still works but gives you less control over node structure — use explicit graph construction for production agents.


Step 2: define state and build the graph structure

**The state schema is the single most important design decision in a LangGraph agent.** State is a TypedDict (or Pydantic BaseModel) that flows through every node. Every node reads from state and writes back to state. Every edge condition reads from state to make routing decisions. Define it carefully — it's your agent's interface. For a simple ReAct agent, the built-in `MessagesState` handles the common case: `from langgraph.graph import MessagesState; class AgentState(MessagesState): pass`. `MessagesState` provides a `messages` field that accumulates the conversation history with proper message deduplication.

**Create the StateGraph.** `from langgraph.graph import StateGraph, END; graph_builder = StateGraph(AgentState)`. The `StateGraph` is a builder — you add nodes and edges to it, then compile it into a runnable graph. Nothing executes during the build phase; this is pure graph construction. The `END` sentinel is LangGraph's special terminal node — any edge pointing to END terminates the graph run.

**Add the model node.** The model node is a function that takes state and returns a partial state update: `def call_model(state: AgentState): response = llm_with_tools.invoke(state['messages']); return {'messages': [response]}`. The function signature must accept a state dict and return a dict of fields to update. Note `lm_with_tools` — this is the model after binding tools (step 3). Register it: `graph_builder.add_node('model', call_model)`.

**Add the tool node.** `from langgraph.prebuilt import ToolNode; tool_node = ToolNode(tools); graph_builder.add_node('tools', tool_node)`. LangGraph's built-in `ToolNode` reads tool call requests from the last message in state, executes them in parallel (using `asyncio.gather` under the hood), and appends tool result messages to state. You don't need to write the tool execution loop manually — `ToolNode` handles it. This parallel execution is the key latency advantage over hand-rolled tool loops.

**Set the entry point and compile.** `graph_builder.set_entry_point('model')` tells LangGraph where to start on each invocation. Then compile: `graph = graph_builder.compile()`. The compiled graph is the runnable object — it validates that all edges point to existing nodes, all entry/exit points are consistent, and the state schema is coherent. Fix all compile errors before testing; they surface structural bugs early. Source: LangGraph quickstart.

**Compiled graph vs builder.** The compiled `graph` object is what you call `.invoke()` or `.stream()` on. The `graph_builder` is only for construction. A common mistake: calling `graph_builder.invoke()` (which fails) instead of `graph.invoke()`. After `compile()`, use `graph` everywhere. You can call `graph.get_graph().draw_mermaid()` to visualize the node-edge structure — paste the output into mermaid.live to see the graph diagram. Essential for debugging complex multi-agent graphs.


Step 3: define tools and bind them to the model

**Define tools as Python functions with `@tool` decorator.** LangGraph uses LangChain's tool system. The `@tool` decorator turns a Python function into a tool with auto-generated JSON schema from the type annotations and docstring: `from langchain_core.tools import tool; @tool; def search_web(query: str) -> str: '''Search the web for current information. Returns top results with titles and excerpts.''' # ... implementation`. The docstring becomes the tool description (keep it concise for token efficiency — see our tool use overhead cost calculator). The type annotations become the input_schema. Returns should be a string.

**Implement real tool logic.** For a web search tool, use a search API (Serper, Tavily, Brave Search). Example with Tavily: `from tavily import TavilyClient; client = TavilyClient(api_key=os.getenv('TAVILY_API_KEY')); @tool; def search_web(query: str) -> str: '''Search the web for current info.'''; results = client.search(query, max_results=3); return '\n'.join([f"{r['title']}: {r['content'][:300]}" for r in results['results']])`. Note the result truncation (`[:300]`) — this directly reduces tool result input tokens on subsequent turns. Always truncate at the tool wrapper level. Source: LangGraph tool documentation.

**Add a calculator tool for numeric reasoning.** `@tool; def calculate(expression: str) -> str: '''Evaluate a math expression. Input: valid Python math expression string.'''; return str(eval(expression, {'__builtins__': {}}, {}))`. The restricted `eval` with empty builtins prevents code injection. For production, use a proper safe math evaluator library (e.g., `simpleeval`). Calculator tools are the most frequently misimplemented — raw `eval` with no sandboxing is a critical security vulnerability. See OWASP LLM Top 10 on prompt injection via tool misuse.

**Bind tools to the model.** `llm_with_tools = llm.bind_tools(tools)`. `bind_tools()` serializes your tool list into the format expected by the model provider (Anthropic's `{name, description, input_schema}` format or OpenAI's `{type: 'function', function: {...}}` format) and attaches them to every call made through `llm_with_tools`. After this call, any invocation of `llm_with_tools` will pass the tool schemas. Replace the `llm` reference in your model node function with `llm_with_tools`.

**Control parallel tool calling.** By default, both Claude and GPT-5 may call multiple tools per turn. LangGraph's `ToolNode` executes them in parallel using `asyncio.gather`. To control this behavior: (1) add a `max_parallel_tools` constraint if your tools have rate limits; (2) use `tool_choice='auto'` (default) for the model to decide or `tool_choice='none'` to disable tools on a specific call; (3) instruct the model in the system prompt about parallelism preferences. For cost analysis of parallel vs sequential, see our tool use overhead cost calculator.

**Test tools in isolation before integrating into the graph.** `result = search_web.invoke({'query': 'LangGraph latest version'})`. Tools are callable LangChain objects — test them independently to verify output format and error handling before adding them to the agent. Common failure modes: tool returns non-string type (LangGraph expects string results from tools), tool raises exception instead of returning error message (unhandled exceptions abort the graph run instead of letting the model recover), tool result is too long (truncate before returning). Verify in isolation, then integrate.


Step 4: add conditional edges for the ReAct loop

**The conditional edge is what makes the ReAct loop.** After the model node runs, the graph needs to decide: did the model call a tool (loop to the tools node) or not (terminate)? This is the `should_continue` function: `from langchain_core.messages import AIMessage; def should_continue(state: AgentState) -> str: last_message = state['messages'][-1]; if hasattr(last_message, 'tool_calls') and last_message.tool_calls: return 'tools'; return END`. The function reads state, checks if the last message has tool calls, and returns the name of the next node ('tools') or END.

**Register the conditional edge.** `graph_builder.add_conditional_edges('model', should_continue, {'tools': 'tools', END: END})`. The third argument is a mapping from return values to node names — this is where you route 'tools' → the tools node and END → the graph end. After the tools node runs and appends tool results to state, add a direct edge back to the model: `graph_builder.add_edge('tools', 'model')`. This creates the loop: model → [if tools] → tools → model → [repeat until no tools] → END.

**Add a recursion limit to prevent infinite loops.** When you compile the graph, set: `graph = graph_builder.compile(checkpointer=checkpointer, recursion_limit=10)`. The default recursion limit is 25 turns. For production agents, set it to the maximum sensible turn count for your task — a 5-turn research agent should have `recursion_limit=6` (5 tool-calling turns + 1 final synthesis). When the limit is hit, LangGraph raises `GraphRecursionError` — catch it and return a graceful error to the user rather than propagating the exception. Runaway loops are the most common cause of unexpectedly high agent costs.

**Add an early termination condition.** Beyond the recursion limit, add semantic exit conditions: if the model produces a response that starts with 'FINAL ANSWER:' or contains no uncertainty markers, return END immediately. `def should_continue(state): last = state['messages'][-1]; if hasattr(last, 'tool_calls') and last.tool_calls: return 'tools'; if isinstance(last, AIMessage) and 'FINAL ANSWER' in last.content: return END; return END`. This prevents the model from making unnecessary tool calls on turns where the answer is already complete. Include 'If you have a complete answer, prefix your response with FINAL ANSWER:' in the system prompt.

**Human-in-the-loop via interrupt_before.** Add `interrupt_before=['tools']` to the compile call to pause the graph before any tool execution: `graph = graph_builder.compile(checkpointer=checkpointer, interrupt_before=['tools'])`. When the graph is paused, it saves state to the checkpointer. A human can review the pending tool calls, approve or reject, and resume: `graph.invoke(None, config={'configurable': {'thread_id': thread_id}})` (passing `None` as input continues from the checkpoint). Essential for agents that take actions with real-world consequences (email sending, database writes, file modifications). Source: LangGraph human-in-the-loop docs.

**Debug the routing function.** Add logging to `should_continue`: `import logging; logger = logging.getLogger(); def should_continue(state): last = state['messages'][-1]; route = 'tools' if (hasattr(last, 'tool_calls') and last.tool_calls) else END; logger.debug(f'Routing: {route}, tool_calls: {getattr(last, "tool_calls", None)}'); return route`. Unexpected routing (agent loops more than expected, agent terminates before tool results are used) is the most common LangGraph debugging task — the routing function is always the first place to look. LangSmith traces also show the routing path for each run.


Step 5: add memory and enable streaming

**Add a checkpointer for cross-turn memory.** Without a checkpointer, each `graph.invoke()` call starts fresh with no memory of prior turns. Add `MemorySaver` for development: `from langgraph.checkpoint.memory import MemorySaver; checkpointer = MemorySaver(); graph = graph_builder.compile(checkpointer=checkpointer)`. For production, use `AsyncPostgresSaver` or `AsyncSqliteSaver`: `from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver; async with AsyncPostgresSaver.from_conn_string(DATABASE_URL) as checkpointer: graph = graph_builder.compile(checkpointer=checkpointer)`. The checkpointer serializes full graph state (including message history and any custom state fields) to the backend on every node completion. Source: LangGraph persistence docs.

**Thread IDs are how memory is scoped.** Every invocation that should share memory must use the same `thread_id`: `config = {'configurable': {'thread_id': 'user-123-session-456'}}; result = graph.invoke({'messages': [HumanMessage(content='Hello')]}, config=config)`. A second invocation with the same `thread_id` loads the prior state from the checkpointer and continues the conversation. Use user-scoped thread IDs for chat applications (`f'user-{user_id}'`), task-scoped for agent runs (`f'task-{task_id}'`). Thread IDs that are too broad (shared across unrelated tasks) cause cross-contamination; too narrow (per-call) loses memory entirely.

**Stream output for responsive UIs.** Instead of `graph.invoke()`, use `graph.stream()`: `for chunk in graph.stream({'messages': [HumanMessage(content=query)]}, config=config, stream_mode='messages'): if chunk[1].get('langgraph_node') == 'model': token = chunk[0].content; print(token, end='', flush=True)`. `stream_mode='messages'` emits tokens as they're generated from the model node — this is the streaming mode for chat UIs. `stream_mode='updates'` emits full state updates after each node completes — better for monitoring agent progress. `stream_mode='values'` emits the full state after each step — best for debugging.

**Async streaming for production servers.** For FastAPI or any async web framework, use the async streaming API: `async for chunk in graph.astream({'messages': [HumanMessage(content=query)]}, config=config, stream_mode='messages'): yield chunk[0].content`. Wrap in a FastAPI `StreamingResponse` with `media_type='text/event-stream'` for SSE delivery to the browser. The async pattern is mandatory for production — synchronous streaming blocks the event loop and prevents concurrent request handling.

**Custom state fields for agent memory beyond messages.** Add structured fields to AgentState for persistent agent knowledge: `class AgentState(MessagesState): scratchpad: str = ''; retrieved_docs: list[str] = []`. The model node can read and write these fields; the tools node can populate them with retrieved data. Use `Annotated[list, operator.add]` for fields that should accumulate (append) rather than overwrite: `from typing import Annotated; import operator; class AgentState(MessagesState): context_docs: Annotated[list, operator.add] = []`. This prevents the common bug where each node overwrites the full field instead of appending to it. Source: LangGraph state management docs.

**Cost monitoring in production.** Add a token counting callback to your model node: `from langchain_core.callbacks import BaseCallbackHandler; class TokenCounter(BaseCallbackHandler): def on_llm_end(self, response, **kwargs): usage = response.llm_output.get('usage', {}); log_tokens(usage.get('input_tokens', 0), usage.get('output_tokens', 0))`. Aggregate across turns to compute per-session and per-task cost. Set a per-session token limit and raise `GraphInterrupt` if exceeded — this prevents runaway cost on pathological inputs without breaking normal usage. See our agent loop cost calculator for the cost model these token counts map to.


Production patterns and common mistakes

**Pattern 1: the prebuilt ReAct agent.** For simple agents that fit the default ReAct pattern exactly, use LangGraph's prebuilt shortcut: `from langgraph.prebuilt import create_react_agent; agent = create_react_agent(llm, tools, checkpointer=checkpointer)`. This creates the same model → conditional → tools → model loop in one line. Use it for prototyping and simple production agents. Extend to the manual graph-building approach when you need custom state fields, non-standard routing logic, or nested subgraphs. Source: LangGraph prebuilt agent docs.

**Pattern 2: structured output from the final turn.** Use the tool-forcing pattern to extract structured output from the last model turn without free-text synthesis: define a `FinalAnswer` tool that wraps your desired output schema, add it to the tools list, and add routing logic that terminates after `FinalAnswer` is called. `def should_continue(state): last = state['messages'][-1]; if hasattr(last, 'tool_calls') and last.tool_calls: if last.tool_calls[0]['name'] == 'FinalAnswer': return END; return 'tools'; return END`. The final answer is extracted as `last.tool_calls[0]['args']` — a validated dict matching your schema.

**Mistake 1: mutable default state values.** Don't use `{}` or `[]` as default values in TypedDict state fields — Python's mutable default argument trap applies. Use `field(default_factory=list)` with `dataclasses.field` or define defaults as `None` and handle in node functions. Mutable defaults cause state to be shared across graph invocations — one of the hardest bugs to diagnose in LangGraph because it manifests as unexpected cross-session context contamination.

**Mistake 2: not handling tool errors gracefully.** If a tool raises an exception, `ToolNode` will propagate it as a graph error by default. Add error handling: `from langgraph.prebuilt import ToolNode; tool_node = ToolNode(tools, handle_tool_errors=True)`. With `handle_tool_errors=True`, tool exceptions become tool result messages with the error text, allowing the model to reason about the failure and retry or escalate. Without this, a single failed API call terminates the entire agent run.

**Mistake 3: large state objects.** LangGraph serializes and stores the full state on every checkpoint. A state with 50K tokens of accumulated messages in a `MemorySaver` will eventually cause memory issues in development or storage costs in production. Implement periodic state pruning: after each successful task completion, trim the messages list to the last 2-3 messages + the final answer. Use `state['messages'] = state['messages'][-3:]` in a cleanup node that runs before the graph terminates.

**Mistake 4: synchronous tool execution in async graphs.** If your graph uses `astream()` or `ainvoke()` but your tools use synchronous blocking I/O (requests library, synchronous database calls), each tool call blocks the async event loop. Wrap synchronous tools with `asyncio.run_in_executor()` or rewrite them using `httpx.AsyncClient` and async database drivers. A blocking tool in an async agent is a silent latency bomb — it doesn't error, it just stalls the event loop, degrading performance for all concurrent requests.


Extending to multi-agent with LangGraph supervisor

**The supervisor pattern** extends the ReAct agent you just built into a multi-agent system. A supervisor node receives a task, routes to worker subgraphs, and assembles results. Each worker is its own compiled LangGraph graph: `worker_graph = build_worker_graph(); supervisor_graph = StateGraph(SupervisorState); supervisor_graph.add_node('worker_1', worker_graph.invoke)`. The supervisor is itself a LangGraph graph whose nodes are other graphs. Source: LangGraph multi-agent concepts.

**Shared state vs subgraph isolation.** By default, worker subgraphs have their own state scope — they cannot directly read the supervisor's state. Workers receive their task as input and return their output, which the supervisor incorporates into the shared state. For workers that need shared context (e.g., all workers need access to the original task description and retrieved documents), pass the shared context as input when invoking the worker subgraph. Don't give workers direct write access to the supervisor state — that creates race conditions in parallel execution.

**Parallel worker execution.** LangGraph supports parallel node execution via the `Send` API: `from langgraph.types import Send; def route_to_workers(state): return [Send('worker', {'task': t}) for t in state['tasks']]`. This fans out all workers simultaneously. Results are collected back into the supervisor state when all workers complete. This is the pattern for genuinely parallel workloads — each `Send` creates an independent subgraph invocation that runs concurrently. For cost modeling of parallel workers, see our multi-agent cost per task calculator.

**Supervisor routing with a model.** For dynamic routing where the supervisor needs to reason about which worker to invoke next, use a model as the supervisor: `def supervisor_node(state): decision = llm.invoke(state['messages'] + [supervisor_prompt]); return {'next_worker': parse_decision(decision)}`. Add a conditional edge that routes based on `next_worker`. The supervisor model is typically a smaller, faster model (GPT-5.4-mini, Haiku 4.5) — it needs to classify task types, not reason about the task content.

**Production deployment.** Compile the graph with a production checkpointer (Postgres or Redis-backed), wrap in a FastAPI endpoint, and deploy as a containerized service. LangGraph Server (the hosted version at langchain-ai.github.io/langgraph) provides a production deployment target with built-in observability. For self-hosted deployments, the standard pattern is: `app = graph.as_langserve_app()` wrapped in FastAPI, deployed on AWS ECS or GCP Cloud Run with an RDS Postgres checkpointer. Budget $50-$150/month for a small production agent deployment (compute + Postgres + LangSmith Pro).


Performance, cost, and reliability benchmarks

**Latency profile.** A LangGraph ReAct agent turn (model call + tool execution + state update) takes 600ms-2,000ms in production, dominated by model TTFT (600-900ms for Claude Sonnet 4.6, 400-650ms for GPT-5.4). Tool execution adds 100-1,500ms depending on the tool (web search ~500ms, calculator ~5ms, database lookup ~50ms). State serialization to Postgres checkpointer adds 10-30ms per turn. Total 5-turn loop wall time: 3-10 seconds on typical production infrastructure — within acceptable range for async/background tasks but not for real-time chat without streaming.

**Cost per 5-turn loop.** Claude Sonnet 4.6, 2K stable system + 5 tools (400 schema tokens), 3 tool calls per query, 500-token tool results, 300 output tokens per turn: approximately $0.137/query uncached, $0.110 with caching. See our full breakdown in the agent loop cost calculator. At 10,000 queries/month: $1,100-$1,370. At 100,000 queries/month: $11,000-$13,700. Enables cost-effective production deployment for most SaaS applications.

**Reliability patterns.** Add retry logic to tool calls: `from tenacity import retry, stop_after_attempt, wait_exponential; @retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10)); def search_web(query: str) -> str: ...`. Set `handle_tool_errors=True` on `ToolNode`. Add a maximum session token budget and raise `GraphInterrupt` when exceeded. Log all graph errors to a monitoring system (Datadog, Sentry). The most common production failure modes are: API rate limits on tool providers (add retry with exponential backoff), model refusing to terminate (add recursion_limit), and state checkpoint failures (use a redundant Postgres instance). Source: LangGraph error handling docs.

**Eval framework integration.** Use LangSmith evaluation datasets to run regression tests against your agent: `from langsmith import Client; client = Client(); dataset = client.create_dataset('agent-evals'); client.create_examples(inputs=[...], outputs=[...], dataset_id=dataset.id)`. Run the agent on the dataset and score outputs automatically. Integrate into CI/CD to catch quality regressions before deployment. See our agent eval with Langfuse tutorial for an alternative eval stack with deeper trace analysis.

Build a LangGraph ReAct agent in 5 steps

  1. 1

    Install and configure LangGraph with your model

    Run `pip install langgraph langchain-anthropic` (or `langchain-openai`). Set your API key in the environment. Initialize the model: `llm = ChatAnthropic(model='claude-sonnet-4-6', temperature=0, max_tokens=1000)`. Set `temperature=0` for deterministic agent behavior and always set `max_tokens` explicitly — unbounded output per turn is the most common cause of unexpected agent costs. Enable LangSmith tracing with `LANGCHAIN_TRACING_V2=true` for production observability.

  2. 2

    Define AgentState and build the StateGraph

    Import `StateGraph, END, MessagesState` from `langgraph.graph`. Define `class AgentState(MessagesState): pass` for a basic message-accumulating state. Create `graph_builder = StateGraph(AgentState)`. Add a model node: `def call_model(state): return {'messages': [llm_with_tools.invoke(state['messages'])]}`. Add it: `graph_builder.add_node('model', call_model)`. Set entry point: `graph_builder.set_entry_point('model')`. The graph is a builder until you call `compile()` — nothing executes during construction.

  3. 3

    Define tools with @tool and bind them to the model

    Decorate Python functions with `@tool`. Keep descriptions to one sentence for token efficiency. Truncate tool results at the wrapper level (target 300-500 tokens per result). Bind: `llm_with_tools = llm.bind_tools(tools)`. Create the tool node: `from langgraph.prebuilt import ToolNode; tool_node = ToolNode(tools, handle_tool_errors=True)`. Add it: `graph_builder.add_node('tools', tool_node)`. Test each tool independently with `tool.invoke({'param': 'value'})` before integrating into the graph.

  4. 4

    Add conditional edges for the ReAct loop

    Write the routing function: `def should_continue(state): last = state['messages'][-1]; return 'tools' if (hasattr(last, 'tool_calls') and last.tool_calls) else END`. Add: `graph_builder.add_conditional_edges('model', should_continue, {'tools': 'tools', END: END})`. Add the return edge: `graph_builder.add_edge('tools', 'model')`. Compile with recursion limit: `graph = graph_builder.compile(recursion_limit=10)`. Test the loop with a query that requires 2-3 tool calls before reaching an answer.

  5. 5

    Add a checkpointer for memory and enable streaming

    Development: `from langgraph.checkpoint.memory import MemorySaver; graph = graph_builder.compile(checkpointer=MemorySaver(), recursion_limit=10)`. Production: use `AsyncPostgresSaver` or `AsyncSqliteSaver` with a connection string. Invoke with thread ID: `graph.invoke({'messages': [HumanMessage(content=query)]}, config={'configurable': {'thread_id': 'session-123'}})`. Stream with `graph.stream(..., stream_mode='messages')` for token-by-token output. Use `graph.astream()` in async FastAPI endpoints.

Frequently Asked Questions

What is LangGraph and why use it for agents?

LangGraph is a graph-based agent framework built on LangChain that models agents as typed state graphs — nodes that transform state, edges that route between nodes, and a checkpointer that persists state across invocations. It provides explicit control over agent loop structure, built-in parallelism via ToolNode and Send API, human-in-the-loop interrupts, and production-grade persistence. It's the production standard for stateful agents in 2026. Source: LangGraph documentation at langchain-ai.github.io/langgraph/

What is the ReAct pattern in LangGraph?

ReAct (Reason + Act) is the pattern where an agent alternates between reasoning (model output: what tool to call and why) and acting (executing the tool and observing the result). In LangGraph, it's implemented as: model node → conditional edge (tool calls pending? → tools node : END) → tools node → model node (loop). The graph loops until the model produces output with no tool calls, then terminates. LangGraph's `create_react_agent` prebuilt implements this in one line; manual graph construction gives you more control over each component.

How do I add memory to a LangGraph agent?

Use a checkpointer — `MemorySaver` for development, `AsyncPostgresSaver` or `AsyncSqliteSaver` for production. Pass it to `graph_builder.compile(checkpointer=checkpointer)`. Invoke with a `thread_id` in the config: `graph.invoke(input, config={'configurable': {'thread_id': 'session-123'}})`. All invocations with the same thread_id share memory — the graph loads prior state from the checkpointer and continues from where it left off. Source: LangGraph persistence docs at langchain-ai.github.io/langgraph/

How does LangGraph handle parallel tool calls?

LangGraph's built-in `ToolNode` executes multiple tool calls emitted in a single model output turn in parallel using `asyncio.gather`. You don't need to write parallel execution logic — ToolNode handles it automatically. Both Claude and GPT-5 may emit multiple tool_calls in a single response turn; ToolNode fans them out concurrently and returns all results in a single user message. Add `handle_tool_errors=True` to ToolNode so individual tool failures return error messages rather than aborting the graph.

What is the recursion_limit and why does it matter?

The recursion_limit is the maximum number of graph traversal steps per invocation — effectively the maximum agent loop turns. Default is 25. When exceeded, LangGraph raises `GraphRecursionError`. For production agents, set it to the maximum sensible turn count for your task (e.g., 10 for a 5-tool research agent). Without this, a stuck agent (model keeps calling tools that return errors) runs indefinitely and accumulates unbounded cost. Always catch `GraphRecursionError` in production and return a graceful error to the user.

How do I stream LangGraph output to a browser?

Use `graph.astream(input, config=config, stream_mode='messages')` in an async context. In FastAPI, wrap in `StreamingResponse` with `media_type='text/event-stream'`: `return StreamingResponse(generate(), media_type='text/event-stream')` where `generate()` is an async generator that yields `f'data: {token}\n\n'` for each token chunk. `stream_mode='messages'` emits tokens as they're generated from model nodes. This is the SSE pattern — the browser receives tokens in real-time as the model generates them, enabling a streaming chat UI.

What is the difference between MemorySaver and AsyncPostgresSaver?

MemorySaver stores checkpoint state in Python in-process memory — it's fast, zero-config, and perfect for development, but loses all state on process restart and can't be shared across multiple processes or servers. AsyncPostgresSaver stores checkpoints in a Postgres database — durable, multi-process-safe, and queryable. Use MemorySaver for development and single-process testing; use AsyncPostgresSaver (or AsyncSqliteSaver for a lighter-weight production option) for any deployed service. The API is identical — just swap the checkpointer in `compile()`.

How much does a LangGraph agent cost per query?

For a 5-turn ReAct agent on Claude Sonnet 4.6 with 3 tool calls per query: approximately $0.137/query uncached, $0.110/query with Anthropic's 90% cache discount on the stable system prefix. At 10K queries/month: $1,100-$1,370. At 100K queries/month: $11,000-$13,700. The main cost drivers are context accumulation across turns (cache your system prompt and tool definitions) and tool result sizes (truncate results to 300-500 tokens at the wrapper level). See the agent loop cost calculator for the full breakdown: /calc/agent-loop-cost-claude-vs-gpt5.

Your LangGraph agent is only as good as its prompts.

Our AI Prompt Generator writes production-ready system prompts for LangGraph agents — cache-anchored, tool-use-ready, and sized for the cheapest model tier that handles your task. 14-day free trial, no card.

Browse all prompt tools →