The agent loop cost formula
**The canonical formula for a single agent loop with N turns is:** `total_cost = Σ_t [ (cached_input_t / 1M) × cache_rate + (new_input_t / 1M) × input_rate + (output_t / 1M) × output_rate + (tool_result_t / 1M) × input_rate ]`. Every turn t has four priced components: cached tokens from the stable prefix, new context tokens added since the previous turn, model output tokens (the assistant's reply), and tool result tokens fed back in the following user message.
**Why tool result tokens matter.** In a ReAct-style agent, each tool call returns a result that becomes part of the next turn's input. A `search()` call returning 800 tokens of context is billed as input on the turn that receives it — just like prompt tokens. In a 5-turn loop with 3 tool calls returning ~600 tokens each, tool results add 1,800 tokens of billable input across the loop. At $15/1M (Opus) that is $0.027 — small per query, but at 100K queries/month that is $2,700/month from tool results alone.
**Context accumulation is the key driver of cost growth.** Turn 1 starts at your system prompt + the user query. Turn 2 replays turns 1-1's full output, the tool call, the tool result, plus whatever the model added. By turn 5, a loop that started at 3,000 tokens might have 12,000-15,000 tokens of accumulated context. **The cost is not linear in number of turns — it's roughly quadratic for standard token replay.** Each additional turn adds both its own tokens and the accumulated tokens of all prior turns.
**Windowed context (summarization) changes the shape.** If your agent framework summarizes prior turns instead of replaying them verbatim, the context growth flattens. A 200-token summary replacing a 2,000-token prior turn saves 1,800 input tokens on every subsequent turn. In a 5-turn loop at $15/1M input (Opus), that saves $0.027 per summarization event — across 100K queries, $2,700/month. Many production agent frameworks (LangGraph, CrewAI) support configurable summarization; enable it when token cost is a constraint.
**Output tokens are the minority of per-turn cost in most agents.** A ReAct agent typically emits 100-400 output tokens per turn (the reasoning step + the tool call specification). The input bill — from context replay plus tool results — usually accounts for 70-85% of total loop cost. This means input optimization (caching, summarization, context pruning) is higher-EV than output optimization for agent workloads. Source: Anthropic agents and tools overview.
**The formula is per-query, not per-turn.** When you see cost benchmarks for agent workloads, always clarify whether they cite per-turn cost or per-query cost. A 5-turn loop at $0.02/turn (Sonnet 4.6) is $0.10/query — a 5x difference from a naive single-turn reading. The per-query figure is what maps to your production budget.