Why agents are expensive: the multiplication effect
A single LLM call bills you once for input and once for output. An agent doesn't. Every step in the loop re-sends the entire context — system prompt, all tool definitions, every prior tool call, every prior tool result. The model has no memory between calls; the loop framework reconstitutes the conversation by replaying it.
Worked example. You build a research agent. Your system prompt is 1,500 tokens (persona, response format, safety rules, output schema). Your tool block defines 12 tools — web_search, fetch_url, summarize, extract_table, save_note, query_db, and so on — totaling 3,500 tokens. That's a 5,000-token static prefix. Add a user task that takes 8 steps to complete with growing trajectory: roughly 800 tokens of tool calls + tool results accumulated per turn.
Naive replay math at $3/1M input on Claude Sonnet 4.6: step 1 sends 5,000 tokens, step 2 sends 5,800, step 3 sends 6,600, all the way to step 8 sending 10,600. Sum: 65,200 input tokens replayed across the loop. Add the per-step output (roughly 400 tokens × 8 = 3,200 output tokens at $15/1M) and you bill $0.196 input + $0.048 output = **$0.244 per task**.
Now imagine the same task running as a single LLM call on the same model — 5,000 input + 1,000 output = $0.015 + $0.015 = **$0.030**. The agent costs 8x more than a single call for the same final answer. Multiply across 10,000 tasks/day and you're paying $2,440/day for what a non-agentic version would cost $300/day.
This is the multiplication effect: every static byte in your prefix gets re-billed once per step. A 5k prefix on an 8-step task is 40k repeated input tokens before you've sent a single dynamic byte. Fixing this multiplication — not picking a smaller model — is the entire game.