The four components of tool use overhead
**Tool schema tokens (input, every call).** When you pass `tools=[...]` to the Claude or OpenAI API, those tool definitions are tokenized and billed as input tokens on every call — including calls where the model never invokes a tool. A 200-token tool definition costs $0.003 on Opus 4.7 per call; if you have 10 tools (2,000 tokens of schemas) and make 100,000 calls/month, that is $3,000/month from tool definitions alone, even if only 5% of calls actually use a tool. Source: Anthropic tool use overview, OpenAI function calling.
**Function call argument tokens (output, per tool invocation).** When the model decides to call a tool, it emits a structured JSON object specifying the tool name and arguments. This JSON is billed as output tokens — the most expensive token class. A typical function call argument block is 50-150 tokens (tool name ~10 tokens, argument JSON ~40-140 tokens depending on parameter complexity). At $75/1M output (Opus), a 100-token call argument costs $0.0075. For an agent that makes 3 tool calls per query at 100K queries/month: $0.0075 × 3 × 100K = $2,250/month in function call argument output alone.
**Tool result tokens (input, next turn).** After your code executes the tool, it returns a result that is fed back into the conversation as user-message input on the next turn. These tokens are billed at the standard input rate. Tool results are the most variable component — a search tool returning 3,000 tokens of raw web content adds $0.045 to the per-query cost on Opus 4.7, while a summarized 200-token result adds only $0.003. **Controlling tool result size is the highest-EV lever for reducing tool overhead.** Return extracted facts, not raw API responses.
**Tool result accumulation in multi-turn loops (input, every subsequent turn).** Tool results fed into the conversation don't disappear — they accumulate in the context and are replayed as input on every subsequent turn. A 500-token tool result added at turn 2 of a 5-turn loop is billed again at turns 3, 4, and 5 — $0.0075 × 3 additional billings on Opus. For a loop with 5 tool results of 500 tokens each, the accumulation overhead across a 5-turn loop is roughly: Σ tool_results × turns_remaining = 5 × 500 × (5−1)/2 average = 5,000 tokens of accumulated result overhead. At Opus: $0.075/query beyond the initial tool result input cost.
**Why tool overhead is underestimated.** Most teams estimate agent cost as: system prompt + user query + outputs. They forget to add: (1) tool schema tokens (constant overhead on every call), (2) function call argument output, (3) tool result input on the current turn, and (4) tool result accumulation in subsequent turns. For a 10-tool agent with 3 tool calls per turn and verbose results, the actual input token count can be 2-3x the estimate from prompt + query alone. Profile a real run with the `usage` field in the API response to see the actual numbers.
**The formula for total tool use overhead per turn:** `tool_overhead = schema_tokens × input_rate + call_args_tokens × output_rate + result_tokens × input_rate`. For the multi-turn case, add result accumulation: each prior turn's results are replayed as input. The clean calculation: `total_result_replay = Σ_{t=1}^{T} result_tokens_t × (T − t) × input_rate`, where T is total turns. This is why tool result truncation has compounding value — every token you cut from a tool result saves input cost on every turn that comes after it.