Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Tool Use Overhead Cost (2026): Quantifying Function Call Token Overhead

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

Every tool-use call has a hidden token overhead beyond your actual prompt and response. Tool schemas (the JSON definition of each tool's name, description, and parameters) are sent as input on every API call — even if the model doesn't call any tool that turn. Function-call arguments (the JSON the model emits when invoking a tool) are billed as output tokens. Tool results (the data your code returns after executing the tool) are billed as input tokens on the subsequent turn. In a 5-tool agent loop running 5 turns, these overheads can add 30-60% to the naive token estimate. See our agent loop cost calculator for the full loop model.

The scale of the overhead depends on three decisions you control: how many tools you define (schema tokens grow with tool count), how verbose your tool descriptions are (the single biggest variable in schema token cost), and how large your tool results are (the most impactful variable in multi-turn cost). A team that ships a 12-tool agent without profiling tool overhead often discovers they're spending 40-50% of their LLM budget on schema tokens alone. This page gives you the formula and the data to fix that before it hits your invoice. Pricing from Anthropic docs and OpenAI function calling docs, fetched June 2026.

Below: the full overhead taxonomy, per-component $/call math, the parallel tool call analysis (fan-out saves turns but not tokens), the schema optimization playbook, and a worked scenario comparing a naive 12-tool definition vs an optimized 12-tool definition on the same agent workload. For the orchestrator-plus-workers cost model, see our multi-agent cost per task calculator. For full agent loop cost with tool overhead included, see the agent loop cost calculator.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Tool use overhead cost per component — 4 models, June 2026

Feature
Claude Opus 4.7
Claude Sonnet 4.6
GPT-5.5
GPT-5.4
Input price ($/1M)$15.00$3.00$5.00$2.50
Output price ($/1M)$75.00$15.00$25.00$15.00
Single tool schema (200 tokens, $/call)$0.003$0.0006$0.001$0.0005
5-tool schema bundle (1K tokens, $/call)$0.015$0.003$0.005$0.0025
12-tool schema bundle (2.4K tokens, $/call)$0.036$0.0072$0.012$0.006
Function call arguments output (100 tokens, $/call)$0.0075$0.0015$0.0025$0.0015
Tool result input (500 tokens, $/call)$0.0075$0.0015$0.0025$0.00125
Tool result input (2K tokens, $/call)$0.030$0.006$0.010$0.005
1 tool call overhead (schema+call+result, 500-token result)$0.018$0.0036$0.006$0.003
3 parallel tool calls (shared schema, 3×500 result)$0.059$0.0117$0.0195$0.00975
Schema as % of 5K-input call cost (5-tool bundle)17%17%17%17%
Schema overhead at 100K calls/month (12 tools)$3,600$720$1,200$600

Sources, fetched 2026-06-21: Anthropic tool use overview (https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview), OpenAI function calling guide (https://platform.openai.com/docs/guides/function-calling). Schema token estimates: single tool = name (20 tokens) + description (100 tokens) + parameters JSON schema (80 tokens) = 200 tokens. 5-tool bundle = 5 × 200 tokens. 12-tool bundle = 12 × 200 tokens. Function call argument output = tool name (10 tokens) + argument JSON (90 tokens) = 100 tokens per call. Tool result input = 500 tokens for a typical search/lookup result; 2K tokens for a detailed API response. Schema token count assumes average verbosity — terse descriptions run 100 tokens/tool, verbose run 350 tokens/tool.

The four components of tool use overhead

**Tool schema tokens (input, every call).** When you pass `tools=[...]` to the Claude or OpenAI API, those tool definitions are tokenized and billed as input tokens on every call — including calls where the model never invokes a tool. A 200-token tool definition costs $0.003 on Opus 4.7 per call; if you have 10 tools (2,000 tokens of schemas) and make 100,000 calls/month, that is $3,000/month from tool definitions alone, even if only 5% of calls actually use a tool. Source: Anthropic tool use overview, OpenAI function calling.

**Function call argument tokens (output, per tool invocation).** When the model decides to call a tool, it emits a structured JSON object specifying the tool name and arguments. This JSON is billed as output tokens — the most expensive token class. A typical function call argument block is 50-150 tokens (tool name ~10 tokens, argument JSON ~40-140 tokens depending on parameter complexity). At $75/1M output (Opus), a 100-token call argument costs $0.0075. For an agent that makes 3 tool calls per query at 100K queries/month: $0.0075 × 3 × 100K = $2,250/month in function call argument output alone.

**Tool result tokens (input, next turn).** After your code executes the tool, it returns a result that is fed back into the conversation as user-message input on the next turn. These tokens are billed at the standard input rate. Tool results are the most variable component — a search tool returning 3,000 tokens of raw web content adds $0.045 to the per-query cost on Opus 4.7, while a summarized 200-token result adds only $0.003. **Controlling tool result size is the highest-EV lever for reducing tool overhead.** Return extracted facts, not raw API responses.

**Tool result accumulation in multi-turn loops (input, every subsequent turn).** Tool results fed into the conversation don't disappear — they accumulate in the context and are replayed as input on every subsequent turn. A 500-token tool result added at turn 2 of a 5-turn loop is billed again at turns 3, 4, and 5 — $0.0075 × 3 additional billings on Opus. For a loop with 5 tool results of 500 tokens each, the accumulation overhead across a 5-turn loop is roughly: Σ tool_results × turns_remaining = 5 × 500 × (5−1)/2 average = 5,000 tokens of accumulated result overhead. At Opus: $0.075/query beyond the initial tool result input cost.

**Why tool overhead is underestimated.** Most teams estimate agent cost as: system prompt + user query + outputs. They forget to add: (1) tool schema tokens (constant overhead on every call), (2) function call argument output, (3) tool result input on the current turn, and (4) tool result accumulation in subsequent turns. For a 10-tool agent with 3 tool calls per turn and verbose results, the actual input token count can be 2-3x the estimate from prompt + query alone. Profile a real run with the `usage` field in the API response to see the actual numbers.

**The formula for total tool use overhead per turn:** `tool_overhead = schema_tokens × input_rate + call_args_tokens × output_rate + result_tokens × input_rate`. For the multi-turn case, add result accumulation: each prior turn's results are replayed as input. The clean calculation: `total_result_replay = Σ_{t=1}^{T} result_tokens_t × (T − t) × input_rate`, where T is total turns. This is why tool result truncation has compounding value — every token you cut from a tool result saves input cost on every turn that comes after it.


Schema token costs: the constant per-call overhead

**Schema tokens are billed as input on every API call**, regardless of whether any tool is invoked. This means a 2,000-token tool bundle passed to 1,000,000 calls/month costs: 1M × 2,000 × $3/1M = $6,000/month on Sonnet 4.6, or $30,000/month on Opus 4.7 — from schema tokens alone, before any actual task content. **Most teams with 10+ tools have their biggest cost lever hiding in their tool definitions.**

**How verbose are your tool schemas?** The token count per tool scales with: (1) tool name (short = good; `search_web` is 4 tokens, `perform_comprehensive_web_search_with_query_refinement` is 12 tokens), (2) description (this is the largest variable — a one-sentence description is ~50 tokens; a detailed multi-sentence description with examples is 200-400 tokens), (3) parameter schemas (one string parameter = ~30 tokens; a complex nested object with 5 fields each with type + description = 200 tokens). The median well-described tool is 150-250 tokens; verbose tools with examples run 400-600 tokens each.

**Tool schema optimization playbook.** Audit each tool definition: cut description to the minimum sentence that allows correct invocation. Delete parameter descriptions that just restate the parameter name ('`query` (string): the search query' adds nothing over '`query` (string)'). For optional parameters with sensible defaults, remove them from the schema entirely and hardcode the defaults in your tool implementation. Remove tools that are never called — run usage logging and kill tools with <1% invocation rate. A typical optimization pass cuts tool schema size by 30-50%.

**Dynamic tool loading.** Instead of passing all tools on every call, pass only the tools relevant to the current step. A 10-tool agent where 3 tools are relevant per turn on average saves (10−3) × 200 tokens × calls × input_rate per call. At 100K calls/month on Opus 4.7: 100K × 1,400 tokens × $15/1M = $2,100/month saved. LangGraph and CrewAI both support per-step tool injection via state-conditional tool selection. Source: LangGraph tool node docs.

**Caching tool schemas.** On Anthropic, you can cache the tools array by including it in the cached prefix block (the `cache_control: {type: 'ephemeral'}` marker on the system message that precedes the tools parameter). On OpenAI, tool definitions cache automatically as part of the prompt prefix if they're stable. Cached schema tokens bill at 10% of standard input rate. For 2,000 schema tokens cached at Opus 4.7's $1.50/1M cache rate vs $15/1M standard: savings = 2,000 × ($15 − $1.50) / 1M × calls = $0.027 per call. At 100K calls/month: $2,700/month from schema caching alone. Source: Anthropic caching guide.

**Real schema size comparison: naive vs optimized 5-tool bundle.** Naive version (verbose descriptions + nested schemas): web_search (~380 tokens), code_execution (~420 tokens), file_read (~210 tokens), calculator (~180 tokens), send_email (~350 tokens). Total: 1,540 tokens. Optimized version (terse descriptions + flat schemas): web_search (~100 tokens), code_execution (~130 tokens), file_read (~70 tokens), calculator (~50 tokens), send_email (~90 tokens). Total: 440 tokens. **71% reduction in schema overhead.** At Opus 4.7, 100K calls/month: $(1,540 − 440) × 100K × $15/1M = $1,650/month from schema optimization alone.


Function call argument tokens: the output overhead

**Function call arguments are billed as output tokens** — at 5x the input price on Opus 4.7 ($75/1M vs $15/1M input). Every tool invocation emits a JSON block containing the tool name and its arguments. The size of this block depends on parameter complexity: a simple `{"query": "search term"}` is about 8 tokens; a complex `{"query": "...", "filters": {"date_range": "...", "sources": [...], "limit": 10, "mode": "semantic"}}` can be 80-120 tokens.

**Minimize argument verbosity where possible.** Use short parameter names (`q` instead of `search_query_string`), use enums instead of free text where the space is finite (`mode: 'fast' | 'deep'` instead of a free-text instructions field), and avoid redundant arguments that your tool implementation can infer from context. Each token saved from function call arguments saves at output rate — 5x more expensive per token than the equivalent schema token savings.

**Parallel tool calls emit multiple argument blocks in one response.** When the model fans out 3 tool calls in parallel, it emits 3 JSON argument blocks in the same output — all billed at output rate. For 3 tools at 80 arguments tokens each: 240 output tokens × $75/1M (Opus) = $0.018 per parallel fan-out. Compared to 3 sequential calls where context accumulates between turns, parallel is still usually cheaper (saves 2 full input replays) but the output token cost is identical whether calls are parallel or sequential. Source: Anthropic tool use docs.

**Structured output arguments via tool forcing.** If you use tool forcing (forcing the model to call a specific tool as a structured output mechanism), the 'function call arguments' are your entire structured output — potentially 500-2,000 tokens. This is fine for structured extraction tasks but surprises teams who are used to thinking of tool arguments as small. A forced tool with a 1,500-token JSON output costs 1,500 × output_rate — the same as any other 1,500-token completion. The tool-forcing pattern is not free. Source: OpenAI function calling docs.

**Tool call argument cost at scale.** For an agent making 3 tool calls per query with 80 tokens average argument output: 3 × 80 = 240 output tokens per query. At 100K queries/month on GPT-5.5: 100K × 240 × $25/1M = $600/month from function call arguments. Modest in isolation, but combine with schema overhead ($1,200/month) and result tokens ($2,500/month) and the tool overhead total is $4,300/month — 30-40% of total agent cost for a mid-complexity workload. Profile each component separately to know which to optimize first.


Tool result tokens: the biggest variable

**Tool results are the most impactful variable in per-query tool overhead** — and the one you have the most control over. A web search API returning full document excerpts might return 3,000-5,000 tokens per call; a properly designed search wrapper that extracts relevant sentences returns 200-400 tokens. That 10-15x difference in result size directly multiplies your tool overhead cost and amplifies through context accumulation on every subsequent turn.

**The compounding effect of large results.** In a 5-turn agent, a 2,000-token result fed in at turn 2 is replayed as input at turns 3, 4, and 5. Total billing for that one tool result: 4 × 2,000 = 8,000 input tokens across the loop. At Opus 4.7: 8K × $15/1M = $0.120 from a single tool result. If you had returned a 200-token summary instead: 4 × 200 = 800 input tokens, $0.012. **$0.108 saved per query from truncating one tool result.** At 100K queries/month: $10,800/month from that single tool result size decision.

**Tool result processing patterns.** (1) API-level truncation: configure your tool wrapper to hard-truncate results at N tokens/characters before returning. Set N based on what the agent actually needs — typically 300-600 tokens for search results, 100-200 tokens for API status responses. (2) LLM-based summarization: add an intermediate summarization step that compresses tool results before feeding them to the main agent. This adds one small LLM call (200 tokens in, 150 tokens out on GPT-5.4-mini = $0.0005) to save $0.108 on Opus. Net positive EV at any call volume. (3) Extraction functions: post-process structured API responses with code to extract only the relevant fields, not the full JSON. Return `{'temperature': 72, 'condition': 'sunny'}` not the full 800-token weather API response.

**Binary vs text results.** Code execution tools often return both stdout and error messages — potentially thousands of tokens for a verbose stack trace. Capture only the last 200 lines of stdout and the first 500 characters of error messages; the full trace is rarely needed by the LLM for its next decision. File reading tools should never return entire files — return the relevant section (lines 50-75 of the error module, not the whole 400-line file). Anthropic's tool use best practices explicitly recommend result preprocessing.

**Parallel result aggregation.** When 3 tools run in parallel (fan-out pattern), all 3 results arrive simultaneously and are fed into the next turn's input as a batch. Total result input: 3 × result_size. On Opus 4.7 with 500-token results: 3 × 500 × $15/1M = $0.0225 per fan-out on the result-receipt turn, plus the same 1,500 tokens replay on every subsequent turn. The parallel execution saves you 2 full context replays vs sequential (no intermediate turns) but pays identical result costs. **For large result sizes (>1,000 tokens each), parallel tool calls can cost more than sequential** because 3 × 1,000 tokens hits the model's context budget faster and triggers more aggressive context pruning. Use parallel calls for small, fast results; use sequential for large results that need incremental reasoning.

**Worst-case: the documentation scraper agent.** Teams building agents that read documentation via web scraping often feed 5,000-10,000 tokens of raw HTML per tool call. At 5 scraper calls × 7,500 tokens average × Opus rate × 5 turns of accumulation: 5 × 7,500 × 5 × $15/1M = $2.81 per query in tool result tokens alone — more than the entire agent cost budget for most use cases. Fix: use a dedicated scraping preprocessor that extracts text + structure and returns under 800 tokens per page, or use a retrieval system that returns only the 3 most relevant paragraphs.


Parallel tool calls: fan-out cost analysis

**Parallel tool calling** is the pattern where the model emits multiple tool call blocks in a single response turn, your code executes them concurrently, and you return all results in a single user message. The benefit is latency — 3 tools that each take 500ms complete in 500ms total instead of 1,500ms sequential. The cost impact is different from what most teams assume.

**Parallel calls save input turns, not tokens.** Consider 3 sequential tool calls vs 3 parallel in the same query: Sequential: turn 1 (system + query + 1 tool call) → result 1 → turn 2 (+ result 1 + 1 tool call) → result 2 → turn 3 (+ results 1,2 + 1 tool call) → result 3 → turn 4 (synthesis with all results). Input tokens: 3,000 + 3,500 + 4,000 + 4,500 = 15,000. Parallel: turn 1 (system + query, emits 3 tool calls) → results 1,2,3 together → turn 2 (system + query + 3 tool calls + 3 results, synthesis). Input tokens: 3,000 + (3,000 + 3 × 500 + 3 × 100) = 3,000 + 4,800 = 7,800. **Parallel saves approximately 7,200 input tokens — nearly half the input cost** — by eliminating the 2 intermediate turns.

**Parallel is better for cost AND latency when result sizes are small.** At Opus 4.7 with 500-token results: sequential total input cost 15K × $15/1M = $0.225; parallel total input cost 7.8K × $15/1M = $0.117. 48% savings. At Sonnet 4.6: sequential $0.045, parallel $0.0234. Both models benefit proportionally — the savings are the same percentage regardless of model tier because the token reduction is the same.

**Parallel can cost more when result sizes are large.** If each result is 3,000 tokens (not 500): parallel turn 2 has 3,000 + 3 × 100 (call args) + 3 × 3,000 (results) = 12,300 input tokens. Sequential turn 4 has 3,000 + 3 × 100 + 3 × 3,000 = 12,300 input tokens for synthesis. Sequential also has intermediate turns 2 and 3 at 3,100 and 6,200 tokens respectively. Still, parallel comes out ahead (12,300 total input) vs sequential (3,100 + 6,200 + 9,300 + 12,300 = 30,900 total). **Parallel is nearly always cheaper for input cost.** The context accumulation in sequential is the dominant factor.

**Output cost is identical for parallel vs sequential.** Each tool call argument block is billed at output rate regardless of whether calls are parallel or sequential. 3 parallel calls emit 3 × 80 = 240 output tokens; 3 sequential calls also emit 3 × 80 = 240 output tokens. No savings there. Source: OpenAI parallel function calling docs, Anthropic parallel tool use docs.

**When the model refuses to parallelize.** Claude Opus 4.7 tends to be more conservative than GPT-5.5 about parallel tool calling — it often calls 1-2 tools per turn even when multiple are available, reasoning that it needs the first result to inform the second call. This is appropriate behavior for genuinely dependent tasks (you need search result A before you can formulate search query B) but suboptimal for independent tasks (look up 3 company names in a database simultaneously). To encourage parallel calls: (1) add explicit instruction in the system prompt: 'When multiple independent information lookups are needed, call them in parallel'; (2) ensure tool descriptions clearly state when tools are independent of each other; (3) use OpenAI's GPT-5.5 if parallel tool calling behavior is critical for your cost model.


Schema optimization: the highest-EV cost reduction

**Schema optimization is the easiest cost reduction because it requires zero changes to agent logic or model choice** — you only edit the tool definitions. Start by running your agent with usage logging enabled for one week to get the actual schema token count per call (visible in `usage.input_tokens` minus your known prompt + context size). Compare to the expected schema token count by counting tokens in your `tools` array. The gap is often surprising — verbose descriptions can add 500-1,000 tokens per tool.

**Tier 1 optimization: trim descriptions to one sentence per tool.** Most tool descriptions that span multiple sentences or include examples can be cut to a single sentence without affecting model behavior. Anthropic's models and GPT-5 are strong enough to infer correct usage from a concise description. 'Search the web for information relevant to a query. Returns the top 10 organic search results with titles, URLs, and 150-word excerpts. Prefer this tool when the user needs current information or factual grounding. Always specify a clear, focused search query for best results.' (78 tokens) → 'Search the web for current information. Returns top results with excerpts.' (14 tokens). 64-token reduction. At Opus, 100K calls/month: $0.96/month per tool from this optimization.

**Tier 2 optimization: strip unnecessary parameter descriptions.** Parameter descriptions that restate the parameter type or name ('`limit` (integer): the maximum number of results to return') add tokens with zero informational value — the parameter name and type already communicate this. Delete them. Keep descriptions only for parameters with non-obvious semantics (enum values that need explanation, parameters with unusual units or formats). On a typical 5-tool bundle, parameter description cleanup saves 150-300 tokens per bundle.

**Tier 3 optimization: use dynamic tool loading.** Profile which tools are actually invoked per query type and load only the relevant tools per step. A 3-phase agent (research phase: web search + calculator; drafting phase: text tools + formatter; review phase: checker tools) can load a different 3-4 tool subset per phase instead of all 12 tools constantly. Implementation: in LangGraph, use conditional edges that pass different tool subsets based on the current state phase. In CrewAI, configure each agent with only its role-relevant tools. Expected savings: 40-60% reduction in schema tokens for agents with clear phase separation.

**Tier 4 optimization: schema caching.** Once your tool schemas are stable (after Tier 1-3 optimization), ensure they're in the cacheable prefix of your prompt. On Anthropic, include tools in the system message block that carries `cache_control: {type: 'ephemeral'}`. On OpenAI, ensure tools come at the start of the message array and are identical across calls (no per-call dynamic tool injection unless necessary). Cached schema tokens on Anthropic bill at $0.30/1M (Sonnet) instead of $3/1M — a 90% reduction on schema costs for the cached portion. This is the largest single lever for high-call-volume agents.

**Putting it together: 12-tool agent, 100K calls/month on Sonnet 4.6.** Baseline (verbose, uncached): 12 × 350 tokens = 4,200 schema tokens. Monthly cost: 100K × 4,200 × $3/1M = $1,260/month. After Tier 1-2 optimization (150 tokens/tool): 12 × 150 = 1,800 tokens. Monthly cost: $540/month. After dynamic loading (average 4 tools active, 150 tokens/tool): 600 schema tokens. Monthly cost: $180/month. After caching on Anthropic (90% discount): $18/month. **From $1,260 to $18/month in schema costs — a 99% reduction** — purely from tool definition hygiene with no change to agent logic or model. Source: Anthropic tool use overview.


Worked scenario: naive vs optimized 10-tool agent

**Naive agent configuration.** Tools: web_search, code_execute, file_read, file_write, calculator, send_email, calendar_create, database_query, screenshot, send_slack. Tool schemas: verbose descriptions + nested parameter schemas. Average tokens per tool: 380. Total schema bundle: 3,800 tokens. Tool results: raw API responses, not truncated. Average result size: 1,800 tokens. Agent runs 3 tool calls per query, 4 turns per query. Queries: 50,000/month. Model: Claude Sonnet 4.6 ($3/$15).

**Naive monthly cost.** Schema tokens: 50K × 4 turns × 3,800 × $3/1M = $2,280/month. Function call arguments: 50K × 3 calls × 80 tokens × $15/1M = $180/month. Tool result input (current turn): 50K × 3 calls × 1,800 × $3/1M = $810/month. Tool result accumulation (results replayed in later turns): 50K × 1,800 × 3 results × average 1.5 subsequent turns × $3/1M = $1,215/month. Task prompt + output: 50K × (2,000 context + 500 output) × effective_rate = $600/month. **Naive total: $5,085/month.**

**Optimized agent configuration.** Tool descriptions trimmed to 1 sentence, parameter descriptions stripped for obvious parameters. Average tokens per tool: 130. Total schema bundle: 1,300 tokens. Dynamic loading: 4 tools per turn average (300 tools active tokens vs 1,300 full bundle). Schema cached on Anthropic (90% discount). Tool results: LLM-summarized to 300 tokens average per result. Tool calls parallelized: 3 parallel calls in 1 output turn instead of 3 sequential turns — reduces to 2 total turns per query.

**Optimized monthly cost.** Schema tokens (cached 4-tool subset, 600 tokens, 90% discount): 50K × 2 turns × 600 × $0.30/1M = $18/month. Function call arguments: 50K × 3 calls × 80 tokens × $15/1M = $180/month (same). Tool result input (current turn, 300 tokens): 50K × 3 calls × 300 × $3/1M = $135/month. Tool result accumulation (parallel: all 3 results arrive at same turn, only 1 subsequent turn): 50K × 300 × 3 × 0.5 avg subsequent turns × $3/1M = $67.50/month. Task prompt + output (2 turns vs 4): $300/month. **Optimized total: $700.50/month.**

**Result: 86% cost reduction.** From $5,085/month to $700/month — $4,385/month or $52,620/year — with no change to the model, no quality sacrifice, and no architectural redesign. The four levers: schema trimming ($2,262 saved), dynamic loading ($1,080 saved), result truncation ($1,890 saved), parallel tool calls ($153 saved). **Tool result size and schema overhead dominate; parallel execution is the smallest lever.** Fix them in order of EV. Source: Anthropic tool use overview, OpenAI function calling guide.

**The production profiling workflow.** (1) Enable usage logging: log `usage.input_tokens` and `usage.output_tokens` for every API call. (2) In parallel, log your known prompt + context token count (exclude tools). (3) The delta is your tool overhead. (4) If tool overhead exceeds 20% of total input tokens, schema optimization is the next action. (5) Log tool invocation frequency per tool — tools called on fewer than 5% of queries are candidates for removal or dynamic loading. This profiling run takes one week and typically surfaces $500-$5,000/month in low-hanging optimizations.


Claude vs OpenAI tool use: API differences and cost implications

**Anthropic's tool schema format** uses `{name, description, input_schema}` — flat, no enclosing wrapper. OpenAI's format uses `{type: 'function', function: {name, description, parameters}}` — one extra nesting level that adds ~15 tokens per tool definition in the raw JSON serialization. At 10 tools and 100K calls/month, that is 10 × 15 × 100K × input_rate in overhead from the extra wrapper. On GPT-5.5: 15M tokens × $5/1M = $75/month from schema format verbosity alone. Not huge, but it's a real token — use terse field names in your OpenAI tool definitions to compensate. Source: Anthropic tool use docs, OpenAI function calling docs.

**Tool result format differs.** On Anthropic, tool results are returned as `tool_result` content blocks within a user message: `{role: 'user', content: [{type: 'tool_result', tool_use_id: '...', content: '...'}]}`. The wrapper JSON (`type`, `tool_use_id` fields) adds ~20 tokens overhead per result. On OpenAI, tool results are a `tool` role message: `{role: 'tool', tool_call_id: '...', content: '...'}`. The wrapper is similarly ~15-20 tokens. Negligible at small scale; at 1M tool results/month: 20M tokens of wrapper overhead = $60 on Sonnet. Not worth special optimization but worth knowing when debugging unexpectedly high token counts.

**Anthropic supports tool caching more explicitly.** Because Anthropic's caching is explicit (you mark the cache breakpoint), you can ensure the tools array is consistently cached by always placing it before the first user message and marking the system block with `cache_control`. OpenAI's automatic caching requires the tools array to be identical and at a stable prefix position — any variation (dynamic tool injection, per-user tool customization) breaks the cache. For agents with strictly static tool sets, both providers cache tools effectively. For agents with dynamic tools, Anthropic's explicit model gives you more control.

**Strict mode vs tool-forcing for structured output.** OpenAI's `response_format: {type: 'json_schema', strict: true}` produces guaranteed-valid structured output without the function-call overhead — no tool call arguments, no tool result round-trip. On Anthropic, structured output typically uses tool forcing (define a single output tool, force the model to call it) — which does incur function call argument output tokens. For high-volume structured extraction tasks, OpenAI's strict mode has a 10-30% cost advantage over Anthropic's tool-forcing approach. Source: OpenAI structured outputs docs.

**The computer-use tool is expensive.** Anthropic's Claude Computer Use API (Opus 4.7 supported) sends screenshots as vision input and emits coordinate-click actions as tool call arguments. A typical screenshot is 1,000-3,000 tokens as vision input. A web browsing session with 10 screenshot-click cycles: 10 × 2,000 vision tokens (input) + 10 × 50 action tokens (output) = 20,500 tokens. At Opus: 20K × $15/1M + 500 × $75/1M = $0.300 + $0.0375 = $0.3375 per browsing session in tool overhead. Computer use agents are expensive by nature — budget $0.30-$1.00 per task and use them only when the task genuinely requires UI interaction that cannot be replaced with an API call.


Sourcing and measurement methodology

**Pricing sourced June 2026.** Anthropic: Sonnet 4.6 $3/$15, Opus 4.7 $15/$75, cache read $0.30/$1.50 per 1M from Anthropic pricing. OpenAI: GPT-5.5 $5/$25, GPT-5.4 $2.50/$15, cache read $0.50/$0.25 per 1M from OpenAI pricing. All prices from official pages on date of publication.

**Schema token measurement methodology.** Token counts estimated using Anthropic's tokenizer for Claude models and tiktoken (cl100k_base) for OpenAI models. Tool definitions tokenize slightly differently between providers due to schema format differences — estimates are accurate to ±10%. For exact counts, use `anthropic.count_tokens()` with your actual `tools` array or the OpenAI token counter tool. Never guess — measure.

**Tool result token measurement.** All result sizes are based on common API response shapes: web search results (top 5 results × 100-token excerpt = 500 tokens), code execution output (varies widely; budgeted at 300 tokens for median), database lookups (50-200 tokens for typical SQL responses), file reads (depends entirely on file; model at 500 tokens for a typical section). Measure your actual tool results in production — they vary more than any other component.

**This page is updated quarterly.** API schema formats, caching mechanics, and pricing all evolve. Verify against live sources: Anthropic tool use overview, OpenAI function calling guide, and both providers' pricing pages before making tool architecture decisions that affect monthly spend over $500.

Measure and reduce tool use overhead in 5 steps

  1. 1

    Profile actual tool overhead: enable usage logging for one week

    Log `usage.input_tokens` and `usage.output_tokens` from every API call. Separately log your prompt + context token count (excluding tools). The delta between logged input tokens and your estimated prompt tokens is your schema + tool result overhead. If it exceeds 25% of total input tokens, schema optimization is the next action. Log tool invocation frequency per tool to identify low-use tools for removal.

  2. 2

    Trim tool descriptions to one sentence each

    Rewrite every tool description as a single sentence that covers: what the tool does, when to use it, and what it returns. Delete multi-sentence descriptions, inline examples, and usage notes — Claude and GPT-5 infer correct usage from concise descriptions. This single change typically cuts schema size by 40-60% with no change to model behavior. Verify by running your eval suite before and after.

  3. 3

    Set hard limits on tool result size

    Add a post-processing wrapper to every tool that truncates the result to a maximum of N tokens before returning it to the agent. Set N based on what the agent actually needs: search results → 400 tokens, API status → 100 tokens, file reads → 500 tokens (extract the relevant section, not the full file), code output → last 200 lines. Test that your agent's behavior is unchanged — most agents don't need more than 400 tokens of result context to make the next decision correctly.

  4. 4

    Enable parallel tool calling for independent lookups

    Identify turns where the agent makes multiple independent lookups sequentially (search query A, then search query B, then database lookup). Explicitly instruct the model to call these in parallel: 'When multiple independent information lookups are needed, call all relevant tools simultaneously.' Parallel calling reduces turn count, eliminating intermediate input replays. Budget: saves approximately 1 full context replay per parallelized batch — measured in input tokens saved at each turn that would have been an intermediate single-tool turn.

  5. 5

    Cache tool schemas on every call

    On Anthropic: place your tools in the system message block that carries `cache_control: {type: 'ephemeral'}`. Verify the cache is hitting by checking `usage.cache_read_input_tokens` in API responses — it should equal your schema token count on cache-hit calls. On OpenAI: ensure your tool definitions are identical across calls and positioned at the start of the message array. Track cache hit rate; anything below 70% on stable tool sets means something is invalidating the cache (dynamic tool injection, unstable serialization order).

Frequently Asked Questions

How many tokens does a tool schema add to each API call?

A well-written tool schema (concise description + flat parameters) is 100-200 tokens per tool. A verbose schema with multi-sentence descriptions and nested parameter objects is 300-600 tokens. On Anthropic, tool schemas are billed as input tokens on every API call regardless of whether any tool is invoked. 10 tools at 200 tokens each = 2,000 schema tokens per call. At 100K calls/month on Sonnet 4.6 ($3/1M): $600/month from schema overhead alone. Source: Anthropic tool use overview (https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview).

Are tool schemas billed even if no tool is called?

Yes. On both Anthropic and OpenAI, tool definitions passed in the API request are tokenized and billed as input tokens on every call, whether or not the model invokes any tool. This is why schema optimization matters most for agents that call tools infrequently — you pay the schema overhead on every call, but only pay function call argument and result tokens when a tool is actually invoked. Use dynamic tool loading to pass only relevant tools when invocation rate is low.

Are function call arguments billed as input or output tokens?

Output tokens. When the model emits a tool call (specifying the tool name and argument JSON), those tokens are billed at the output token rate — the more expensive rate. On Opus 4.7 ($75/1M output), 100 argument tokens cost $0.0075 per call. On Sonnet 4.6 ($15/1M output), the same costs $0.0015. Minimize argument verbosity: use short parameter names and avoid redundant fields that your tool implementation can infer.

How do tool results accumulate cost in multi-turn agent loops?

Tool results fed into the conversation at turn T are replayed as input on every subsequent turn T+1, T+2, etc. A 1,000-token result added at turn 2 of a 5-turn loop is billed again at turns 3, 4, and 5 — 3 additional input billings. At Opus 4.7 ($15/1M): 3 × 1,000 × $15/1M = $0.045 in accumulation cost from that single result. Truncate tool results aggressively; every token you cut saves on every downstream turn.

Do parallel tool calls cost less than sequential calls?

Parallel tool calls save input cost by eliminating intermediate turns — when 3 tools run in parallel (1 output turn, 1 result-receipt turn), you avoid the 2 intermediate single-tool turns that sequential execution requires. The savings are approximately the input token cost of 2 full context replays. Function call argument and tool result token costs are identical whether calls are parallel or sequential. For small result sizes (under 500 tokens each), parallel execution saves 30-50% on input tokens. Source: Anthropic tool use docs, OpenAI function calling guide.

What is the cost difference between Anthropic and OpenAI for tool use?

Schema format differences account for ~15 tokens/tool more on OpenAI (extra function wrapper nesting) vs Anthropic. Tool result format overhead is similar (~20 tokens per result on both). The main cost difference is model pricing: Sonnet 4.6 ($3/$15) is cheaper than GPT-5.5 ($5/$25) for tool-heavy agents. Caching depth differs: Anthropic's 90% cache discount on schema tokens is more aggressive than OpenAI's 50%, making high-call-volume tool agents meaningfully cheaper on Anthropic when schemas are stable.

How much can I save by optimizing tool schemas?

Typically 40-70% of schema token overhead. A 12-tool bundle trimmed from 350 to 130 tokens per tool (63% reduction) saves 12 × 220 = 2,640 tokens per call. At 100K calls/month on Sonnet 4.6: 264M tokens × $3/1M = $792/month from schema trimming alone. Add dynamic loading (average 4 tools per call) and caching (90% discount on Anthropic): total schema cost drops from ~$1,260/month to ~$18/month. Schema optimization has the best ROI of any tool use optimization.

How do I profile tool use overhead in my agent?

Log the `usage` field from every API response: `input_tokens`, `output_tokens`, `cache_read_input_tokens`, `cache_creation_input_tokens`. Separately estimate your prompt + context tokens by tokenizing your messages array without the tools parameter. The difference between `input_tokens` and your estimated prompt tokens is your schema + tool result overhead. Also log tool invocation frequency per tool name — tools called under 5% of the time are candidates for dynamic loading or removal.

Tight tool descriptions pay dividends on every API call.

Our AI Prompt Generator writes concise, cache-anchored tool definitions that cut schema overhead by 40-70% — with no change to model behavior. Works with Claude and GPT-5 tool use. 14-day free trial, no card.

Browse all prompt tools →