Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Anthropic Tool Use Limits 2026: Max Tools, Token Costs, Parallel Calls, and Caching

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

If you're already familiar with Claude's overall request quotas, start with our Claude API rate limits 2026 page — it covers tokens-per-minute (TPM), requests-per-minute (RPM), and tier-by-tier breakdowns that apply to every Claude call, tool use included. Understanding those ceilings before you design your tool use architecture will save you from hitting ITPM walls mid-production.

Tool use on the Anthropic API — documented at https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview — lets Claude call structured functions you define, inspect the results, and continue reasoning until a task is complete. The system is flexible and supports parallel tool calling, JSON schema validation, prompt caching for tool definitions, and the specialized computer_use beta. But every one of those features interacts with Claude's billing and rate-limit system in ways that aren't always obvious from a first read of the docs.

This page covers the hard numbers: the 64-tool ceiling, token costs per tool definition, parallel fan-out behavior across Opus 4.7 and Sonnet 4.6, JSON schema restrictions, computer_use beta rules, and how to cache tool definitions for a 90% cost reduction. See also our Claude API cost calculator, the OpenAI to Claude migration tutorial if you're porting an existing tool use agent, and the Claude API rate limits 2026 reference for the full ITPM and RPM picture.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Anthropic tool use limits — June 2026

Feature
Limit
Value
Notes
Max tools per request64Hard limit as of June 2026; most production systems use 5–20
Max tool name length64 charactersASCII alphanumeric and underscores only; no spaces or special chars
Max tool description length~8,192 tokensCounted as input tokens; verbose descriptions inflate cost
Parallel tool calls per turnUp to 64Model decides how many to fan out; depends on tool_choice setting
Tool result max size~100,000 tokensCounted as input tokens on next turn; truncate API responses before returning
computer_use tool accessRestricted — betaRequires X-Anthropic-Beta: computer-use-2024-10-22 header; Sonnet 4.6 + Opus 4.7 only
tool_choice optionsauto / any / tool"any" forces at least one call; named tool forces a specific function
JSON schema supportDraft-07 subsetNo $ref, no recursive schemas; properties/required/enum/array/nested all supported
Cache eligibilityYesTool definitions cacheable as part of system prompt prefix; 90% discount on cache reads
Token cost of tool definitionsStandard input rateBilled as input tokens at model-specific rate ($3/M Sonnet 4.6, $15/M Opus 4.7)
ITPM impactFull input token consumptionTool defs + results count toward ITPM; cache_read_input_tokens are exempt from ITPM
Streaming tool callsSupportedtool_use blocks emitted as partial_json deltas in stream; handle incomplete JSON safely

Sources, fetched 2026-06-21: https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview, https://docs.anthropic.com/en/docs/about-claude/pricing

The 64-tool ceiling and why it rarely binds

Anthropic enforces a hard limit of 64 tools per API request. This number sounds generous until you consider that a well-designed tool use agent rarely needs more than 10–15 tools at once, and the real constraint almost always surfaces somewhere else — usually in the token budget consumed by tool definitions themselves. **The 64-tool ceiling exists as a safety rail, not a design target.**

In practice, production systems typically register 5 to 20 tools per request. A customer support agent might need search_knowledge_base, create_ticket, update_ticket, lookup_order, and send_email — that's five tools and a clean, focused design. A more complex coding assistant might add read_file, write_file, run_tests, search_web, and get_documentation, landing around ten. These are the natural size ranges for effective Claude agents, and they're well below the 64-tool ceiling.

The ceiling becomes a genuine constraint in two specific patterns: **broad agent architectures** that try to expose every possible capability to a single Claude instance, and **tool discovery patterns** where the agent's tool list is dynamically generated from a large tool registry. In both cases, the recommended solution is tool namespacing — grouping related operations under a single dispatcher tool — or dynamic tool injection, where you only include the tools relevant to the current task context.

Dynamic tool injection is the more powerful approach. Rather than passing all 64 tools on every request, you analyze the user's message, select the 5–10 most likely relevant tools, and inject only those. This keeps your input token overhead low, makes the model's decisions easier (fewer options to choose from), and leaves headroom for future tool additions. The Anthropic documentation at https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview describes this pattern in the context of agentic loops.

Tool namespacing works differently: you define a single tool like call_internal_api with parameters action (an enum of all available operations) and payload. Claude calls this one tool and your server routes internally. This trades the structured output quality of individual tools for a lower count, and it's a reasonable compromise when you genuinely need more than 64 distinct operations in a single agent context. The tradeoff is that Claude gets less schema guidance per operation, which can hurt parameter accuracy.


Tool definition token cost: the hidden budget item

Every tool definition you pass to the Anthropic API is billed as input tokens at the standard per-model rate. This is the most commonly overlooked cost driver in tool use agent design. A well-written tool definition — including name, description, and JSON schema for its parameters — typically consumes 200 to 500 tokens. **A production agent with 20 tools at 400 tokens each adds 8,000 tokens of overhead to every single API call.**

The cost math becomes significant at scale. On Claude Sonnet 4.6 ($3.00 per million input tokens), 8,000 tokens of tool overhead costs $0.024 per call. That might sound trivial until you run the numbers: 10,000 calls/day × $0.024 = $240/month just in tool definition overhead, before you count the actual message content or completions. On Claude Opus 4.7 ($15.00 per million input tokens), the same overhead costs $120/day at 10k calls. **Tool definition inflation is a real P&L line for high-volume agents.**

The antidote is prompt caching. When tool definitions are placed in the cacheable prefix block of your request, subsequent requests with the same tool set pay the cache read rate — which is approximately 90% cheaper than fresh input tokens. On Sonnet 4.6, the effective rate drops from $3.00/M to $0.30/M for cached tool definitions. The 8,000-token overhead now costs $0.0024 per call instead of $0.024, a 10x reduction. For agents that use a stable tool set across many calls, **caching tool definitions is the single highest-ROI optimization available.**

Concise descriptions also compound savings. A tool description that explains the same concept in 150 tokens instead of 400 tokens is not just aesthetically cleaner — it's cheaper by $0.00075 per call on Sonnet 4.6. Multiply that by 10 tools and 10,000 daily calls and you're saving $7.50/day, $225/month. The practice of writing tight tool descriptions — one sentence for the function, one sentence for when to use it, nothing more — pays real dividends at scale.

The token cost applies to tool results as well, not just definitions. When Claude calls a tool and your server returns a result, that result is included as input tokens in the next API call. A tool that returns a 5,000-token JSON blob on every call adds $0.015 per turn on Sonnet 4.6. Best practice: truncate, summarize, or filter tool results to the minimum information Claude needs to continue reasoning. A tool result containing a database query response should return the relevant rows, not the full schema metadata.

For Opus 4.7 specifically, the cost math is 5x more severe. **At $15/M input tokens, a single Opus call with 20 tools at 500 tokens each costs $0.15 in tool overhead alone**, before counting message content. Opus is best reserved for tasks requiring deep reasoning or complex judgment — not for high-frequency tool use loops where Sonnet 4.6 delivers comparable tool selection accuracy at one-fifth the cost. See the pricing page at https://docs.anthropic.com/en/docs/about-claude/pricing for current rates.


Parallel tool calling: how Claude fans out and how to control it

Claude supports parallel tool calling: in a single response, Claude can return multiple tool_use blocks, each requesting a different tool call. Your server executes these concurrently, collects all the results, and returns them in a single user turn before Claude continues. **Parallel tool calling can collapse what would be a 5-turn sequential agent loop into a 2-turn exchange**, dramatically reducing latency and cost for agent tasks with independent sub-problems.

The tool_choice parameter controls whether and how Claude uses tools. The default is tool_choice: { type: "auto" }, which lets Claude decide whether to call any tools, and if so, how many to fan out in parallel. This is the right setting for most general-purpose agents where you want Claude to reason about which tools are actually needed. The model will naturally batch tool calls that can be parallelized — for example, calling both search_product and get_user_profile simultaneously when both are needed to answer a product recommendation question.

Setting tool_choice: { type: "any" } forces Claude to make at least one tool call in its response. It cannot return a plain text answer without invoking a tool. This is the right setting for extraction pipelines, structured output workflows, and any scenario where you know a tool call is always warranted. Without tool_choice: any, Claude occasionally decides to answer directly from its parametric knowledge instead of calling the tool — which is usually the wrong behavior for deterministic data retrieval.

Setting tool_choice: { type: "tool", name: "specific_tool_name" } forces Claude to call exactly that tool. This is useful for deterministic extraction when you know precisely which schema you want the output in, or for testing that a specific tool is functioning correctly in your agent. It eliminates the model's decision overhead for tool selection and produces consistent, testable behavior.

**Opus 4.7 and Sonnet 4.6 exhibit meaningfully different parallel fan-out behavior.** Opus tends to be more conservative — it may make one tool call, analyze the result, then make a second rather than fanning out both in parallel. This is actually better behavior for tasks where the second tool call depends on the first result, but it increases latency for genuinely independent calls. Sonnet 4.6 is more aggressive about parallel fan-out, which is usually what you want for high-throughput agent pipelines. Haiku 4.5 also supports tool use but is best suited to simple, single-tool classification tasks rather than complex multi-tool orchestration.

For production agent architectures, design your server to handle parallel tool execution from the start. When Claude returns three tool_use blocks in one response, you should fire all three concurrently (using Promise.all in Node.js or asyncio.gather in Python), collect the results, and return them together. Serializing parallel tool calls is a common performance mistake that turns a 1-second agent turn into a 3-second agent turn.


JSON schema constraints: what's supported and what's not

Claude's tool use system validates tool parameter schemas against a subset of JSON Schema draft-07. Not all JSON Schema features are available, and attempting to use unsupported features can cause silent failures (Claude ignores the unsupported constraint) or API errors (if the schema fails server-side validation). **Knowing what's in and out of scope before you write your schemas saves debugging time.**

What is fully supported: type (string, number, integer, boolean, array, object), properties (with nested objects), required (array of required property names), enum (for constrained string/number values), array (with items specifying element schema), minLength/maxLength (for strings), minimum/maximum (for numbers), and nested object types up to several levels of depth. These cover the vast majority of real tool parameter structures and are the safe set to build on.

What is NOT supported or has limited support: $ref (JSON Schema references to external or self-defined schema fragments), recursive schemas (a schema that references itself), anyOf at the top level on some model versions (use a union-typed enum instead), and allOf/oneOf in complex compositions. **Attempting to use $ref will either cause an API error or be silently ignored — do not rely on schema references for tool definitions.** If your tool parameters are complex enough that you'd want $ref, simplify the schema by inlining the referenced definition.

Practical workarounds for common patterns: if you want to accept either a string or null, use type: ["string", "null"] rather than anyOf. If you want to describe a polymorphic input (one of several object shapes), use a single object type with all possible properties marked as optional, and document the valid combinations in the description field. This trades type system precision for compatibility.

**Schema validation before sending is strongly recommended for production systems.** Use the ajv library (Node.js) or jsonschema (Python) to validate your tool schemas against draft-07 before including them in API requests. Catching a malformed schema in a test is far less costly than discovering it causes Claude to misinterpret tool parameters in production. The Anthropic API will accept some invalid schemas without erroring — the failure mode is silent degradation in tool call accuracy.

One additional constraint: tool names must be 1–64 characters, composed only of ASCII alphanumeric characters and underscores. No hyphens, no dots, no camelCase with special characters. The practical naming convention that works well is snake_case (search_knowledge_base, create_support_ticket, get_user_profile). Names should be unambiguous verbs that describe what the tool does — Claude uses the name as part of its tool selection reasoning.


Computer-use tool: special rules and restrictions

The computer_use tool set is a specialized beta capability that lets Claude directly control a computer — taking screenshots, clicking elements, typing text, and executing bash commands. It is fundamentally different from function tools: instead of calling a structured function you define, Claude is issued raw computer control instructions. **Computer-use requires opt-in via a beta header and is restricted to specific Claude models.**

To enable computer-use, you must pass the header X-Anthropic-Beta: computer-use-2024-10-22 with your API request. Without this header, computer_use tool definitions will be rejected. The beta header requirement also means computer-use is subject to change — Anthropic may update the API contract, add new capabilities, or change the required header value as the feature moves toward GA.

**Computer-use is only supported on Claude Sonnet 4.6 and Claude Opus 4.7.** Haiku 4.5 does not support the computer_use beta tools. The recommended model for computer-use tasks is Sonnet 4.6 — it has strong performance for UI navigation and form-filling at a much lower cost per turn than Opus 4.7. Reserve Opus 4.7 for computer-use tasks requiring complex reasoning (multi-step form completion across multiple apps, long-horizon workflows with conditional logic).

The computer_use beta includes three specific tools: computer (for taking screenshots and controlling mouse/keyboard), text_editor (for viewing and editing file contents with undo/redo support), and bash (for running shell commands and capturing output). These tools work together in an agent loop where Claude observes the screen state, decides on an action, executes it, and then takes a new screenshot to observe the result of the action.

**Cost implications are significant**: every screenshot Claude takes is converted to image tokens and billed at the standard input token rate. A 1366×768 screenshot encodes to approximately 1,000–1,500 tokens depending on content complexity. An agent that takes 20 screenshots per task costs 20,000–30,000 input tokens just in screenshots. On Sonnet 4.6 at $3/M, that's $0.09 in vision overhead per task — which compounds quickly at scale. Strategies to manage this: compress screenshots before returning them, use lower-resolution displays, and design workflows that minimize the number of screenshots needed before acting.

Latency is also a meaningful concern for computer-use agents. Each action-observe cycle involves a full API round trip: action → screenshot → API call → action. Even at low latency, a 30-step computer-use workflow takes 60–90 seconds. Plan your user experience accordingly, use streaming to show progress, and consider whether a structured tool use approach (calling a purpose-built API) is more appropriate than computer-use before defaulting to the latter.


Tool result tokens: sizing and ITPM impact

When Claude calls a tool and you return the result in a tool_result message, those result tokens are counted as input tokens in the subsequent API call. This is one of the most important — and most overlooked — interactions between tool use and Anthropic's rate limits. **If your agent runs at 100,000 ITPM and your tool results average 5,000 tokens each, a single 3-tool parallel call consumes 15,000 of those tokens before Claude even starts reasoning.**

The hard limit on tool result size is approximately 100,000 tokens per tool result content block. This is effectively the same as Claude's maximum context window, which makes sense — the result must fit in context. In practice, very few legitimate tool results should exceed a few thousand tokens. If your tool is returning 100,000 tokens, you almost certainly should be returning a summary, a reference, or a paginated subset instead.

**The most common mistake is returning raw API responses as tool results.** A REST API call might return 50KB of JSON with deeply nested metadata, most of which Claude doesn't need. Truncating that response to the 10–20 most relevant fields — before returning it from your tool — can reduce tool result token size by 90%, dramatically lowering both cost and ITPM pressure. Build this truncation into your tool implementation, not into Claude's reasoning loop.

The 'summarize before returning' pattern takes this further: for tools that query large data sources (databases, document stores, log systems), have your tool implementation call a lightweight summarization step before returning. Instead of returning 500 raw log lines, return 'Database query returned 847 rows. Top 5 by revenue: [...]'. This is especially valuable for iterative agent loops where Claude needs to make decisions based on large dataset outputs — the summary gives Claude what it needs to reason about the next action without inflating the context window.

Caching tool results is possible when the underlying data is stable. If a tool result won't change between turns (a static configuration lookup, a rarely-updated product catalog), you can mark the tool_result content block with cache_control: { type: "ephemeral" } to make it eligible for caching. Subsequent turns that see the same cached block get the 90% cache discount on those tokens. However, most tool results are dynamic and won't benefit from caching — the pattern is most valuable for large static context that you inject via tools rather than the system prompt.

For ITPM management in high-throughput agents, instrument each turn to log both input_tokens and cache_read_input_tokens from the usage object in the response. The ratio of cache_read to total input tokens tells you how efficiently you're using the cache. A well-optimized agent processing a stable tool set should see cache_read fractions above 60–70%. If you're seeing near-zero cache reads, your tool definitions or system prompt are changing between calls in ways that invalidate the cache.


Caching tool definitions to save ITPM and cost

Prompt caching is the highest-impact optimization for tool use agents with stable tool sets. By placing your tool definitions in the cacheable prefix of the request, you pay the cached token rate (approximately 90% cheaper than fresh input) on all subsequent calls that use the same tool set. On Sonnet 4.6, this means tool definition reads cost $0.30/M instead of $3.00/M — a 10x reduction on what is often the largest per-call overhead item.

**The mechanics**: to make tool definitions cacheable, you add a cache_control marker to the tools array in your API request. Specifically, the last tool definition in your array gets cache_control: { type: "ephemeral" }, which signals to Anthropic's infrastructure that everything up to and including that point should be cached as a prefix. All subsequent requests with the same tool set (in the same order, with the same content) will get a cache hit and pay the lower rate.

Worked cost math for a production system: 20 tools × 300 tokens per definition = 6,000 tokens of tool overhead. Uncached on Sonnet 4.6 at $3.00/M: $0.018 per call. Cached at $0.30/M: $0.0018 per call. At 10,000 calls/day, uncached costs $180/day versus $18/day cached — a **$162/day savings, or $4,860/month, from a single configuration change.** The cache write incurs a one-time cost of approximately 25% above the standard input rate, but this cost is amortized across all subsequent cache reads.

Cache hit conditions: the tool definitions must be byte-for-byte identical across requests, in the same order, with the same cache_control markers in the same positions. Any change to a tool's name, description, or schema — or a reordering of tools in the array — invalidates the cache and triggers a cache write. This means you should sort your tool definitions deterministically (alphabetically by name is a common convention) and avoid any dynamic string interpolation in tool descriptions.

For agents that do need to vary their tool set (dynamic tool injection), consider a two-tier approach: put a stable core set of 5–10 always-present tools in the cacheable prefix, and inject variable tools after the cache_control marker. The core tools get the cached rate; the injected tools are billed at the full input rate. This hybrid approach captures most of the caching benefit even for agents with variable tool contexts.

**Tool caching interacts directly with ITPM limits.** Cached token reads (cache_read_input_tokens) are exempt from ITPM counting. This means that at high request volumes, caching your tool definitions not only reduces your dollar cost — it also frees up ITPM headroom that you'd otherwise spend on repetitive tool definition ingestion. For agents approaching their ITPM ceiling, aggressive caching can meaningfully increase effective throughput without a tier upgrade.


Comparing tool use across Opus 4.7, Sonnet 4.6, and Haiku 4.5

All three current Claude models support tool use, but they differ in behavior, capability, and cost in ways that matter for production agent design. The right model choice depends on the complexity of your tool selection logic, the precision required in parameter generation, and the cost sensitivity of your workload.

**Claude Opus 4.7** ($15/M input, $75/M output) is the most capable model for complex tool use scenarios: tasks requiring multi-step reasoning about which tools to combine, edge cases in parameter generation, or high-stakes decisions about tool sequencing. Opus is more conservative in parallel fan-out — it tends to make one careful tool call, analyze the result, then decide on the next action rather than fanning out optimistically. This conservative behavior is actually advantageous for tasks where incorrect tool calls have side effects (writing to a database, sending an email), but it increases latency and cost for tasks where aggressive parallelism would be correct.

**Claude Sonnet 4.6** ($3/M input, $15/M output) is the recommended model for the vast majority of production tool use agents. It has strong tool selection accuracy, supports parallel fan-out aggressively, and handles complex JSON schemas reliably. **Sonnet 4.6 delivers approximately 80% of Opus 4.7's tool use quality at 20% of the input cost** — the ROI calculation strongly favors Sonnet for any high-frequency agent workload. Sonnet also supports computer_use and has full compatibility with all tool_choice options.

**Claude Haiku 4.5** (lower cost tier) is well-suited to high-volume tool classification tasks: routing user inputs to the right tool, performing simple named-entity extraction into a structured schema, or making binary tool-use decisions (call tool A or tool B based on message type). Haiku is significantly faster and cheaper than Sonnet for these narrow, well-defined tool call patterns. The tradeoff is that Haiku struggles with complex multi-tool orchestration and may produce less accurate parameters for tools with elaborate schemas.

For cost-optimized multi-agent architectures, a common pattern is to use Haiku as a routing layer (classify the request, select the relevant tool subset, call the simple tools) and invoke Sonnet only when the task requires complex reasoning or multiple coordinated tool calls. This model-mixing pattern can reduce overall agent cost by 40–60% compared to using Sonnet for every step.

One model-specific caveat: computer_use is NOT supported on Haiku 4.5. If you're building a computer-use agent and want to use Haiku for cost optimization on intermediate steps, you'll need to structure your architecture so the computer_use tool calls go through Sonnet 4.6 or Opus 4.7, while Haiku handles upstream routing or downstream summarization. The Anthropic documentation at https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview includes model compatibility notes for each tool type.

Deploying a Claude tool use agent in production

  1. 1

    Design your tool schema for token efficiency

    Write tool descriptions that explain what the tool does and when to use it in under 200 tokens each. Use enum for any parameter with a constrained set of valid values — this reduces token consumption and improves Claude's parameter accuracy. Avoid padding descriptions with examples unless the tool behavior is genuinely non-obvious. For a 10-tool agent, the difference between 400-token and 150-token average descriptions is 2,500 tokens of input overhead per call — $0.0075 on Sonnet 4.6, but $375/day at 50k calls. Validate all schemas against JSON Schema draft-07 before deploying; use ajv (Node.js) or jsonschema (Python) in your CI pipeline.

  2. 2

    Cache your tool definitions in the prefix

    Place your full tool definitions array in the cacheable prefix of every request. Add cache_control: { type: "ephemeral" } to the last tool in the array. Sort your tools alphabetically by name to ensure byte-for-byte consistency across calls. Test that your cache hit rate (cache_read_input_tokens / total_input_tokens) reaches above 70% after the first call in a session. Monitor this ratio in your observability stack and alert if it drops — a sudden drop usually means something changed in your tool definitions, invalidating the cache.

  3. 3

    Set tool_choice appropriately for your use case

    Use tool_choice: auto for general-purpose assistants where Claude should decide whether a tool is needed. Use tool_choice: any for extraction pipelines, structured output workflows, and any flow where you know a tool call is always warranted — this prevents Claude from answering directly from parametric knowledge instead of calling your data source. Use tool_choice: { type: "tool", name: "your_tool" } for deterministic extraction flows where you always want a specific output schema, or during testing when you want to verify a specific tool's parameter generation in isolation.

  4. 4

    Handle parallel tool calls with concurrent execution

    When Claude returns multiple tool_use blocks in a single response, execute all of them concurrently. In Node.js, use Promise.all([executeTool(call1), executeTool(call2), executeTool(call3)]). In Python, use asyncio.gather(). Collecting all results before returning the tool_result message is required — you cannot return partial results mid-turn. Also ensure your server can safely execute the same tool multiple times concurrently (thread safety, connection pool sizing). Document in your agent design which tools are safe for parallel execution and which require serialization (e.g., tools that mutate shared state).

  5. 5

    Monitor ITPM usage from tool results on every turn

    Log usage.input_tokens, usage.cache_read_input_tokens, and usage.output_tokens from every API response. Set an alert when any single turn's tool results exceed 10,000 tokens — this is a signal that a tool is returning over-large results that should be truncated. Track your ITPM utilization as a rolling 60-second window; if it regularly exceeds 80% of your tier's limit, you have three levers: increase caching, reduce tool result sizes, or upgrade to a higher ITPM tier. The Anthropic rate limits documentation at https://docs.anthropic.com/en/docs/about-claude/pricing details tier-by-tier ITPM ceilings.

Frequently Asked Questions

How many tools can you pass to Claude at once?

Claude supports a hard limit of 64 tools per API request as of June 2026. However, the practical constraint for most agents isn't the 64-tool ceiling — it's the token cost of tool definitions. Each tool definition is billed as input tokens (at $3/M for Sonnet 4.6, $15/M for Opus 4.7), so a full 64-tool request at 400 tokens per definition adds 25,600 tokens of overhead per call. Most production systems use 5–20 tools and apply dynamic tool injection to stay lean. If you're genuinely close to the 64-tool ceiling, consider tool namespacing: group related operations under a single dispatcher tool that routes internally.

Do tool definitions count toward Claude's ITPM limit?

Yes. Tool definitions are billed as input tokens and count toward your tokens-per-minute (ITPM) limit on every call where they appear uncached. However, cached tool definition reads (reported as cache_read_input_tokens in the API response) are exempt from ITPM counting. This means that caching your tool definitions — by placing them in the cacheable prefix with a cache_control marker — both reduces your dollar cost by ~90% and frees up ITPM headroom. For high-throughput agents approaching their ITPM ceiling, aggressive tool definition caching can meaningfully increase effective throughput without requiring a tier upgrade. See https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview for caching implementation details.

Can Claude call multiple tools in one response?

Yes — Claude supports parallel tool calling. In a single API response, Claude can return multiple tool_use content blocks, each requesting a different tool call. Your server should execute these concurrently (using Promise.all or asyncio.gather), collect all results, and return them in a single user turn before Claude continues. This parallel pattern can collapse a multi-step agent loop into far fewer API round trips, reducing both latency and cost. The tool_choice: auto setting lets Claude decide how many tools to fan out; setting tool_choice: any guarantees at least one tool call per response but does not limit parallel calls. Claude Sonnet 4.6 tends to fan out more aggressively than Opus 4.7.

What JSON schema features does Claude tool use support?

Claude tool use supports a subset of JSON Schema draft-07: type, properties, required, enum, array (with items), nested objects, and numeric/string constraints like minimum, maximum, minLength, maxLength. What is NOT supported: $ref (schema references), recursive schemas, and complex anyOf/allOf/oneOf compositions at the top level. Attempting to use $ref will either cause an API error or be silently ignored. Workarounds: inline referenced definitions rather than using $ref; use type: ["string", "null"] for nullable fields instead of anyOf; describe polymorphic inputs in the description field rather than attempting oneOf schemas. Validate schemas with ajv or jsonschema in your CI pipeline before deploying.

How do I use the computer_use tool with Claude?

The computer_use beta requires the header X-Anthropic-Beta: computer-use-2024-10-22 on your API request. It is only supported on Claude Sonnet 4.6 and Claude Opus 4.7 — Haiku 4.5 does not support computer_use. The beta provides three tools: computer (screenshots + mouse/keyboard control), text_editor (file viewing and editing), and bash (shell command execution). Key cost consideration: each screenshot is billed as image input tokens (~1,000–1,500 tokens for a 1366×768 display). A 20-screenshot agent task costs 20,000–30,000 tokens just in screenshot overhead. Use lower-resolution displays and design workflows that minimize unnecessary screenshot-observe cycles. See https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview for full implementation guidance.

What is the tool result size limit?

Tool results can be up to approximately 100,000 tokens per content block — effectively the full context window size. However, large tool results are a major cost and ITPM driver, since they count as input tokens on the next API call. Best practice is to truncate tool results to the minimum information Claude needs to continue reasoning. A database query result should return the relevant rows with key columns, not full schema metadata. An API response should be filtered to the 10–20 most relevant fields. The 'summarize before returning' pattern — where your tool implementation performs a lightweight summarization step before returning — is particularly effective for tools that query large data sources. Target tool result sizes under 2,000 tokens for routine operations.

Can I cache tool definitions to save money?

Yes — and it's one of the highest-ROI optimizations for tool use agents. Place your tool definitions in the cacheable prefix block of your request and add cache_control: { type: "ephemeral" } to the last tool in the array. Subsequent requests with the same tool set pay approximately 90% less for those tokens: $0.30/M instead of $3.00/M on Sonnet 4.6. A practical example: 20 tools × 300 tokens = 6,000 tokens. Uncached: $0.018/call. Cached: $0.0018/call. At 10,000 calls/day that's $162/day in savings. Cache hits are also exempt from ITPM counting, giving you additional throughput headroom. The cache requires byte-for-byte identical tool definitions — sort tools alphabetically by name to ensure consistency.

How does tool_choice: any differ from tool_choice: auto?

tool_choice: auto (the default) lets Claude decide whether to use any tools at all in its response. Claude may return a plain text answer if it judges that no tool call is necessary — which is often correct for conversational exchanges, but wrong for extraction and data retrieval workflows. tool_choice: any forces Claude to make at least one tool call in every response. It cannot answer with plain text alone. This is the right setting for structured output pipelines, extraction tasks, and any scenario where you need a guaranteed tool invocation every turn. tool_choice: { type: "tool", name: "specific_tool" } goes further, forcing Claude to call exactly the named tool — useful for deterministic extraction workflows and testing individual tool behavior in isolation.

Build smarter tool use agents from day one.

Our AI Prompt Generator writes tool-use-ready system prompts for Claude — with cache-anchored tool definitions, concise descriptions, and parallel-call patterns baked in. 14-day free trial, no card.

Browse all prompt tools →