Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Agent Loop Cost: Claude Opus 4.7 vs GPT-5.5 vs GPT-5.4 (2026)

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

Agent loops cost more than single-shot LLM calls — often 5-10x more — because each subsequent turn in the loop replays the full conversation history plus the new tool outputs, ballooning the input token count with every step. A 5-turn agent on GPT-5.5 that starts at 3,000 input tokens might finish turn 5 with 12,000 input tokens accumulated across the loop, while emitting only 300 output tokens per turn. That shape — heavy input growth, modest output per turn — is the cost profile you need to model before choosing a model. See our multi-agent cost per task calculator for the orchestrator-plus-workers variant.

The four models we compare — Claude Opus 4.7 ($15/$75 per 1M input/output), Claude Sonnet 4.6 ($3/$15), GPT-5.5 ($5/$25), and GPT-5.4 ($2.50/$15) — span a 6x range on input price and a 5x range on output price. At a single call that delta is small; run 100,000 agent queries per month and it becomes the dominant line item in your infrastructure budget. Pricing sourced from Anthropic pricing docs and OpenAI pricing, fetched June 2026.

Below: the canonical agent loop cost formula, a full per-turn breakdown for each model, the impact of prompt caching on the stable prefix, worked scenarios at 1k, 10k, and 100k queries/month, and the 10-question FAQ production teams ask before pinning a model for their agent. For the full model-vs-model comparison without the agent loop focus, see GPT-5 vs Claude Opus 4.7. To estimate your own spend, our OpenAI API cost calculator and Claude API cost calculator accept custom token profiles.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

5-turn agent loop total cost per query — 4 models, June 2026

Feature
Claude Opus 4.7
Claude Sonnet 4.6
GPT-5.5
GPT-5.4
Input price ($/1M)$15.00$3.00$5.00$2.50
Output price ($/1M)$75.00$15.00$25.00$15.00
Cache read price ($/1M)$1.50$0.30$0.50$0.25
Turn 1 total tokens (3K in / 300 out)$0.0675$0.0135$0.0225$0.0105
Turn 2 (5K in / 300 out)$0.0975$0.0195$0.0325$0.0170
Turn 3 (7.5K in / 300 out)$0.135$0.027$0.045$0.023
Turn 4 (10K in / 300 out)$0.1725$0.0345$0.0575$0.0300
Turn 5 (12.5K in / 300 out)$0.210$0.042$0.0700$0.036
Total uncached (5 turns)$0.683$0.137$0.228$0.116
Total cached prefix (2K system)$0.541$0.108$0.203$0.103
Queries/month at $500 budget7323,6502,1934,310
Monthly cost at 10K queries$6,830$1,370$2,280$1,160

Sources, fetched 2026-06-21: Anthropic pricing (https://docs.anthropic.com/en/docs/about-claude/pricing), OpenAI pricing (https://openai.com/api/pricing/). Token counts modeled on a ReAct-style agent with 2K stable system prompt + tool definitions, tool results averaging 500 tokens per turn, and 300 output tokens per turn. Cache discount applied to the 2K stable prefix only (Anthropic 90% off = $1.50/1M; OpenAI 50% off = $0.50/1M for GPT-5.5, $0.25/1M for GPT-5.4). Cache write cost omitted from cached total for readability; in production add cache write at 1.25x input rate per Anthropic and OpenAI docs.

The agent loop cost formula

**The canonical formula for a single agent loop with N turns is:** `total_cost = Σ_t [ (cached_input_t / 1M) × cache_rate + (new_input_t / 1M) × input_rate + (output_t / 1M) × output_rate + (tool_result_t / 1M) × input_rate ]`. Every turn t has four priced components: cached tokens from the stable prefix, new context tokens added since the previous turn, model output tokens (the assistant's reply), and tool result tokens fed back in the following user message.

**Why tool result tokens matter.** In a ReAct-style agent, each tool call returns a result that becomes part of the next turn's input. A `search()` call returning 800 tokens of context is billed as input on the turn that receives it — just like prompt tokens. In a 5-turn loop with 3 tool calls returning ~600 tokens each, tool results add 1,800 tokens of billable input across the loop. At $15/1M (Opus) that is $0.027 — small per query, but at 100K queries/month that is $2,700/month from tool results alone.

**Context accumulation is the key driver of cost growth.** Turn 1 starts at your system prompt + the user query. Turn 2 replays turns 1-1's full output, the tool call, the tool result, plus whatever the model added. By turn 5, a loop that started at 3,000 tokens might have 12,000-15,000 tokens of accumulated context. **The cost is not linear in number of turns — it's roughly quadratic for standard token replay.** Each additional turn adds both its own tokens and the accumulated tokens of all prior turns.

**Windowed context (summarization) changes the shape.** If your agent framework summarizes prior turns instead of replaying them verbatim, the context growth flattens. A 200-token summary replacing a 2,000-token prior turn saves 1,800 input tokens on every subsequent turn. In a 5-turn loop at $15/1M input (Opus), that saves $0.027 per summarization event — across 100K queries, $2,700/month. Many production agent frameworks (LangGraph, CrewAI) support configurable summarization; enable it when token cost is a constraint.

**Output tokens are the minority of per-turn cost in most agents.** A ReAct agent typically emits 100-400 output tokens per turn (the reasoning step + the tool call specification). The input bill — from context replay plus tool results — usually accounts for 70-85% of total loop cost. This means input optimization (caching, summarization, context pruning) is higher-EV than output optimization for agent workloads. Source: Anthropic agents and tools overview.

**The formula is per-query, not per-turn.** When you see cost benchmarks for agent workloads, always clarify whether they cite per-turn cost or per-query cost. A 5-turn loop at $0.02/turn (Sonnet 4.6) is $0.10/query — a 5x difference from a naive single-turn reading. The per-query figure is what maps to your production budget.


Per-model cost breakdown: Opus 4.7 at full and cached rates

**Claude Opus 4.7 at $15/$75 per 1M input/output** is the most expensive of the four models by 3-6x depending on the comparison. For our reference 5-turn loop (2K stable system + tool defs, 500-token tool result per turn, 300 output per turn, context growing from 3K to 12.5K across turns 1-5), the uncached total is **$0.683 per query**. At 10,000 queries/month, that is $6,830/month — a line item that requires the per-call quality gain to be material to justify.

**With Anthropic's 90% cache-read discount on the 2K stable prefix**, the effective input on cached tokens drops from $15/1M to $1.50/1M. Across 5 turns, caching the prefix saves approximately 5 × 2K × ($15 − $1.50) / 1M = $0.135. Total cached cost: **$0.548 per query** (subtract cache write cost which is $0.0025 at 1.25x). At 10K queries/month, $5,480 — still the most expensive option, but the 90% cache discount is the most aggressive of any major provider.

**The SWE-bench edge is real but narrow.** Opus 4.7 scores ~76% on SWE-bench Verified vs GPT-5.5's ~74%. In an agent loop with 5 turns, if each turn has a 76% vs 74% correctness rate, the probability of completing the task without a failed turn is approximately 0.76^5 = 0.253 for Opus vs 0.74^5 = 0.221 for GPT-5.5. That is a 14% relative improvement in full-loop success rate — not trivial, but whether it is worth $0.455/query more depends entirely on the cost of a failed query in your use case.

**When Opus justifies its price.** High-stakes agent workflows where a failed run requires human escalation (support tickets, contract review, medical coding), highly cacheable prompts where the 90% discount closes the gap to roughly 2x vs GPT-5.5, and teams with existing Anthropic integrations where the migration cost of switching is real. Source: Anthropic pricing.

**When Opus does not justify its price.** High-volume agents on well-defined tasks (data extraction, classification loops, structured generation) where GPT-5.4 or Sonnet 4.6 quality is indistinguishable, and any workflow where context is uncacheable (unique per-query system prompts, dynamic tool definitions). At $6,830/month vs $1,160/month (GPT-5.4) for 10K queries, the $5,670 monthly delta needs a measurable business outcome to justify.


Per-model cost breakdown: Sonnet 4.6 and the mid-tier case

**Claude Sonnet 4.6 at $3/$15 per 1M input/output** is the most cost-effective Claude model for production agent workloads that don't require Opus-level reasoning. For our reference 5-turn loop, the uncached total is **$0.137 per query** — about 5x cheaper than Opus. With the 90% cache discount on the 2K prefix, total drops to roughly **$0.110 per query**.

**Sonnet 4.6 vs GPT-5.5: the $3 vs $5 input price matters at scale.** At 100K queries/month on our reference loop, uncached Sonnet 4.6 costs $13,700 vs GPT-5.5 at $22,800 — a $9,100/month difference. With 80% cache hit on both: Sonnet ≈ $11,000 vs GPT-5.5 ≈ $20,300. Sonnet 4.6 beats GPT-5.5 on price and beats it on cache discount depth (Anthropic's 90% vs OpenAI's 50%), making it the default for most mid-tier production agents.

**Sonnet 4.6 vs GPT-5.4: pricing is nearly identical.** GPT-5.4 is $2.50/$15 vs Sonnet 4.6's $3/$15. Input is 20% cheaper on GPT-5.4; output is identical. For output-heavy workloads the two are at parity. For input-heavy agent loops (which have high context accumulation), GPT-5.4 saves roughly 17% on total cost. The decision comes down to which API ecosystem your team is already standardized on.

**Quality at Sonnet/GPT-5.4 tier.** SWE-bench Verified places Sonnet 4.6 at approximately 65-68% and GPT-5.4 at approximately 62-65%. Both are well below the Opus/GPT-5.5 tier, but for structured agent tasks (research pipelines, data transformation, code scaffolding) the gap to the flagship tier often doesn't matter in practice — the structure of the task, the quality of the tool definitions, and the prompt design determine more of the output quality than the model tier does.

**Recommended pattern:** use Sonnet 4.6 as the default agent model, route escalation steps to Opus 4.7 via an explicit router when confidence is low or the task type requires it. This hybrid brings per-query cost to roughly $0.12-0.15 at 80% Sonnet / 20% Opus mix — about 80% cheaper than all-Opus while preserving quality on hard paths. Source: LangGraph multi-agent concepts.


GPT-5.5 agent loop: the OpenAI cost profile

**GPT-5.5 at $5/$25 per 1M input/output** runs our reference 5-turn loop at **$0.228 per query uncached** and approximately **$0.203 per query** with 50% cache discount on the 2K stable prefix. At 10K queries/month: $2,280 uncached, $2,030 cached. It sits between Sonnet 4.6 and Opus 4.7 in both price and quality.

**OpenAI's prompt cache is automatic and prefix-only.** You do not set explicit cache breakpoints — the API caches the prompt prefix automatically when it detects a repeated pattern. The advantage is simplicity; the disadvantage is less control. You cannot cache an arbitrary middle section of your context; only the literal start of the message array caches. For agent loops where the system prompt + tool definitions are a stable prefix (which they usually are), the automatic cache fires well. For dynamic system prompts, it doesn't fire at all.

**GPT-5.5's strict JSON mode is a real agent advantage.** Tool results that need to conform to a schema are guaranteed valid JSON with `response_format: {type: 'json_schema', strict: true}` — zero post-call validation failures, no retry loop needed. On an agent loop where each turn includes a structured tool call, eliminating retry loops (which add entire extra turns and thus double the per-query cost on failure) is worth quantifying. If your baseline failure rate without strict mode is 5%, strict mode saves roughly 0.05 × $0.228 = $0.011 per query in expected retry costs.

**GPT-5.5 400K context window matters for long-horizon agents.** A research agent that accumulates 50+ documents in context across a 10-turn loop might need 150K-200K tokens in context by the final turn. GPT-5.5 handles this natively; Opus 4.7 maxes out at 200K and Sonnet 4.6 at 200K as well. For ultra-long-horizon agents, GPT-5.5 is the only option in the four-model comparison. Source: OpenAI platform docs.

**The GPT-5.5 sweet spot for agents.** High-volume agents at 100K+ queries/month where Opus is too expensive but you want the 400K context window, strict JSON mode, and the OpenAI ecosystem (Assistants API, code interpreter, retrieval). If you're already on the OpenAI platform and your agent query profile matches, GPT-5.5 is the natural default — pay 20% more than Sonnet 4.6 for tighter structured output guarantees.


GPT-5.4 agent loop: the budget flagship

**GPT-5.4 at $2.50/$15 per 1M input/output** runs our 5-turn reference loop at **$0.116 per query uncached** and approximately **$0.103 per query** with the 50% cache discount. This makes it the cheapest of the four models for agent use — slightly cheaper than Sonnet 4.6 on input-heavy loops, identical on output. At 100K queries/month, GPT-5.4 uncached costs $11,600 vs Sonnet 4.6's $13,700.

**GPT-5.4 quality for agents.** On SWE-bench Verified, GPT-5.4 scores approximately 62-65% — lower than the flagship tier by 10-12 points. For well-defined agent tasks with tight tool definitions and clear success criteria (data extraction pipelines, structured report generation, classification chains), that quality gap rarely surfaces as a user-visible failure. For open-ended research or complex multi-step reasoning, the gap matters and you'll see higher retry rates.

**The GPT-5.4 hidden cost: retry amplification.** If GPT-5.4 fails on 8% of agent turns vs GPT-5.5's 3%, and each failure adds one full retry turn, the effective cost per query rises by 0.08 × (average turn cost) on top of the base cost. For a loop where average turn cost is $0.023, that adds $0.0018/query — small but non-zero. At 100K queries/month, $1,800/month in retry overhead that partially offsets the lower base price. Always measure failure rate empirically before assuming GPT-5.4 is cheaper end-to-end.

**GPT-5.4 wins when.** The task is well-scoped and repetitive — classification loops, data extraction agents, structured summarization pipelines, internal tooling. The context is short (under 20K tokens total across the loop) so context accumulation doesn't run away. You're already on the OpenAI platform and want to minimize switching costs. Volume is 50K+ queries/month where even the $0.013/query savings vs Sonnet 4.6 compounds to $1,300/month.

**GPT-5.4 vs Sonnet 4.6: the real decision.** For most production agents, these two models are a coin flip on cost and quality. The differentiator is ecosystem: if you're on Anthropic, Sonnet 4.6 is the default mid-tier choice; if you're on OpenAI, GPT-5.4 is. Don't over-engineer the choice. Run a 30-task blind eval on your specific workload, pick whichever wins, and revisit quarterly. See OpenAI function calling docs and Anthropic tool use overview for the API-level comparison.


Prompt caching impact on agent loops

**Prompt caching is the single highest-EV optimization for agent loop cost.** In a 5-turn loop, the stable system prompt + tool definitions prefix appears in every turn's input — 5 times for every query. Without caching, you pay full input price for that prefix 5 times. With caching, you pay it once (at write time) and 4 times at the cache read rate. **For Anthropic's 90% cache discount on a 2K-token prefix at $15/1M, the saving across 5 turns is: 4 × 2K × ($15 − $1.50) / 1M = $0.108 per query.** That is 16% off the uncached Opus total for zero code complexity.

**Anthropic's explicit cache breakpoints give you more control.** Claude requires you to mark the cacheable prefix with `cache_control: {type: 'ephemeral'}`. This means you can cache any contiguous prefix — system prompt only, system prompt + tool definitions, system prompt + tools + a stable few-shot example. The explicit control also means you know exactly what is cached and can structure the prompt to maximize cache hit rate. Source: Anthropic tool use and caching docs.

**OpenAI's automatic caching is simpler but less controllable.** The cache fires on prefix matches without any code change. For agent loops where the system prompt and tool definitions are truly static (the most common case), the automatic cache fires reliably. For loops where anything in the prefix varies per query (user context embedded in system prompt, dynamic tool configurations), the cache doesn't fire and you pay full price. Fix: move all dynamic context to user messages, keep the system prompt and tool definitions truly static.

**Cache write cost.** Anthropic charges 1.25x the input rate for a 5-minute cache write (2x for 1-hour TTL). OpenAI's caching is automatic — no write premium, but you also have no TTL control. Anthropic's 1-hour TTL cache write at 2x input on a 2K-token prefix costs: 2K × $30/1M = $0.060 — you need 8+ reads within the hour to break even vs uncached. For production agents running 10+ queries per hour, the 1-hour TTL is almost always positive EV.

**Context accumulation undermines caching past turn 2.** The system prefix caches cleanly. But turn 2's input includes turn 1's output — which is dynamic. The dynamic portion after the stable prefix cannot be cached. This means caching savings are front-loaded: the 2K prefix saves on every turn, but turns 3+ have large uncacheable tails (accumulated tool results, prior reasoning). The net effect: caching helps more on short (2-3 turn) loops than on long (7-10 turn) loops, because long loops have larger uncacheable tails relative to the stable prefix. For long-horizon agents, context summarization has higher EV than caching past the prefix.


Worked scenario: 10,000 queries/month on each model

**Scenario parameters.** 10,000 agent queries/month. 5-turn ReAct loop per query. 2K stable system prompt + tool definitions (cacheable). Tool results ~500 tokens per turn. 300 output tokens per turn. Context grows as: turn 1 = 3K in, turn 2 = 5K in, turn 3 = 7.5K in, turn 4 = 10K in, turn 5 = 12.5K in. Total input per query across all 5 turns = 38K tokens. Total output per query = 1.5K tokens.

**Claude Opus 4.7 uncached:** 10K × (38K × $15/1M + 1.5K × $75/1M) = 10K × ($0.570 + $0.1125) = 10K × $0.6825 = **$6,825/month**. With 80% cache hit on the 2K prefix per turn (5 turns × 2K × 0.8 = 8K cached tokens vs uncached): savings = 10K × 8K × ($15 − $1.50) / 1M = 10K × 0.008 × $13.50 = $1,080. **Cached Opus total: $5,745/month.**

**Claude Sonnet 4.6 uncached:** 10K × (38K × $3/1M + 1.5K × $15/1M) = 10K × ($0.114 + $0.0225) = 10K × $0.1365 = **$1,365/month**. Cached savings = 10K × 8K × ($3 − $0.30) / 1M = $216. **Cached Sonnet total: $1,149/month.**

**GPT-5.5 uncached:** 10K × (38K × $5/1M + 1.5K × $25/1M) = 10K × ($0.190 + $0.0375) = 10K × $0.2275 = **$2,275/month**. Cached savings = 10K × 8K × ($5 − $0.50) / 1M = $360. **Cached GPT-5.5 total: $1,915/month.**

**GPT-5.4 uncached:** 10K × (38K × $2.50/1M + 1.5K × $15/1M) = 10K × ($0.095 + $0.0225) = 10K × $0.1175 = **$1,175/month**. Cached savings = 10K × 8K × ($2.50 − $0.25) / 1M = $180. **Cached GPT-5.4 total: $995/month.**

**Summary at 10K queries/month (cached):** GPT-5.4 $995 < Sonnet 4.6 $1,149 < GPT-5.5 $1,915 < Opus 4.7 $5,745. The Opus premium at this volume is $4,750/month vs GPT-5.4. That is $57,000/year — the cost of a full-time engineer — as the price of the per-call quality edge. Quantify the business value of Opus's quality advantage before committing to that cost.


Scaling to 100,000 queries/month: where model choice becomes critical

**At 100K queries/month**, the monthly cost differences multiply by 10 from the 10K scenario: Claude Opus 4.7 cached ≈ $57,450/month; Claude Sonnet 4.6 cached ≈ $11,490/month; GPT-5.5 cached ≈ $19,150/month; GPT-5.4 cached ≈ $9,950/month. **The Opus-to-GPT-5.4 gap is $47,500/month — $570,000/year.**

**At this volume, hybrid routing has the highest EV of any optimization.** If 80% of agent queries are routine (extraction, classification, structured generation) and 20% require complex reasoning, routing 80% to GPT-5.4/Sonnet and 20% to Opus gives a blended per-query cost of: 0.8 × $0.1175 + 0.2 × $0.6825 = $0.094 + $0.137 = $0.231 uncached per query. At 100K queries/month: $23,100 — vs $68,250 all-Opus. A **66% cost reduction with no quality loss on the 80% of routine queries.** The router itself can be a cheap model (GPT-5.4-mini or Haiku) that classifies query complexity for a fraction of a cent.

**Context pruning becomes mandatory at scale.** At $0.6825/query uncached on Opus, if you can cut average input tokens from 38K to 25K (by summarizing earlier turns instead of replaying verbatim), you save: 100K × 13K × $15/1M = $19,500/month. Context pruning at scale is worth more than model switching for Opus workloads. LangGraph's persistence and memory module supports configurable context management with sliding windows and summarization.

**Output length caps matter less than you'd expect for agent loops.** Because each turn emits only 100-400 output tokens (the reasoning step is concise by design in well-built agents), output cost is typically 15-25% of total loop cost. Input optimization — caching, pruning, summarization — is 4-6x higher EV than output optimization for agents. This is the opposite of single-shot chat workloads, where output dominates.

**Budget modeling at 100K queries/month.** Establish three budgets: (1) worst-case (all Opus, no cache) = $68,250; (2) realistic (Sonnet with cache, hybrid routing for complex queries) = $14,000-$17,000; (3) optimized (GPT-5.4 or Sonnet with full cache + context pruning + routing) = $8,000-$10,000. The gap between worst-case and optimized is $58,000+/month — engineering the stack to hit the optimized budget is worth 2-3 weeks of a senior engineer's time. Use our multi-agent cost per task calculator to model the orchestrator overhead separately.


Tool call overhead in the agent loop

**Tool definitions add to every turn's input.** A well-defined tool schema (name, description, parameters with type annotations) is typically 150-300 tokens per tool. A loop with 5 tools defined adds 750-1,500 tokens to every single turn's input — across a 5-turn loop and 100K queries/month, that is 100K × 5 turns × 1,125 tokens (midpoint) = 562.5B tokens/month from tool definitions alone. At $3/1M (Sonnet): $1,687/month just for the tool schema overhead. **Cache your tool definitions** — they're the most stable part of your prompt and the easiest win.

**Tool call arguments are billed as output.** When the model emits a tool call like `{"name": "search", "arguments": {"query": "...", "limit": 10}}`, the JSON is output tokens. A typical tool call argument block is 30-100 tokens. At $15/1M output (Sonnet) for 3 tool calls per query at 65 tokens each: $15 × 0.000195 = $0.003/query just in tool-call argument output. Not huge per-query, but at 100K queries: $300/month from tool-call JSON overhead.

**Parallel tool calls reduce turns but not tokens.** If the model calls 3 tools in parallel in a single turn (one model output, three tool results fed back in the next user message), you save the overhead of 2 extra turns (input replay cost of 2 full context replays) while paying the same tool argument and result token costs. **For long loops with many tool calls, parallel tool execution can cut your total turn count and thus your input accumulation cost by 30-50%.** Both Claude and GPT-5.5/5.4 support parallel tool calls. Source: OpenAI function calling docs, Anthropic tool use overview.

**Tool result size is the variable you have the most control over.** A search tool that returns 3,000 tokens of raw document context per call adds 3,000 tokens to the next turn's input — and to every subsequent turn's accumulated context. Truncate tool results to the minimum useful size. Return summaries, not raw content. A search tool that returns 200-token summaries instead of 2,000-token documents saves 1,800 tokens × number of search calls × number of subsequent turns × input price. At 5 searches in a loop with 3 subsequent turns each on Opus: 5 × 1,800 × 3 × $15/1M = $0.405/query. At 10K queries/month, $4,050/month from tool result size alone. See our tool use overhead cost calculator for the full breakdown.

**The compound effect.** Context accumulation, tool result size, tool definition overhead, and parallel tool execution each contribute 5-30% to total loop cost. Optimizing all four compounds multiplicatively: a loop that costs $0.683/query (Opus uncached) can be brought to $0.30-$0.35/query through caching + result truncation + parallel tools + context summarization — a 50% cost reduction without changing the model. Benchmark your baseline cost first, then apply optimizations in order of EV.


Sourcing and methodology

**Model pricing sourced June 2026.** Claude Opus 4.7: $15/$75 per 1M input/output, cache read $1.50/1M from Anthropic pricing. Claude Sonnet 4.6: $3/$15, cache read $0.30/1M from same. GPT-5.5: $5/$25, cache read ~$0.50/1M from OpenAI pricing. GPT-5.4: $2.50/$15, cache read ~$0.25/1M from same. All prices verified against live pricing pages on date of publication.

**Token model.** Reference loop: 2K stable system prompt + tool definitions (5 tools × 250 tokens each = 1.25K, rounded to 2K total with system), tool results 500 tokens per turn (3 tools × 167 tokens average), 300 output tokens per turn, context replay is cumulative (turn T includes all prior turns' output, tool results, and new input). This is a mid-weight agent — heavier than a simple chatbot, lighter than a full autonomous research agent.

**SWE-bench numbers.** Sourced from swe-bench.github.io and vendor release notes: Opus 4.7 ~76%, GPT-5.5 ~74%, Sonnet 4.6 ~65-68%, GPT-5.4 ~62-65%. These are SWE-bench Verified (500-task human-validated subset), not the full benchmark. Agent task performance varies significantly by task type; these are indicative baselines only.

**Cache modeling.** Cache hit rate assumed at 80% of calls after the first call per session. Cache write cost modeled at 1.25x input rate per Anthropic's 5-minute TTL (conservative). OpenAI cache modeled at 50% discount with automatic activation on stable prefix. Actual cache hit rates in production depend on session length, system prompt stability, and traffic patterns — validate with your own usage_metadata.

**This article is updated quarterly.** LLM pricing changes without notice. Before using these numbers for procurement decisions, verify against the live pricing pages: Anthropic, OpenAI. For custom token profiles and monthly estimates, use our Claude API cost calculator and OpenAI API cost calculator which take your specific numbers.

How to calculate your agent loop cost in 5 steps

  1. 1

    Profile your loop: turns, tokens per turn, tool result size

    Add logging to your agent to capture: number of turns per query (p50 and p90), input tokens at each turn (watch for growth), output tokens per turn, tool result sizes per call. One week of production logs gives you the distribution. p50 cost is your budget anchor; p90 is your tail spend. Don't use average-turn cost — the accumulation shape matters.

  2. 2

    Apply the formula: Σ turns × (cached_input + new_input + output + tool_results)

    For each turn t: cost_t = (cached_prefix_tokens / 1M) × cache_rate + (new_input_tokens / 1M) × input_rate + (output_tokens / 1M) × output_rate + (tool_result_tokens / 1M) × input_rate. Sum across all turns for per-query cost. Multiply by monthly query volume for the monthly budget.

  3. 3

    Identify and cache your stable prefix

    Mark everything that is identical across all queries: system prompt, tool definitions, static few-shot examples. On Anthropic, add cache_control markers to that prefix block. On OpenAI, ensure your system prompt and tool definitions come first in the message array and contain no dynamic interpolations. Measure cache hit rate in production via usage_metadata.cache_read_input_tokens.

  4. 4

    Benchmark quality per model on your actual task distribution

    Take 30 representative queries from production (not synthetic benchmarks). Run all 4 models. Blind-rate the 5-turn outputs on success/failure. Compute the per-model failure rate. Add failure cost (expected extra turns × per-turn cost × failure rate) to the base per-query cost. This gives you the true effective cost per successful completion.

  5. 5

    Design a hybrid router for 50-70% cost reduction

    Classify queries by complexity before routing to the agent: simple (classification, extraction, structured generation) → GPT-5.4 or Sonnet 4.6; complex (multi-step reasoning, open-ended research, code synthesis) → Opus 4.7 or GPT-5.5. The router itself costs ~$0.001 per query on a small model. A well-calibrated router at 75/25 split typically cuts blended cost by 50-65% with no measurable quality loss on the easy tier.

Frequently Asked Questions

How much does a 5-turn agent loop cost on Claude Opus 4.7?

For a reference loop with 2K stable system prompt, 500-token tool results per turn, 300 output tokens per turn, and context growing from 3K to 12.5K across turns 1-5: approximately $0.683 per query uncached and $0.548 with Anthropic's 90% cache discount on the stable prefix. At 10K queries/month: $5,480-$6,830. Source: Anthropic pricing $15/$75 per 1M input/output, cache read $1.50/1M. https://docs.anthropic.com/en/docs/about-claude/pricing

Is GPT-5.5 or Claude Sonnet 4.6 cheaper for agent loops?

Claude Sonnet 4.6 is cheaper. GPT-5.5 is $5/$25 per 1M input/output; Sonnet 4.6 is $3/$15. For a 38K-input / 1.5K-output 5-turn loop: GPT-5.5 costs $0.228/query vs Sonnet 4.6's $0.137/query uncached — a 40% advantage for Sonnet. With caching, Sonnet's 90% cache discount widens the gap further vs GPT-5.5's 50% discount. Source: OpenAI pricing, Anthropic pricing, both fetched June 2026.

How does prompt caching reduce agent loop cost?

The stable system prompt and tool definitions appear in every turn's input — 5 times per query in a 5-turn loop. Caching prices those repeated tokens at 10% of normal input rate (Anthropic 90% off, OpenAI 50% off). On a 2K stable prefix across 5 turns on Opus 4.7: 5 × 2K × ($15 − $1.50) / 1M = $0.135 saved per query. At 10K queries/month, $1,350/month from caching alone, with no code changes beyond marking the breakpoint.

What is the cost formula for an agent loop?

cost = Σ_t [ (cached_input_t / 1M) × cache_rate + (new_input_t / 1M) × input_rate + (output_t / 1M) × output_rate + (tool_result_t / 1M) × input_rate ]. Sum across all N turns. The key driver is context accumulation: each turn replays all prior turns' content, so input tokens compound across the loop. Cached tokens are billed at the cache read rate for the cacheable prefix only.

How do tool result tokens affect agent loop cost?

Tool results are fed back as user-message input on the turn following the tool call, billed at the standard input rate. A search tool returning 800 tokens per call adds 800 tokens to that turn's input — and to every subsequent turn's accumulated context. In a 5-turn loop with 3 tool calls at 600 tokens each, tool results add ~1,800 tokens of billable input across the loop: at $15/1M (Opus), $0.027/query, or $2,700/month at 100K queries. Truncate tool results to the minimum useful size.

Does Claude or GPT-5 handle parallel tool calls?

Both Claude and GPT-5.5/GPT-5.4 support parallel tool calls — emitting multiple tool calls in a single model output turn. Claude (Opus 4.7, Sonnet 4.6) typically fans out 2-3 tools per turn; GPT-5.5 is more aggressive, often fanning out 4-6 tools per turn. Parallel tool calls reduce total loop turns (and thus context accumulation cost) while paying the same tool argument + result token costs. Enabling parallel tool execution can cut turn count by 30-50% on tool-heavy agents. Source: Anthropic tool use docs, OpenAI function calling docs.

How does the agent loop cost compare at 100K queries/month?

At 100K queries/month with 80% cache hit on stable prefix: Claude Opus 4.7 ≈ $57,450/month; GPT-5.5 ≈ $19,150/month; Claude Sonnet 4.6 ≈ $11,490/month; GPT-5.4 ≈ $9,950/month. A hybrid router (80% Sonnet/GPT-5.4, 20% Opus for hard queries) brings blended cost to ~$16,000/month — a 72% reduction vs all-Opus.

When is Claude Opus 4.7 worth the premium for agent loops?

When three conditions hold simultaneously: (1) the task requires near-frontier reasoning quality and the 14% relative improvement in full-loop success rate (Opus ~76% vs GPT-5.5 ~74% SWE-bench, compounding over 5 turns) measurably reduces human escalations or retries; (2) the system prompt is long and highly cacheable (Anthropic's 90% cache discount closes the Opus-to-GPT-5.5 price gap materially); (3) the business cost of a failed agent run (human review time, delayed output, error propagation) exceeds $0.45/query — the price differential vs GPT-5.5.

Your prompts are driving your agent costs. Tighten them.

Every token your agent loop sends is billed. Our AI Prompt Generator writes cache-anchored, tool-use-ready prompts that trim 20-40% off agent input tokens without sacrificing quality. Works with Claude, GPT-5, and every major agent framework. 14-day free trial, no card.

Browse all prompt tools →