The 6 variables that determine the real delta
**1. Per-token price**. The headline number from the table above. Useful only as the starting point. For Sonnet 4.6 vs gpt-5.4 the delta is small (+20% input / 0% output). For Haiku 4.5 vs gpt-5-mini it's brutal (+220% / +100%). For Opus 4.7 vs gpt-5.5 it's structurally indefensible outside frontier reasoning workloads (+1,100% / +650%). Anchor on this only for sanity — the next five variables can swing the final bill by 3-5x in either direction.
**2. Output-length multiplier**. The single most-ignored variable. By our internal evals across customer-support, summarization, and code-generation prompts, Claude Sonnet 4.6 produces 20-40% longer outputs than gpt-5.4 on identical instructions. Claude is trained to over-explain, hedge, and structure with headers; GPT is trained to be terser. Multiply your projected Claude output token count by 1.2-1.4x before you trust any cost projection. On a workload that's 80% output cost, a 1.3x verbosity multiplier eats a 30% input-side cache discount entirely.
**3. Cache discount differential**. Anthropic prompt caching gives 90% off cached input tokens (5-minute default TTL, 1-hour TTL available at 2x write cost). OpenAI's automated cache gives 50% off cached tokens with no TTL controls. For any workload with a stable system prompt or stable retrieval context, this is a 5x advantage to Anthropic on the cached portion. The breakeven where caching flips a Claude-loses comparison to a Claude-wins comparison is roughly: when >60% of input tokens are cacheable and reused >2x within the TTL window.
**4. Batch discount differential**. Both providers offer batch mode at ~50% off list price for 24-hour-tolerant workloads. The differential is small (both ~50%) but Anthropic's batch API throughput limits are tighter, and certain models (Opus 4.7) gate batch behind enterprise tiers. If you're already on OpenAI batch, the move to Anthropic batch may require throughput renegotiation.
**5. Prompt-rewrite tax**. GPT prompts written without XML structuring, without explicit response-format anchors, and without cache-prefix discipline under-deliver on Claude — typically 10-25% lower quality on the same task by internal eval. Rewriting 50 production prompts to Claude-native conventions is ~25 engineer-hours = $3-5k internal cost. This is one-time but unavoidable; budget it.
**6. Latency and retry tax**. Anthropic Sonnet 4.6 median TTFT is 800ms-1.2s; Haiku 4.5 is 400-600ms. OpenAI gpt-5.4 is 400-700ms; gpt-5-mini is 200-400ms. If your service has a strict p99 latency SLA, Anthropic's higher TTFT may force you to increase retry budgets, fallback to faster models, or accept higher timeout rates — each of which has a cost.