The cost formula with cache write/read math
Claude pricing has three input rates instead of two: standard input, cache-write input (premium), cache-read input (90% discount). For a single call with no cache, the formula matches OpenAI's:
``` cost = (input_tokens / 1,000,000) × input_price_per_M + (output_tokens / 1,000,000) × output_price_per_M ```
When caching is enabled, the prefix you mark as cacheable bills at the cache-write rate on the first call (1.25x for 5-min TTL or 2x for 1-hour TTL), then bills at the cache-read rate (10% of base) on every subsequent call until the TTL expires. The amortized cost across N calls in the same TTL window:
``` amortized_cost = (cache_write_cost + (N-1) × cache_read_cost + N × non_cached_input_cost + N × output_cost) ```
Break-even on the 1-hour cache write (2x premium) happens after 2 cache hits. After that, every additional hit is pure savings. For a stable 2,000-token system prompt + tools on Sonnet 4.6 read across 100 calls in an hour: cache write = 2000 × $6/1M = $0.012 once, cache reads = 99 × 2000 × $0.30/1M = $0.0594 — vs reading the same prefix 100x at standard input = 100 × 2000 × $3/1M = $0.60. That is a **88% saving on the prefix portion**.
The Batch API stacks on top of everything else: 50% off both input and output for asynchronous jobs.