Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Claude API Rate Limits 2026: RPM, ITPM, OTPM by Tier and Model

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

Anthropic's rate limits are structured differently from OpenAI's. Where OpenAI uses a combined **tokens-per-minute (TPM)** budget that mixes input and output, Claude splits the budget into **ITPM (input tokens per minute)** and **OTPM (output tokens per minute)** measured separately, each with its own ceiling. This matters for app design: an agent that reads massive context but emits short answers hits a different ceiling than a generator that emits long markdown from a short prompt. You can saturate ITPM while OTPM sits idle, and vice versa.

Tier promotion on Claude is also faster than OpenAI's: **Tier 1 unlocks at $5 in credit purchases**, **Tier 2 at $40**, **Tier 3 at $200**, and **Tier 4 at $400** — with promotion happening immediately upon reaching the threshold. There is no 30-day waiting clock like OpenAI's Tier 5 gate. The ceiling at Tier 4 is generous: Claude Opus 4.7 reaches **10,000,000 ITPM and 800,000 OTPM**, Sonnet 4.6 reaches **2,000,000 ITPM and 400,000 OTPM**, and Claude Fable 5 (Anthropic's newest flagship) reaches **4,000,000 ITPM and 800,000 OTPM**. Above Tier 4 is **Monthly Invoicing / Custom**, which removes the monthly spend cap entirely and is negotiated via sales.

Below: the canonical per-tier table sourced from docs.anthropic.com/en/api/rate-limits (fetched 2026-06-20), then the structural decisions every Claude-on-API team needs to understand — the split ITPM/OTPM model, prompt caching's effect on ITPM consumption, the Message Batches separate quota pool, and the difference between a **429 rate-limit error** and a **529 capacity-overload error**. For Anthropic vs OpenAI side-by-side, see our OpenAI Tier 5 unlock requirements; for token economics that determine how fast you accumulate credit toward Tier 4, see the Claude API cost calculator.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Claude API rate limits by tier — June 2026 (Sonnet 4.6 baseline)

Feature
Sonnet 4.6 RPM
ITPM
OTPM
Tier 1 ($5 credit)5030,0008,000
Tier 2 ($40 credit)1,000450,00090,000
Tier 3 ($200 credit)2,000800,000160,000
Tier 4 ($400 credit)4,0002,000,000400,000
Custom / Monthly InvoicingNegotiatedNegotiatedNegotiated

Source, as of June 2026: Anthropic rate-limits documentation (https://docs.anthropic.com/en/api/rate-limits), fetched 2026-06-20. The Sonnet 4.x limit is a *combined* ceiling that applies across all Sonnet 4.x model traffic (Sonnet 4.5 + Sonnet 4.6). Other models scale differently — Claude Opus 4.7 at Tier 4 reaches 10,000,000 ITPM and 800,000 OTPM; Claude Haiku 4.5 reaches 4,000,000 ITPM and 800,000 OTPM; Claude Fable 5 reaches 4,000,000 ITPM and 800,000 OTPM. Custom tier (Monthly Invoicing) removes the monthly spend cap and uses negotiated rate-limit ceilings — contact sales via console.anthropic.com/settings/limits. Verify your own organization's live limits at console.anthropic.com/settings/limits or via the [Rate Limits API](https://docs.anthropic.com/en/api/admin-api/usage-cost/rate-limits).

How Anthropic's tier ladder actually works (spend, not days)

Anthropic's tier ladder is **spend-only**: you advance to the next tier the moment your cumulative credit purchases (excluding tax) cross the threshold. There is no minimum-days requirement like OpenAI's Tier 5 wait. **Tier 1** requires **$5 in credit purchases** and gives you a **$500/month spend cap**. **Tier 2** requires **$40** and keeps the **$500/month cap**. **Tier 3** requires **$200** and lifts the cap to **$1,000/month**. **Tier 4** requires **$400** and lifts the cap to **$200,000/month**. Above Tier 4, **Monthly Invoicing** removes the monthly cap entirely (negotiated via sales, Net-30 terms by default).

Two non-obvious mechanics: First, the threshold is on **credit purchased**, not credit consumed. Buy $400 of credit on day 1 and you are immediately promoted to Tier 4 — even if you've only burned $12 of it. This is the opposite of OpenAI's policy, where only *consumed* paid usage counts toward the tier. Second, the **max-credit-purchase-per-transaction** limit at lower tiers (Tier 1 and Tier 2 cap single deposits at **$500**) is an anti-overfunding guardrail, not a tier requirement — you can hit Tier 4 in four $100 deposits or stack purchases to clear $400 quickly.

Promotion is instant and automatic. Once the threshold clears, your organization moves up in the same API call cycle. Check your live tier and limits at console.anthropic.com/settings/limits or read them programmatically via the Rate Limits API. **Claude Platform on AWS** is an exception: organizations start at Tier 1 and there is no automatic tier advancement — rate-limit increases go through your Anthropic account representative.


Why Anthropic splits ITPM and OTPM (and why it matters for app design)

Almost every other major API provider uses a **combined TPM** model: a single ceiling that counts all tokens — input plus output, cached plus uncached — against one budget. Anthropic splits these into two ceilings: **ITPM** (input tokens per minute) and **OTPM** (output tokens per minute). They are measured and enforced independently. You can hit your ITPM ceiling while your OTPM ceiling is 90% idle, and the API will throw 429s for ITPM exhaustion while your output budget sits untouched.

The ratio between ITPM and OTPM on Claude is roughly **5:1 across most models and tiers**. At Tier 4, Sonnet 4.6 gives you **2,000,000 ITPM** vs **400,000 OTPM**. At Tier 1, that ratio holds: **30,000 ITPM** vs **8,000 OTPM**. The implication: Anthropic expects your workloads to be **input-heavy** (long contexts, retrieved documents, conversation history, tool definitions) and **output-light** (concise answers, structured JSON, short summaries). Designs that flip the ratio — short prompts generating long markdown reports — saturate OTPM first and leave ITPM idle.

Three concrete design implications. **First**: if you're building an agent that reads large context and emits short tool calls, your binding constraint is ITPM — fix it with prompt caching (next section). **Second**: if you're building a content generator that emits long-form prose from short briefs, your binding constraint is OTPM — fix it with model selection (Claude Fable 5 and Haiku 4.5 have higher OTPM ceilings than Sonnet at every tier) or batching. **Third**: monitor both. Anthropic returns separate `anthropic-ratelimit-input-tokens-remaining` and `anthropic-ratelimit-output-tokens-remaining` headers on every response — log both and alert on whichever crosses 80% first.


Per-model rate-limit scaling: Fable 5, Opus 4.7, Sonnet 4.6, Haiku 4.5

Rate limits scale **per model** on Claude. The RPM ceiling is identical across all models at a given tier (50 at Tier 1, 1,000 at Tier 2, 2,000 at Tier 3, 4,000 at Tier 4), but ITPM and OTPM vary dramatically by model. Counterintuitively, **Opus has the highest ITPM ceiling at every tier** — not the lowest. At Tier 4, Opus 4.7 reaches **10,000,000 ITPM** vs Sonnet 4.6's **2,000,000 ITPM**. The reason: Opus traffic is lower-volume per customer, so Anthropic provisions per-org headroom generously to encourage usage of the flagship reasoning model.

**Claude Fable 5** (Anthropic's newest flagship as of June 2026) sits between Opus and Sonnet on ITPM scaling. At Tier 1: **100,000 ITPM / 20,000 OTPM**. At Tier 4: **4,000,000 ITPM / 800,000 OTPM**. Fable 5 is positioned for production workloads that need Opus-class quality on a larger volume than Opus pricing supports — a middle tier that Anthropic now markets as the default for new builds.

**Claude Opus 4.7** has the **most generous ITPM ceiling** but the **most aggressively capped RPM-to-token ratio** — meaning at Tier 1, 50 RPM with 500,000 ITPM means each request can average 10,000 input tokens before you saturate. At Tier 4, 4,000 RPM with 10,000,000 ITPM gives you 2,500 input tokens per request on average before ITPM is the binding constraint. For high-volume Opus traffic, ITPM is rarely the gate — RPM is.

**Claude Sonnet 4.6** is the most-deployed Claude model in production as of 2026 and has the **tightest ITPM ceiling**. At Tier 4, 2,000,000 ITPM is generous in absolute terms but represents the *combined* limit across Sonnet 4.5 and Sonnet 4.6 traffic in your organization — they share the bucket. Teams that A/B test between Sonnet versions need to plan against the combined ceiling, not against each version individually.

**Claude Haiku 4.5** has **generous OTPM scaling**: at Tier 4 it reaches **800,000 OTPM** (matching Fable 5 and Opus). Haiku is the right choice for high-volume output workloads — short-answer classification, structured-JSON extraction, lightweight chat — where the cost-per-token and the OTPM headroom both matter. The deprecated Claude Haiku 3.5 (retired everywhere except Bedrock and Vertex AI) has a tighter ceiling (400,000 ITPM / 80,000 OTPM at Tier 4) and additionally counts cached reads toward ITPM, which Haiku 4.5 does not.


Prompt caching: the lever that multiplies effective ITPM

This is the single most important rate-limit lever on Claude — and the one most teams miss. **On every Claude model except the deprecated Haiku 3.5, `cache_read_input_tokens` do NOT count toward your ITPM ceiling.** Cached reads are billed at 10% of base input price *and* consume zero ITPM budget. The only tokens that count toward ITPM are `input_tokens` (uncached tokens after the last cache breakpoint) and `cache_creation_input_tokens` (tokens being written to cache for the first time).

The throughput multiplier is dramatic. Anthropic's own documentation gives this example: **with a 2,000,000 ITPM ceiling and an 80% cache hit rate, you can effectively process 10,000,000 total input tokens per minute** — 2M uncached plus 8M cached. That's a 5x effective rate-limit increase from prompt caching alone, with no tier upgrade and no extra cost beyond the cache-write surcharge (which is paid once per cached prefix, not per request).

The structural play: **build prompts with a stable, long, cacheable prefix and a short, variable suffix**. System instructions, tool definitions, large reference documents, and conversation history should sit *above* the cache breakpoint. The user's per-request question, the dynamic context, and the retrieved chunks specific to this call should sit *below* the breakpoint. On a typical RAG agent with a 50k-token system prompt + tool definitions + few-shot examples (cacheable) plus a 2k-token retrieved context + user query (uncached), the ITPM-billed portion is the 2k tail. The 50k prefix is essentially free against your rate limit.

Two operational notes. **First**: monitor cache hit rate on the Usage page in the Claude Console. If your cache hit rate is under 60%, your prompt architecture is leaving rate-limit headroom on the table. **Second**: `input_tokens` in the API response only counts tokens *after* the last cache breakpoint — it can look misleadingly small. Total input is `cache_read_input_tokens + cache_creation_input_tokens + input_tokens`. Log all three or your token-accounting math will be wrong.


Message Batches API: a separate quota pool with 50% discount

The **Message Batches API** runs on a **completely separate rate-limit pool** from the real-time Messages API. Hitting your Sonnet 4.6 ITPM ceiling on synchronous traffic does not affect your batch throughput at all — they share nothing. This is the cleanest workaround for teams that need higher effective throughput without waiting on Tier promotion or negotiating a Custom tier.

The Message Batches limits scale with tier. At **Tier 1**: 50 RPM, **100,000 batch requests in processing queue**, 100,000 requests per batch. At **Tier 2**: 1,000 RPM, **200,000 in queue**. At **Tier 3**: 2,000 RPM, **300,000 in queue**. At **Tier 4**: 4,000 RPM, **500,000 in queue**. The processing queue limit is the binding constraint — it caps how many in-flight batch requests can be pending simultaneously. The per-batch limit (100,000 requests per submitted batch) is identical across all tiers.

Pricing: **Message Batches run at 50% off both input and output token prices**, identical to OpenAI's Batch API discount. Completion happens within 24 hours; most batches complete much faster. This makes batch the right answer for any non-real-time workload: evaluation runs, training-set generation, weekly classification jobs, A/B variant generation, embedding precomputes for retrieval indices.

For Claude-specific patterns and per-tier batch behavior, see our dedicated Anthropic Message Batches limits page. The summary: batch is not just a cost play, it's a *rate-limit play*. A team rate-limited on real-time Sonnet 4.6 ITPM can move 80%+ of its async workload to batch and reserve real-time budget for the genuinely synchronous traffic — without any tier change.


429 vs 529: the error you get tells you what to do

Claude returns **two distinct error codes** that look similar but mean different things. **HTTP 429** is a rate-limit error — *your* organization has exceeded a per-minute ceiling (RPM, ITPM, or OTPM). **HTTP 529** is a capacity-overload error — *Anthropic's* infrastructure is at capacity right now and cannot accept more traffic, independent of your org's limits. They require different retry strategies.

**429 handling**: the response includes a `retry-after` header indicating seconds to wait, plus `anthropic-ratelimit-{requests,input-tokens,output-tokens}-{limit,remaining,reset}` headers showing exactly which budget you exhausted. Production pattern: read `retry-after` and wait that long, then resume. For sustained ITPM saturation, the right answer is not aggressive retry — it's prompt caching (above) or Tier promotion (below).

**529 handling**: there is no `retry-after` header (or it is conservative). The right pattern is **exponential backoff with jitter**, capped at 60 seconds — 1s, 2s, 4s, 8s, 16s, 32s with ±25% jitter. 529s typically clear within 1-5 minutes during normal capacity squeezes, longer during major model launches or incidents. Anthropic posts capacity events at status.anthropic.com; subscribe to incident feeds for production-critical workloads.

Two extra mechanics worth knowing. **Acceleration limits**: if your organization's traffic spikes sharply, you can hit 429s even when your stated ITPM/OTPM/RPM look fine — Anthropic enforces a separate acceleration limit that requires gradual ramps. Plan production rollouts as graduated load increases over 15-30 minutes rather than instant flips. **Priority Tier**: Anthropic's Priority Tier offers committed-spend service levels with separate, higher rate-limit pools and explicit capacity guarantees — the right answer if your workload cannot tolerate 529s. See Service Tiers for the commitment levels.


The path from Tier 1 to Tier 4: spend thresholds and what each unlocks

**Tier 1** ($5 in credit purchases): 50 RPM across all models. Sonnet 4.6 ITPM 30k / OTPM 8k. Opus 4.7 ITPM 500k / OTPM 80k. Fable 5 ITPM 100k / OTPM 20k. Haiku 4.5 ITPM 50k / OTPM 10k. Monthly spend cap: **$500**. This tier supports prototype + small-team production work — enough for a 50-DAU Claude-backed app or an internal-tool deployment. The Opus ITPM ceiling of 500k is the standout — Opus traffic is unusually generous at Tier 1 because per-token pricing is high.

**Tier 2** ($40 in credit purchases): 1,000 RPM. Sonnet 4.6 jumps to ITPM 450k / OTPM 90k (15x ITPM increase from Tier 1). Opus 4.7 jumps to ITPM 2M / OTPM 200k. Fable 5 to ITPM 500k / OTPM 100k. Haiku 4.5 to ITPM 450k / OTPM 90k. Monthly spend cap stays at **$500**. This is the typical 'real production traffic' tier for early-stage SaaS — supports 500-1,000 DAU with caching applied.

**Tier 3** ($200 in credit purchases): 2,000 RPM. Sonnet 4.6 ITPM 800k / OTPM 160k. Opus 4.7 ITPM 5M / OTPM 400k. Fable 5 ITPM 1.5M / OTPM 300k. Haiku 4.5 ITPM 1M / OTPM 200k. Monthly cap rises to **$1,000**. This tier handles mid-scale production for most B2B SaaS — supports 5k+ DAU on a well-cached prompt architecture.

**Tier 4** ($400 in credit purchases): 4,000 RPM. Sonnet 4.6 ITPM 2M / OTPM 400k. Opus 4.7 ITPM 10M / OTPM 800k. Fable 5 ITPM 4M / OTPM 800k. Haiku 4.5 ITPM 4M / OTPM 800k. Monthly cap rises to **$200,000**. This is the top of the *standard* tier ladder — sufficient for most production workloads under $200k/month combined spend. Above this, you negotiate a Custom tier.

**Total credit purchases to reach Tier 4 from zero: $400.** Promotion is immediate at each threshold. A team that wants Tier 4 capacity for a Tuesday product launch can buy $400 of credit on Monday and be at Tier 4 within minutes. This is the single biggest practical difference from OpenAI, where Tier 5 requires a 30-day wait regardless of how fast you spend.


Custom / Monthly Invoicing tier: when to switch and what you get

Above Tier 4 sits **Monthly Invoicing** (also called the Custom tier), Anthropic's equivalent of an enterprise contract. It is not on the self-serve ladder — you contact sales via console.anthropic.com/settings/limits and a Custom tier is provisioned for your organization. The default payment terms are **Net-30 invoicing** instead of pre-paid credits, which simplifies enterprise procurement.

What you actually get at Custom tier: **(1) no monthly spend cap** — Tier 4's $200k/month ceiling is removed; (2) **negotiated per-model rate-limit ceilings** above the Tier 4 levels (typical custom-tier orgs run at 2-10x Tier 4 throughput); (3) **a named account team** including a CSM and a technical contact; (4) **Service Tier eligibility** for committed-spend Priority Tier with explicit capacity guarantees; (5) **custom data-handling and security review** for HIPAA, FedRAMP, or other regulatory environments; (6) **support SLA** with response-time commitments.

Threshold for serious Custom-tier conversations: typically **$5,000+/month committed spend** for a baseline contract, **$20k+/month** for the more interesting throughput negotiations. Below that, Tier 4 self-serve is faster and cleaner — you don't need an account team to move 50 RPM more on Sonnet.

Indicators you should switch to Custom: (a) consistently saturating Tier 4 ITPM or OTPM during peak hours; (b) approaching the $200k/month spend cap; (c) regulatory or procurement requirement for a signed contract with IP indemnity; (d) need for Priority Tier capacity guarantees to handle a public launch or known traffic event; (e) need for regional inference (`inference_geo` controls beyond the shared pool).


Claude vs GPT-5 rate limits: the migration decision

Teams choosing between Anthropic and OpenAI for production typically compare three dimensions: **ceiling height**, **promotion friction**, and **caching multiplier**. Claude wins on caching and promotion friction; OpenAI wins on absolute RPM ceiling at the top tier; the ITPM/OTPM comparison depends entirely on how cache-friendly your workload is.

**Promotion friction**: Anthropic's $400-purchase path to Tier 4 takes one transaction and clears immediately. OpenAI's $1,000-paid-usage path to Tier 5 requires 30 days minimum since first successful payment, with no skip available. For teams launching production traffic this week, this is a 30-day gap that matters.

**Caching multiplier**: Claude's policy that `cache_read_input_tokens` don't count toward ITPM is structurally different from OpenAI's caching. OpenAI's prompt caching gives you a price discount (50% off cached input on most models as of 2026) but cached tokens still count toward your TPM budget. Anthropic gives you both — the price discount (10% of base) *and* the rate-limit exemption. On a workload with 80% cache hit rate, Claude's effective ITPM is 5x its stated limit; OpenAI's is unchanged.

**Absolute ceiling at the top**: OpenAI's Tier 5 RPM on flagship models is higher in raw numbers than Anthropic's Tier 4 RPM (both top out at 4,000 RPM on Anthropic vs ~10,000+ on OpenAI flagship at Tier 5 as of mid-2026 — verify against your account). For very-high-RPM low-token-per-request workloads (real-time classification, edge inference), OpenAI's ceiling is meaningfully higher. For high-token-per-request workloads (long-context agents, RAG with large retrieved chunks), Anthropic's combination of generous ITPM + caching exemption is the winning architecture.

Migration tactic for teams considering the switch: **run a 2-week parallel test** with the same prompt structure on both providers. Measure cache hit rate, ITPM/TPM utilization, and 429/529 incidence per model. The team that picks based on the spec sheet often picks wrong; the team that picks based on observed production behavior usually picks right. For the OpenAI side of this comparison, see OpenAI Tier 5 unlock requirements.


Sourcing and live-verify checklist

**Per-tier table source**: Anthropic's official rate-limits documentation at docs.anthropic.com/en/api/rate-limits, fetched 2026-06-20. The full ITPM/OTPM tables per model per tier are listed verbatim on that page in the tabbed view (Tier 1 / Tier 2 / Tier 3 / Tier 4 / Custom). The spend thresholds ($5, $40, $200, $400) and monthly spend caps ($500, $500, $1,000, $200,000) appear in the 'Requirements to advance tier' table on the same page.

**Cache-aware ITPM policy** is documented in the 'Cache-aware ITPM' section of the same doc. The footnote that flags Claude Haiku 3.5 as the exception (the only model that counts `cache_read_input_tokens` toward ITPM) is verbatim from the docs.

**Message Batches per-tier limits** are documented in the 'Message Batches API' section of the same page. Cross-reference our deeper-dive Anthropic Message Batches limits for Batches-specific patterns.

**Live-verify when you budget**: open console.anthropic.com/settings/limits when logged into your Anthropic account. The page shows your live tier, your per-model ITPM/OTPM ceilings (including any custom adjustments), and your current monthly spend against the cap. The Rate Limits API returns the same data programmatically — useful for capacity-planning dashboards.

**Account-level rate-limit headers**: every API response includes `anthropic-ratelimit-input-tokens-{limit,remaining,reset}` and `anthropic-ratelimit-output-tokens-{limit,remaining,reset}` showing your live consumption against each ceiling. Log these on every call to track utilization without separate dashboard queries.

**If the docs page returns 404 or redirects**: as of June 2026, docs.anthropic.com/en/api/rate-limits 301-redirects to platform.claude.com/docs/en/api/rate-limits (the rebranded developer portal). Both URLs serve the same content. If both fail, check docs.claude.com for the current canonical path.

**Why this page exists**: ChatGPT and Perplexity routinely cite outdated rate-limit numbers from Reddit threads, blog posts from 2024, or Anthropic forum posts that pre-date the Fable 5 launch. The official doc is authoritative but tab-gated (the per-tier numbers live behind a tab control that some scrapers miss). This page exists to be the clean, single-URL, dated reference that AI engines have a better citation target for than the noisy secondary sources. If you found this page via ChatGPT or Perplexity, the mechanism is working.

Step-by-step: unlocking Anthropic Tier 4

  1. 1

    Create your organization at console.anthropic.com

    Sign up at console.anthropic.com and complete the standard onboarding. You start at Tier 0 (no API access until you make your first credit purchase). Add a payment method — cards clear faster than ACH/wire transfers, and the credit-purchase threshold is what advances the tier, not days elapsed.

  2. 2

    Buy $5 of credit to unlock Tier 1

    Navigate to console.anthropic.com/settings/billing and purchase $5 of API credit. Promotion to Tier 1 is immediate — your organization unlocks 50 RPM, 30k ITPM and 8k OTPM on Sonnet 4.6 (and the corresponding per-model ceilings on Fable 5, Opus 4.7, Haiku 4.5). The monthly spend cap at Tier 1 is $500.

  3. 3

    Stack credit purchases to $40 for Tier 2, $200 for Tier 3, $400 for Tier 4

    Each threshold is on *cumulative credit purchases* (not consumed usage). A single $400 transaction promotes you to Tier 4 immediately. Tier 1 and Tier 2 have a max single-transaction deposit of $500, so you can hit Tier 4 in a single $400 purchase. Tier 3 and Tier 4 raise the max single transaction to $1,000 and $200,000 respectively.

  4. 4

    Verify your live tier and per-model ceilings

    Open console.anthropic.com/settings/limits. The page shows your current tier, per-model RPM/ITPM/OTPM ceilings (including any custom adjustments above the standard table), and your current monthly spend against the cap. If the page disagrees with the standard tier table, the live page wins — it reflects any account-specific overrides.

  5. 5

    Architect for the cache to multiply your effective ITPM

    Tier 4's stated 2,000,000 Sonnet 4.6 ITPM becomes effective 10,000,000 ITPM at 80% cache hit rate. Front-load your prompts with the stable, cacheable prefix (system instructions, tool definitions, large reference documents); put the variable, per-request content (user query, retrieved chunks) below the last cache breakpoint. Monitor your cache hit rate on the Usage page — under 60% means you're leaving rate-limit headroom unclaimed.

Frequently Asked Questions

What does ITPM mean on the Claude API?

ITPM stands for **input tokens per minute** — the maximum number of input tokens your organization can submit to a specific Claude model per minute. It is one of three rate-limit ceilings Anthropic enforces (alongside RPM and OTPM) and is the binding constraint for most context-heavy workloads (agents, RAG, long conversations). For most models, only uncached input tokens count toward ITPM — `cache_read_input_tokens` do not consume your ITPM budget.

Why doesn't Claude use a combined TPM like OpenAI?

Anthropic splits input and output budgets because Claude workloads tend to be heavily input-skewed (long contexts, large tool definitions, retrieved documents) and lightly output-skewed (concise answers, structured JSON). Separating ITPM and OTPM lets Anthropic provision generous input ceilings (typically 5x the output ceiling at the same tier) without giving away the output capacity. It also gives you cleaner cost visibility — you can tune input and output independently in app design.

Do cached reads count toward Claude's ITPM rate limit?

On every Claude model except the deprecated Claude Haiku 3.5, **no** — `cache_read_input_tokens` do not count toward your ITPM budget. Only `input_tokens` (uncached tokens after the last cache breakpoint) and `cache_creation_input_tokens` (tokens being written to cache) consume ITPM. This is why prompt caching is the highest-leverage rate-limit play on Claude: an 80% cache hit rate gives you ~5x effective ITPM with zero tier change.

What is a 529 error and how does it differ from 429?

A **429** means your organization exceeded a per-minute rate-limit ceiling (RPM, ITPM, or OTPM) — your problem, fixable via caching, tier promotion, or batching. A **529** means Anthropic's infrastructure is at capacity right now — their problem, not yours, fixable only by exponential-backoff retry until capacity returns. 429s include a `retry-after` header; 529s typically don't and require client-side backoff with jitter (1s, 2s, 4s, 8s, 16s, 32s capped at 60s).

Can I share API keys across tiers or organizations?

Rate limits are enforced at the **organization** level, and API keys belong to a single organization — you cannot share a key across organizations. Within a single org, all API keys share the same rate-limit budget; creating a second key in the same org gives you a separate audit identity but no additional rate-limit headroom. To get a separate rate-limit budget, create a separate organization with its own billing and its own credit-purchase history.

How does the Message Batches API help with rate limits?

Message Batches runs on a **completely separate rate-limit pool** from the real-time Messages API. Saturating your Sonnet 4.6 ITPM ceiling on synchronous traffic does not affect batch throughput at all. Batch jobs run at **50% off both input and output** with a 24-hour completion window, and the per-tier limits are generous (Tier 4 allows 500,000 batch requests in the processing queue). For any non-real-time workload, batch is the cleanest rate-limit workaround. See our Anthropic Message Batches limits page for tier-by-tier details.

Tier 4 vs Custom: when should I switch?

Switch from Tier 4 self-serve to Custom (Monthly Invoicing) when one of: (a) you consistently saturate Tier 4 ITPM or OTPM during peak hours; (b) you're approaching the $200,000/month Tier 4 spend cap; (c) procurement requires a signed contract with IP indemnity or custom data-handling terms; (d) you need Service Tier / Priority Tier capacity guarantees for a known traffic event; (e) you need named-account-team support. Threshold for serious Custom conversations: typically $5k+/month committed spend.

How do I migrate from OpenAI Tier 5 to Claude Tier 4?

Operationally: (1) create an Anthropic organization at console.anthropic.com; (2) buy $400 of credit to unlock Tier 4 immediately (no 30-day wait); (3) translate your OpenAI prompt structure to Claude's caching model — front-load stable content above the cache breakpoint to capture the ITPM exemption; (4) re-architect for split ITPM/OTPM instead of combined TPM (your tuning levers are different); (5) run 2 weeks of parallel traffic on both providers and compare cache hit rate, utilization, and 429/529 incidence per model. Most teams find Claude's caching multiplier compensates for any raw-ceiling gap at the top tier. See OpenAI Tier 5 unlock requirements for the OpenAI-side context.

Tier 4 raises Claude's ceiling. Caching captures the savings.

ITPM is the binding constraint on Claude apps. The fix isn't a higher tier — it's a prompt whose cache-anchor stays stable across calls. Our AI Prompt Generator writes Claude-tuned prompts (Opus, Sonnet, Haiku, Fable) with the cacheable prefix up top, based on YOUR business + task. 14-day free trial, no card.

Browse all prompt tools →