AI Rate Limits & Quotas
Hitting a 429 is almost always a tier problem, not a code problem. Every page on this hub documents the exact RPM, TPM, batch cap, and tier-unlock path for one provider — sourced from the provider's live limits dashboard.
If you're picking a provider, pick the limits page that matches your traffic profile. If you're already on one, bookmark the unlock path so your next scale-up doesn't take a week.
12 pages · updated 2026
Anthropic Message Batches API Limits (2026): 100k Requests, 256MB, 24h, 50% Off
Exact limits for Anthropic's Message Batches API in 2026: 100,000 requests per batch, 256MB max payload, 24-hour processing SLA (most batches finish in under 1 hour), 29-day results retention, 50% discount on input + output + cache writes. Separate quota pool from real-time Messages. Sourced from Anthropic's batch-processing docs.
ReadAzure OpenAI Quota Management 2026: TPM, PTU, Regional Caps & Increase Requests
Canonical 2026 reference for Azure OpenAI quota: per-subscription/per-region/per-model TPM, deployment SKUs (Standard, Global Standard, Data Zone, Regional Provisioned, Global Provisioned), Quota Tiers (0-6), default gpt-5.5 / gpt-5.4 allocations, PTU sizing and hourly billing, the quota increase request form, Dynamic Quota + spillover, and Azure vs OpenAI direct migration math. Sourced from Microsoft Learn, June 2026.
ReadClaude API Rate Limits 2026: RPM, ITPM, OTPM by Tier and Model
Exact Claude API rate limits in 2026 across Tier 1, 2, 3, 4, and Custom. Per-tier RPM, ITPM (input tokens per minute), and OTPM (output tokens per minute) for Claude Fable 5, Opus 4.7, Sonnet 4.6, and Haiku 4.5. Why Anthropic splits ITPM/OTPM instead of using combined TPM, how prompt caching multiplies effective throughput, Message Batches as a separate quota pool, 429 vs 529 handling, and the Tier 4 unlock path. Sourced from Anthropic's official rate-limits documentation.
ReadDALL·E 3 Rate Limit by Tier (2026): Full IPM Table + Workarounds
Exact DALL·E 3 rate limits at every OpenAI usage tier in 2026: Free → Tier 5, images per minute, per-image prices by resolution and quality, batch and concurrency workarounds, and what to do when you hit the cap. Sourced from OpenAI's live model documentation.
ReadFireworks AI Rate Limits 2026: Developer, Enterprise, On-Demand Deployments
Exact Fireworks AI rate limits in 2026 — Developer spending-tier ladder ($50 → $50,000 monthly caps), the 6,000 RPM account-wide ceiling, per-model serverless defaults for Llama 3.3 70B, DeepSeek V3/R1, Qwen 2.5, FireFunction V2, FLUX.1, the on-demand deployment alternative (per-GPU-hour, no rate limit), Business + Enterprise upgrade path, and the 429 vs 503 distinction. Sourced from Fireworks' live docs.
ReadGemini API Rate Limits 2026: Free Tier, Paid Tiers, Per-Model Quotas
The canonical 2026 reference for Google Gemini API rate limits. Free, Tier 1, Tier 2, Tier 3 thresholds; per-model RPM, TPM, RPD on Gemini 2.5 Pro, 2.5 Flash, 2.5 Flash-Lite; AI Studio vs Vertex AI quota systems; 429 / RESOURCE_EXHAUSTED handling; Batch API + Context Caching levers. Sourced from Google's live rate-limits documentation.
ReadGroq API Rate Limits 2026: RPM, TPM, RPD, TPD per Model — Free vs Dev vs Enterprise
Exact Groq API rate limits in June 2026 across Free, Developer, and Enterprise tiers. RPM, TPM, RPD, and TPD per model for Llama 3.3 70B Versatile, Llama 3.1 8B Instant, DeepSeek R1 Distill 70B, Qwen 2.5 32B, GPT-OSS 120B, and Whisper Large v3 Turbo. Which dimension binds first, how to upgrade, and how Groq's LPU speed translates to actual throughput. Sourced from console.groq.com/docs.
ReadOpenAI Batch API Limits 2026: Per-Tier Enqueued Tokens, 200MB Files, 24h SLA
Exactly what the OpenAI Batch API allows in 2026. Per-tier enqueued-token caps (Tier 1 → 5), 200MB max file size, 50,000 requests per batch, 24-hour SLA, the 50% discount on input + output, JSONL + custom_id schema, partial-completion behavior, and the Batch-vs-real-time decision tree. Sourced from OpenAI's live Batch API documentation.
ReadOpenAI Tier 1 vs Tier 5 (2026): What Each Usage Tier Unlocks
What every OpenAI usage tier unlocks in 2026 — Free → Tier 5. Monthly caps, indicative gpt-5.5 RPM/TPM, image limits, fine-tuning, batch quotas, prompt cache eligibility, priority routing. Sourced from OpenAI's rate-limits doc.
ReadOpenAI Tier 5 Unlock Requirements (2026): The Canonical Doc
Exactly what it takes to unlock OpenAI usage Tier 5 in 2026. $1,000 paid + 30 days since first payment. Full thresholds for every tier (Free → Tier 5), monthly usage caps, rate-limit gains per tier, payment-history strategies, common stuck-at-Tier-4 traps. Sourced from OpenAI's official rate-limits page.
ReadReplicate Rate Limits 2026: Predictions/Sec, Concurrency & Cold Starts
Exact Replicate rate limits in 2026: 600 predictions/min default, per-model concurrency caps, and the 30-90s cold-start problem that dominates latency. When to switch to always-on dedicated deployments, GPU class pricing (A100 vs H100 vs L40S), webhooks for long-running predictions, and self-hosted Cog. Sourced from Replicate's live docs.
ReadTogether AI Rate Limits 2026: Build, Scale, Enterprise — Per-Model Ceilings
Exact Together AI rate limits in 2026 across Build, Scale, and Enterprise tiers. Per-model RPM/TPM for Llama 3.3 70B/8B, DeepSeek R1, Qwen 2.5, FLUX.1, BGE embeddings. When to switch from serverless to dedicated endpoints (per-GPU-hour math). 429 handling, embedding + fine-tuning quotas, batch API. Sourced from Together's live docs.
Read
Stop guessing your AI bill.
Digital Dashboard Hub turns your real spend across OpenAI, Anthropic, and Google into one live dashboard — usage, cost, budget alerts, model mix. 14 days free.
Try DDH free