What the IPM ceiling actually means
The IPM (images per minute) limit is enforced at the organization level — every API key in the same OpenAI org shares the bucket. A 5,000 IPM ceiling at Tier 3 is the *total* across all keys, all environments, all team members. Burst above the limit returns HTTP 429 with a `retry-after` header indicating when to retry.
DALL·E 3 has no separate tokens-per-minute (TPM) limit the way chat models do — image generation is gated purely on IPM. This matters when you compare to the newer per-token image models (gpt-image-1.5, gpt-image-2) which have both IPM and TPM ceilings. For high-volume cost-sensitive image workloads, DALL·E 3's pure-IPM model is simpler to plan against.
Burst tolerance is short. The 5,000 IPM ceiling at Tier 3 does not mean you can do 5,000 images in second 1 then idle for 59 seconds — the limit is a token-bucket-style enforcement that fills at the steady rate. Sustained traffic at 80-90% of your ceiling is typical for production teams; spikes to 100% usually return 429s on at least some requests.