Per-token price is at parity — but PTU minimums change the math
The first myth to retire: 'Azure charges more per token.' In June 2026 this is false. The published per-million-token rates on Azure OpenAI Service exactly match OpenAI direct for every shared model — gpt-5.4 at $2.50 input / $15.00 output, gpt-5.4-mini at $0.25 / $2.00, gpt-5-nano at $0.05 / $0.40, the embeddings line at $0.13 / 1M for text-embedding-3-large. Cached-input pricing is also at parity (50% discount on both). If your workload is pure pay-as-you-go, you pay the same number for the same number of tokens against the same model on either platform.
Where the math diverges is PTU — Azure's Provisioned Throughput Unit deployment model. A PTU buys you a reserved slice of model capacity with guaranteed throughput, predictable latency, and isolation from other tenants. The standard PTU SKU starts at 100 PTU minimum, priced at roughly $8-12 per PTU per hour (varies by model and region — gpt-5.4 runs higher than gpt-5.4-mini, East US 2 is cheaper than EU regions). 100 PTU × $10/hr × 730 hrs/month ≈ **$73,000/month** at list, or **$24,000-30,000/month** with typical 1-year reserved commitments.
PTUs are billed for **reserved capacity, not consumed tokens**. If you deploy 100 PTU and only use 40% of it, you still pay for 100. That idle-capacity waste is the hidden Azure premium that no published comparison surfaces. Teams that try Azure without doing the utilization math end up paying $30k/month for what would have been $12k of pay-as-you-go traffic on OpenAI direct.
The decision rule: PTU only makes sense if (a) you have predictable, sustained throughput well above the PTU floor, (b) the latency / capacity guarantees are load-bearing for your product (real-time voice, low-latency agentic loops), or (c) you have a Microsoft EA discount large enough to absorb the commitment. Below ~$50k/month of consistent OpenAI spend, the PTU math almost never pencils.