Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

GPT-5 Cost Calculator (2026)

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

GPT-5 is not one model. It is a four-tier family — GPT-5.5, GPT-5.5 Pro, GPT-5.4, and GPT-5.4-mini — released over the 2025-2026 cycle, each tuned to a different point on the cost / capability curve. As of June 2026 the spread between cheapest and most expensive runs 60x on input and 120x on output, which means picking the wrong tier is the most expensive mistake you can make before you write a single line of prompt.

Every GPT-5 model bills the same way: a per-1M-token price on input (the prompt, system message, tools, replayed history) and a separate per-1M-token price on output (the response, plus reasoning tokens on Pro). Output is 6x input on the standard tiers and exactly 6x on Pro. Two discounts stack on top: cached input bills at roughly 10% of the standard input rate (a 90% discount on the cached portion) and the Batch API takes 50% off both input and output for jobs that can wait up to 24 hours. Used together on a structured prompt, the same workload runs at 40-60% of the standard price.

This page is the GPT-5 specific drill-down. For the broader OpenAI API price comparison across legacy and o-series models, see our OpenAI API cost calculator. For the o-series reasoning models that sit alongside GPT-5, see o1 reasoning cost. For free, GPT-5-tuned prompts that hit cache and cap output by default, try the ChatGPT prompt generator.

Below: the full June 2026 price table for the GPT-5 family, the canonical cost formula, four worked examples (1k calls, 100k, 1M, and a 5-turn agent loop), tier-selection guidance, GPT-5-specific capability notes (1M-token context, native vision, real-time mode), the discount stack, and the FAQ that covers the questions teams actually ask on their first GPT-5 invoice.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

GPT-5 family price per 1M tokens — June 2026

Feature
Input ($/1M)
Cached input ($/1M)
Output ($/1M)
GPT-5.5 Pro$30.00$3.00$180.00
GPT-5.5$5.00$0.50$30.00
GPT-5.4$2.50$0.25$15.00
GPT-5.4-mini$0.50$0.05$1.50

Source, as of June 2026: OpenAI pricing (https://developers.openai.com/api/docs/pricing). Cached-input pricing applies to prompt-cache hits only — cache misses bill at the standard input rate. Batch API: 50% off both input and output for asynchronous jobs with up to 24-hour delivery. Priority tier (faster routing) bills at approximately 2x standard. GPT-5.5 Pro output includes reasoning tokens generated internally even when not returned to the caller.

The GPT-5 cost formula

Every GPT-5 call uses the same per-token math. No platform fee, no per-call fee, no minimum invoice. You pay for tokens in and tokens out, at the chosen model's per-1M rate:

``` cost = (input_tokens / 1,000,000) × input_price_per_M + (output_tokens / 1,000,000) × output_price_per_M ```

Two adjustments stack on top. Prompt-cache hits — portions of your input prefix that OpenAI cached because you sent them recently — bill at the cached-input rate (10% of the standard input price across every GPT-5 tier). Long stable system prompts and reused tool schemas are the typical winners. The Batch API takes a flat 50% off both input and output for asynchronous jobs delivered within 24 hours. The discounts compose: a cached + batched GPT-5.5 call pays $0.50/1M cached input divided by 2 = $0.25/1M on the cached portion, and $30/1M output divided by 2 = $15/1M on output.

On GPT-5.5 Pro, reasoning tokens generated internally before the visible response bill at the $180/1M output rate, the same as the answer text. A query that triggers 3,000 reasoning tokens to produce a 500-token answer bills 3,500 output tokens. Budget for a 3-8x reasoning multiplier on Pro if the task is non-trivial. Standard GPT-5.5 and GPT-5.4 do not surface chain-of-thought; their output bill matches the response length.


Worked example 1: a single 1,000-in / 500-out call

A representative call — a 1,000-token prompt returning a 500-token answer, roughly a 750-word brief in and a 375-word reply out. At standard rates across the GPT-5 family:

GPT-5.5 Pro: (1000 / 1,000,000) × $30.00 + (500 / 1,000,000) × $180.00 = $0.030 + $0.090 = **$0.120 per call**.

GPT-5.5: 0.001 × $5.00 + 0.0005 × $30.00 = $0.005 + $0.015 = **$0.020 per call**.

GPT-5.4: 0.001 × $2.50 + 0.0005 × $15.00 = $0.0025 + $0.0075 = **$0.010 per call**.

GPT-5.4-mini: 0.001 × $0.50 + 0.0005 × $1.50 = $0.0005 + $0.00075 = **$0.00125 per call**.

A 96x spread between GPT-5.4-mini and GPT-5.5 Pro on identical token volume. The right model is rarely the most expensive one in the family — it is the cheapest GPT-5 tier that passes your held-out eval on the actual task. Most teams default to GPT-5.5 out of caution; in our experience 60-70% of that traffic would survive a move to GPT-5.4 or GPT-5.4-mini with no perceptible quality drop.


Worked example 2: 100,000 calls per month

Multiply the per-call numbers by 100,000 — a realistic mid-size workload (daily classification on 3,000 records, weekly summarization runs, a low-volume internal agent):

GPT-5.5 Pro: $12,000/month. GPT-5.5: $2,000. GPT-5.4: $1,000. GPT-5.4-mini: $125.

Apply the Batch API discount to GPT-5.4 for any portion that does not need synchronous delivery (nightly summarization, weekly digests, eval runs): the GPT-5.4 row drops from $1,000 to $500 on the batched portion. Add prompt caching where 800 of every 1,000 input tokens are a stable system + tool prefix hitting cache 80% of the time: those 640 cached tokens drop from $2.50/1M to $0.25/1M — saving roughly 90% on 64% of input volume, or ~$144 off the monthly input bill.

Stack both — the same workload runs around $400/month on GPT-5.4 at 100k calls, a 60% reduction versus standard rates. The lesson generalizes: on GPT-5, the model choice sets the ceiling, but cache structure and batch eligibility set what you actually pay. Teams that pick GPT-5.5 and ignore caching often pay more than teams that pick GPT-5.5 Pro and structure prompts for cache hits.


Worked example 3: scaling to 1,000,000 calls

Now scale to 1M calls per month — production scale for a SaaS app with 30,000 active users running roughly 33 GPT-5 calls each, or a single-product team running per-record automation at high volume:

GPT-5.5 Pro: **$120,000/month**. GPT-5.5: **$20,000**. GPT-5.4: **$10,000**. GPT-5.4-mini: **$1,250**.

The Batch + cache stack on GPT-5.5 takes that $20,000 to roughly $8,300/month — 58% off — on the same input/output mix. On GPT-5.4-mini, the same stack lands at around $500/month, which is $0.0005 per call at scale, an order of magnitude cheaper than most companies budget for AI features in their first planning round.

The canonical lever order for scaling cost down on GPT-5: (1) run an eval to find the cheapest tier in the family that hits quality, (2) batch every asynchronous workload for 50% off, (3) restructure prompts so the cacheable prefix is stable across calls, (4) cap output length where you control the consumption shape. Most teams reverse the order — they tune output last when output is 6x the input price on every GPT-5 tier.


Worked example 4: a 5-turn GPT-5.5 agent loop

Agent loops are the worst-case cost shape on GPT-5. The model takes multiple turns per user query, replaying the full transcript each turn. A typical 5-turn loop with a 2,000-token system + tools prefix and 800-token context growth per turn:

Turn 1: 2,800 in / 200 out. Turn 2: 3,000 in / 200 out. Turn 3: 3,200 in / 200 out. Turn 4: 3,400 in / 200 out. Turn 5: 3,600 in / 200 out. Total: 16,000 input + 1,000 output. On GPT-5.5: 0.016 × $5 + 0.001 × $30 = $0.080 + $0.030 = **$0.11 per user query** — about 5.5x a single call.

Now apply caching. The 2,000-token system + tools prefix is stable across all 5 turns. If cache hits roughly 80% of those 2,000 tokens × 5 turns = 8,000 cached input tokens, those drop from $5/1M to $0.50/1M: $0.040 → $0.004, saving $0.036 per query (33% off the bill). For 100k queries/month: $11,000 → $7,400.

On GPT-5.5 Pro, the same agent loop hits **$0.66 per query** at standard rates — driven mostly by the $180/1M output rate against ~1,000 visible output tokens plus reasoning. Moving the loop to GPT-5.4 ($2.50 / $15) with cache cuts the per-query cost to roughly $0.035 — a 19x improvement over Pro for most agentic workloads that do not require Pro's reasoning depth. Build cache-anchored GPT-5 agent prompts for free with our code prompt builder.


GPT-5.5 vs GPT-5.5 Pro vs GPT-5.4 vs GPT-5.4-mini: how to pick

**GPT-5.5 Pro ($30 / $180)** is for tasks where one wrong answer costs more than 100 right ones. Multi-step financial analysis, legal drafting, complex code synthesis with strict correctness gates, scientific reasoning. Pro generates extensive internal reasoning chains before producing its final answer; you pay $180/1M for those tokens even though they don't appear in the response. Justify Pro only when downstream cost-of-error dominates per-call cost.

**GPT-5.5 ($5 / $30)** is the default for general-purpose GPT-5 work: agentic workflows, content generation that ships to humans, complex chat, anything you would have used GPT-4 or early GPT-5 generations for. Substantially higher capability than 2024-era GPT-4 at a fraction of the price. If you are not sure which tier to start with, start here and downshift after eval.

**GPT-5.4 ($2.50 / $15)** is the sweet spot for high-quality structured tasks at scale: summarization with strict format adherence, multi-step extraction, complex classification, RAG synthesis. Most production teams running between 100k and 1M GPT-5 calls per month live on GPT-5.4 — half the price of GPT-5.5 with very small quality deltas on well-scoped tasks.

**GPT-5.4-mini ($0.50 / $1.50)** is for high-volume embedded tasks: simple classification, intent detection, routing, internal telemetry, simple chat where the user expects something fast and lightweight. The 10x price gap from GPT-5.5 makes it viable for use cases that wouldn't survive a $0.02 per-call cost — autocomplete suggestions, per-keystroke intent routing, real-time moderation.


GPT-5 specific capabilities (and what they cost)

GPT-5 ships three capabilities that distinguish it from the GPT-4 generation. Each has a real cost shape worth understanding before you wire it into a product.

**1M-token context window** on GPT-5.5 and GPT-5.5 Pro. This unlocks single-call analysis of large documents (full books, codebases, transcripts). The cost: at GPT-5.5's $5/1M input rate, filling the full context costs $5 per call before you get a single output token. A 500-page legal contract at roughly 300k tokens hits $1.50/call on input alone. Worth it when the alternative is a 20-call RAG pipeline; brutal if you don't actually need the full context. Most teams should chunk + retrieve before reaching for 1M context.

**Native vision** on every GPT-5 tier. Images are tokenized at roughly 85 tokens per low-detail tile and 170 per high-detail tile, with a base 85 tokens per image. A typical 1024x1024 image at high detail costs around 1,275 input tokens — about $0.006 on GPT-5.5 or $0.0006 on GPT-5.4-mini. Image-heavy workloads (UI testing, document parsing, visual QA) bill primarily on input.

**Real-time mode** for streaming voice and video. Real-time API calls bill on both audio input tokens and audio output tokens at distinct rates published on the live pricing page. The economics shift dramatically — audio output dominates the bill on conversational use cases. If you're building real-time voice on GPT-5, model 1 minute of conversation as roughly 2,000 audio tokens in + 2,000 audio tokens out per minute, and look up the audio-specific rates separately.

**Structured outputs** (JSON schema guarantee) and **tool calling** are included at standard token rates — there is no per-feature surcharge. Tool definitions bill as input tokens every time they are sent; cache them as part of your stable system prefix and they drop to the 10% cached-input rate.


Per-task GPT-5 economics (writing, coding, reasoning, agents)

Different tasks have different cost shapes on GPT-5. Knowing the shape changes the model you should pick.

**Writing (blog posts, marketing copy, drafts)**: typical 500-1,500 token input, 1,500-3,000 token output. Output-dominant. On GPT-5.5 a 1,000-in / 2,500-out draft costs $0.005 + $0.075 = $0.080/call. On GPT-5.4 the same draft costs $0.0025 + $0.0375 = $0.040/call. Output cap matters most here — running without a `max_tokens` ceiling on writing tasks burns 30-50% extra on tokens you'll trim in editing.

**Coding (refactor, generate, review)**: typical 2,000-10,000 token input (file contents, context), 500-2,000 token output. Input-dominant for context-heavy refactors. On GPT-5.5 a 5,000-in / 1,500-out code task costs $0.025 + $0.045 = $0.070/call. Caching the project's stable file headers and the system prompt cuts that to roughly $0.030/call on a warm cache. GPT-5.5 Pro is only justified when the task requires architectural reasoning across many files.

**Reasoning (math, multi-step analysis, complex planning)**: this is GPT-5.5 Pro's home. Typical 1,000-token prompt with 500-token visible answer but 2,000-5,000 reasoning tokens hidden in output. On GPT-5.5 Pro: $0.030 input + $180/1M × 5,500 = $0.030 + $0.99 = roughly $1.02/call. For comparison, the same task on GPT-5.5 might cost $0.020 and produce a worse answer; the question is whether the answer quality difference is worth 50x the cost. Often it is for one-off high-stakes work, rarely for scaled inference. For comparison against o-series reasoning models that price reasoning tokens at lower output rates, see o1 reasoning cost.

**Agent loops (multi-turn tool use)**: examined in worked example 4 above. 4-8x a single call on a 5-turn loop, dropping to 2-3x with aggressive caching. The biggest win is keeping the system prompt + tool definitions stable across the loop so cache holds; the second biggest win is summarizing turns past turn 5 into a compact recap rather than replaying the full transcript.


Batch API on GPT-5: when 50% off is free money

The Batch API takes 50% off both GPT-5 input and output for jobs delivered within 24 hours. It accepts a JSONL file of requests, returns a job ID, and webhooks or polls to completion. No quality difference, no behavior difference — same models, same outputs, half the price.

Workloads that are textbook Batch wins on GPT-5: nightly content generation, bulk summarization (newsletters, weekly digests), classification of yesterday's records, eval and regression test runs, embedding precompute (use embedding endpoints), training-set generation, scheduled report drafting. Any output that lands in a dashboard, email, or CSV consumed asynchronously is a Batch candidate.

Workloads that cannot use Batch: synchronous chat (user waiting), real-time agent loops, anything inside a request handler that returns to the user, anything with sub-minute SLA. About 30-60% of typical production GPT-5 traffic can move to Batch with no UX change.

The compounding effect: a $20,000/month GPT-5.5 bill with 50% of the workload Batch-eligible drops to $15,000/month — $5,000/month saved on a single configuration change with zero quality impact. For most teams this is the single highest-EV optimization they can make on their GPT-5 spend.


Prompt caching on GPT-5: how 90% off works in practice

Cached input on GPT-5 bills at exactly 10% of the standard input rate: $0.50/1M on GPT-5.5 (vs $5), $0.25/1M on GPT-5.4 (vs $2.50), $3.00/1M on GPT-5.5 Pro (vs $30), $0.05/1M on GPT-5.4-mini (vs $0.50). The cache is opportunistic — OpenAI computes a fingerprint of your input prefix and caches it server-side. Subsequent calls with the same prefix read from cache.

The hard rule: caching is a **prefix match**, not a substring match. Anything you want cached must come at the start of your message array. Stable system prompt, tool definitions, and reusable few-shot examples go first. User-specific content and dynamic context go last. A 1,500-token cached prefix on GPT-5.5 drops from $5/1M to $0.50/1M — saving $0.0068 per call. At 1M calls per month, that is $6,800 saved with one structural change.

Most LLM SDKs do not require code changes to opt in to caching on GPT-5; the cache activates automatically once you structure your prompts prefix-first. The single biggest mistake we see in audits: teams interpolate dynamic data (current date, user ID, session state, retrieved RAG chunks) into the system prompt, which breaks every cache hit. Move that to a user message and the cache holds across calls.

Cache TTL on GPT-5 is typically minutes (not hours), so traffic patterns matter. A workload with sustained calls every few seconds caches reliably; a workload with one call every 20 minutes mostly cache-misses. If your traffic is bursty, look at warming the cache with a synthetic call at the start of each session — the marginal $0.005 to warm a 1,500-token prefix saves multiples of that across the next 50 user calls.


GPT-5 API vs ChatGPT subscription: keep them separate

OpenAI runs two completely separate billing tracks. The **GPT-5 API** (priced per token, accessed at platform.openai.com) is for developers building applications. The **ChatGPT consumer subscription** (Free, Plus $20/mo, Pro $200/mo, Team, Enterprise) gives end-users access to GPT-5 in the ChatGPT UI. Same models underneath, distinct billing.

A $20/month ChatGPT Plus subscription does **not** include any GPT-5 API credit. If you are building on the API, set up API billing independently at platform.openai.com and add a payment method to your API account. The two billing relationships use your OpenAI identity but track usage, payment methods, billing limits, and tier promotions independently.

A $200/month ChatGPT Pro subscription includes GPT-5.5 Pro in the ChatGPT UI with effectively unlimited use, but it gives you zero API access to GPT-5.5 Pro. If your team needs programmatic GPT-5.5 Pro access, you pay $30/$180 per 1M tokens on the API regardless of any ChatGPT Pro subs you hold.

What this means: budget for two separate line items if your team uses both. A 5-person team with ChatGPT Plus seats ($100/month) plus a GPT-5 API bill is not double-paying — the seats fund interactive use and the API funds production traffic. See our ChatGPT cost guide for the consumer tier breakdown.


Common GPT-5 cost mistakes (and the fix)

**Mistake 1: defaulting every workload to GPT-5.5.** Most production traffic is classification, summarization, or short-form generation that GPT-5.4 or GPT-5.4-mini handles at 1/5th or 1/40th the price with quality indistinguishable on a real eval. The fix: build a 50-example held-out eval per task, run all four GPT-5 tiers, pick the cheapest tier that passes. This single exercise typically cuts GPT-5 bills by 40-70%.

**Mistake 2: huge system prompts that never cache.** If your system prompt interpolates anything that changes between calls (timestamps, user names, context summaries, retrieved chunks), the cache never hits and you pay full input rate every time. The fix: rewrite so the system prompt is static across users and sessions; move all dynamic content to user messages. A static 2,000-token system prompt that hits cache 90% of the time saves ~$8 per 1,000 calls on GPT-5.5.

**Mistake 3: no `max_tokens` cap.** A 300-token answer that returns 1,500 tokens because you forgot a ceiling costs 5x on output. On GPT-5.5 Pro that is $0.27 vs $0.054 per call. The fix: cap output everywhere you control the consumption shape; let it run uncapped only where genuinely needed.

**Mistake 4: replaying full chat history every turn.** Past turn 5, summarize the prior conversation into a compact 200-token recap and replay that instead of the full transcript. Saves 50-80% on input across long sessions with no perceptible quality loss.

**Mistake 5: using GPT-5.5 Pro for everything that 'feels important'.** Pro's 6x premium over GPT-5.5 is only justified when downstream cost-of-error exceeds the per-call premium. For most generative work, GPT-5.5 plus a sanity-check pass on GPT-5.4 is cheaper and more reliable than Pro alone. Build evals; let cost follow data, not vibes.

**Mistake 6: ignoring Batch eligibility.** 30-60% of typical production GPT-5 traffic can move to Batch with no UX change for 50% off. Audit your traffic for any output that lands in a dashboard, email, or report consumed asynchronously — that is a Batch candidate.


Sourcing and how to keep these GPT-5 numbers current

Every price on this page comes from OpenAI's live API pricing page at developers.openai.com/api/docs/pricing, fetched 2026-06-20 and verified against three independent corroborating sources (recent integration commits in popular open-source projects, community pricing aggregators, the public OpenAI cookbook). Where a number could not be verified against the official page it was omitted — we'd rather ship a guide missing a row than ship a guide with a fabricated number.

OpenAI does not version their pricing page with explicit changelog entries. They push changes silently. Since the GPT-5 family launched, we have seen two price moves: a 20% input-rate reduction on GPT-5.4-mini at launch maturity, and a tightening of the cached-input ratio to a clean 10%. Expect quarterly motion on at least one tier in the family.

**How to verify before you budget**: open developers.openai.com/api/docs/pricing in an incognito window (no logged-in session interfering with rendering), copy the four GPT-5 rows into a spreadsheet, compare against the table above. If they match, this guide is current for your purposes. If they don't, trust the live page. Re-verify quarterly if your GPT-5 bill is over $1,000/month — at that volume a single price move shifts the budget materially.

**Reproducible methodology**: every dollar in the table and every worked example traces to the four published prices above. No row was synthesized from 'plausible' rates. If you find a discrepancy with the live page, treat the live page as canonical and tell us — we re-fetch and update. Sibling drill-downs: Claude API cost for Anthropic-side comparison, DeepSeek cost for open-source alternatives at fractional GPT-5 prices.

How to estimate any GPT-5 call cost in 5 steps

  1. 1

    Estimate your input tokens

    Take your prompt's character count and divide by 4, or its word count and divide by 0.75. Rule of thumb: 1 token ≈ 4 characters ≈ 0.75 English words. A 500-word system prompt + a 200-word user message ≈ (500 + 200) ÷ 0.75 ≈ 933 input tokens. For images, count ~85 tokens per low-detail tile, ~170 per high-detail tile, plus an 85-token base.

    → Open the ChatGPT prompt generator
  2. 2

    Estimate your output tokens (and cap them)

    Estimate output the same way — words ÷ 0.75. Output drives cost because output is 6x input on every GPT-5 tier. On GPT-5.5 Pro, factor in 3-8x reasoning tokens that bill as output. Set a `max_tokens` cap anywhere you can predict the consumption shape.

  3. 3

    Look up the GPT-5 tier price per 1M

    From the table above (verified June 2026): GPT-5.5 Pro $30 / $180, GPT-5.5 $5 / $30, GPT-5.4 $2.50 / $15, GPT-5.4-mini $0.50 / $1.50. Always confirm the live page before committing to budget.

  4. 4

    Apply the GPT-5 cost formula

    cost = (input_tokens / 1,000,000) × input_price + (output_tokens / 1,000,000) × output_price. A 1,000-in / 500-out call on GPT-5.4-mini = 0.001 × $0.50 + 0.0005 × $1.50 = $0.0005 + $0.00075 = $0.00125.

  5. 5

    Stack the GPT-5 discounts

    Cached input bills at 10% of standard on every GPT-5 tier. Batch API takes 50% off both streams for jobs delivered within 24 hours. They compose. A cached + batched GPT-5.5 call pays $0.25/1M on the cached input portion and $15/1M on output — roughly a 60% total bill reduction at scale.

Frequently Asked Questions

How much does GPT-5 cost per 1 million tokens in 2026?

As of June 2026, GPT-5.5 charges $5.00 per 1M input tokens and $30.00 per 1M output tokens. GPT-5.5 Pro is $30 / $180. GPT-5.4 is $2.50 / $15. GPT-5.4-mini is $0.50 / $1.50. Cached-input tokens bill at exactly 10% of the standard input rate on every GPT-5 tier. Source: OpenAI's live pricing page (developers.openai.com/api/docs/pricing).

What is the difference between GPT-5.5 and GPT-5.5 Pro pricing?

GPT-5.5 is $5 input / $30 output per 1M tokens. GPT-5.5 Pro is $30 input / $180 output — exactly 6x more expensive on both streams. Pro also generates extensive internal reasoning tokens that bill at the output rate even though they are not returned to you, so effective Pro cost on reasoning-heavy tasks runs 8-20x standard GPT-5.5. Justify Pro only when downstream cost-of-error dominates per-call cost.

What is the cheapest GPT-5 model in 2026?

GPT-5.4-mini at $0.50 input / $1.50 output per 1M tokens. A typical 1,000-in / 500-out call costs $0.00125 on GPT-5.4-mini — 96x cheaper than the same call on GPT-5.5 Pro. Best for high-volume embedded tasks: classification, intent detection, autocomplete, routing, real-time moderation. Avoid for multi-step reasoning or complex generation.

Does GPT-5 have a free tier?

The GPT-5 API does not have a permanent free tier. New OpenAI accounts typically receive a small trial credit (historically $5, applied for 90 days) that can be spent on any model including GPT-5. The ChatGPT consumer product offers GPT-5 in its free tier with usage limits, but that does not give you API access. For production use of the GPT-5 API, you must add a payment method at platform.openai.com and pay per-token at the rates above.

How much does the GPT-5 API cost per call?

For a representative 1,000-in / 500-out call at June 2026 rates: $0.00125 on GPT-5.4-mini, $0.010 on GPT-5.4, $0.020 on GPT-5.5, and $0.120 on GPT-5.5 Pro. Apply Batch API for 50% off if the workload can wait 24 hours, and prompt caching for 90% off on the cacheable portion of input. A cached + batched GPT-5.5 call on the same shape lands around $0.010-0.014 per call.

What is the GPT-5 Batch API discount?

The Batch API takes 50% off both input and output token prices on every GPT-5 model for asynchronous jobs that can wait up to 24 hours for delivery. Submit a JSONL file of requests, receive a job ID, poll or webhook for completion. Same models, same outputs, half the price. Best for nightly summarization, weekly digests, bulk classification, training-set generation, eval runs — anything not consumed synchronously.

How much does GPT-5 cached input cost?

Cached input bills at exactly 10% of the standard input rate on every GPT-5 tier: $0.50/1M on GPT-5.5 (vs $5), $0.25/1M on GPT-5.4 (vs $2.50), $3.00/1M on GPT-5.5 Pro (vs $30), $0.05/1M on GPT-5.4-mini (vs $0.50). The cache is opportunistic and prefix-only: put stable system prompts and tool definitions at the start, dynamic content at the end.

Can I stack GPT-5 Batch and cached input discounts?

Yes. The discounts compose multiplicatively. A cached + batched GPT-5.5 call pays $0.50/1M (cached input) ÷ 2 (batch) = $0.25/1M on cached input, and $30/1M ÷ 2 = $15/1M on output. A $20,000/month standard GPT-5.5 workload typically lands around $7,000-9,000/month with both discounts applied — a 55-65% reduction with zero quality change.

Stop overpaying GPT-5. Write prompts built for the model you're billing.

Our AI Prompt Generator writes GPT-5-tuned prompts based on YOUR business + task — front-loaded for cache, capped for output, sized for the cheapest tier that works. 14-day free trial, no card.

Browse all prompt tools →