By The DDH Team · Digital Dashboard Hub

OpenAI API Cost Calculator (2026)

By The DDH Team at Digital Dashboard Hub·Updated June 20, 2026

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

OpenAI charges per token. Every API call has two priced streams: input tokens (the prompt, the system message, prior turns you replay, tool definitions) and output tokens (everything the model writes back — including reasoning tokens on the o-series and tool-call arguments). Input and output are billed at different per-1M rates, with output typically 5-6x more expensive than input across every model in the GPT-5 family.

As of June 2026, the prices span a 150x range from gpt-5.4-nano ($0.20 input / $1.25 output per 1M tokens) up to gpt-5.5-pro ($30 / $180). Two discount levers dramatically change the bill: the Batch API takes 50% off both input and output for asynchronous jobs that can wait up to 24 hours, and cached-input pricing reads prompt-cache hits at ~10% of the standard input rate (a 90% discount on the cached portion).

Below: the full June-2026 price table verified against OpenAI's live pricing page, the canonical cost formula, four worked examples (1k, 100k, 1M, and a full production workload), and the FAQ that captures everything that trips teams up on their first invoice. Bookmark this — and quickly draft prompts that don't waste tokens with our free ChatGPT prompt generator. Sibling calculators: Claude API cost · Embeddings cost · Midjourney cost.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card. →

OpenAI API price per 1M tokens — June 2026

Feature	Input ($/1M)	Cached input ($/1M)	Output ($/1M)
gpt-5.5-pro	$30.00	$3.00	$180.00
gpt-5.5	$5.00	$0.50	$30.00
gpt-5.4-pro	$30.00	$3.00	$180.00
gpt-5.4	$2.50	$0.25	$15.00
gpt-5.4-mini	$0.75	$0.075	$4.50
gpt-5.4-nano	$0.20	$0.02	$1.25

Source, as of June 2026: OpenAI pricing (https://developers.openai.com/api/docs/pricing). Cached-input pricing applies to prompt-cache hits only — cache misses bill at the standard input rate. Batch API: 50% off both input and output for asynchronous jobs with up to 24-hour delivery (https://platform.openai.com/docs/guides/batch). Models not listed on the verified live page (legacy gpt-4.1 family, embeddings, o-series, whisper, TTS) are omitted from this table — see model-specific pages for those rates.

The cost formula (memorize this one)

Every OpenAI API call follows the same math. There is no platform fee, no per-call fee, no minimum. You pay for what you send and what you get back, at the model's per-1M-token rate:

``` cost = (input_tokens / 1,000,000) × input_price_per_M + (output_tokens / 1,000,000) × output_price_per_M ```

Two adjustments stack on top. First, prompt-cache hits — portions of your input prefix that OpenAI has cached because you sent them in a recent prior call — bill at the cached-input rate (~10% of standard input). Long system prompts and stable tool schemas are the typical winners; the cache is opportunistic across most SDKs and does not need code changes to activate. Second, the Batch API takes 50% off both input and output in exchange for a 24-hour-or-less delivery window. The two discounts stack: a cached, batched call on gpt-5.5 bills at $0.25 input ÷ 2 = $0.125/1M and $30 output ÷ 2 = $15/1M for the cached + batched portion. The structure of your prompts determines how much of each discount you can capture in practice.

Reasoning tokens on the o-series bill at the output rate even though they are not returned to you — a model that 'thinks' for 4,000 tokens before producing a 200-token answer bills 4,200 output tokens. Plan a 5-10x output budget on reasoning-heavy tasks.

Worked example 1: a single 1,000-in / 500-out call

Take a representative call — a 1,000-token prompt that returns a 500-token answer, roughly equivalent to a 750-word brief in and a 375-word reply out. At standard rates, the per-call cost lands as:

gpt-5.5-pro: (1000 / 1,000,000) × $30.00 + (500 / 1,000,000) × $180.00 = $0.030 + $0.090 = **$0.120 per call**.

gpt-5.5: 0.001 × $5.00 + 0.0005 × $30.00 = $0.005 + $0.015 = **$0.020 per call**.

gpt-5.4: 0.001 × $2.50 + 0.0005 × $15.00 = $0.0025 + $0.0075 = **$0.010 per call**.

gpt-5.4-mini: 0.001 × $0.75 + 0.0005 × $4.50 = $0.00075 + $0.00225 = **$0.003 per call**.

gpt-5.4-nano: 0.001 × $0.20 + 0.0005 × $1.25 = $0.0002 + $0.000625 = **$0.000825 per call**.

Notice the 145x spread between gpt-5.4-nano ($0.000825) and gpt-5.5-pro ($0.120) on identical token volumes. The right model is almost never the most expensive one — it is the cheapest tier that meets your quality bar on the actual task.

Worked example 2: 100,000 calls per month

Multiply the per-call numbers by 100,000. This is a realistic mid-size workload — daily classification on 3,000+ records, weekly summarization, a low-volume agent loop:

gpt-5.5-pro: $12,000. gpt-5.5: $2,000. gpt-5.4: $1,000. gpt-5.4-mini: $300. gpt-5.4-nano: $82.50.

Apply the Batch API discount to the gpt-5.4 row (asynchronous summarization is a textbook batch use case): $1,000 → $500. Apply prompt caching where 800 of every 1,000 input tokens are a stable system prefix that hits cache 80% of the time: those 640 cached tokens drop to $0.25/1M instead of $2.50/1M — saving 90% on 64% of input, roughly $115 off the input bill, ~12% off the total.

Stack both — the same workload runs around $400 on gpt-5.4 at 100k calls, a 60% reduction. The largest cost lever most teams ignore is not the model choice; it is failing to batch what can wait and cache what repeats.

Worked example 3: scaling to 1,000,000 calls

Now scale to 1M calls — a full-scale production workload (e.g., per-user summarization across a SaaS app with 30,000 active users running 33 calls/month each):

gpt-5.5-pro: **$120,000**. gpt-5.5: **$20,000**. gpt-5.4: **$10,000**. gpt-5.4-mini: **$3,000**. gpt-5.4-nano: **$825**.

The same Batch + cache stack on gpt-5.5 takes $20,000 → roughly $8,300 (58% off) on the same input/output mix. On gpt-5.4-mini, the same stack lands at ~$1,200 — under 1.2¢ per call at scale.

The canonical lever order for scaling cost down: (1) pick the cheapest tier that hits quality, (2) batch everything asynchronous, (3) restructure prompts so the cacheable prefix is stable, (4) cap output length where you control it. Most teams reverse the order — they tune output last when output is 5-6x the input price.

Worked example 4: a real production stack (agent loop on gpt-5.5)

An agent loop is the worst-case cost shape — the model takes multiple turns per user query, replaying the full transcript each turn. Take a typical 5-turn loop with a 2,000-token system prompt + tools, growing context 800 tokens per turn:

Turn 1: 2,800 in / 200 out. Turn 2: 3,000 in / 200 out. Turn 3: 3,200 in / 200 out. Turn 4: 3,400 in / 200 out. Turn 5: 3,600 in / 200 out. Total: 16,000 input + 1,000 output. On gpt-5.5: 0.016 × $5 + 0.001 × $30 = $0.080 + $0.030 = **$0.11 per query** — about 5.5x a single call.

Now apply caching. The 2,000-token system + tools prefix is stable across all 5 turns. If cache hits ~80% of those 2,000 tokens × 5 turns = 8,000 cached input tokens dropping from $5/1M to $0.50/1M: $0.040 → $0.004, saving $0.036 per query (33% off the bill). For 100k queries/month: from $11,000 → $7,400. Cache structure is the single highest-EV change you can make to an agent prompt. Build cache-anchored prompts free with our code prompt builder.

When to pick pro vs standard vs mini vs nano

gpt-5.5-pro ($30 / $180): high-stakes reasoning where one wrong answer is more expensive than 100 right ones — financial analysis, legal drafting, complex code synthesis with strict correctness. The 6x premium over gpt-5.5 is justified only when downstream cost-of-error dominates per-call cost.

gpt-5.5 ($5 / $30): the default for general-purpose chat, agentic workflows, content generation that ships to humans, anything you would have used GPT-4 for in 2024. Substantially higher quality than late-2024 GPT-4 at roughly half the price.

gpt-5.4-mini ($0.75 / $4.50): the sweet spot for high-volume structured-output tasks — classification, extraction, summarization, simple Q&A. Most production teams running 1M+ calls/month live here.

gpt-5.4-nano ($0.20 / $1.25): embedded use cases — autocomplete, intent detection, simple routing, internal telemetry classification. Where cost has to be measured in fractions of a cent. For a side-by-side cost across providers, see our GPT vs Claude vs Gemini calculator.

Batch API: when 50% off is actually free money

The Batch API accepts a JSONL file of requests and returns results within 24 hours, billed at half the standard input and output rates. The trade-off is latency — you cannot use it for anything synchronous a user waits on. But for offline workloads, it is one of the most under-used cost reductions on the API.

Workloads that are textbook batch wins: nightly summarization, bulk classification, fine-tune training-set generation, embedding precompute, weekly digests, daily exception reports, evaluation runs. If the deliverable is consumed asynchronously (a dashboard refresh, an email, an internal report), batch it.

Submission is a single POST with a JSONL body — each line is a standard chat completion request. OpenAI returns a job ID; poll or webhook for completion. See OpenAI's batch docs for the exact schema. Most teams that adopt batch for the right workloads cut their monthly bill by 30-50% with no quality change.

Prompt caching: how 90% off works in practice

Cached-input pricing reads prompt-cache hits at ~10% of the standard input rate. The cache is opportunistic — OpenAI computes a fingerprint of your prompt prefix and caches it server-side. Subsequent calls within the cache window (typically minutes) that share the same prefix read from cache.

The hard rule: caching is a *prefix* match, not a substring match. Put your stable system prompt, tool definitions, and any reusable few-shot examples at the start of the message array. User-specific content goes at the end. A 1,500-token cached prefix on gpt-5.5 drops from $5/1M to $0.50/1M — that is $0.0068 saved per call. At 1M calls/month, that is $6,800.

Most LLM SDKs do not require code changes to opt in — caching activates automatically once you structure prompts prefix-first. The biggest mistake we see: teams interpolate dynamic context (current date, user ID, session state) into the system prompt, which breaks every cache hit. Move that to a user message and the cache holds.

Sibling read: our prompt caching tutorial covers the structural rewrite that flips a non-caching prompt into a cache-anchored one.

OpenAI API vs ChatGPT consumer subscription: don't confuse them

OpenAI runs two completely separate billing relationships. The **API** (priced per-token in the table above, accessed via developers.openai.com and platform.openai.com) is for developers building applications. The **ChatGPT consumer subscription** (Free, Go $8/mo, Plus $20/mo, Pro $200/mo, Team, Enterprise — see our ChatGPT cost guide) is for end-users chatting in a UI. Same models underneath, distinct billing.

What this means for builders: a $20/mo ChatGPT Plus subscription does **not** include API credit. If you're building on GPT-5.5, set up API billing independently at platform.openai.com.

What it means for end-users: a maxed-out $200/mo ChatGPT Pro subscription does not give you API access either. Pro is great for interactive use; if you need to programmatically call GPT-5.5 from code, you still need an API key and pay-per-token billing.

The two relationships use the same identity (your OpenAI account) but track usage, payment methods, billing limits, and tier promotions independently. You can have a Tier 5 API account and a Free ChatGPT account on the same login, or vice-versa.

Frequent mistakes that inflate the OpenAI bill

**Mistake 1: defaulting to gpt-5.5 for everything.** Most production traffic is classification, summarization, or extraction — gpt-5.4-mini handles these at 1/7th the price with quality indistinguishable on a held-out eval. Test before you assume.

**Mistake 2: huge system prompts that never get cached.** If your system prompt interpolates anything that changes between calls (timestamps, user names, context summaries), the cache never hits. Restructure so the system prompt is static and the dynamic context lives in user messages.

**Mistake 3: not capping output.** A 200-token answer that returns 1,200 tokens because you forgot to set `max_tokens` costs 6x. On gpt-5.5-pro, that is $0.18 per call vs $0.03. Cap output length anywhere you control the consumption shape.

**Mistake 4: replaying full history every turn in a chat.** Summarize earlier turns into a compact 200-token recap once context exceeds 5,000 tokens. You will save 50-80% on input across long sessions with no perceptible quality loss.

**Mistake 5: synchronous batches.** If 1,000 records can wait 30 minutes, they can wait 24 hours. Batch them and save 50%.

Sourcing methodology and how to keep these numbers current

Every price in this guide comes from OpenAI's live pricing page at developers.openai.com/api/docs/pricing, fetched on 2026-06-20 and verified against three independent corroborating sources (community pricing aggregators, recent integration commits in popular open-source projects, the public OpenAI cookbook). When a number could not be verified against the official page, it was omitted — we'd rather ship a guide missing a row than ship a guide with a fabricated number.

OpenAI does not version their pricing page with explicit changelog entries. They push changes silently. We've seen 3-5 price moves per year on average since 2024 — some downward (model upgrades that include price cuts), some upward (regional residency uplifts, new premium tiers). The single biggest practical hazard: assuming a price you sourced in Q1 still holds in Q3.

**How to verify before you budget**: open developers.openai.com/api/docs/pricing in an incognito window (no logged-in session interfering with rendering), copy the numbers for your target models into a spreadsheet, compare against this guide. If they match, this guide is current for your purposes. If they don't, trust the live page. Re-verify quarterly if your monthly bill is over $1,000 — at that volume, a single price move shifts the budget materially.

**Why we omitted some rows**: certain models commonly cited in third-party guides (notably the gpt-4.1 family, embeddings text-embedding-3-large/small, Whisper transcription, TTS) did not appear on the verified live pricing page snapshot from 2026-06-20. Community references list rates for these but with inconsistent versioning. Rather than propagate possibly-stale numbers, we omit them here — for embeddings specifically, see our Embeddings cost calculator which sources from each provider directly.

**Reproducible methodology**: the GEO Playbook that drove this guide (sibling project, 2026-06-19) explicitly mandates curl-verification before publishing any $ value. Every row in the table above has a citation; every worked example uses those rows; every FAQ answer reflects them. If you find a discrepancy with the live page, treat the live page as canonical and tell us — we re-fetch and update.

How to estimate any OpenAI API call cost in 5 steps

1
Estimate your input tokens
Take your prompt's character count and divide by 4, or its word count and divide by 0.75. Rule of thumb: 1 token ≈ 4 characters ≈ 0.75 English words. A 500-word system prompt + a 200-word user message is roughly (500 + 200) ÷ 0.75 ≈ 933 input tokens.
→ Open the ChatGPT prompt generator
2
Estimate your output tokens
Estimate output the same way — words ÷ 0.75. Output usually drives cost because output prices are 5-6x input on every GPT-5 model. If you set a `max_tokens` cap, that is your worst-case ceiling. Use it to budget conservatively.
3
Look up the input and output price per 1M
From the table above (verified June 2026): gpt-5.5 $5.00 / $30.00, gpt-5.4 $2.50 / $15.00, gpt-5.4-mini $0.75 / $4.50, gpt-5.4-nano $0.20 / $1.25. Always check the live page before shipping — prices change.
4
Apply the cost formula
cost = (input_tokens / 1,000,000) × input_price + (output_tokens / 1,000,000) × output_price. A 1,000-in / 500-out call on gpt-5.4-mini = 0.001 × $0.75 + 0.0005 × $4.50 = $0.00075 + $0.00225 = $0.003.
5
Apply caching + batch discounts
Cached input bills at ~10% of standard. Batch API takes 50% off both streams. They stack. A cached + batched gpt-5.5 call pays $0.25/1M on the cached input portion ÷ 2 = $0.125/1M, and $30/1M output ÷ 2 = $15/1M output. Match each discount to the actual shape of your workload.

Digital Dashboard Hub

The prompt patterns above work 10x better when they live in a library you actually own — tunable to your niche, exportable to GPT-5, Claude, Gemini, Perplexity, Midjourney, Llama. Stop pasting across 6 tools.

Try DDH's AI Prompt Builder — free 14 days, no card. →

Related calculators

OpenAI Pricing Calculator →GPT-5.5, 5.4, mini, nano — full per-call cost in one input.Claude Pricing Calculator →Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5 — input + output combined.Context Window Comparison →Max input length and price per 1M for every current model.

Related prompt tools

ChatGPT prompt generator (token-tight)→Code prompt builder (cache-anchored)→Claude API cost calculator→Embeddings cost calculator→

Frequently Asked Questions

How much does the OpenAI API cost per 1 million tokens in 2026?

As of June 2026, OpenAI's flagship gpt-5.5 charges $5.00 per 1M input tokens and $30.00 per 1M output tokens. gpt-5.5-pro is $30 / $180. gpt-5.4 is $2.50 / $15.00. gpt-5.4-mini is $0.75 / $4.50. gpt-5.4-nano is $0.20 / $1.25. Cached-input tokens bill at ~10% of the standard input rate. Source: OpenAI's live pricing page.

How much does GPT-5.5 cost per call for a 1,000-in / 500-out request?

(1000 / 1,000,000) × $5.00 + (500 / 1,000,000) × $30.00 = $0.005 + $0.015 = $0.020 per call on gpt-5.5. The same call costs $0.120 on gpt-5.5-pro and $0.000825 on gpt-5.4-nano — a 145x spread on identical token volumes.

What is the OpenAI Batch API discount?

The Batch API takes 50% off both input and output token prices for asynchronous jobs that can wait up to 24 hours for completion. It accepts a JSONL file of requests and returns results via webhook or polling. Best for nightly summarization, bulk classification, embedding precompute, training-set generation — anything not consumed synchronously.

How much does cached-input pricing save?

Cached-input tokens — portions of your prompt prefix that hit OpenAI's prompt cache — bill at roughly 10% of the standard input rate, a 90% discount on the cached portion. The cache is opportunistic and prefix-only: put stable system prompts and tool definitions first, dynamic user content last. A 1,500-token cached prefix on gpt-5.5 saves $0.0068 per call vs uncached.

Can I stack Batch API + cached input?

Yes. The discounts stack multiplicatively. A cached + batched gpt-5.5 call pays $0.25/1M (cached input rate) ÷ 2 (batch) = $0.125/1M on the cached input portion, and $30/1M ÷ 2 = $15/1M on output. The same workload that costs $20,000/month at standard rates can run ~$8,300/month with both discounts applied.

What is the cheapest OpenAI model in 2026?

gpt-5.4-nano at $0.20 input / $1.25 output per 1M tokens — about $0.000825 per 1,000-in / 500-out call. Best for embedded use cases: autocomplete, intent classification, simple routing, internal telemetry. Avoid it for anything that requires multi-step reasoning.

Why do reasoning tokens cost more on the o-series?

Reasoning tokens on o-series models (o4-reasoning, o4-mini-reasoning) bill at the output rate even though they are not returned to you. A model that 'thinks' for 4,000 tokens before producing a 200-token answer bills 4,200 output tokens. Plan a 5-10x output budget on reasoning-heavy tasks vs straight chat tasks.

How do I reduce my OpenAI API bill without changing the model?

Five levers, in order of EV: (1) cap output length where you can — it's 5-6x the input price; (2) structure prompts prefix-first so caching activates; (3) batch any non-synchronous workload for 50% off; (4) summarize chat history past 5,000 tokens instead of replaying it; (5) move from full system prompts to per-task system prompts so the cacheable prefix stays stable.

Stop overpaying. Write prompts built for the model you're billing.

Our AI Prompt Generator writes GPT-5-tuned prompts based on YOUR business + task — front-loaded for cache, capped for output, sized for the cheapest tier that works. 14-day free trial, no card.

Browse all prompt tools →