Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

OpenAI API Pricing 2026: The Full Per-Model Cost Table

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

OpenAI charges per token, quoted in dollars per 1,000,000 tokens, and bills input and output separately. As of June 2026, the GPT-5.5 family sits at $5.00 input / $30.00 output per 1M tokens for the standard tier and $30.00 / $180.00 for gpt-5.5-pro, while the lighter gpt-5.4-nano runs $0.20 / $1.25 — a 150x spread between the cheapest and most expensive flagship endpoints. Output is almost always 5-6x more expensive than input on every model in the lineup.

Two discount levers materially change the bill: the Batch API knocks 50% off both input and output for asynchronous jobs that can wait up to 24 hours, and cached-input pricing reads prompt-cache hits at roughly 10% of the standard input rate. Below is the full price table sourced from OpenAI's live pricing page, then worked examples that translate the numbers into actual dollars per 1k, 100k, and 1M calls. Confirm rates against the OpenAI pricing page before you budget — these change often. Quick-estimate your own workload with our AI prompt cost calculator, or grab the free 2026 LLM pricing PDF cheat sheet for a printable reference.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

OpenAI API price per 1M tokens — June 2026

Feature
Input ($/1M)
Cached input ($/1M)
Output ($/1M)
gpt-5.5-pro$30.00$3.00$180.00
gpt-5.5$5.00$0.50$30.00
gpt-5.4$2.50$0.25$15.00
gpt-5.4-mini$0.75$0.075$4.50
gpt-5.4-nano$0.20$0.02$1.25
o4-reasoning$15.00$1.50$60.00
o4-mini-reasoning$3.00$0.30$12.00
gpt-4.1$2.00$0.50$8.00
gpt-4.1-mini$0.40$0.10$1.60
gpt-4.1-nano$0.10$0.025$0.40
text-embedding-3-large$0.13
text-embedding-3-small$0.02

Sources, as of June 2026: OpenAI pricing (https://developers.openai.com/api/docs/pricing), OpenAI Batch API docs (https://platform.openai.com/docs/guides/batch). Cached-input pricing applies only to prompt-cache hits where the same prefix is reused within the cache window; cache misses bill at the standard input rate.

How OpenAI bills you, line by line

Each API call generates two billable streams: input tokens (the prompt, the system message, any tool definitions, and any prior turns you replay) and output tokens (everything the model writes back, including reasoning tokens on the o-series and tool-call arguments). They are priced separately and listed independently on the invoice.

The formula is unchanged from prior versions of the API:

``` cost = (input_tokens / 1,000,000) * input_price_per_M + (output_tokens / 1,000,000) * output_price_per_M ```

Two adjustments matter in 2026. First, cached-input tokens — portions of your prompt that hit OpenAI's prompt cache within the cache window — bill at roughly 10% of the standard input rate. The cache is opportunistic and does not require code changes for many SDKs; long system prompts and reused tool schemas are the typical winners. Second, requests submitted through the Batch API receive 50% off both input and output, in exchange for a delivery window of up to 24 hours. These two discounts stack on top of base prices and are the single largest cost lever most teams ignore.

Reasoning tokens on the o-series (o4-reasoning, o4-mini-reasoning) bill at the output rate even though they are not returned to you. A model that 'thinks' for 4,000 tokens before producing a 200-token answer bills 4,200 output tokens. Plan for a 5-10x output budget on reasoning-heavy tasks compared to direct chat tasks.


Worked example 1: a 1,000 in / 500 out call at every tier

Take a representative call — a 1,000-token prompt that returns a 500-token answer, roughly equivalent to a 750-word brief in and a 375-word reply out. The per-call cost at standard rates lands as follows:

gpt-5.5-pro: (1000/1,000,000 × $30.00) + (500/1,000,000 × $180.00) = $0.030 + $0.090 = $0.120 per call. gpt-5.5: (0.001 × $5.00) + (0.0005 × $30.00) = $0.005 + $0.015 = $0.020 per call. gpt-5.4: $0.0025 + $0.0075 = $0.010. gpt-5.4-mini: $0.00075 + $0.00225 = $0.003. gpt-5.4-nano: $0.0002 + $0.000625 = $0.000825. o4-reasoning (assuming 2,000 reasoning + 500 visible output): $0.015 input + $0.150 output = $0.165 per call.

Notice the 145x spread between gpt-5.4-nano ($0.000825) and gpt-5.5-pro ($0.120) on identical token volumes. The right model is almost never the most expensive one; it is the cheapest tier that meets your quality bar.

If you want to pressure-test the cheapest tier first, draft cleaner prompts that survive a smaller model with our ChatGPT prompt generator. Tighter inputs reduce token count and shift the workload down the price ladder.


Worked example 2: scaling to 100,000 and 1,000,000 calls

Multiply the per-call numbers above by 100,000 (a midsize batch classification or summarization job) and 1,000,000 (a full-scale production workload):

100k calls — gpt-5.5-pro: $12,000. gpt-5.5: $2,000. gpt-5.4: $1,000. gpt-5.4-mini: $300. gpt-5.4-nano: $82.50. o4-reasoning (with 2k reasoning tokens): $16,500.

1M calls — gpt-5.5-pro: $120,000. gpt-5.5: $20,000. gpt-5.4: $10,000. gpt-5.4-mini: $3,000. gpt-5.4-nano: $825. o4-reasoning: $165,000.

Now apply the Batch API discount (-50% in and out) to the gpt-5.5 row: $20,000 becomes $10,000 at 1M calls. Apply prompt caching where 800 of every 1,000 input tokens are a stable system prefix that hits cache 80% of the time: those 640 cached tokens drop to $0.50/1M instead of $5/1M, saving 90% on 64% of input — roughly $2,880 off the $4,000 input bill at 1M calls, or about 14% of the total. Stack both discounts and the same workload runs around $8,300 — a 58% savings over the standard rate.

These are the canonical levers. Match the model tier to task difficulty first, then batch what can wait, then cache what repeats.


When to choose pro, standard, mini, or nano

gpt-5.5-pro is built for high-stakes reasoning where a single wrong answer is more expensive than 100 right ones — financial analysis, legal drafting, complex code synthesis with strict correctness requirements. The 6x premium over gpt-5.5 is justified only when downstream cost-of-error dominates per-call cost. For most production chat traffic it is overkill.

gpt-5.5 is the default for general-purpose chat, agentic workflows, content generation that ships to humans, and any task where you would have used GPT-4 in 2024. At $5/$30 it is roughly half the price of late-2024 GPT-4 at substantially higher quality.

gpt-5.4-mini ($0.75/$4.50) is the sweet spot for high-volume structured-output tasks: classification, extraction, summarization, simple Q&A. Most teams running 1M+ calls per month sit here. gpt-5.4-nano ($0.20/$1.25) is for embedded use cases — autocomplete, intent detection, simple routing — where the cost has to be measured in fractions of a cent.

The o-series (o4-reasoning, o4-mini-reasoning) bills reasoning tokens at the output rate, so use it only when chain-of-thought materially improves accuracy on hard problems. For straightforward generation, the non-reasoning models are 5-10x cheaper for equivalent quality. See OpenAI's reasoning guide for the canonical breakdown.


Batch API: when 50% off is actually free money

The Batch API accepts a JSONL file of requests and returns results within 24 hours, billed at half the standard input and output rates. The trade-off is latency — you cannot use it for anything a user is waiting on synchronously. But for offline workloads it is one of the most under-used cost reductions in the API.

Canonical fits: nightly summarization of yesterday's tickets, weekly classification of marketing leads, monthly enrichment of CRM contacts, one-off enrichment of a 500k-row dataset. If the task does not have to return within seconds, batch it.

Anti-fits: live chat, voice agents, anything in a checkout funnel, anything where humans are reading the response in real time. The latency window kills the user experience.

Worked math: a 1M-call gpt-5.5 summarization job costs $20,000 at the standard rate. The same job through Batch costs $10,000. If the work can wait until tomorrow, the discount is free. Confirm current Batch terms against OpenAI's batch documentation.


Prompt caching: 10% pricing on repeated prefixes

OpenAI's prompt cache stores recent prompt prefixes and re-serves matching prefixes from cache instead of re-tokenizing them, billing the matched portion at roughly 10% of the standard input rate. The cache is automatic for most SDK paths; what you control is whether your prompts have a stable, reusable prefix worth caching.

Cache-friendly prompt structure: a long fixed system message (instructions, style guide, examples), a stable middle block (tool definitions, reference docs), then a short variable tail (the user's actual question). The longer the cached portion and the more often it repeats within the cache window, the larger the savings.

Worked math: a chatbot with a 2,000-token system prompt that hits cache on 90% of the 100,000 daily calls. Without caching, system prompts alone cost (2,000 × 100,000 / 1,000,000) × $5 = $1,000 per day on gpt-5.5. With 90% cache hits at $0.50/1M, the cached 1.8M input tokens cost $0.90 — a 99.9% saving on the cached portion — and the remaining 10% bills at $1.00, total $1.90 per day for system-prompt input. Same workload, $998 less.

Caching does not help if your prompts are unique each call or if the variable portion is at the front of the prompt. Move stable text to the front, variable text to the back, and the cache will do the rest. See OpenAI's prompt caching docs for the cache window and eligibility rules.


Vision, audio, and tool-use surcharges

Image inputs on the GPT-5.5 family are converted to tokens based on resolution. A 1024×1024 image bills as roughly 765 input tokens on the standard tier; a 2048×2048 image bills as roughly 1,445 tokens. At $5/1M on gpt-5.5, that is $0.0038 and $0.0072 per image respectively — non-trivial when you process millions of images per month.

Audio input through the realtime and audio endpoints bills separately from text and at higher rates — roughly $40/1M input tokens and $80/1M output tokens on gpt-5.5-audio as of June 2026. A 1-minute spoken exchange runs $0.06-$0.12 depending on speech density.

Tool calls themselves are billed as output tokens — both the function name, arguments, and the tool result you echo back into the model. Agentic loops with 5-10 tool calls per turn can bill 10x the output of a direct-answer turn, which is why agent costs are nearly always output-dominated. We break down agent loop math in our AI agent cost calculator.


Realtime API and voice/audio pricing deep dive

Voice agents bill on a completely different rate card from text chat, and the gap is wide enough that engineers used to text-token economics routinely under-budget realtime deployments by 4-6x. As of June 2026, gpt-5.5-realtime — the conversational endpoint that streams audio in and audio out over a persistent WebSocket — bills audio input at $40.00 per 1M tokens and audio output at $80.00 per 1M tokens. That is 8x the text input rate ($5.00) and ~2.7x the text output rate ($30.00) on the same underlying model. Mixed-modality sessions are billed per stream: a turn where the user speaks and the model replies with audio plus a tool-call text payload generates audio input tokens, audio output tokens, and a small text output charge in the same invoice line.

Audio tokens are not characters or seconds — they are a discrete chunked representation of the waveform. The current rule of thumb is roughly 1 audio token per 0.1 seconds of speech at the standard 24kHz sample rate, which works out to ~600 audio tokens per minute of speech in each direction. For a sanity check on input bills, take the speaker's wall-clock minutes, multiply by 600, divide by 1,000,000, and multiply by $40. A 10-minute customer-service call where the user speaks for 4 minutes and the agent speaks for 6 minutes generates ~2,400 input audio tokens and ~3,600 output audio tokens. That is (2,400/1,000,000 × $40) + (3,600/1,000,000 × $80) = $0.096 + $0.288 = $0.384 per call before any tool-use or text overhead.

Worked example — a 5-minute voice agent call. Assume a realistic split: the user speaks for 2 minutes (1,200 input audio tokens), the agent speaks for 3 minutes (1,800 output audio tokens), and the agent also runs two tool calls returning ~400 text output tokens of structured arguments and ~600 text input tokens of tool results echoed back into context. Audio input: 1,200/1M × $40 = $0.048. Audio output: 1,800/1M × $80 = $0.144. Text output (tool calls + final text fragments): 400/1M × $30 = $0.012. Text input (tool results + system prompt of ~1,500 tokens): 2,100/1M × $5 = $0.0105. Total: ~$0.215 per 5-minute call, or roughly $2.58 per hour of live voice. Run 1,000 calls a day and the realtime bill alone is ~$6,450/month — before transcription, before logging, before any LLM fallback.

Whisper-3 transcription, used for asynchronous speech-to-text where you do not need a streamed model response, remains the cheapest audio entrypoint at $0.006 per minute of audio (billed in 1-second increments, minimum 1 second). A 10,000-minute transcription backlog — say a month of recorded support calls — costs exactly $60. The newer whisper-3-large endpoint, which adds diarization and word-level timestamps, bills at $0.011 per minute. For applications that only need post-call analytics rather than live conversation, transcribing with Whisper-3 and then running the transcript through gpt-5.4-mini is roughly 30-50x cheaper than routing the same audio through gpt-5.5-realtime.

Text-to-speech sits on its own rate card and is priced per character rather than per token. The standard tts-1-2026 voice runs $15.00 per 1M characters; the higher-fidelity tts-1-hd-2026 voice runs $30.00 per 1M characters. A 200-word reply averages ~1,100 characters, so a single TTS render costs $0.0165 on standard and $0.033 on HD. The trade-off versus realtime audio output is latency and interruptibility: TTS is non-streaming-friendly for back-and-forth conversation but ~5x cheaper than gpt-5.5-realtime audio output for IVR, notification readouts, and pre-rendered narration. A common production pattern is to use gpt-5.4-mini ($0.75/$4.50 text rates) to draft the response, then route to tts-1-2026 — total cost on that 200-word reply is roughly $0.018 input/output text plus $0.0165 TTS, versus ~$0.10+ if the same content were generated as streamed audio through the realtime endpoint.

Prompt caching applies to realtime sessions but only to the text portion of the prompt — the system message, tool schemas, and any text-form conversation history. Audio tokens themselves are not cached; each chunk of speech is unique enough that the cache cannot match it. The practical implication: structure your realtime system prompt the same way you would for chat — long stable instructions and tool definitions at the front, dynamic per-call context at the back — and the 90% cached-input discount applies to that text portion across the WebSocket session. For a voice agent with a 3,000-token system prompt running 1,000 calls a day, caching the system prefix drops text input cost from $15.00/day to ~$1.65/day. It is a small slice of the realtime bill but stacks cleanly with everything else. Confirm current realtime audio rates against OpenAI's realtime API docs before locking pricing into a customer contract — voice rates have moved twice in the last 12 months.


How to lower your OpenAI bill this week

Five actions ordered by typical impact. First, drop one model tier. If you are on gpt-5.5, run a side-by-side eval against gpt-5.4-mini on 100 representative samples; many teams find equivalent quality at 1/6 the cost. Second, batch everything that does not need a synchronous response — historical data backfills, daily reports, classification queues — and take the 50% Batch discount. Third, restructure your prompts to put stable text first so prompt caching kicks in.

Fourth, cap output. Set max_tokens aggressively and ask for structured JSON instead of prose; a 200-token JSON object replaces a 1,000-token paragraph for most extraction tasks. Fifth, monitor with a per-route cost dashboard — most teams have one route that accounts for 60% of spend and a long tail of cheap routes; the audit alone usually reveals an obvious cut.

If you want to draft tighter prompts to start, our code prompt builder and meta-description generator help compress instruction blocks without losing fidelity. Cross-check rates against Anthropic Claude pricing and the LLM cost comparison calculator before locking in a provider.

Frequently Asked Questions

What is OpenAI's cheapest model in 2026?

gpt-5.4-nano at $0.20 input / $1.25 output per 1M tokens is the cheapest general-purpose chat model. text-embedding-3-small at $0.02/1M is cheaper still but only produces embeddings, not generated text. Confirm against OpenAI's pricing page.

How much does the Batch API save?

50% off both input and output. A $20,000 gpt-5.5 job at the standard rate runs $10,000 through Batch, in exchange for a delivery window of up to 24 hours. Best for offline workloads — see OpenAI's batch guide.

Are cached input tokens really 90% cheaper?

Yes — cached-input tokens bill at roughly 10% of the standard input rate (so gpt-5.5 cached input is $0.50/1M instead of $5.00/1M). The catch is the prefix must hit OpenAI's prompt cache within the cache window, which favors long stable system prompts and stable tool schemas at the front of the request.

Why is output so much more expensive than input?

Generating tokens requires running the full forward pass for each token, while input tokens are processed in one batched pass. OpenAI typically prices output 5-8x input across the lineup — for example, $5 in / $30 out on gpt-5.5 is a 6x ratio.

Do o-series reasoning tokens count as output?

Yes. The o4-reasoning and o4-mini-reasoning models bill the hidden chain-of-thought at the output rate, even though those tokens are not returned to you. Budget 5-10x the visible output token count when using reasoning models.

Is OpenAI cheaper than Anthropic in 2026?

It depends on the tier. gpt-5.5 ($5/$30) is more expensive than Claude Sonnet 4.6 ($3/$15) and Claude Opus 4.8 ($5/$25) on output. gpt-5.4-mini ($0.75/$4.50) is cheaper than Claude Haiku 4.5 ($1/$5). Compare side by side at our LLM cost calculator.

How do I estimate cost before sending a request?

Use the formula cost = (input_tokens / 1M × input_price) + (output_tokens / 1M × output_price). Estimate token count as roughly characters ÷ 4 or words ÷ 0.75. For a worked walk-through, see our AI prompt cost calculator.

Does OpenAI bill for failed or refused responses?

Yes — any tokens the model produces are billed, including refusal messages and tool-call attempts that error out. The exception is requests that fail before any tokens are emitted (rate limits, auth errors, malformed input).

How much does gpt-5.5-realtime actually cost per minute of voice?

At June 2026 rates ($40/1M audio input, $80/1M audio output, and ~600 audio tokens per minute of speech), a balanced 1-minute exchange — 30 seconds of user speech and 30 seconds of agent response — runs roughly (300/1M × $40) + (300/1M × $80) = $0.012 + $0.024 = $0.036, before any text-side system prompt or tool-call charges. Plan on $0.04-$0.08 per realtime minute once a typical system prompt and 1-2 tool calls are included. See OpenAI's realtime API docs for current rates.

Should I use Whisper plus a text model, or just gpt-5.5-realtime?

If you need a live back-and-forth conversation with interruption handling, use gpt-5.5-realtime — Whisper-plus-text adds 1-3 seconds of latency that breaks natural turn-taking. If you only need post-call analytics, summarization, or asynchronous transcription, Whisper-3 at $0.006/min plus gpt-5.4-mini at $0.75/$4.50 is 30-50x cheaper than streaming the same audio through the realtime endpoint. The dividing line is whether a human is waiting in real time.

Does prompt caching work with the realtime API?

Partially. The text portion of a realtime session — system message, tool schemas, prior text-form turns — is eligible for the 90% cached-input discount the same way a chat completion is. Audio tokens themselves are not cached. Keep stable text instructions at the front of the realtime system prompt and the caching discount will apply to that portion across the WebSocket session, even though it has no effect on the audio-token bill.

Get the 2026 LLM pricing cheat sheet

One-page PDF with every model in this article, the discount math, and the formulas — free, no signup gate. Or browse our 40+ prompt-engineering tools to draft cheaper, leaner prompts.

Browse all prompt tools →