Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

o1 / o3 Reasoning Cost Calculator (2026)

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

OpenAI's o-series reasoning models — o3, o3-mini, and the deprecated o1 — bill differently from every chat model on the API. Before the model produces a single user-visible token, it generates internal reasoning tokens: a private chain-of-thought scratchpad that the model uses to plan, verify, and refine its answer. Those reasoning tokens are NEVER returned to the caller. But they bill at the full output rate, every single one of them. A 200-token answer that took 4,000 reasoning tokens to produce bills 4,200 output tokens — not 200.

This single mechanic is responsible for nearly every billing-surprise story we hear about reasoning models. A team estimates cost by counting the words in their answer, runs production for a week, and gets an invoice 5-15x what they budgeted for. The fix is not to avoid reasoning models — for the right workloads (math, code synthesis, multi-step planning, formal verification) they are dramatically better than chat. The fix is to budget against the reasoning-token shape, not the visible answer.

As of June 2026, the o-series ladder is: **o3 at $2.00 input / $8.00 output per 1M tokens**, **o3-mini at $0.55 / $2.20**, and **o1 at $15 / $60** (deprecated — migrate). The o1 to o3 transition was an 87% price drop on the flagship reasoning model — one of the largest single-model price cuts in API history (VentureBeat coverage). Reasoning is now ~7x cheaper than it was a year ago, and the math below reflects that reset.

Below: the full June-2026 reasoning-model price table, the reasoning-token cost formula (the one you actually need), four worked $-math examples that show the thinking-token premium in dollars, a decision tree for when reasoning beats chat, and a sourced FAQ. Quickly draft reasoning-tuned prompts that minimize thinking-token bloat with our free ChatGPT prompt generator. Sibling calculators: GPT-5 cost · OpenAI API cost · DeepSeek cost.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

OpenAI o-series reasoning model prices — June 2026

Feature
Input ($/1M)
Output ($/1M, incl. reasoning)
Context window
o3$2.00$8.00200K
o3-mini$0.55$2.20200K
o1 (deprecated — migrate to o3)$15.00$60.00200K

Source, as of June 2026: OpenAI pricing (https://developers.openai.com/api/docs/pricing). Reasoning tokens bill at the output rate even though they are not returned to the caller. No published cached-input discount on the o-series as of this verification date. 200K context window applies to all three rows. o1 remains on the pricing page for migration-window compatibility but is end-of-life — every new build should target o3 or o3-mini.

The reasoning-token cost formula (the one nobody warns you about)

On chat models like GPT-5.5, the cost formula is straightforward — you pay for the input tokens you sent and the output tokens the model wrote back. On the o-series, there is a third term that does not appear in any response field but absolutely appears on your invoice:

``` cost = (input_tokens / 1,000,000) × input_price + (reasoning_tokens / 1,000,000) × output_price ← invisible to caller + (visible_output / 1,000,000) × output_price ```

The reasoning_tokens count is reported in the API response under `usage.completion_tokens_details.reasoning_tokens`. Read it. Log it. If you skip this field you have no idea what you are actually paying per call — the `content` field shows you the 200-token answer, but the `reasoning_tokens` field is where the 4,000-token bill hides.

Practical reasoning-token shape we see across production deployments: simple math / classification with reasoning enabled = 200-800 reasoning tokens; multi-step code generation = 1,500-5,000 reasoning tokens; complex planning / proof-style tasks = 5,000-25,000 reasoning tokens; agentic loops with self-verification = 20,000-80,000 reasoning tokens per query. Budget the full envelope, not the visible answer.


Worked example 1: the 'cheap' classification call that is not cheap

A team migrates a classification pipeline from gpt-5.4-mini to o3-mini, hoping for higher accuracy on edge cases. Input: 500 tokens (the rubric + the document). Visible output: 50 tokens (a JSON label). Looks identical to chat in shape — so they budget against chat math.

**Chat estimate (wrong)**: 0.0005 × $0.55 + 0.00005 × $2.20 = $0.000275 + $0.00011 = **$0.000385 per call**. At 100k calls/month, $38.50.

**Reality**: o3-mini generates ~1,200 reasoning tokens before producing each 50-token label. Real output bill = (1,200 + 50) / 1,000,000 × $2.20 = $0.00275. Total: $0.000275 input + $0.00275 output = **$0.003 per call** — 7.8x the chat-shape estimate. At 100k calls/month, $300, not $38.50.

Lesson: even on the 'mini' tier of the reasoning ladder, the thinking-token tail dominates. If your classification task does NOT need multi-step reasoning, stay on gpt-5.4-mini ($0.50 / $1.50 input/output) — same call shape lands at $0.000125 per call, 24x cheaper than o3-mini. The reasoning premium only pays for itself when reasoning is actually the bottleneck. Cross-reference: OpenAI API cost calculator for the full chat ladder.


Worked example 2: the 200-token answer that cost $0.0336

The hero example. A user asks o3 a math-heavy product question. The visible answer is 200 tokens of clean prose. The model thought for 4,000 reasoning tokens to get there — running through unit conversions, edge-case checks, and a final verification pass.

Input: 1,000 tokens (the question + a 500-token system prompt). Output bill: (4,000 reasoning + 200 visible) / 1,000,000 × $8.00 = 4,200 / 1,000,000 × $8 = **$0.0336 in output alone**. Plus input: 0.001 × $2 = $0.002. **Total per call: $0.0356.**

Compare to the chat-shape estimate against the visible answer: 0.001 × $2 input + 0.0002 × $8 output = $0.002 + $0.0016 = $0.0036. **Actual cost is 9.9x the visible-answer estimate.**

Compare to gpt-5.5 on the same input/output: 0.001 × $5 + 0.0002 × $30 = $0.005 + $0.006 = $0.011 per call. o3 is 3.2x more expensive than gpt-5.5 on this call — but the o3 answer correctly handles the math edge case, and the gpt-5.5 answer is subtly wrong. Whether the 3.2x premium is worth it is a downstream-cost-of-error question, not a token-price question.


Worked example 3: agentic code-synthesis loop on o3

A coding agent generates a 150-line Python module from a spec. The agent runs in a 4-turn loop: plan → write → self-review → patch. Each turn the model thinks heavily before producing visible output.

Per-turn shape: input ~2,500 tokens (system + tools + growing transcript), visible output ~400 tokens, reasoning ~6,000 tokens. Per turn output bill: (6,000 + 400) / 1,000,000 × $8 = **$0.0512**. Per turn input: 0.0025 × $2 = $0.005. Per turn total: $0.0562.

**4-turn total: ~$0.225 per code-synthesis run on o3.** Compare to gpt-5.5-pro on the same loop (no reasoning tokens, but more visible output ~800 tokens/turn): 0.0025 × $30 input + 0.0008 × $180 output = $0.075 + $0.144 = $0.219/turn × 4 = $0.876. **o3 is 3.9x cheaper than gpt-5.5-pro on this workload despite the reasoning premium, because pro's output rate is so much higher.**

The takeaway: on tasks where reasoning is what you are buying, o3's $2 / $8 ladder beats every premium chat tier. The trap is using o3 for tasks where reasoning is not the bottleneck — that is where the thinking-token tail makes you pay for compute that does not improve the answer.


Worked example 4: 100,000 reasoning calls/month — the budget reality

Scale the per-call numbers to a realistic monthly workload. Assume 100k calls/month, mixed shape: 1,200 input tokens average, 300 visible output, 3,500 reasoning tokens (the production median we see on real o3 traffic).

Per call: 0.0012 × $2 + (3,500 + 300) / 1,000,000 × $8 = $0.0024 + $0.0304 = $0.0328. **Monthly on o3: $3,280.**

On o3-mini (assuming same shape; mini typically uses 30-50% fewer reasoning tokens — say 2,000 instead of 3,500): 0.0012 × $0.55 + (2,000 + 300) / 1,000,000 × $2.20 = $0.00066 + $0.00506 = $0.00572. **Monthly on o3-mini: $572.** o3-mini is 5.7x cheaper for the same call shape with a small accuracy delta.

On the old o1 (for comparison only — migrate): 0.0012 × $15 + 3,800/1,000,000 × $60 = $0.018 + $0.228 = $0.246. **Monthly on o1: $24,600.** o3 is 7.5x cheaper than o1 for the same workload — the 87% price drop is real and you should be capturing it.

The lever order for keeping reasoning costs in check: (1) cap `max_completion_tokens` to bound the worst case, (2) use `reasoning_effort: 'low'` where the task tolerates less thinking, (3) drop to o3-mini wherever quality holds, (4) route only the truly reasoning-bound queries to o3 — let gpt-5.4-mini handle the rest. See our DeepSeek cost calculator for the open-weights reasoning alternative.


The 87% o1 to o3 price drop — and what it changes

When o3 launched at $2 / $8 versus o1's $15 / $60, OpenAI announced an effective 80-87% price reduction on the flagship reasoning model (VentureBeat coverage). On input, o3 is 7.5x cheaper than o1. On output (where reasoning tokens bill), o3 is also 7.5x cheaper. Net effect: any o1 workload moved to o3 lands at ~13% of the previous cost with quality-on-benchmark improvements at the same time.

This is not a marginal price tweak — it is a re-pricing of the reasoning category. Workloads that were uneconomic on o1 ($25k/month for 100k mid-complexity calls) are now under $4k/month on o3. Reasoning models have moved from 'premium escape valve for hard problems' to 'plausible default for any task where chain-of-thought helps.'

What this means for your migration plan: if you have ANY o1 traffic still running, the migration is overdue. Code change: replace model ID `o1` with `o3`, leave everything else identical (same context window, same reasoning-token billing mechanic, same response shape). You will see a 7-8x cost reduction on the same workload before any other optimization.

What this means for your build decisions: when you were avoiding reasoning models because of the $60/M output rate, reconsider. At $8/M, o3 is competitive with gpt-5.5 ($30/M output) once you factor in the better answer quality on reasoning-bound tasks. The dollar argument for chat-instead-of-reasoning has weakened materially.


Decision tree: when reasoning models beat chat models

**Use o3 / o3-mini when**: (1) the task has objectively-verifiable correctness — math, code that runs and passes tests, logic puzzles, formal extraction with a ground truth; (2) the task has multi-step dependencies that chat models miss (multi-constraint scheduling, multi-hop reasoning over a knowledge base, plan-then-execute); (3) you have an eval showing a real accuracy lift over the equivalent chat model on YOUR task — not on a benchmark.

**Stick with chat models (gpt-5.4 / gpt-5.5) when**: (1) the task is open-ended generation — content, copy, conversation, brainstorming — where 'correctness' is taste, not truth; (2) the task is simple extraction / classification where chat models already hit 95%+ accuracy (paying 5-15x for a reasoning model gets you the last 1-3%, often not worth it); (3) latency matters and you can't wait for the model to think (reasoning models add 5-30 seconds of latency from internal thinking before any output streams).

**Use o3-mini specifically when**: (1) you want the reasoning shape but the visible answer is short and the cost-of-error is moderate; (2) classification with hard edge cases where chat-tier gets ~90% and you need 96%+; (3) you have a high-volume workload where the o3 → o3-mini drop (4x cheaper input, 3.6x cheaper output) is the difference between a viable and a non-viable deployment.

**The eval test**: before you commit a workload to reasoning models, run 100 representative queries through both o3 and your best chat model. Score correctness. If the accuracy lift is <5% absolute, stay on chat — the reasoning premium will not pay for itself. If the lift is >10%, reasoning is almost certainly worth it. Between 5% and 10% is a downstream-cost-of-error judgement call.

Compare the open-weights alternative: DeepSeek-R1 costs $0.55 / $2.19 per 1M tokens — nearly identical to o3-mini's $0.55 / $2.20. The cost gap between proprietary reasoning (o3-mini) and open-weights reasoning (R1) has closed completely; the differentiation is now quality, latency, and tool-integration, not price.


How to control reasoning-token bloat (the levers that work)

**Lever 1 — `reasoning_effort` parameter.** The o-series accepts a `reasoning_effort` value of 'low', 'medium', or 'high'. Low cuts internal thinking by 50-70% and clips reasoning-token bills proportionally. For tasks where the model's first plausible answer is usually right, 'low' is the right default. Reserve 'high' for tasks where you've measurably seen 'medium' produce wrong answers.

**Lever 2 — `max_completion_tokens` cap.** Sets a hard ceiling on (reasoning + visible) output combined. Set this to your worst-case acceptable bill per call. If the model hits the cap, you'll see `finish_reason: 'length'` — handle it explicitly (retry with more budget, or degrade to a chat-model fallback).

**Lever 3 — bounded scratchpad in the prompt.** Counter-intuitively, instructing the model 'work through this in at most 3 steps' or 'verify only the critical constraint' shapes the reasoning trace and reduces token count without measurably hurting accuracy on most tasks. Reasoning models respect prompt-level reasoning bounds well.

**Lever 4 — pre-decompose the task.** If you can break a multi-step reasoning task into 3 simpler chat-model calls + 1 reasoning-model call (instead of one big reasoning-model call), the chat calls bill at $0.50/M and the single reasoning call has a much smaller scratchpad to manage. Common 50-70% savings on agentic workloads.

**Lever 5 — log `reasoning_tokens` on every call.** OpenAI exposes the count in `usage.completion_tokens_details.reasoning_tokens`. Send it to your observability stack. The first time you see a 25k-reasoning-token outlier in production, you'll understand why this lever matters more than the other four combined — catch the outliers, not the median.

**Lever 6 — route the task, don't route the model.** Build a classifier in front of your reasoning model: simple queries route to gpt-5.4-mini ($0.50 / $1.50), complex queries route to o3. A 100k-call/month workload where 70% can go to chat and 30% needs reasoning lands at ~$1,200/month combined vs $3,280 if everything goes to o3. The router itself costs almost nothing.


o3 vs o3-mini: when the 4x cheaper tier is actually enough

o3-mini at $0.55 / $2.20 is roughly 4x cheaper than o3 on input and 3.6x cheaper on output. It also typically generates 30-50% fewer reasoning tokens for the same task — the smaller model is faster to converge on an answer. Combined effect: o3-mini is often 5-6x cheaper than o3 in production for the same workload.

Where o3-mini holds quality: structured extraction with hard edge cases, mid-complexity code generation (single function, well-specified), classification with 5-15 classes and ambiguous boundaries, multi-hop Q&A over a small knowledge base.

Where o3-mini falls short and you need full o3: long-horizon agentic planning (>5 sequential reasoning steps), proof-style mathematical work, code synthesis above ~200 lines, tasks where the eval shows o3-mini at <85% accuracy.

Default-on-mini policy: ship every new reasoning workload on o3-mini first. Run a 200-sample eval against o3. If o3-mini is within 3 percentage points of o3 accuracy, keep mini. If gap is 3-7 points, decide based on cost-of-error. If gap is >7 points, move to o3. This policy keeps 60-80% of typical reasoning traffic on the cheaper tier with no measurable quality impact at the product layer.


Why there's no cached-input discount on the o-series (and what to do about it)

Unlike the GPT-5 chat family — where cached-input pricing reads prompt-cache hits at ~10% of the standard input rate (a 90% discount) — the o-series does NOT publish a cached-input discount as of June 2026. Every input token bills at full $2/M (o3) or $0.55/M (o3-mini) regardless of cache state.

Why this matters: on chat models, structuring your prompt prefix-first to maximize cache hits can shave 30-50% off the input bill. That lever is not available on reasoning models. Every long system prompt costs full price every call.

Practical implication: on o-series workloads, keep system prompts SHORT. A 2,000-token reasoning-model system prompt that would cache to $0.20/M effective on gpt-5.5 instead costs full $2/M on o3 — the same tokens, 10x more expensive. Trim ruthlessly. Move stable context to the user-message-prefix only if it has to be there at all.

Workaround for repeated reasoning patterns: pre-compute the reasoning step once with o3, store the conclusion, and serve subsequent identical-shape queries from a chat-model + retrieval pipeline that just retrieves the cached conclusion. This pattern (reason once, serve from cache) routes the expensive reasoning to a tiny fraction of traffic. See our code prompt builder for the cache-anchored prompt patterns that work on chat tiers.

Watch the OpenAI changelog — if/when caching ships for o-series, the cost math in this guide shifts materially. As of 2026-06-20 it has not.


Migrating off o1: the checklist

o1 is deprecated. Pricing remains on the page for migration compatibility but new builds should target o3 or o3-mini. The migration is one of the simplest model swaps OpenAI has ever shipped:

**Step 1**: replace `model: 'o1'` with `model: 'o3'` (or `model: 'o3-mini'`) in your API calls. Same endpoint, same request shape, same response shape. The o-series API contract is stable across the o1 → o3 transition.

**Step 2**: re-tune `reasoning_effort`. o3 is faster at converging than o1 was — workloads that needed 'high' on o1 frequently land at 'medium' on o3 with equal or better quality. Test before assuming 'high' is still required.

**Step 3**: re-baseline your cost budget. The 7.5x price drop on both input and output means your monthly bill should fall by ~85% for the same workload. If it doesn't fall by that much, you're probably emitting more reasoning tokens — check whether `reasoning_effort` defaulted higher on the new model.

**Step 4**: re-run your eval suite. Quality should be equal or better on every benchmark we have data on; if a specific task regresses, file an issue and consider whether reasoning_effort or prompt structure needs adjusting for the new model.

**Step 5**: archive o1-specific code paths. The longer o1 stays in your codebase the more likely an engineer adds another call against it. Remove the legacy ID, force a build break, migrate everything.


Sourcing methodology — how to keep these numbers current

Every price in this guide comes from OpenAI's live pricing page at developers.openai.com/api/docs/pricing, fetched on 2026-06-20 and cross-verified against the deprecation notices on o1 and the launch posts for o3. Where a number could not be verified against the official page (e.g., cached-input pricing for o-series) we explicitly note it is not published rather than fabricating a value.

OpenAI does not version their pricing page with explicit changelog entries — changes ship silently. The o-series category has been particularly volatile: o3 alone has seen one major price drop (the 87% cut from o1) and one quiet adjustment in reasoning-token billing semantics since launch. Re-verify quarterly if your monthly reasoning bill exceeds $1,000.

**How to verify before you budget**: open developers.openai.com/api/docs/pricing in an incognito window, find the o-series section, and confirm the four numbers ($2 / $8 for o3, $0.55 / $2.20 for o3-mini) match this guide. If they match, this guide is current. If they don't, trust the live page and ping us.

**The reasoning-token billing semantics are documented separately** at platform.openai.com/docs/guides/reasoning. That page explicitly states reasoning tokens bill at the output rate and are reported under `usage.completion_tokens_details.reasoning_tokens`. The structural behavior — internal scratchpad, never returned, fully billed — has been stable since o1 launch and applies identically to o3 and o3-mini.

**Why we omit some commonly-cited numbers**: third-party guides sometimes list o-series cached-input rates or volume discounts that do not appear on OpenAI's live page. Rather than propagate possibly-stale or possibly-fabricated rates, we omit them. If OpenAI publishes a cached-input rate for o-series after this guide ships, we'll re-fetch and update — until then, plan against full input rates.

How to estimate any o-series reasoning call cost in 5 steps

  1. 1

    Estimate your input tokens

    Same chat-model rule: characters ÷ 4 or words ÷ 0.75. Keep system prompts short on o-series (no cached-input discount means every token bills at full rate every call).

    → Open the ChatGPT prompt generator (reasoning-tuned)
  2. 2

    Estimate your VISIBLE output tokens

    Estimate the user-facing answer length the same way — words ÷ 0.75. This is the tip of the iceberg on reasoning models; the reasoning-token tail underneath usually dominates the bill.

  3. 3

    Estimate your REASONING tokens (the hidden term)

    Production medians we see: simple math/classification 200-800; multi-step code 1,500-5,000; complex planning 5,000-25,000; agentic self-verification loops 20,000-80,000. For a first build, budget 3,000-5,000 reasoning tokens per call and refine against actual `usage.completion_tokens_details.reasoning_tokens` from logs.

  4. 4

    Apply the reasoning cost formula

    cost = (input_tokens / 1M) × input_price + ((reasoning_tokens + visible_output) / 1M) × output_price. Example o3 call: 1,000 input + 4,000 reasoning + 200 visible = 0.001 × $2 + 0.0042 × $8 = $0.002 + $0.0336 = $0.0356 per call. That $0.0356 is ~10x what the visible-output-only estimate would have shown.

  5. 5

    Tune reasoning_effort + max_completion_tokens

    Default to `reasoning_effort: 'low'` and lift only when an eval shows quality gains. Always set `max_completion_tokens` so a single runaway scratchpad cannot bill 80k output tokens — that's $0.64 on o3 from one bad query.

Frequently Asked Questions

How much does o3 cost per 1M tokens in 2026?

As of June 2026, OpenAI's o3 charges $2.00 per 1M input tokens and $8.00 per 1M output tokens — with the critical caveat that internal reasoning tokens bill at the output rate even though they are not returned to the caller. A typical o3 call generating 3,500 reasoning tokens + 300 visible output tokens bills 3,800 tokens against the $8/M output rate ($0.0304), plus input. Sourced from OpenAI's live pricing page.

What are reasoning tokens and why do they cost extra?

Reasoning tokens are internal chain-of-thought scratchpad tokens that o-series models generate before producing the user-visible answer. They are how the model plans, verifies, and refines its response. They are never returned to the caller (the `content` field shows only the visible answer), but they bill at the full output rate. A 200-token answer that took 4,000 reasoning tokens to produce bills 4,200 output tokens — not 200. This is the single mechanic that makes reasoning models cost 5-15x chat models on identical-looking workloads.

Do reasoning tokens count toward output billing?

Yes. Every reasoning token bills at the model's output rate, identically to visible output tokens. The API response reports the count under `usage.completion_tokens_details.reasoning_tokens` — log this field on every call or you have no visibility into your actual cost shape. The `total_tokens` field includes reasoning tokens in the output sum.

Is o3 cheaper than o1?

Yes — dramatically. o3 prices at $2 input / $8 output per 1M tokens; o1 (now deprecated) was $15 / $60. That's a 7.5x reduction on both input and output, or roughly 87% off. The same workload that cost $24,600/month on o1 lands at ~$3,280/month on o3 with quality at parity or better. Every o1 workload should be migrated to o3. See: https://venturebeat.com/ai/openai-announces-80-price-drop-for-o3-its-most-powerful-reasoning-model

o3 vs o3-mini pricing — when is mini enough?

o3-mini at $0.55 / $2.20 per 1M tokens is roughly 4x cheaper on input and 3.6x cheaper on output than o3. It also generates 30-50% fewer reasoning tokens for typical tasks. Default policy: ship every new reasoning workload on o3-mini first, run a 200-sample eval against o3, keep mini if accuracy is within 3 points. Mini handles structured extraction, mid-complexity code, classification with hard edge cases. Move to full o3 for long-horizon agentic planning, proof-style math, or 200+ line code synthesis.

How do I reduce my o3 API cost?

Six levers: (1) set `reasoning_effort: 'low'` as your default and lift only when needed; (2) cap `max_completion_tokens` so a runaway scratchpad can't bill 80k tokens; (3) drop to o3-mini wherever the eval allows; (4) pre-decompose multi-step tasks into chat-model + one reasoning-model call; (5) keep system prompts short (no cached-input discount on o-series — every token bills full rate every call); (6) build a router that sends only truly reasoning-bound queries to o3 and routes the rest to gpt-5.4-mini at $0.50 / $1.50.

o3 vs DeepSeek R1 cost — which is cheaper?

Nearly identical at the headline rate. DeepSeek-R1 is $0.55 / $2.19 per 1M tokens — essentially the same as o3-mini's $0.55 / $2.20. The gap to full o3 ($2 / $8) is roughly 4x in DeepSeek's favor. DeepSeek-R1 also offers a published 90% cache-hit input discount that o-series does not. For pure cost on heavy reasoning workloads, R1 wins; for tool-use, function-calling depth, and OpenAI-ecosystem integration, o3 or o3-mini still wins. See our DeepSeek cost calculator for the full open-weights cost picture.

Why is there no cached-input discount on o-series?

As of June 2026, OpenAI has not published cached-input pricing for the o-series. Every input token on o3 bills at the full $2/M rate regardless of cache state — there is no $0.20/M cached-tier like on gpt-5.5. The structural workaround: keep system prompts short on reasoning models (every token costs full price every call), and consider a 'reason once, serve from cache' architecture where you pre-compute the reasoning step with o3 and serve subsequent identical-shape queries from a chat-model + retrieval pipeline. Watch the OpenAI changelog — if cached-input ships for o-series, the cost math shifts materially.

Stop overpaying on reasoning tokens.

o-series bills 5-15x chat models on identical token volumes. Our AI Prompt Generator writes reasoning-tuned prompts that minimize thinking-token bloat — based on YOUR business + task. 14-day free trial, no card.

Browse all prompt tools →