Model card · Verified against OpenAI docs · 2026-06-20

o3: Full Spec Sheet (June 2026)

By The DDH Team at Digital Dashboard Hub·Updated June 20, 2026

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

o3 is OpenAI's production reasoning model, released in mid-2025 as the successor to the o1 family and the broader reasoning-model line. It is the model that brought reasoning-class quality into the price range where production deployment makes sense: $2/M input, $8/M output, $0.50/M cached input — roughly 1.3% of o1-pro's price for the bulk of o1-pro's quality on standard reasoning benchmarks.

Headline numbers: $2 per 1M input tokens, $8 per 1M output, $0.50 per 1M for cached input (75% off). Context window is 200,000 tokens. Max output (including reasoning tokens) is 100,000 tokens per response. Knowledge cutoff is June 2024. Modalities are text + image input; text output. Function calling, parallel tool calls, structured outputs, prompt caching, the Batch API (50% off), and streaming are all supported. The Responses API is the recommended endpoint.

Below: full spec table, when o3 is the right call vs GPT-5 with reasoning_effort or o1-pro, the minimal API request, and 8 FAQs. Sibling pages: o1-pro spec sheet · GPT-5 spec sheet · OpenAI API cost calculator. Write an o3-tuned prompt free with our ChatGPT prompt generator.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card. →

o3 — Full spec sheet (June 2026)

Feature	o3 spec
Provider	OpenAI
Model ID (API)	o3
Released	Mid-2025 (general availability)
Input price (per 1M)	$2.00
Cached input price (per 1M)	$0.50 (75% off)
Output price (per 1M)	$8.00
Reasoning tokens billing	Billed at output rate ($8/M)
Batch API discount	50% off input + output
Context window	200,000 tokens
Max output tokens (incl. reasoning)	100,000 tokens
Modalities (input)	Text, image
Modalities (output)	Text
Function calling
Parallel tool calls
Structured outputs (JSON Schema)
Streaming
Prompt caching (automatic)
Vision (image understanding)
Reasoning effort control	low / medium / high
Knowledge cutoff	June 2024
Endpoint	/v1/responses, /v1/chat/completions

Sources verified 2026-06-20: OpenAI model page (https://platform.openai.com/docs/models/o3), OpenAI pricing page (https://openai.com/api/pricing), OpenAI Responses API reference (https://platform.openai.com/docs/api-reference/responses). o3 pricing was cut significantly in mid-2025 (originally launched at $10/$40 per 1M; current $2/$8 reflects a post-launch reduction). Re-verify the live pages before budgeting.

What o3 actually is (and why it changed the reasoning-model menu)

o3 is OpenAI's third generation of reasoning model in the post-GPT-4 era — successor to o1 (Sep 2024) and o3-mini (Jan 2025), preceded by the experimental o3-preview that OpenAI quietly retired ahead of o3 GA. The release brought two things that o1 didn't have: production-grade pricing ($2/$8 vs o1's ~$15/$60), and full feature surface (streaming, prompt caching, the standard chat completions endpoint).

Functionally, o3 is a reasoning model in the same lineage as o1 and o1-pro — it produces a long internal chain of thought before writing the visible answer. Reasoning tokens bill at the output rate ($8/M on o3) and are not returned to you. The key behavioral difference from non-reasoning models (GPT-5, Claude, Gemini): you don't scaffold the chain of thought yourself. Prompt o3 with the bare problem statement; the model handles the reasoning structure internally.

o3 has been the production default for reasoning-class workloads on OpenAI since mid-2025. It supplanted o1 entirely (o1 is deprecated for new code as of June 2026) and made o1-pro a narrow-niche tool for the hardest single-call decisions where another 5-10% quality gain justifies a 60× price premium.

Pricing math: what o3 actually costs per call

Standard rates: `cost = (input_tokens / 1M) × $2 + (output_tokens / 1M) × $8`. The representative 1,000-in / 500-out call WITHOUT reasoning tokens: `0.001 × $2 + 0.0005 × $8 = $0.002 + $0.004 = $0.006`. About 0.6¢ per call — same order of magnitude as GPT-5 ($0.00625 on the same call).

The reasoning-token line item is where o3 differs from GPT-5. A 1,000-in / 500-visible-out o3 call with `reasoning_effort: medium` typically burns 2,000-5,000 reasoning tokens. At $8/M output, 3,000 reasoning tokens = $0.024 — quadrupling the per-call cost vs the same prompt on GPT-5 with `reasoning_effort: low`.

Apply prompt caching: o3 supports automatic prompt caching with the same prefix-first mechanics as GPT-5. Cached input bills at $0.50/M (75% off). For workloads with a stable system prefix, caching cuts the input bill by 60-80% at typical hit rates.

Batch API: 50% off both input and output. Asynchronous reasoning workloads — daily evals, weekly research synthesis, monthly compliance analysis — should always run on batch. The cost stack of caching + batching can bring o3's effective per-call cost below GPT-5's `reasoning_effort: high` cost for the same problem class. Worked $: OpenAI API cost calculator.

Reasoning effort on o3

o3 exposes `reasoning_effort` with three levels: `low`, `medium` (default), and `high`. Same parameter shape as GPT-5 but without the `minimal` option (reasoning models always reason at least a small amount).

`low`: 500-2,000 reasoning tokens typically. Use for routine problem-solving, structured analysis, code review tasks where the answer is mechanical but benefits from a brief reasoning pass.

`medium` (default): 2,000-8,000 reasoning tokens. The right default for general reasoning workloads — complex code synthesis, multi-step analysis, math problems of moderate difficulty.

`high`: 5,000-30,000+ reasoning tokens. Use for the hardest problems where correctness is the constraint and you've explicitly budgeted for the higher per-call cost. Most teams over-use `high` — profile your actual quality gains before defaulting to it.

Always cap with `max_output_tokens`. The default ceiling (100K total output including reasoning) is a guardrail, not a target. Set explicit caps based on the realistic output budget for the task.

Vision, function calling, and structured outputs on o3

o3 accepts text + image input via the standard OpenAI message format. Vision reasoning on o3 is strong on tasks that require step-by-step interpretation of visual data — chart analysis, multi-step diagrams, complex visual logic puzzles. For simple image classification or single-step VQA, the reasoning premium isn't worth it; use GPT-5 or gpt-5-mini.

Function calling is fully supported via the standard `tools` parameter. Parallel tool calls are on by default. Reasoning models invoke tools within their reasoning chain — o3 can issue a tool call, see the result, reason about it, and issue another tool call before producing the final answer.

Structured outputs (JSON Schema enforcement) work via the standard `text.format` parameter on Responses API or `response_format` on chat completions. Same shape as GPT-5; outputs are guaranteed to validate against the schema.

Streaming is supported — the visible answer tokens stream as they're produced, after internal reasoning completes. The first byte of streaming output typically arrives 2-30 seconds into the call depending on reasoning depth.

When to pick o3 vs GPT-5 vs o1-pro

**Pick o3** when: the task is structured reasoning at production scale — code synthesis with non-trivial logic, mathematical problem solving, structured analysis with multi-step inference, agentic workflows that need genuine reasoning per turn. o3 is the right default for any 'reasoning-class' workload that runs more than a few times per day.

**Pick GPT-5 with `reasoning_effort: high`** when: you want reasoning quality close to o3 in a model with the full GPT-5 feature surface (broader knowledge, slightly better instruction following on non-reasoning tasks, native multimodal across the same input types). GPT-5 high-reasoning is ~$0.10 per call for the same 1,000-in / 500-out + 5K-reasoning workload that costs ~$0.05 on o3 — comparable order of magnitude. Pick GPT-5 when the task mix is heterogeneous; pick o3 when reasoning is consistently the bottleneck.

**Pick o1-pro** only when: you've benchmarked all three and o1-pro wins on your specific workload by a margin that justifies 30-60× the per-call cost, AND the wall-clock latency (60-180 seconds, no streaming) is acceptable. Narrow use case; most teams don't need it. See our o1-pro spec sheet for the explicit comparison.

How o3 differs from o1 (and why you should migrate)

If you're on o1 and considering whether to migrate to o3: yes. o1 is deprecated for new code as of June 2026 — OpenAI will eventually sunset it. o3 is dramatically cheaper ($2/$8 vs o1's ~$15/$60), faster, and produces typically equal or better quality on standard reasoning benchmarks.

Migration is trivial: change `model='o1'` to `model='o3'` in your API calls. Same parameter shape, same reasoning-effort dial, same response structure. No prompt rewrites needed for most workloads.

Run a side-by-side eval before the cutover — sample 100-300 production inputs, compare o1 vs o3 outputs, blind-score. Most teams find o3 wins on quality at 13% of o1's price. If your eval shows o3 losing on a subset, that subset goes to GPT-5 `reasoning_effort: high` (much cheaper than o1-pro) or stays on o1-pro for the narrow band where it pencils.

Verified sources and how to re-check the numbers

Every number on this page was verified against OpenAI's live documentation on 2026-06-20. Sources: platform.openai.com/docs/models/o3 for context, modalities, and feature support; openai.com/api/pricing for input/output/cached prices; platform.openai.com/docs/api-reference/responses for the Responses API contract.

o3 pricing was cut significantly post-launch (originally $10/$40 per 1M; current $2/$8 reflects a mid-2025 reduction). OpenAI does not publish formal pricing changelogs — watch openai.com/api/pricing directly if your monthly o3 spend exceeds $500. Further reductions are plausible as the reasoning-model architecture matures.

Methodology: when a number could not be cross-confirmed against an official OpenAI page on the verification date, it was omitted from this card rather than guessed.

Make your first o3 API call in 5 steps

1
Get an OpenAI API key
platform.openai.com → dashboard → API keys → Create new secret key. Copy to `.env` as `OPENAI_API_KEY=...`. o3 is available on API tier 1+ (basic verified account).
2
Install or update the SDK
Python: `pip install openai`. Node: `npm install openai`. The SDK supports o3, the Responses API, structured outputs, vision input, prompt caching, and streaming with no version-pinning beyond the latest stable release.
3
Send a minimal call
Python: `from openai import OpenAI; client = OpenAI(); r = client.responses.create(model='o3', input='Prove that there are infinitely many primes.', reasoning={'effort': 'medium'}); print(r.output_text)`. Streaming is supported — use `stream=True` for token-by-token visible output.
4
Cap output and set explicit reasoning effort
For production cost predictability: `client.responses.create(model='o3', input=prompt, reasoning={'effort': 'low'}, max_output_tokens=10000)`. Reasoning effort `low` for routine problems, `medium` (default) for general reasoning, `high` only when correctness dominates cost.
→ Open the ChatGPT prompt generator
5
Add structured outputs and let caching activate
Force typed output: `text={'format': {'type': 'json_schema', 'json_schema': {...}}}`. Structure prompts prefix-first (stable system + tools at the start) so automatic prompt caching activates on the cacheable prefix — drops input cost by 75%.

Digital Dashboard Hub

The prompt patterns above work 10x better when they live in a library you actually own — tunable to your niche, exportable to GPT-5, Claude, Gemini, Perplexity, Midjourney, Llama. Stop pasting across 6 tools.

Try DDH's AI Prompt Builder — free 14 days, no card. →

Related calculators

OpenAI Pricing Calculator →GPT-5.5, 5.4, mini, nano — full per-call cost in one input.Claude Pricing Calculator →Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5 — input + output combined.Context Window Comparison →Max input length and price per 1M for every current model.

Related prompt tools

Prompt generator (o3-tuned)→Code prompt builder (cache-anchored)→o1-pro spec sheet→GPT-5 spec sheet→OpenAI API cost calculator→

Frequently Asked Questions

How much does o3 cost in 2026?

$2 per 1M input tokens, $8 per 1M output tokens, $0.50 per 1M for cached input (75% off). Reasoning tokens bill at the output rate ($8/M). Batch API takes another 50% off both streams. A representative 1,000-in / 500-visible-out call with 3,000 reasoning tokens costs ~$0.03. Source: openai.com/api/pricing, verified 2026-06-20.

What is the difference between o3 and o1-pro?

Same reasoning-model family. o3 is the production successor; o1-pro is a pro-tier high-effort configuration of the prior generation. o3 is $2/$8 per 1M; o1-pro is $150/$600 — about 75× cheaper. o3 supports streaming, prompt caching, and the full standard API surface; o1-pro doesn't. For most reasoning workloads, o3 is the right pick over o1-pro by a wide margin. See our o1-pro spec sheet.

Should I migrate from o1 to o3?

Yes. o1 is deprecated for new code as of June 2026 and OpenAI will eventually sunset it. o3 is dramatically cheaper ($2/$8 vs o1's ~$15/$60), supports the full feature surface (streaming, prompt caching), and produces equal or better quality on standard reasoning benchmarks. Migration is a model-ID swap (`'o1'` → `'o3'`) plus an eval pass.

What is o3's context window?

200,000 tokens. Max output (including reasoning tokens) is 100,000 tokens per response. Smaller than GPT-5 (400K context, 128K output) and dramatically smaller than Gemini 2.5 Pro (1M).

Does o3 support prompt caching?

Yes — automatic and prefix-based, same mechanics as GPT-5. Cached input bills at $0.50/M (75% off). Structure prompts prefix-first (stable system + tools at the start, dynamic user content at the end) and caching activates automatically. No code changes required for most SDKs.

What is reasoning_effort on o3?

An API parameter controlling how many internal reasoning tokens o3 burns before producing the visible answer. Three levels: `low` (500-2,000 reasoning tokens, routine problems), `medium` (2,000-8,000, default, general reasoning), `high` (5,000-30,000+, hardest problems where correctness dominates cost). Reasoning tokens bill at the output rate ($8/M).

Does o3 support vision?

Yes — text + image input via the standard OpenAI message format. Strong on tasks requiring step-by-step interpretation of visual data (chart analysis, complex diagrams, visual logic puzzles). For simple image classification or single-step VQA, the reasoning premium isn't justified — use GPT-5 or gpt-5-mini instead.

Should I use o3 or GPT-5 with high reasoning effort?

Comparable cost and quality at the high end. o3 at $2/$8 per 1M and GPT-5 at $1.25/$10 per 1M with `reasoning_effort: high` produce similar quality on reasoning benchmarks. Pick GPT-5 when the task mix is heterogeneous (some reasoning, some classification, some chat) — the unified model is easier to operate. Pick o3 when reasoning is consistently the bottleneck — the dedicated reasoning model is slightly more reliable on hard problems.

o3 is the reasoning model most teams should default to. Make every reasoning token count.

Our AI Prompt Generator writes o3-tuned prompts (clean problem statement, no chain-of-thought scaffolding, structured-output schema, max-output capped) based on YOUR business + task. 14-day free trial of DDH Pro, no card.

Browse all prompt tools →