Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
Model card · Verified against OpenAI docs · 2026-06-20

o1-pro: Full Spec Sheet (June 2026)

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

o1-pro is OpenAI's premium reasoning model, released in December 2024 as part of the original o1 family. It is the highest-effort reasoning configuration OpenAI exposes through the API — more compute per query than o1 (standard), more reasoning depth than any tier of GPT-5, with pricing to match. Where the consumer-facing 'ChatGPT Pro' tier ($200/month) gates access to o1-pro in the chat UI, the API exposes it directly to developers as model ID `o1-pro` via the Responses API.

Headline numbers: $150 per 1M input tokens, $600 per 1M output tokens. No cached-input pricing — o1-pro does not support automatic prompt caching as of June 2026. Context window is 200,000 tokens. Max output (including reasoning tokens, which are billed at the output rate but not returned to you) is 100,000 tokens. Knowledge cutoff is October 2023. Modalities are text + vision input; text output only. Function calling, structured outputs, and the Responses API are supported. Streaming is not supported on o1-pro (responses are returned as a single block once reasoning completes).

Below: full spec table, when o1-pro is the right call vs GPT-5 with `reasoning_effort: high` (much cheaper) or o3 (also cheaper), the minimal API request, and 8 FAQs. Sibling pages: o3 spec sheet · GPT-5 spec sheet · o1 reasoning cost calculator. Write an o1-pro-tuned prompt free with our ChatGPT prompt generator.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

o1-pro — Full spec sheet (June 2026)

Feature
o1-pro spec
ProviderOpenAI
Model ID (API)o1-pro
ReleasedDecember 2024
Input price (per 1M)$150.00
Cached input priceNot supported
Output price (per 1M)$600.00
Reasoning tokens billingBilled at output rate ($600/M)
Batch API discount50% off (where supported)
Context window200,000 tokens
Max output tokens (incl. reasoning)100,000 tokens
Modalities (input)Text, image
Modalities (output)Text
Function calling
Structured outputs (JSON Schema)
Streaming
Prompt caching
Vision (image understanding)
Knowledge cutoffOctober 2023
Endpoint/v1/responses

Sources verified 2026-06-20: OpenAI model page (https://platform.openai.com/docs/models/o1-pro), OpenAI pricing page (https://openai.com/api/pricing). o1-pro is only available via the Responses API endpoint; it is not available on the Chat Completions API. Streaming and prompt caching are not supported on o1-pro as of June 2026. Re-verify the live pages before budgeting — o1-pro pricing is the most likely model to see significant downward revisions over time.

What o1-pro actually is (and why it's so expensive)

o1-pro is OpenAI's highest-effort reasoning configuration. The base o1 model (released September 2024) introduced 'reasoning models' — models that produce a long internal chain of thought before writing the visible answer. o1-pro takes the same base model and runs it with substantially more reasoning compute per query: more tokens of internal reasoning, longer wall-clock per response, higher quality on the hardest tasks at correspondingly higher cost.

Pricing-wise, o1-pro is in a tier by itself. $150 input / $600 output per 1M is roughly 60× more expensive than GPT-5 ($1.25/$10) on input and 60× more expensive on output. A 1,000-in / 500-out call costs `0.001 × $150 + 0.0005 × $600 = $0.15 + $0.30 = $0.45` per call — and that's before reasoning tokens.

Reasoning tokens are the hidden cost. o1-pro can burn 10,000-50,000+ reasoning tokens on a complex problem before writing the visible answer. Those tokens bill at the output rate ($600/M) but are not returned to you. A query with 30,000 reasoning tokens + 500 visible output tokens bills `30,500 × $600/M = $18.30` for the output portion alone. A single hard query can cost $5-20.

OpenAI launched o1-pro at this price point because the unit economics of pro-tier reasoning compute are extreme — the model effectively runs the same prompt many times internally to find the best answer. As reasoning-model architecture matures, prices have fallen (o3 at $2/$8 per 1M is dramatically cheaper than o1-pro for many of the same workloads). Expect o1-pro's price to drop or the model to be deprecated in favor of higher-quality successors over 2026-2027.


Pricing math: when o1-pro pencils and when it doesn't

Base case: a 2,000-token problem with 20,000 reasoning tokens + 1,000 visible output. `(0.002 × $150) + (0.021 × $600) = $0.30 + $12.60 = $12.90` per call. Compare with GPT-5 `reasoning_effort: high` on the same problem: `(0.002 × $1.25) + (0.020 × $10) = $0.0025 + $0.20 = $0.2025`. o1-pro is ~64× more expensive on identical token volumes.

When o1-pro pencils: a single high-stakes decision that one wrong answer would cost >$100 to fix downstream. Legal contract analysis where a missed clause costs $50K. Medical diagnosis decision support where a wrong call costs a treatment cycle. Algorithmic-trading strategy synthesis where a bad backtest costs the day's P&L.

When o1-pro does NOT pencil: anything that runs at production volume. Anything where GPT-5 `reasoning_effort: high` or o3 closes 90% of the quality gap at <2% of the price. Anything where the cost of error is bounded (a chat response, a content draft, a code suggestion).

Most teams who experiment with o1-pro discover that the marginal quality gain over GPT-5 + high reasoning effort or o3 is not worth the 30-60× price premium for their actual workloads. The honest use case for o1-pro is narrow and shrinking as cheaper reasoning models close the gap. Worked $: o1 reasoning cost calculator.


What o1-pro doesn't support (the parameter gotchas)

**Streaming is not supported.** o1-pro returns the response as a single block once internal reasoning completes. For a hard problem with 30K reasoning tokens, that's 60-180 seconds of wall-clock before the first byte arrives. Plan UX accordingly — a 'thinking…' placeholder visible to the user, async job submission patterns, or both.

**Prompt caching is not supported.** Unlike GPT-5 (where caching automatically activates on stable prefixes), o1-pro charges full input rate on every call regardless of prefix stability. There is no cost optimization via caching available.

**Temperature, top_p, presence_penalty, frequency_penalty are ignored.** Reasoning models on OpenAI ignore the sampling parameters that work on GPT-5 / GPT-4 — the model controls its own internal sampling strategy as part of reasoning. The API accepts the parameters for compatibility but does not apply them.

**No system messages in the traditional sense.** o1-pro uses 'developer' messages (a new role introduced for reasoning models) for instructions that previously went in `system`. The Responses API handles this distinction automatically; chat completions pre-o1 patterns need adaptation.

**Only the Responses API.** o1-pro is not exposed on `/v1/chat/completions`. Must use `/v1/responses`. Migration from GPT-4o chat code requires API surface changes, not just a model-ID swap.


Reasoning tokens: budget like output, plan for explosion

Reasoning tokens on o1-pro bill at the output rate ($600/M) but are not returned to you. They are visible only in the response's `usage.output_tokens_details.reasoning_tokens` field after the fact — you cannot see them mid-stream (since streaming isn't supported anyway).

Typical reasoning token counts on o1-pro per query class: simple Q&A 500-2,000 reasoning tokens, moderate analysis 3,000-10,000, hard reasoning 15,000-40,000, math/proof tasks can hit 50K+. The variance is high — the same prompt run twice can produce 5K vs 25K reasoning tokens depending on how the internal search converges.

Cap with `max_output_tokens` to prevent runaway cost on the long tail. Setting `max_output_tokens: 20000` caps the combined reasoning + visible output at 20K tokens. The model truncates reasoning rather than producing a malformed visible answer. For production cost predictability, treat max_output_tokens as a hard guardrail.


When to pick o1-pro vs o3 vs GPT-5 high-reasoning

**Pick o1-pro** only when: a single decision is worth >$100 of cost-of-error, AND you've benchmarked it against o3 and GPT-5 `reasoning_effort: high` and o1-pro wins on the eval, AND the wall-clock latency (60-180 seconds) is acceptable for the use case. This is a narrow band; most teams discover they don't need o1-pro after running the comparison.

**Pick o3** ($2/$8 per 1M) for production reasoning at scale. o3 closes most of o1-pro's quality gap at ~1% of the price and supports streaming, prompt caching, and the full standard API surface. See o3 spec sheet for the side-by-side.

**Pick GPT-5 with `reasoning_effort: high`** when: you want reasoning depth in a model that also supports streaming, prompt caching, and the unified GPT-5 feature set. ~1% of o1-pro's price for typically 80-95% of o1-pro's quality on most tasks.

Honest assessment: o1-pro's economic case is shrinking. As of June 2026, it is the right call for fewer workloads than it was at launch in December 2024. Re-evaluate quarterly against the latest reasoning-model menu.


Vision and function calling on o1-pro

o1-pro supports text + image input. Pass images as URLs or base64-encoded data inside a user (or developer) message's content array. Vision reasoning on o1-pro is among the strongest of any production model — multi-image reasoning, chart analysis, diagram interpretation, complex visual logic puzzles. The price premium that doesn't pencil for text-only tasks is more defensible for vision-reasoning tasks where the cost-of-error is real.

Function calling is supported via the Responses API's standard tools mechanism. Parallel tool calls are supported. Structured outputs via the `text.format` parameter (JSON Schema enforcement) work as on GPT-5.

The Responses API supports stateful conversations (`previous_response_id` parameter) so you can chain o1-pro calls without re-sending the full conversation history. Given o1-pro's input price ($150/M), this is meaningful — replaying a 5,000-token conversation history bills $0.75 per turn at input alone. Use stateful mode to avoid that.


Verified sources and how to re-check the numbers

Every number on this page was verified against OpenAI's live documentation on 2026-06-20. Sources: platform.openai.com/docs/models/o1-pro for context, modalities, parameter support; openai.com/api/pricing for input/output prices.

o1-pro pricing has been stable since launch in December 2024. Expect downward revisions or a successor model in 2026-2027 — the unit economics of pro-tier reasoning will improve as the architecture matures, and OpenAI's pricing typically follows compute-cost reductions.

Methodology: when a number could not be cross-confirmed against an official OpenAI page on the verification date, it was omitted from this card rather than guessed.

Make your first o1-pro API call in 5 steps

  1. 1

    Confirm you actually need o1-pro

    Run your problem against GPT-5 (`reasoning_effort: high`) and o3 first. Benchmark all three on the same 30-50 representative inputs. If o1-pro wins by a margin that justifies 30-60× the price on your eval, proceed. If not, pick the cheapest model that hits your quality bar.

  2. 2

    Get an OpenAI API key with API access tier

    platform.openai.com → dashboard → API keys. o1-pro requires API tier 1+ (a basic verified account). High-volume use cases benefit from higher tiers for rate limits. Set `OPENAI_API_KEY=...` in `.env`.

  3. 3

    Use the Responses API only

    Python: `from openai import OpenAI; client = OpenAI(); r = client.responses.create(model='o1-pro', input='Your hard problem here', max_output_tokens=20000); print(r.output_text)`. Note: streaming is not supported, expect 60-180 second wall-clock before the response arrives.

  4. 4

    Set max_output_tokens as a hard guardrail

    o1-pro reasoning tokens can run away. `max_output_tokens=20000` caps combined reasoning + visible output at 20K tokens — about $12 worst-case. Without a cap, a runaway reasoning chain can produce $30-50 calls. Always set the cap in production.

    → Open the ChatGPT prompt generator
  5. 5

    Use stateful conversations to avoid input replay

    Pass `previous_response_id` on follow-up calls instead of re-sending the full conversation history. At $150/M input, replaying a 10K-token history is $1.50 per turn. Stateful mode amortizes that cost across the conversation.

Frequently Asked Questions

How much does o1-pro cost in 2026?

$150 per 1M input tokens, $600 per 1M output tokens. Reasoning tokens bill at the output rate ($600/M) but are not returned to you. No cached-input pricing. A 1,000-in / 500-out call without reasoning tokens costs ~$0.45; the same call with 20,000 reasoning tokens costs ~$12.90. Source: openai.com/api/pricing, verified 2026-06-20.

What is the difference between o1 and o1-pro?

Same base reasoning model. o1-pro runs with substantially more internal reasoning compute per query — more tokens of reasoning, longer wall-clock, higher quality on the hardest tasks. Price reflects the difference: o1 is ~$15/$60 per 1M; o1-pro is $150/$600 — 10× more expensive. For most teams, o1 or the newer o3 ($2/$8) is the right pick over o1-pro.

What is o1-pro's context window?

200,000 tokens. Max output (including reasoning tokens) is 100,000 tokens per response. The reasoning tokens are the dominant output budget consumer for hard problems — typical hard queries burn 10K-40K reasoning tokens before producing the visible answer.

Does o1-pro support streaming?

No. o1-pro returns the response as a single block once internal reasoning completes. For a hard query with 30K reasoning tokens, expect 60-180 seconds of wall-clock before the first byte arrives. Plan UX with explicit 'thinking' state or async job submission patterns.

Does o1-pro support prompt caching?

No. Unlike GPT-5 (where prompt caching automatically activates on stable prefixes for 90% off the cached portion), o1-pro charges the full $150/M input rate on every call. There is no cost optimization via caching.

What temperature and sampling parameters does o1-pro accept?

The API accepts `temperature`, `top_p`, `presence_penalty`, `frequency_penalty` for backward compatibility but does NOT apply them. Reasoning models control their own internal sampling strategy as part of reasoning. Only `max_output_tokens`, `tools`, `tool_choice`, and `text.format` (structured outputs) materially affect o1-pro behavior.

Where is o1-pro available?

OpenAI API (Responses endpoint only — not chat completions) and ChatGPT Pro ($200/month consumer tier). API and consumer billing are separate — a ChatGPT Pro subscription does NOT include API o1-pro credit, and an API tier high enough to use o1-pro does not give access to the ChatGPT Pro UI.

Should I migrate from o1-pro to o3?

Probably yes for most workloads. o3 is $2/$8 per 1M — about 1% of o1-pro's price — and closes most of o1-pro's quality gap on standard benchmarks. o3 supports streaming, prompt caching, and the full standard API surface. Run a side-by-side on your specific workload before committing; for the narrow class of tasks where o1-pro wins, the price premium can still be justified.

o1-pro is the most expensive reasoning model on the menu. Make every token count.

Our AI Prompt Generator writes o1-pro-tuned prompts (problem statement upfront, no CoT scaffolding, structured-output ready, max-output capped) based on YOUR business + task. 14-day free trial of DDH Pro, no card.

Browse all prompt tools →