Model card · Verified against OpenAI docs · 2026-06-20

GPT-5: Full Spec Sheet (June 2026)

By The DDH Team at Digital Dashboard Hub·Updated June 20, 2026

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

GPT-5 is OpenAI's flagship general-purpose model, released August 2025. It replaced GPT-4o and the GPT-4.5 preview line and consolidated reasoning, multimodal input, and tool use into a single endpoint. As of June 2026 it is the default model for ChatGPT Plus, Pro, Team, and Enterprise, and the recommended choice on the API for any task that crossed the threshold where GPT-4o stopped being good enough.

The headline numbers: $1.25 per 1M input tokens, $10 per 1M output, $0.125 per 1M for cached input (a 90% discount on the cached prefix). The context window is 400,000 tokens — combined input + output — with a hard ceiling of 128,000 output tokens per response. Knowledge cutoff is September 30, 2024. Modalities are text and image input; text output only. Function calling, parallel tool calls, structured outputs (JSON Schema), the Responses API, prompt caching, and the Batch API (50% off) are all supported.

Below is the full spec table, the minimal cURL + Python request, when GPT-5 is the right call vs gpt-5-mini or gpt-5-nano, a side-by-side against Claude Opus 4.7 and Gemini 2.5 Pro, and the FAQs that cover every nuance the docs gloss over. Sibling pages: GPT-5 mini spec sheet · Claude Opus 4.7 spec sheet · Gemini 2.5 Pro spec sheet. Write a GPT-5-tuned prompt free with our ChatGPT prompt generator.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card. →

GPT-5 — Full spec sheet (June 2026)

Feature	GPT-5 spec
Provider	OpenAI
Model ID (API)	gpt-5
Released	August 2025
Input price (per 1M)	$1.25
Cached input price (per 1M)	$0.125 (90% off)
Output price (per 1M)	$10.00
Batch API discount	50% off input + output
Context window (input + output)	400,000 tokens
Max output tokens	128,000 tokens
Modalities (input)	Text, image
Modalities (output)	Text
Function calling
Parallel tool calls
Structured outputs (JSON Schema)
Streaming
Prompt caching (automatic)
Vision (image understanding)
Reasoning effort control	minimal / low / medium / high
Knowledge cutoff	September 30, 2024
Endpoint	/v1/responses, /v1/chat/completions

Sources verified 2026-06-20: OpenAI model page (https://platform.openai.com/docs/models/gpt-5), OpenAI pricing page (https://openai.com/api/pricing), OpenAI Responses API reference (https://platform.openai.com/docs/api-reference/responses). Prices and limits change without notice — re-verify the live pages before budgeting.

What GPT-5 actually is (and what changed from GPT-4o)

GPT-5 is OpenAI's first model to ship reasoning, multimodal input, and tool use as a unified, single-model surface rather than three separate endpoints. Where GPT-4o, o1, and o3-mini were three distinct API surfaces in 2024-2025, GPT-5 collapses them: a single `gpt-5` model ID with a `reasoning_effort` parameter (`minimal`, `low`, `medium`, `high`) that scales how many internal reasoning tokens the model burns before answering.

Practically, this means you no longer pick a 'chat model' vs a 'reasoning model'. You pick GPT-5 and dial reasoning effort to match the task. A classification call uses `reasoning_effort: minimal` and bills like GPT-4o. A code-synthesis or math-proof call uses `reasoning_effort: high` and burns several thousand reasoning tokens — billed at the output rate even though they're not returned in the response.

Vision is built in: pass an image URL or base64-encoded image in any user message and GPT-5 will analyze it. Function calling, parallel tool calls, structured outputs (force the model to return JSON conforming to a JSON Schema), and prompt caching are all on by default. The Responses API (`/v1/responses`) is OpenAI's recommended endpoint for new code; chat completions still works for backward compatibility.

Pricing math: what GPT-5 actually costs per call

The pricing formula is the standard per-token model: `cost = (input_tokens / 1M) × $1.25 + (output_tokens / 1M) × $10`. A representative 1,000-token-in / 500-token-out call: `0.001 × $1.25 + 0.0005 × $10 = $0.00125 + $0.005 = $0.00625`. Roughly 0.6¢ per call.

Apply prompt caching: 80% of a 2,000-token system prompt cached across calls drops 1,600 input tokens from $1.25/M to $0.125/M — saving $0.0018 per call. At 100,000 calls/month, that is $180 off the bill at no quality cost.

Apply the Batch API: a JSONL upload of 1,000 requests with up-to-24-hour delivery runs at $0.625/M input and $5/M output — 50% off both streams. The 1,000-in / 500-out workload drops to $0.003125 per call.

Reasoning tokens are the hidden line item. With `reasoning_effort: high`, GPT-5 can burn 4,000-10,000 reasoning tokens before producing a 500-token visible answer. Those reasoning tokens bill at the output rate ($10/M) but are not returned to you. A high-reasoning call that produces a 500-token answer with 5,000 reasoning tokens bills 5,500 output tokens = $0.055 — 8.8× a no-reasoning call. Budget accordingly. For a full cost walkthrough across the GPT-5 family, see our OpenAI API cost calculator.

Context window: 400K total, 128K output ceiling

GPT-5's 400,000-token context window is combined input + output — not 400K of each. The hard ceiling on output is 128,000 tokens per response, so a maximum-length response leaves 272,000 tokens for input. In practice, most production workloads run with input in the 5K-50K range and output capped well below 4K via `max_output_tokens`, which is the configuration that hits the price-performance sweet spot.

400K is enough to fit roughly a 300,000-word document (≈600 pages of single-spaced text) or a full 1,500-line codebase chunk with metadata. It is less than Gemini 2.5 Pro's 1M window but more than Claude Opus 4.7's 200K. For retrieval-augmented workflows, GPT-5's window is comfortable. For long-document summarization where the entire input must fit in one shot, Gemini 2.5 Pro is the only frontier model with a larger window.

Cap output length always. The default `max_output_tokens` is the model maximum (128K), and a model that decides to ramble can run away in cost. Setting `max_output_tokens: 2000` for a normal Q&A request, `max_output_tokens: 8000` for code generation, and only lifting that cap when the task genuinely needs it (long-form report, full codebase review) is the discipline that separates predictable bills from surprise invoices.

Modalities: text in, text + image in, text out

GPT-5 accepts text and images as input. There is no audio input or output on the standard `gpt-5` endpoint (OpenAI's Realtime API uses a separate model for speech-to-speech). Image input is sent as either a URL OpenAI can fetch or a base64-encoded data URL inside a message's `content` array. Each image is billed at a fixed token cost depending on resolution detail (`low` is ~85 tokens, `high` is up to ~1,100 tokens per image for the standard tile size).

Output is text only — no native image generation. To generate images, call `gpt-image-1` (DALL·E 3's successor) or send a tool call from GPT-5 to your image-gen pipeline. For audio, integrate the Realtime API separately or use Whisper for transcription before passing text to GPT-5.

Vision quality on GPT-5 is meaningfully stronger than GPT-4o on charts, diagrams, handwritten text, and multi-image reasoning. It still struggles with very small text (license plates, dense screenshots at low resolution), and as with every vision model, OCR is better handled by a dedicated OCR tool when the volume justifies it.

Function calling, tools, and structured outputs

GPT-5 supports the full function-calling API surface: define tools as JSON Schema, GPT-5 picks one (or several, in parallel) and returns the arguments to call. Parallel tool calls let GPT-5 invoke multiple tools in a single response when it would speed up the task — e.g., fetching two API endpoints simultaneously. This is on by default; pass `parallel_tool_calls: false` to disable.

Structured outputs (introduced in 2024 for GPT-4o, refined in GPT-5) let you pass a `response_format: { type: 'json_schema', json_schema: {...} }` parameter and OpenAI guarantees the model's output validates against that schema. No more parsing free-form JSON and retrying on validation errors. Supports nested objects, arrays, enums, oneOf — the JSON Schema subset that grew across 2025.

The new Responses API (`/v1/responses`) is OpenAI's recommended endpoint for new integrations as of mid-2025. It supports stateful conversations (server-side message storage), built-in tools (file search, web search, code interpreter, computer use), and a cleaner streaming protocol. Chat completions (`/v1/chat/completions`) still works for everything GPT-5 does and remains the lowest-friction path for migrating from earlier GPT models.

Reasoning effort: GPT-5's most important parameter

GPT-5 exposes a `reasoning_effort` parameter with four levels: `minimal`, `low`, `medium` (default), and `high`. This is the dial that turns GPT-5 from a fast chat model into a slow reasoning model on a single call basis — no separate model ID needed.

`minimal`: near-zero reasoning tokens, fastest response, billed essentially as input + visible output. Use for classification, extraction, simple Q&A, format conversions — anything where the answer is mechanical.

`low`: a few hundred reasoning tokens. The sweet spot for general-purpose chat, content generation, and routine code tasks. Adds <$0.005 per call vs minimal on most workloads.

`medium` (default): typically 1,000-3,000 reasoning tokens. Use for analysis, multi-step planning, code synthesis where correctness matters.

`high`: 4,000-10,000+ reasoning tokens. Use for math proofs, complex code synthesis with strict correctness requirements, legal/financial analysis. Bills 5-10× a no-reasoning call. Most teams over-use `high` and under-use `low`; profile your traffic and right-size each prompt's effort level.

When to pick GPT-5 vs gpt-5-mini vs gpt-5-nano

**Pick GPT-5** when output quality is the bottleneck — code synthesis, complex reasoning, multi-step planning, vision tasks with high accuracy requirements, anything that ships to humans where 'good enough' wasn't good enough on gpt-4o. The 8× price premium over gpt-5-mini is justified when downstream cost-of-error dominates per-call cost.

**Pick gpt-5-mini** ($0.25 / $2 per 1M) when you need GPT-5's instruction following and structured outputs but the underlying task is mechanical: classification, extraction, summarization, structured-data transformation, simple chat. Most production workloads with >100K calls/month live on gpt-5-mini.

**Pick gpt-5-nano** ($0.05 / $0.40 per 1M, where available) for embedded use cases — intent routing, content moderation, autocomplete-style suggestions, internal telemetry classification. Avoid for anything requiring multi-step reasoning.

Cross-vendor: pick **Claude Opus 4.7** for long-form writing where Anthropic's voice and refusal calibration matters more than raw IQ; pick **Gemini 2.5 Pro** when you need a 1M context window in a single call or native video understanding. See GPT-5 vs Claude Opus 4.7 for the side-by-side.

Verified sources and how to re-check the numbers

Every number on this page was verified against OpenAI's live documentation on 2026-06-20. The canonical sources: platform.openai.com/docs/models/gpt-5 for context window, modalities, and parameter support; openai.com/api/pricing for input/output/cached prices; platform.openai.com/docs/api-reference/responses for the Responses API contract.

OpenAI does not version their pricing or model pages with explicit changelog entries. Prices have moved 3-5 times per year on average since GPT-4 launched in 2023, almost always downward as a model matures. Re-verify quarterly if your monthly bill exceeds $1,000 — a single price move shifts the budget materially at scale.

Methodology: when a number could not be cross-confirmed against the official OpenAI page on the verification date, it was omitted from this card rather than guessed. If you find a discrepancy against the live OpenAI page, treat the live page as canonical.

Make your first GPT-5 API call in 5 steps

1
Get an OpenAI API key
Sign in at platform.openai.com → dashboard → API keys → Create new secret key. Copy it to a `.env` file as `OPENAI_API_KEY=...`. Never commit keys to git.
2
Install the SDK
Python: `pip install openai`. Node: `npm install openai`. The SDK supports GPT-5, the Responses API, structured outputs, vision input, and prompt caching with no version-pinning needed beyond the latest stable release.
3
Send a minimal call
Python: `from openai import OpenAI; client = OpenAI(); r = client.responses.create(model='gpt-5', input='Explain caching prefixes in one sentence.'); print(r.output_text)`. That is the entire round-trip — model ID, input, response.
4
Add reasoning effort + max output cap
For predictable cost: `client.responses.create(model='gpt-5', input=prompt, reasoning={'effort': 'low'}, max_output_tokens=2000)`. Reasoning effort `low` is the right default for most chat-style workloads; cap output to keep the bill bounded.
→ Open the ChatGPT prompt generator
5
Add structured outputs for production
Force a typed response: pass `text={'format': {'type': 'json_schema', 'json_schema': {...}}}` in Responses API or `response_format` in chat completions. The model is guaranteed to return JSON that validates — no parse-and-retry loops.

Digital Dashboard Hub

The prompt patterns above work 10x better when they live in a library you actually own — tunable to your niche, exportable to GPT-5, Claude, Gemini, Perplexity, Midjourney, Llama. Stop pasting across 6 tools.

Try DDH's AI Prompt Builder — free 14 days, no card. →

Related calculators

OpenAI Pricing Calculator →GPT-5.5, 5.4, mini, nano — full per-call cost in one input.Claude Pricing Calculator →Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5 — input + output combined.Context Window Comparison →Max input length and price per 1M for every current model.

Related prompt tools

ChatGPT prompt generator (GPT-5-tuned)→Code prompt builder (cache-anchored)→GPT-5 mini spec sheet→Claude Opus 4.7 spec sheet→OpenAI API cost calculator→

Frequently Asked Questions

How much does GPT-5 cost in 2026?

$1.25 per 1M input tokens, $10 per 1M output tokens, $0.125 per 1M for cached input (90% off). Batch API takes another 50% off both streams for asynchronous jobs with up-to-24-hour delivery. A representative 1,000-in / 500-out call costs ~$0.00625. Source: openai.com/api/pricing, verified 2026-06-20.

What is GPT-5's context window?

400,000 tokens — combined input + output. The maximum output per response is capped at 128,000 tokens, so a maximum-length response leaves 272,000 tokens for input. Larger than Claude Opus 4.7 (200K), smaller than Gemini 2.5 Pro (1M).

What is GPT-5's knowledge cutoff?

September 30, 2024 per OpenAI's model card. For anything after that date — events, releases, API changes — GPT-5 has no knowledge unless you provide it via context or a web-search tool call.

What is the difference between GPT-5 and GPT-5 mini?

Same context window (400K), same modalities (text + image input), same feature set (function calling, structured outputs, prompt caching). The difference is quality and price: GPT-5 is $1.25/$10 per 1M, mini is $0.25/$2 — about 5× cheaper. GPT-5 has stronger reasoning, code synthesis, and multi-step planning; mini is the right pick for high-volume mechanical tasks. See our GPT-5 mini spec sheet for the side-by-side.

Does GPT-5 support vision?

Yes. Pass images as URLs or base64 data URLs inside a user message's content array. GPT-5's vision is meaningfully stronger than GPT-4o on charts, diagrams, handwritten text, and multi-image reasoning. Output is text only — no native image generation; use `gpt-image-1` for that.

What is reasoning_effort and how do I use it?

GPT-5 exposes a `reasoning_effort` parameter with four levels: `minimal`, `low`, `medium` (default), `high`. It controls how many internal reasoning tokens GPT-5 burns before producing the visible answer. Reasoning tokens bill at the output rate. Use `minimal` for classification/extraction, `low` for chat, `medium` for analysis, `high` only when correctness dominates cost (proofs, complex code synthesis).

Is GPT-5 available in the API or only in ChatGPT?

Both. Model ID `gpt-5` is available on platform.openai.com via the Responses API (`/v1/responses`, recommended for new code) and chat completions (`/v1/chat/completions`, backward compatible). It is also the default model for ChatGPT Plus, Pro, Team, and Enterprise. API billing and ChatGPT subscription billing are separate — a ChatGPT Plus subscription does not include API credit.

Can I fine-tune GPT-5?

OpenAI has not opened public fine-tuning on GPT-5 as of June 2026 — fine-tuning is available on gpt-4.1, gpt-4o, and gpt-4o-mini. For most use cases, GPT-5 + structured outputs + a well-engineered prompt closes the quality gap that fine-tuning would address. Check platform.openai.com/docs/guides/fine-tuning for the current model availability.

Stop overpaying on GPT-5. Write prompts built for the model.

Our AI Prompt Generator writes GPT-5-tuned prompts (system+developer+user split, structured-output-ready, cache-anchored) based on YOUR business + task. 14-day free trial of DDH Pro, no card.

Browse all prompt tools →