Model card · Verified against OpenAI docs · 2026-06-20

GPT-5 mini: Full Spec Sheet (June 2026)

By The DDH Team at Digital Dashboard Hub·Updated June 20, 2026

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

gpt-5-mini is OpenAI's mid-tier sibling to GPT-5, released alongside the flagship in August 2025. Same context window, same modalities (text + image input), same feature set (function calling, structured outputs, parallel tool calls, prompt caching, the Responses API), same reasoning-effort parameter. The only meaningful difference is the underlying model size — and the pricing that follows.

Headline numbers: $0.25 per 1M input tokens, $2 per 1M output, $0.025 per 1M for cached input (90% off). That is 5× cheaper than GPT-5 on both input and output. Context window is 400,000 tokens combined; max output is 128,000 tokens. Knowledge cutoff is May 31, 2024. The same Responses API endpoint serves it.

Most production teams running over 100,000 calls/month live on gpt-5-mini. It is the right default for classification, extraction, summarization, structured-data transformation, routine chat, content scaffolding — anything where the GPT-5 flagship would be overpaying for capability you don't use. Below: full spec table, when to pick it over GPT-5 or Claude Sonnet 4.6, minimal API request, 8 FAQs. Sibling pages: GPT-5 spec sheet · Claude Sonnet 4.6 spec sheet · Gemini 2.5 Flash spec sheet. Write a gpt-5-mini-tuned prompt free with our ChatGPT prompt generator.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card. →

gpt-5-mini — Full spec sheet (June 2026)

Feature	gpt-5-mini spec
Provider	OpenAI
Model ID (API)	gpt-5-mini
Released	August 2025
Input price (per 1M)	$0.25
Cached input price (per 1M)	$0.025 (90% off)
Output price (per 1M)	$2.00
Batch API discount	50% off input + output
Context window (input + output)	400,000 tokens
Max output tokens	128,000 tokens
Modalities (input)	Text, image
Modalities (output)	Text
Function calling
Parallel tool calls
Structured outputs (JSON Schema)
Streaming
Prompt caching (automatic)
Vision (image understanding)
Reasoning effort control	minimal / low / medium / high
Knowledge cutoff	May 31, 2024
Endpoint	/v1/responses, /v1/chat/completions

Sources verified 2026-06-20: OpenAI model page (https://platform.openai.com/docs/models/gpt-5-mini), OpenAI pricing page (https://openai.com/api/pricing). Prices change without notice — re-verify before budgeting.

What gpt-5-mini actually is (vs gpt-5)

gpt-5-mini is a smaller, faster, cheaper variant of GPT-5 built on the same architecture and training pipeline. OpenAI does not publish parameter counts for either model, but in practice gpt-5-mini is roughly the GPT-5 family's equivalent of gpt-4o-mini in the GPT-4 family — same instruction-following discipline, weaker on hard reasoning, dramatically cheaper.

Crucially, the feature surface is identical: gpt-5-mini supports the same function calling, parallel tool calls, structured outputs (JSON Schema-validated), prompt caching, the Responses API endpoint, vision input, and reasoning-effort control. Anything you write for GPT-5 runs on gpt-5-mini with a single model-ID change. The difference shows up on tasks that need multi-step reasoning, complex code synthesis, or strict factual accuracy.

OpenAI's positioning: gpt-5-mini is the 'default for production' tier. Most chat assistants, structured-data extractors, classification pipelines, content scaffolders, and routing agents should live here. Reserve GPT-5 for the small fraction of traffic where cost-of-error genuinely dominates per-call cost.

Pricing math: what gpt-5-mini actually costs per call

Standard rates: `cost = (input_tokens / 1M) × $0.25 + (output_tokens / 1M) × $2`. The representative 1,000-in / 500-out call: `0.001 × $0.25 + 0.0005 × $2 = $0.00025 + $0.001 = $0.00125`. About 0.125¢ per call — one-fifth the GPT-5 cost on identical tokens.

At 1,000,000 calls/month with that profile, the standard rate runs $1,250 vs GPT-5's $6,250. Apply 80% prompt cache hit on a 1,500-token system prefix and the per-call input drops further; apply Batch API for asynchronous workloads (summarization, classification, evaluation) and another 50% off both streams.

Real-world example: a customer-support classification pipeline at 2M calls/month with 800-in / 100-out averages on gpt-5-mini runs `0.0008 × $0.25 + 0.0001 × $2 = $0.0004 per call = $800/month`. With prompt caching activating on the stable system prefix, ~$500/month. On GPT-5, the same workload is $4,000/month. The model choice is the largest cost lever in the GPT-5 family by a wide margin. Worked $ at scale: OpenAI API cost calculator.

Context window and output cap — same shape as GPT-5

gpt-5-mini ships with the same 400,000-token context window as GPT-5 and the same 128,000-token max output ceiling. There is no smaller-context variant; OpenAI standardized the window across the GPT-5 family so prompt-engineering work is portable across tiers.

Practical implication: prompts that work on GPT-5 fit on gpt-5-mini without context-related rewrites. The price gap is purely a quality gap, not a context gap. If you're moving traffic from GPT-5 to gpt-5-mini for cost reasons, the migration is a model-ID swap plus an eval pass — not a prompt rewrite.

As always, cap output. Set `max_output_tokens` to the realistic ceiling for your task (1,500 for chat, 4,000 for code, 8,000 for long-form). The default ceiling (128K) is a defensive guard, not a target.

Where gpt-5-mini wins and where it loses to GPT-5

**Wins (use gpt-5-mini)**: classification, named-entity extraction, summarization of structured input, format conversions (JSON ↔ YAML ↔ Markdown), simple chat assistance, routing/dispatch agents, content scaffolding (outlines, first drafts), structured-output pipelines, internal telemetry classification, document tagging, sentiment analysis. Quality on these is indistinguishable from GPT-5 on a held-out eval for most teams.

**Loses (use GPT-5)**: complex code synthesis with multi-file context, math proofs, multi-step planning that requires backtracking, legal/financial analysis where correctness is non-negotiable, fine-grained vision reasoning (counting objects in dense scenes, reading small text), competitive coding-benchmark-level tasks. The 5× price premium for GPT-5 is justified when these are the bottleneck.

Run an eval before you commit. Take 200 representative inputs from your production traffic, run both models, blind-score the outputs. Most teams discover gpt-5-mini covers 70-90% of traffic with no measurable quality drop and reserve GPT-5 for the rest — typically routed via a simple complexity classifier.

Compared to: Claude Sonnet 4.6 and Gemini 2.5 Flash

gpt-5-mini at $0.25 / $2 sits between Claude Sonnet 4.6 ($3 / $15 — Anthropic's mid-tier, slightly higher quality, much higher price) and Gemini 2.5 Flash ($0.30 / $2.50 — Google's mid-tier, comparable price, 1M context).

vs **Claude Sonnet 4.6**: 12× cheaper on input, 7.5× cheaper on output. Sonnet wins on long-form writing voice, instruction-following discipline on complex prompts, and prompt caching savings (90% off cached reads via Anthropic). gpt-5-mini wins on raw throughput and price. See Claude Sonnet vs GPT-5 mini for the side-by-side.

vs **Gemini 2.5 Flash**: comparable price ($0.25 vs $0.30 input), comparable output ($2 vs $2.50), but Flash has a 1M context window vs gpt-5-mini's 400K. Flash also supports native audio input. gpt-5-mini wins on structured outputs (OpenAI's JSON Schema enforcement is more mature) and the broader OpenAI tooling ecosystem.

Cross-tier: gpt-5-mini at 5× cheaper than GPT-5 covers the same feature set; most teams should default here and escalate only on demonstrated need.

Reasoning effort on gpt-5-mini

gpt-5-mini exposes the same `reasoning_effort` parameter as GPT-5 (`minimal`, `low`, `medium`, `high`). The same rules apply: reasoning tokens bill at the output rate, are not returned to you, and can dramatically inflate cost if defaulted to `high`.

On gpt-5-mini specifically, `minimal` is the right default for high-volume mechanical tasks (classification, extraction). The cost-per-call stays under $0.001 for typical inputs. `low` is the right default for chat and content generation. `medium` and `high` are usually wasted on gpt-5-mini — if the task needs serious reasoning, the better move is escalating to GPT-5 with `medium`, not running gpt-5-mini at `high`.

Common mistake: leaving `reasoning_effort` at the default (`medium`) on a classification workload at 1M+ calls/month. The reasoning tokens silently 2-3× the bill. Always explicitly set the effort level on production prompts.

Verified sources and how to re-check the numbers

Every number on this page was verified against OpenAI's live documentation on 2026-06-20. Sources: platform.openai.com/docs/models/gpt-5-mini, openai.com/api/pricing, platform.openai.com/docs/api-reference/responses.

Prices on the GPT-5 family have moved once since launch (a downward adjustment on cached input in late 2025). OpenAI does not version their pricing page with changelog entries. Re-verify the live page quarterly if your monthly bill exceeds $500.

If you find a discrepancy with the live page, treat the live page as canonical. We re-fetch and update this card monthly.

Switch from GPT-5 to gpt-5-mini in 5 steps

1
Pick a representative slice of production traffic
Sample 200-500 real inputs from the last week of your gpt-5 traffic. Stratify across task types (classification, extraction, chat, code) so the eval covers your actual workload, not just the easy cases.
2
Run both models on the same inputs
Identical prompt, identical parameters, only the model ID changes (`gpt-5` → `gpt-5-mini`). Log both outputs. Cost-track: gpt-5-mini at 5× cheaper, so the cost delta is the lower bound on savings.
3
Blind-score the outputs
Have a human or an LLM-judge score outputs without knowing which model produced which. On most production tasks (classification, extraction, summarization, routine chat), gpt-5-mini scores within 2-5% of gpt-5. That delta is usually within the noise of the eval.
4
Set explicit reasoning_effort + max_output_tokens
On production gpt-5-mini calls: `reasoning_effort='minimal'` for classification/extraction, `'low'` for chat. Cap `max_output_tokens` to the realistic ceiling for the task. Both prevent silent cost creep.
→ Open the ChatGPT prompt generator
5
Migrate the model ID, monitor for 7 days
Change `model='gpt-5'` to `model='gpt-5-mini'` in production. Monitor your quality metrics (acceptance rate, manual-review rate, downstream error rate) for 7 days. If quality holds, the migration is done. If it slips on a subset of traffic, route that subset back to gpt-5 via a complexity classifier.

Digital Dashboard Hub

The prompt patterns above work 10x better when they live in a library you actually own — tunable to your niche, exportable to GPT-5, Claude, Gemini, Perplexity, Midjourney, Llama. Stop pasting across 6 tools.

Try DDH's AI Prompt Builder — free 14 days, no card. →

Related calculators

OpenAI Pricing Calculator →GPT-5.5, 5.4, mini, nano — full per-call cost in one input.Claude Pricing Calculator →Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5 — input + output combined.Context Window Comparison →Max input length and price per 1M for every current model.

Related prompt tools

ChatGPT prompt generator (mini-tuned)→GPT-5 spec sheet→Claude Sonnet 4.6 spec sheet→Gemini 2.5 Flash spec sheet→OpenAI API cost calculator→

Frequently Asked Questions

How much does gpt-5-mini cost in 2026?

$0.25 per 1M input tokens, $2 per 1M output tokens, $0.025 per 1M for cached input (90% off). Batch API takes another 50% off both streams. A representative 1,000-in / 500-out call costs ~$0.00125 — about 5× cheaper than GPT-5 on identical tokens. Source: openai.com/api/pricing, verified 2026-06-20.

What is the difference between gpt-5 and gpt-5-mini?

Same context window (400K), same modalities (text + image input), same feature set (function calling, structured outputs, parallel tool calls, prompt caching, reasoning_effort). The difference is model size, output quality, and price. gpt-5-mini is 5× cheaper on both input and output. Use gpt-5-mini as the default; escalate to gpt-5 for complex code synthesis, multi-step planning, or correctness-critical tasks.

What is gpt-5-mini's context window?

400,000 tokens combined input + output, with a 128,000-token max output ceiling. Identical to gpt-5 — OpenAI standardized the window across the GPT-5 family so prompts are portable across tiers.

Does gpt-5-mini support function calling and structured outputs?

Yes — full parity with gpt-5. Function calling, parallel tool calls, structured outputs (JSON Schema validation guaranteed by the API), the Responses API, streaming, and prompt caching are all supported.

What is gpt-5-mini's knowledge cutoff?

May 31, 2024 per OpenAI's model card. Slightly earlier than gpt-5's cutoff (September 30, 2024). For anything after May 2024, provide context in the prompt or use a web-search tool call.

Is gpt-5-mini available in ChatGPT or only the API?

Primarily the API. ChatGPT users on Free and Plus tiers see a mix of gpt-5 and gpt-5-mini depending on load and rate limits, but the model selection UI in ChatGPT does not expose gpt-5-mini as an explicit choice. For deterministic gpt-5-mini use, call the API.

Should I use gpt-5-mini or gpt-4o-mini?

gpt-5-mini in almost every case. Same price tier as gpt-4o-mini ($0.15/$0.60 vs gpt-5-mini's $0.25/$2), but with the full GPT-5 feature set, larger context (400K vs 128K), better structured-output enforcement, and the unified reasoning_effort parameter. gpt-4o-mini remains available for backward compatibility on legacy fine-tunes.

Can I fine-tune gpt-5-mini?

OpenAI has not opened public fine-tuning on gpt-5-mini as of June 2026. Fine-tuning remains available on gpt-4.1, gpt-4o, and gpt-4o-mini. For most use cases, gpt-5-mini + structured outputs + a well-engineered prompt closes the quality gap fine-tuning would address. Check platform.openai.com/docs/guides/fine-tuning for current model availability.

Save 5× on GPT-5 traffic. Write mini-tuned prompts.

Our AI Prompt Generator writes gpt-5-mini-tuned prompts (tight context, structured outputs, cache-anchored, explicit reasoning_effort) based on YOUR business + task. 14-day free trial of DDH Pro, no card.

Browse all prompt tools →