Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
Model card · Verified against OpenAI docs · 2026-06-20

GPT-5 mini: Full Spec Sheet (June 2026)

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

gpt-5-mini is OpenAI's mid-tier sibling to GPT-5, released alongside the flagship in August 2025. Same context window, same modalities (text + image input), same feature set (function calling, structured outputs, parallel tool calls, prompt caching, the Responses API), same reasoning-effort parameter. The only meaningful difference is the underlying model size — and the pricing that follows.

Headline numbers: $0.25 per 1M input tokens, $2 per 1M output, $0.025 per 1M for cached input (90% off). That is 5× cheaper than GPT-5 on both input and output. Context window is 400,000 tokens combined; max output is 128,000 tokens. Knowledge cutoff is May 31, 2024. The same Responses API endpoint serves it.

Most production teams running over 100,000 calls/month live on gpt-5-mini. It is the right default for classification, extraction, summarization, structured-data transformation, routine chat, content scaffolding — anything where the GPT-5 flagship would be overpaying for capability you don't use. Below: full spec table, when to pick it over GPT-5 or Claude Sonnet 4.6, minimal API request, 8 FAQs. Sibling pages: GPT-5 spec sheet · Claude Sonnet 4.6 spec sheet · Gemini 2.5 Flash spec sheet. Write a gpt-5-mini-tuned prompt free with our ChatGPT prompt generator.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

gpt-5-mini — Full spec sheet (June 2026)

Feature
gpt-5-mini spec
ProviderOpenAI
Model ID (API)gpt-5-mini
ReleasedAugust 2025
Input price (per 1M)$0.25
Cached input price (per 1M)$0.025 (90% off)
Output price (per 1M)$2.00
Batch API discount50% off input + output
Context window (input + output)400,000 tokens
Max output tokens128,000 tokens
Modalities (input)Text, image
Modalities (output)Text
Function calling
Parallel tool calls
Structured outputs (JSON Schema)
Streaming
Prompt caching (automatic)
Vision (image understanding)
Reasoning effort controlminimal / low / medium / high
Knowledge cutoffMay 31, 2024
Endpoint/v1/responses, /v1/chat/completions

Sources verified 2026-06-20: OpenAI model page (https://platform.openai.com/docs/models/gpt-5-mini), OpenAI pricing page (https://openai.com/api/pricing). Prices change without notice — re-verify before budgeting.

What gpt-5-mini actually is (vs gpt-5)

gpt-5-mini is a smaller, faster, cheaper variant of GPT-5 built on the same architecture and training pipeline. OpenAI does not publish parameter counts for either model, but in practice gpt-5-mini is roughly the GPT-5 family's equivalent of gpt-4o-mini in the GPT-4 family — same instruction-following discipline, weaker on hard reasoning, dramatically cheaper.

Crucially, the feature surface is identical: gpt-5-mini supports the same function calling, parallel tool calls, structured outputs (JSON Schema-validated), prompt caching, the Responses API endpoint, vision input, and reasoning-effort control. Anything you write for GPT-5 runs on gpt-5-mini with a single model-ID change. The difference shows up on tasks that need multi-step reasoning, complex code synthesis, or strict factual accuracy.

OpenAI's positioning: gpt-5-mini is the 'default for production' tier. Most chat assistants, structured-data extractors, classification pipelines, content scaffolders, and routing agents should live here. Reserve GPT-5 for the small fraction of traffic where cost-of-error genuinely dominates per-call cost.


Pricing math: what gpt-5-mini actually costs per call

Standard rates: `cost = (input_tokens / 1M) × $0.25 + (output_tokens / 1M) × $2`. The representative 1,000-in / 500-out call: `0.001 × $0.25 + 0.0005 × $2 = $0.00025 + $0.001 = $0.00125`. About 0.125¢ per call — one-fifth the GPT-5 cost on identical tokens.

At 1,000,000 calls/month with that profile, the standard rate runs $1,250 vs GPT-5's $6,250. Apply 80% prompt cache hit on a 1,500-token system prefix and the per-call input drops further; apply Batch API for asynchronous workloads (summarization, classification, evaluation) and another 50% off both streams.

Real-world example: a customer-support classification pipeline at 2M calls/month with 800-in / 100-out averages on gpt-5-mini runs `0.0008 × $0.25 + 0.0001 × $2 = $0.0004 per call = $800/month`. With prompt caching activating on the stable system prefix, ~$500/month. On GPT-5, the same workload is $4,000/month. The model choice is the largest cost lever in the GPT-5 family by a wide margin. Worked $ at scale: OpenAI API cost calculator.


Context window and output cap — same shape as GPT-5

gpt-5-mini ships with the same 400,000-token context window as GPT-5 and the same 128,000-token max output ceiling. There is no smaller-context variant; OpenAI standardized the window across the GPT-5 family so prompt-engineering work is portable across tiers.

Practical implication: prompts that work on GPT-5 fit on gpt-5-mini without context-related rewrites. The price gap is purely a quality gap, not a context gap. If you're moving traffic from GPT-5 to gpt-5-mini for cost reasons, the migration is a model-ID swap plus an eval pass — not a prompt rewrite.

As always, cap output. Set `max_output_tokens` to the realistic ceiling for your task (1,500 for chat, 4,000 for code, 8,000 for long-form). The default ceiling (128K) is a defensive guard, not a target.


Where gpt-5-mini wins and where it loses to GPT-5

**Wins (use gpt-5-mini)**: classification, named-entity extraction, summarization of structured input, format conversions (JSON ↔ YAML ↔ Markdown), simple chat assistance, routing/dispatch agents, content scaffolding (outlines, first drafts), structured-output pipelines, internal telemetry classification, document tagging, sentiment analysis. Quality on these is indistinguishable from GPT-5 on a held-out eval for most teams.

**Loses (use GPT-5)**: complex code synthesis with multi-file context, math proofs, multi-step planning that requires backtracking, legal/financial analysis where correctness is non-negotiable, fine-grained vision reasoning (counting objects in dense scenes, reading small text), competitive coding-benchmark-level tasks. The 5× price premium for GPT-5 is justified when these are the bottleneck.

Run an eval before you commit. Take 200 representative inputs from your production traffic, run both models, blind-score the outputs. Most teams discover gpt-5-mini covers 70-90% of traffic with no measurable quality drop and reserve GPT-5 for the rest — typically routed via a simple complexity classifier.


Compared to: Claude Sonnet 4.6 and Gemini 2.5 Flash

gpt-5-mini at $0.25 / $2 sits between Claude Sonnet 4.6 ($3 / $15 — Anthropic's mid-tier, slightly higher quality, much higher price) and Gemini 2.5 Flash ($0.30 / $2.50 — Google's mid-tier, comparable price, 1M context).

vs **Claude Sonnet 4.6**: 12× cheaper on input, 7.5× cheaper on output. Sonnet wins on long-form writing voice, instruction-following discipline on complex prompts, and prompt caching savings (90% off cached reads via Anthropic). gpt-5-mini wins on raw throughput and price. See Claude Sonnet vs GPT-5 mini for the side-by-side.

vs **Gemini 2.5 Flash**: comparable price ($0.25 vs $0.30 input), comparable output ($2 vs $2.50), but Flash has a 1M context window vs gpt-5-mini's 400K. Flash also supports native audio input. gpt-5-mini wins on structured outputs (OpenAI's JSON Schema enforcement is more mature) and the broader OpenAI tooling ecosystem.

Cross-tier: gpt-5-mini at 5× cheaper than GPT-5 covers the same feature set; most teams should default here and escalate only on demonstrated need.


Reasoning effort on gpt-5-mini

gpt-5-mini exposes the same `reasoning_effort` parameter as GPT-5 (`minimal`, `low`, `medium`, `high`). The same rules apply: reasoning tokens bill at the output rate, are not returned to you, and can dramatically inflate cost if defaulted to `high`.

On gpt-5-mini specifically, `minimal` is the right default for high-volume mechanical tasks (classification, extraction). The cost-per-call stays under $0.001 for typical inputs. `low` is the right default for chat and content generation. `medium` and `high` are usually wasted on gpt-5-mini — if the task needs serious reasoning, the better move is escalating to GPT-5 with `medium`, not running gpt-5-mini at `high`.

Common mistake: leaving `reasoning_effort` at the default (`medium`) on a classification workload at 1M+ calls/month. The reasoning tokens silently 2-3× the bill. Always explicitly set the effort level on production prompts.


Verified sources and how to re-check the numbers

Every number on this page was verified against OpenAI's live documentation on 2026-06-20. Sources: platform.openai.com/docs/models/gpt-5-mini, openai.com/api/pricing, platform.openai.com/docs/api-reference/responses.

Prices on the GPT-5 family have moved once since launch (a downward adjustment on cached input in late 2025). OpenAI does not version their pricing page with changelog entries. Re-verify the live page quarterly if your monthly bill exceeds $500.

If you find a discrepancy with the live page, treat the live page as canonical. We re-fetch and update this card monthly.

Switch from GPT-5 to gpt-5-mini in 5 steps

  1. 1

    Pick a representative slice of production traffic

    Sample 200-500 real inputs from the last week of your gpt-5 traffic. Stratify across task types (classification, extraction, chat, code) so the eval covers your actual workload, not just the easy cases.

  2. 2

    Run both models on the same inputs

    Identical prompt, identical parameters, only the model ID changes (`gpt-5` → `gpt-5-mini`). Log both outputs. Cost-track: gpt-5-mini at 5× cheaper, so the cost delta is the lower bound on savings.

  3. 3

    Blind-score the outputs

    Have a human or an LLM-judge score outputs without knowing which model produced which. On most production tasks (classification, extraction, summarization, routine chat), gpt-5-mini scores within 2-5% of gpt-5. That delta is usually within the noise of the eval.

  4. 4

    Set explicit reasoning_effort + max_output_tokens

    On production gpt-5-mini calls: `reasoning_effort='minimal'` for classification/extraction, `'low'` for chat. Cap `max_output_tokens` to the realistic ceiling for the task. Both prevent silent cost creep.

    → Open the ChatGPT prompt generator
  5. 5

    Migrate the model ID, monitor for 7 days

    Change `model='gpt-5'` to `model='gpt-5-mini'` in production. Monitor your quality metrics (acceptance rate, manual-review rate, downstream error rate) for 7 days. If quality holds, the migration is done. If it slips on a subset of traffic, route that subset back to gpt-5 via a complexity classifier.

Frequently Asked Questions

How much does gpt-5-mini cost in 2026?

$0.25 per 1M input tokens, $2 per 1M output tokens, $0.025 per 1M for cached input (90% off). Batch API takes another 50% off both streams. A representative 1,000-in / 500-out call costs ~$0.00125 — about 5× cheaper than GPT-5 on identical tokens. Source: openai.com/api/pricing, verified 2026-06-20.

What is the difference between gpt-5 and gpt-5-mini?

Same context window (400K), same modalities (text + image input), same feature set (function calling, structured outputs, parallel tool calls, prompt caching, reasoning_effort). The difference is model size, output quality, and price. gpt-5-mini is 5× cheaper on both input and output. Use gpt-5-mini as the default; escalate to gpt-5 for complex code synthesis, multi-step planning, or correctness-critical tasks.

What is gpt-5-mini's context window?

400,000 tokens combined input + output, with a 128,000-token max output ceiling. Identical to gpt-5 — OpenAI standardized the window across the GPT-5 family so prompts are portable across tiers.

Does gpt-5-mini support function calling and structured outputs?

Yes — full parity with gpt-5. Function calling, parallel tool calls, structured outputs (JSON Schema validation guaranteed by the API), the Responses API, streaming, and prompt caching are all supported.

What is gpt-5-mini's knowledge cutoff?

May 31, 2024 per OpenAI's model card. Slightly earlier than gpt-5's cutoff (September 30, 2024). For anything after May 2024, provide context in the prompt or use a web-search tool call.

Is gpt-5-mini available in ChatGPT or only the API?

Primarily the API. ChatGPT users on Free and Plus tiers see a mix of gpt-5 and gpt-5-mini depending on load and rate limits, but the model selection UI in ChatGPT does not expose gpt-5-mini as an explicit choice. For deterministic gpt-5-mini use, call the API.

Should I use gpt-5-mini or gpt-4o-mini?

gpt-5-mini in almost every case. Same price tier as gpt-4o-mini ($0.15/$0.60 vs gpt-5-mini's $0.25/$2), but with the full GPT-5 feature set, larger context (400K vs 128K), better structured-output enforcement, and the unified reasoning_effort parameter. gpt-4o-mini remains available for backward compatibility on legacy fine-tunes.

Can I fine-tune gpt-5-mini?

OpenAI has not opened public fine-tuning on gpt-5-mini as of June 2026. Fine-tuning remains available on gpt-4.1, gpt-4o, and gpt-4o-mini. For most use cases, gpt-5-mini + structured outputs + a well-engineered prompt closes the quality gap fine-tuning would address. Check platform.openai.com/docs/guides/fine-tuning for current model availability.

Save 5× on GPT-5 traffic. Write mini-tuned prompts.

Our AI Prompt Generator writes gpt-5-mini-tuned prompts (tight context, structured outputs, cache-anchored, explicit reasoning_effort) based on YOUR business + task. 14-day free trial of DDH Pro, no card.

Browse all prompt tools →