Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
Model card · Verified against Google docs · 2026-06-20

Gemini 2.5 Flash: Full Spec Sheet (June 2026)

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

Gemini 2.5 Flash is Google DeepMind's mid-tier model in the Gemini 2.x family, released alongside Pro in early 2025 and refreshed mid-year. It is the production workhorse of the Gemini menu: 1M context, native multimodal (text, image, audio, video, PDF), built-in tools (code execution, Search grounding), and a flat pricing structure that doesn't have Pro's >200K tier bump.

Headline numbers: $0.30 per 1M input for text, image, video. $1.00 per 1M input for audio. $2.50 per 1M output (standard); $3.50 per 1M output when thinking mode is enabled. Cached input is $0.075/M (text/image/video) — 75% off. Context window is 1,000,000 tokens. Max output is 65,536 tokens. Knowledge cutoff is January 2025. Function calling, parallel calls, structured outputs, code execution, and Google Search grounding all supported.

Below: full spec table, when Flash is the right call vs Gemini 2.5 Pro or gpt-5-mini, the audio input price gotcha, the minimal API request, and 8 FAQs. Sibling pages: Gemini 2.5 Pro spec sheet · GPT-5 mini spec sheet · Claude Sonnet 4.6 spec sheet. Write a Flash-tuned prompt free with our ChatGPT prompt generator.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Gemini 2.5 Flash — Full spec sheet (June 2026)

Feature
Gemini 2.5 Flash spec
ProviderGoogle DeepMind
Model ID (API)gemini-2.5-flash
ReleasedApril 2025 (refreshed mid-2025)
Input price text/image/video (per 1M)$0.30
Input price audio (per 1M)$1.00
Cached input text/image/video (per 1M)$0.075 (75% off)
Cached input audio (per 1M)$0.25 (75% off)
Output price standard (per 1M)$2.50
Output price with thinking (per 1M)$3.50
Batch API discount50% off input + output
Context window1,000,000 tokens
Max output tokens65,536 tokens
Modalities (input)Text, image, audio, video, PDF
Modalities (output)Text
Function calling
Parallel function calling
Structured outputs (JSON Schema)
Streaming
Code execution (built-in tool)
Google Search grounding (built-in tool)
Thinking mode (reasoning)
Video understanding
Audio understanding
Knowledge cutoffJanuary 2025
Endpoint (Google AI)generativelanguage.googleapis.com/v1/models/gemini-2.5-flash:generateContent

Sources verified 2026-06-20: Google Gemini API models documentation (https://ai.google.dev/gemini-api/docs/models/gemini), Google AI Studio pricing (https://ai.google.dev/pricing). Audio input is billed at a different (higher) rate than text/image/video input — a common gotcha when budgeting multimodal workloads. Output price increases by $1/M when thinking mode is enabled. Re-verify the live pages before budgeting.

What Flash 2.5 actually is (vs Gemini 2.5 Pro)

Gemini 2.5 Flash is the smaller, faster sibling of Gemini 2.5 Pro built on the same architecture and training pipeline. Same 1M context window, same native multimodal capability, same built-in tools (code execution, Search grounding), same thinking mode. Same feature surface — different model size, dramatically different price.

Flash at $0.30/$2.50 per 1M is approximately 4× cheaper than Gemini 2.5 Pro (≤200K tier) on input and 4× cheaper on output. The trade-off is quality on hard reasoning tasks: Flash is calibrated for production volume — chat, classification, extraction, structured-data pipelines, multimodal Q&A on shorter inputs. For complex code synthesis, multi-step reasoning, or long-form analysis where Pro's extra quality pencils, escalate to Pro.

Critically, Flash does NOT have Pro's >200K input tier bump. The flat $0.30/M input rate holds from 1 token to 1M tokens. For workloads that occasionally cross 200K (full PDF processing, long video summarization), Flash is dramatically cheaper than Pro's >200K tier ($0.30/M vs $2.50/M).


Pricing math: what Flash actually costs per call

Standard text-only call: `cost = (input_tokens / 1M) × $0.30 + (output_tokens / 1M) × $2.50`. The representative 1,000-in / 500-out call: `0.001 × $0.30 + 0.0005 × $2.50 = $0.0003 + $0.00125 = $0.00155`. About 0.155¢ per call — cheapest of any model in this guide.

Audio input gotcha: pass audio data and the input portion of that audio bills at $1/M, not $0.30/M. A 10-minute audio recording is ~19,200 tokens at the audio rate: `0.0192 × $1 = $0.0192` per audio input. Roughly 3× what text-only billing would suggest. Always check whether your input mix includes audio when modeling per-call cost.

Thinking mode toggles output pricing. With thinking enabled (default for many use cases): output bills at $3.50/M instead of $2.50/M, plus the thinking tokens themselves bill at the same $3.50/M. Disable thinking entirely with `thinking_config: {thinking_budget: 0}` to lock output at $2.50/M for fastest, cheapest responses on mechanical tasks.

Caching: explicit, via the same `cachedContents` API endpoint as Gemini 2.5 Pro. Pre-create a cached block, reference by ID. Cached portion reads at $0.075/M (text/image/video) or $0.25/M (audio). Largest cost lever on workloads with stable prefixes.


Audio and video: native, but watch the token math

Flash supports the same native multimodal as Pro: pass text, images, audio, video, or PDF in the `contents` array in any combination. The token accounting matches Pro: video bills at ~258 tokens per second; audio at ~32 tokens per second; images at fixed token cost per resolution tier.

Practical: a 5-minute video bills `300 × 258 = 77,400` input tokens. On Flash text/image/video rate, that's `0.0774 × $0.30 = $0.0232` for the video input. A 30-minute meeting video is ~465K input tokens = $0.1395. Tractable cost for analytical workloads that would be expensive on Pro.

Pure audio is more expensive per token (the $1/M rate), but audio token density is lower (32 tokens/sec vs video's 258 tokens/sec). A 30-minute audio-only meeting is ~57,600 audio tokens = $0.0576. Still cheap, just be explicit about the audio rate.

For files over 20MB, use the Google File API: upload once, reference across multiple analytical calls. Cuts re-upload time and bandwidth, doesn't change per-token billing.


Function calling, structured outputs, and built-in tools

Flash supports the full Gemini function-calling surface: declare functions in `tools`, model picks one (or several in parallel) and returns the arguments. Parallel function calling is on by default.

Structured outputs via `responseSchema` (JSON Schema subset). Pass in `generationConfig`, output is guaranteed to validate. Identical mechanism to Gemini 2.5 Pro.

Built-in tools: code execution (sandboxed Python), Google Search grounding (cites results with attribution), URL context (fetches and reads URLs). Same as Pro — same orchestration savings. For agentic workflows where a tool's main cost is calling out to your own infrastructure, Flash + built-in tools often replaces a custom orchestrator that would have cost more to build than the LLM bill saves.


Thinking mode on Flash: usually leave it off

Thinking mode is configurable on Flash via `thinking_config: {thinking_budget: N}`. Default behavior varies by deployment context but tends to enable a small thinking budget by default.

On Flash specifically, thinking mode is often a net loss. Flash is calibrated for fast, cheap responses on production-volume tasks (classification, extraction, chat). Adding thinking adds output cost (+$1/M on output) AND adds latency AND adds thinking tokens (billed at $3.50/M). For most Flash workloads, explicit `thinking_budget: 0` is the right call — it locks the cheaper output rate and removes a hidden cost lever.

When to enable thinking on Flash: when you've measured a specific quality gap on a known task and a small budget (500-1,500 tokens) closes it. For anything harder, escalate to Gemini 2.5 Pro rather than turning up thinking on Flash.


When to pick Flash vs Pro vs gpt-5-mini vs Sonnet 4.6

**Pick Gemini 2.5 Flash** when you need 1M context at the lowest price, when native multimodal (especially video) matters for the task, when built-in tools replace custom orchestration, or when you're in the Google ecosystem and Vertex AI billing simplifies procurement. The cheapest frontier model with this feature surface.

**Pick Gemini 2.5 Pro** over Flash when: the task is hard reasoning / code synthesis / complex multi-step planning where Flash's quality isn't enough. Pay the 4× premium only when the quality gap is measurable on your eval.

**Pick gpt-5-mini** over Flash when: structured outputs with the most-mature JSON Schema enforcement matter, you're in the OpenAI tooling ecosystem, or you don't need video/audio input and don't need 1M context. Comparable price ($0.25/$2 vs $0.30/$2.50). Pick by ecosystem fit.

**Pick Claude Sonnet 4.6** over Flash when: long-form writing voice, refusal-calibration discipline, or extended thinking matters more than 1M context. Sonnet is 10× more expensive than Flash on input ($3 vs $0.30) — the premium is justified only when Anthropic's quality differential is the bottleneck.


Verified sources and how to re-check the numbers

Every number on this page was verified against Google's live documentation on 2026-06-20. Sources: ai.google.dev/gemini-api/docs/models/gemini for feature support; ai.google.dev/pricing for input/output/cached prices and the audio-input tier; cloud.google.com/vertex-ai/generative-ai/pricing for Vertex AI pricing (currently identical to AI Studio direct).

Flash pricing has moved twice since launch (both downward). Google publishes price changes via the Vertex AI release notes; subscribe if your monthly Flash bill exceeds $500.

Methodology: when a number could not be cross-confirmed against an official Google page on the verification date, it was omitted from this card rather than guessed.

Make your first Flash call in 5 steps

  1. 1

    Get an API key

    aistudio.google.com → Get API key → Create. Free tier is generous for development; production at scale moves to Vertex AI on Google Cloud for better quota and SLAs.

  2. 2

    Install the SDK

    Python: `pip install google-genai`. Node: `npm install @google/genai`. The `google-genai` SDK is the canonical client for 2026; the older `google-generativeai` is deprecated.

  3. 3

    Send a minimal Flash call

    Python: `from google import genai; c = genai.Client(); r = c.models.generate_content(model='gemini-2.5-flash', contents='Hello'); print(r.text)`. Swap `gemini-2.5-pro` → `gemini-2.5-flash` to migrate from Pro.

  4. 4

    Disable thinking for production mechanical tasks

    For classification/extraction at high volume: `c.models.generate_content(model='gemini-2.5-flash', contents=prompt, config={'thinking_config': {'thinking_budget': 0}})`. Locks the cheaper $2.50/M output rate and removes thinking-token cost.

    → Open the ChatGPT prompt generator
  5. 5

    Add structured outputs and cache the prefix

    For production pipelines: pre-create a cached system prefix via `client.caches.create()`, reference by name in subsequent calls. Pass `responseSchema` for guaranteed-typed output. Both reduce cost; both improve reliability.

Frequently Asked Questions

How much does Gemini 2.5 Flash cost in 2026?

$0.30 per 1M input for text/image/video, $1.00 per 1M input for audio. Output $2.50/M standard or $3.50/M with thinking mode enabled. Cached input bills at 75% off ($0.075/M text/image/video, $0.25/M audio). Batch API takes another 50% off both streams. A representative 1,000-in / 500-out text call costs ~$0.00155. Source: ai.google.dev/pricing, verified 2026-06-20.

What is Gemini 2.5 Flash's context window?

1,000,000 tokens — same as Gemini 2.5 Pro. Unlike Pro, Flash does NOT have a >200K input tier price bump; the $0.30/M input rate holds from 1 token to 1M tokens. For workloads that cross 200K input, Flash is dramatically cheaper than Pro.

What is the difference between Gemini 2.5 Flash and Pro?

Same context (1M), same modalities, same built-in tools, same thinking mode. Flash is 4× cheaper than Pro (≤200K tier) on input and 4× cheaper on output. Pro wins on hard reasoning, complex code synthesis, multi-step planning. Flash wins on price-performance for production volume. See our Gemini 2.5 Pro spec sheet.

Why is audio input more expensive than video?

Audio bills at $1/M input vs $0.30/M for text/image/video. The reasoning: audio processing on Flash is more compute-intensive per token than other modalities. The practical impact is modest because audio token density is lower than video (32 tokens/sec audio vs 258 tokens/sec video), but always model audio at the higher rate when budgeting.

Should I enable thinking mode on Flash?

Usually no. Thinking mode on Flash adds output cost (+$1/M on visible output) AND adds thinking tokens at $3.50/M AND adds latency. For most Flash workloads (classification, extraction, chat, structured pipelines), explicit `thinking_budget: 0` is correct. Enable thinking only when you've measured a specific quality gap a small budget (500-1,500 tokens) closes.

Does Flash support function calling and structured outputs?

Yes — full parity with Gemini 2.5 Pro. Function calling with parallel call support, structured outputs via `responseSchema` JSON Schema, built-in tools (code execution, Google Search grounding, URL context). Identical API surface to Pro; swap model ID to migrate.

Where is Gemini 2.5 Flash available?

Google AI Studio (direct, free tier + paid), Google Cloud Vertex AI (enterprise tier), and the Gemini consumer apps (gemini.google.com — Flash powers most of the free-tier consumer experience).

Should I use Flash or gpt-5-mini?

Comparable price ($0.30/$2.50 vs $0.25/$2). Flash wins on native multimodal (video, audio), 1M context, built-in tools (code execution, Search grounding). gpt-5-mini wins on structured-output enforcement maturity and the OpenAI tooling ecosystem. Pick by which ecosystem fits your stack.

Cheapest frontier multimodal. Write prompts that don't waste it.

Our AI Prompt Generator writes Flash-tuned prompts (short, structured, thinking-budget capped, multimodal-ready) based on YOUR business + task — so you ship at $0.30/M input, not $3/M Sonnet rates. 14-day free trial of DDH Pro, no card.

Browse all prompt tools →