Model card · Verified against Google docs · 2026-06-20

Gemini 2.5 Pro: Full Spec Sheet (June 2026)

By The DDH Team at Digital Dashboard Hub·Updated June 20, 2026

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

Gemini 2.5 Pro is Google DeepMind's flagship general-purpose model, released March 2025 as the successor to Gemini 2.0 Pro. It is the only frontier model from a major provider with a native 1,000,000-token context window in general availability (Anthropic's Sonnet has 1M in beta; OpenAI's GPT-5 maxes at 400K). It is also the only frontier model with native video understanding — pass an MP4 file directly to the model and ask questions about it.

Headline numbers: tiered pricing by input size. For input ≤200,000 tokens: $1.25 per 1M input / $10 per 1M output. For input >200,000 tokens: $2.50 per 1M input / $15 per 1M output. Cached input is $0.31/M (≤200K tier) or $0.625/M (>200K tier) — 75% off. Context window is 1,000,000 tokens (2M in private preview). Max output is 65,536 tokens. Modalities are text, image, audio, video, and PDF input; text output only. Function calling, structured outputs, code execution, and thinking mode are all supported.

Below: full spec table, when Gemini 2.5 Pro is the right call vs Claude Opus or GPT-5, when the 1M context is worth the >200K tier price bump, the minimal API request, and 8 FAQs. Sibling pages: Gemini 2.5 Flash spec sheet · GPT-5 spec sheet · Claude Opus 4.7 spec sheet. Write a Gemini-tuned prompt free with our ChatGPT prompt generator.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card. →

Gemini 2.5 Pro — Full spec sheet (June 2026)

Feature	Gemini 2.5 Pro spec
Provider	Google DeepMind
Model ID (API)	gemini-2.5-pro
Released	March 2025
Input price ≤200K (per 1M)	$1.25
Input price >200K (per 1M)	$2.50
Cached input ≤200K (per 1M)	$0.31 (75% off)
Cached input >200K (per 1M)	$0.625 (75% off)
Output price ≤200K (per 1M)	$10.00
Output price >200K (per 1M)	$15.00
Batch API discount	50% off input + output
Context window	1,000,000 tokens
Max output tokens	65,536 tokens
Modalities (input)	Text, image, audio, video, PDF
Modalities (output)	Text
Function calling
Parallel function calling
Structured outputs (JSON Schema)
Streaming
Code execution (built-in tool)
Google Search grounding (built-in tool)
Thinking mode (reasoning)
Video understanding
Audio understanding
Knowledge cutoff	January 2025
Endpoint (Google AI)	generativelanguage.googleapis.com/v1/models/gemini-2.5-pro:generateContent
Endpoint (Vertex AI)	{LOCATION}-aiplatform.googleapis.com

Sources verified 2026-06-20: Google Gemini API models documentation (https://ai.google.dev/gemini-api/docs/models/gemini), Google AI Studio pricing (https://ai.google.dev/pricing), Vertex AI Gemini pricing (https://cloud.google.com/vertex-ai/generative-ai/pricing). Pricing tier shifts at 200K input tokens — calls under 200K input bill at the lower tier even if context window allocation is higher. Re-verify the live pages before budgeting.

What Gemini 2.5 Pro actually is (and what makes it unique)

Gemini 2.5 Pro is Google DeepMind's flagship model in the Gemini 2.x family, released in March 2025. It succeeded Gemini 2.0 Pro (which itself replaced Gemini 1.5 Pro in late 2024) and brought three step-changes: native thinking mode (configurable reasoning budget per call), tier-2-quality vision matching GPT-5's vision benchmarks, and stable 1M-token context behavior with recall that holds across the full window.

What makes Gemini 2.5 Pro structurally different from GPT-5 or Claude Opus: it is natively multimodal across more modalities than either. Text, image, audio, video, and PDF inputs all flow through the same `contents` array. Pass an MP4 video file, an audio recording, a stack of PDFs and a free-form text question — Gemini accepts all of it in one call and reasons across them. GPT-5 supports text + image. Claude supports text + image. Only Gemini 2.5 Pro (and its Flash sibling) support video and audio natively in production.

Thinking mode (Google's name for configurable reasoning) is enabled by default on Gemini 2.5 Pro with a model-decided budget. Force a specific budget with `thinking_config: {thinking_budget: 5000}`; disable thinking entirely with `thinking_budget: 0` for fastest possible response. Thinking tokens bill at the output rate like reasoning tokens on GPT-5 and thinking tokens on Claude.

Pricing math: the 200K input tier and what it means

Gemini 2.5 Pro uses a unique tiered pricing model among frontier providers. Below 200,000 input tokens per call: $1.25/M input, $10/M output. Above 200,000 input tokens: $2.50/M input, $15/M output. The tier applies to the entire call — if you send 250K input tokens, the full 250K bills at the higher tier, not just the portion above 200K.

Worked: a 100K-token input + 1K output call bills `(0.100 × $1.25) + (0.001 × $10) = $0.125 + $0.01 = $0.135`. The same input as 250K + 1K output bills `(0.250 × $2.50) + (0.001 × $15) = $0.625 + $0.015 = $0.640`. Crossing the 200K threshold is a 2× input price + 1.5× output price step function, not a smooth ramp.

Implication: keep calls under 200K input where you can. If you're at 195K, padding to 205K to fit one more chunk is a 5× price increase. If you're going to cross 200K, go all the way — 250K and 500K bill at the same per-token rate.

Caching: explicit, via the `cachedContents` API endpoint. Pre-create a cached content block (1-hour default TTL, configurable up to 24 hours), reference it by ID in subsequent calls. Cached portion reads at 75% off ($0.31/M in the ≤200K tier, $0.625/M in the >200K tier). Largest cost lever on long-context workloads. Worked $ across providers: GPT/Claude/Gemini cost calculator.

The 1M context window: when it actually matters

Gemini 2.5 Pro accepts 1,000,000 tokens in a single call. For reference: a full feature-length screenplay is ~30K tokens, a 300-page novel is ~150K tokens, the codebase of a medium SaaS application is 200-500K tokens, the full Lord of the Rings trilogy is ~600K tokens. Gemini 2.5 Pro fits any of these in one call.

Recall holds across the full 1M window — Google's needle-in-haystack benchmarks show >99% recall through ~1M tokens for the Pro model. The practical bottleneck is cost and latency, not recall. A 1M-token call bills `1.0 × $2.50 + (output × $15) = $2.50+` per call before output costs and runs 30-60 seconds end-to-end at typical streaming rates.

When 1M actually matters: full-codebase reasoning (refactor planning across an entire repo), full-document Q&A on books or legal contracts, long-form audio/video understanding (transcribe and analyze a 1-hour meeting in one shot), multi-document research synthesis.

When 1M doesn't matter: classification, extraction, chat, structured-data tasks, anything that fits in 50K tokens with RAG. For most production workloads, the smaller ≤200K tier on Gemini 2.5 Pro (or even Gemini 2.5 Flash at $0.30/M) is the right pick.

Multimodal: video, audio, PDF — natively in one call

Pass a video file as an inline base64 blob or via the Google File API (recommended for files >20MB). Gemini extracts frames at 1 FPS by default, transcribes audio, and reasons across the combined stream. Token accounting: video bills at ~258 tokens per second of footage (compressed across video + audio).

A 5-minute video bills `300 seconds × 258 tokens = 77,400 tokens` — still under the 200K tier. A 30-minute meeting recording is ~465K tokens — over the 200K threshold, into the >200K tier. Plan accordingly.

PDFs are processed page-by-page with image + text extraction. A typical text-heavy PDF page is ~258 tokens; image-heavy pages can be substantially more. Pass via the File API for documents over 20MB.

Audio (without video) bills at ~32 tokens per second. A 10-minute audio recording is ~19,200 tokens. Use for meeting transcription + summarization, podcast analysis, voice-note structuring.

Practical caveat: multimodal inputs inflate token counts fast. A naive 'summarize this hour-long meeting' call can easily exceed 200K tokens (1 hour video = ~930K tokens). Cache the video via the File API and reference it across multiple analytical calls instead of re-uploading.

Function calling, structured outputs, and built-in tools

Gemini 2.5 Pro supports JSON Schema function calling: declare functions in the `tools` parameter, the model picks one (or several in parallel) and returns the arguments. Parallel function calling is supported and on by default for the Pro model.

Structured outputs are first-class: pass a `responseSchema` (JSON Schema subset) in `generationConfig` and Google guarantees the model's output validates against that schema. Supports nested objects, arrays, enums — comparable to OpenAI's structured outputs and Anthropic's tool-use-as-output pattern.

Built-in tools that you don't have to implement yourself: **code execution** (the model writes and runs Python in a sandbox, sees the output, iterates), **Google Search grounding** (the model issues a Google Search and cites the results in its response, with attribution), **URL context** (the model fetches and reads URLs in the conversation). Built-in tools are unique to Gemini in the frontier-model menu and dramatically reduce orchestration code for agentic workflows.

Thinking mode: Google's reasoning dial

Thinking mode is enabled by default on Gemini 2.5 Pro with a model-decided thinking budget. Override with `thinking_config: {thinking_budget: N}` where N is the maximum thinking tokens for the call. Set N=0 to disable thinking entirely; set N=-1 (or very high) to let the model decide dynamically.

Thinking tokens bill at the output rate (same as reasoning tokens on GPT-5 and thinking tokens on Claude). On Gemini 2.5 Pro: `$10/M` in the ≤200K tier, `$15/M` in the >200K tier. A 3,000-token thinking budget on a ≤200K call adds $0.03 to the call.

When to set explicit thinking budgets: cost control (cap at 1,000 for routine tasks), quality control (boost to 5,000-10,000 for complex reasoning), latency control (set to 0 for fastest possible response on simple tasks). When to leave thinking on auto: general-purpose chat where Gemini's calibration is well-tuned for the task mix.

When to pick Gemini 2.5 Pro vs Claude Opus 4.7 vs GPT-5

**Pick Gemini 2.5 Pro** when you need native multimodal across video/audio/PDF, when you need 1M context in general availability, when built-in tools (code execution, Search grounding) replace custom orchestration, or when you're already in the Google Cloud / Workspace ecosystem and Vertex AI billing simplifies procurement.

**Pick Claude Opus 4.7** when long-form writing voice, refusal-calibration discipline, or hard reasoning is the bottleneck. Opus is $15/$75 vs Gemini 2.5 Pro's $1.25/$10 (≤200K) — Gemini is dramatically cheaper for everything except the narrow tasks where Opus's quality premium pencils.

**Pick GPT-5** when you need 400K context without the >200K tier bump on Gemini, when you're in the OpenAI tooling ecosystem (Responses API, Assistants, ChatGPT Pro), or when structured outputs with the most-mature JSON Schema enforcement matter.

Cross-vendor head-to-head: GPT-4o vs Gemini 2.5 Pro.

Verified sources and how to re-check the numbers

Every number on this page was verified against Google's live documentation on 2026-06-20. Sources: ai.google.dev/gemini-api/docs/models/gemini for context, modalities, and feature support; ai.google.dev/pricing for AI Studio direct pricing; cloud.google.com/vertex-ai/generative-ai/pricing for Vertex AI pricing (currently identical to AI Studio direct).

Google's pricing updates are announced via the Vertex AI release notes and the ai.google.dev changelog. Prices have moved twice on Gemini 2.5 Pro since launch (both downward). Re-verify quarterly if your bill is significant.

Methodology: when a number could not be cross-confirmed against an official Google page on the verification date, it was omitted from this card rather than guessed.

Make your first Gemini 2.5 Pro call in 5 steps

1
Get an API key
Easiest path: aistudio.google.com → Get API key → Create. Copy to `.env` as `GEMINI_API_KEY=...`. For production at scale, use Vertex AI on Google Cloud instead (better quota, SLAs, regional control).
2
Install the SDK
Python: `pip install google-genai`. Node: `npm install @google/genai`. The `google-genai` SDK is the current canonical client as of 2026; the older `google-generativeai` SDK is deprecated for new code.
3
Send a minimal call
Python: `from google import genai; client = genai.Client(); r = client.models.generate_content(model='gemini-2.5-pro', contents='Hello'); print(r.text)`. That is the entire round-trip.
4
Add structured outputs and explicit thinking budget
For production: `client.models.generate_content(model='gemini-2.5-pro', contents=prompt, config={'response_mime_type': 'application/json', 'response_schema': MySchema, 'thinking_config': {'thinking_budget': 2000}})`. Forces typed output and caps thinking cost.
→ Open the ChatGPT prompt generator
5
Use the File API for large multimodal inputs
For PDFs/videos/audio over 20MB: `file = client.files.upload(file='meeting.mp4'); r = client.models.generate_content(model='gemini-2.5-pro', contents=[file, 'Summarize the key decisions'])`. Files persist server-side for 48 hours and can be referenced across multiple calls without re-uploading.

Digital Dashboard Hub

The prompt patterns above work 10x better when they live in a library you actually own — tunable to your niche, exportable to GPT-5, Claude, Gemini, Perplexity, Midjourney, Llama. Stop pasting across 6 tools.

Try DDH's AI Prompt Builder — free 14 days, no card. →

Related calculators

OpenAI Pricing Calculator →GPT-5.5, 5.4, mini, nano — full per-call cost in one input.Claude Pricing Calculator →Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5 — input + output combined.Context Window Comparison →Max input length and price per 1M for every current model.

Related prompt tools

Prompt generator (Gemini-tuned)→Gemini 2.5 Flash spec sheet→GPT-5 spec sheet→Claude Opus 4.7 spec sheet→GPT/Claude/Gemini cost calculator→

Frequently Asked Questions

How much does Gemini 2.5 Pro cost in 2026?

Tiered by input size. For input ≤200K tokens: $1.25 per 1M input, $10 per 1M output. For input >200K tokens: $2.50 per 1M input, $15 per 1M output. Cached input bills at 75% off the tier rate ($0.31/M ≤200K, $0.625/M >200K). Batch API takes another 50% off both streams. Source: ai.google.dev/pricing, verified 2026-06-20.

What is Gemini 2.5 Pro's context window?

1,000,000 tokens — the largest of any frontier model in general availability. A 2M-token context is in private preview. Recall holds across the full 1M window per Google's needle-in-haystack benchmarks (>99% accuracy through 1M tokens for the Pro model).

What is the 200K input price tier?

Gemini 2.5 Pro uses tiered pricing: calls with ≤200,000 input tokens bill at $1.25/$10 per 1M; calls with >200,000 input tokens bill at $2.50/$15 per 1M. The tier applies to the entire call, not just the portion above 200K. Implication: a 195K-input call is dramatically cheaper than a 205K-input call. Plan accordingly.

Does Gemini 2.5 Pro support video and audio?

Yes — natively, in the same call as text input. Video bills at ~258 tokens per second of footage; audio at ~32 tokens per second. Pass via inline base64 (small files) or the Google File API (recommended for >20MB). A 5-minute video is ~77K tokens; a 30-minute meeting is ~465K (crosses into the >200K tier).

What is thinking mode on Gemini 2.5 Pro?

Google's name for explicit chain-of-thought reasoning, on by default with a model-decided budget. Override with `thinking_config={'thinking_budget': N}` where N is the max thinking tokens. Set N=0 to disable for fastest response. Thinking tokens bill at the output rate. Use for hard reasoning tasks; disable for classification/extraction.

What is the difference between Gemini 2.5 Pro and 2.5 Flash?

Same context (1M), same modalities, same multimodal support, same built-in tools. Flash is smaller and faster, with a flat pricing structure ($0.30/$2.50 per 1M for text/image/video input). Use Pro for hard reasoning and code synthesis; Flash for production volume and the broad mid-tier sweet spot. See our Gemini 2.5 Flash spec sheet.

Does Gemini 2.5 Pro support function calling and structured outputs?

Yes to both. Function calling with parallel call support; structured outputs via `responseSchema` JSON Schema in `generationConfig`. Output is guaranteed to validate against the schema. Built-in tools (code execution, Google Search grounding, URL context) are unique to Gemini — reduces custom orchestration for agentic workflows.

Where is Gemini 2.5 Pro available?

Google AI Studio (direct, free tier + paid), Google Cloud Vertex AI (enterprise tier, regional control, SLAs), and the Gemini consumer apps (gemini.google.com Pro and Advanced tiers). API and consumer billing are separate.

1M context is power. Wasted 1M context is bill.

Our AI Prompt Generator writes Gemini-tuned prompts (long-context structured, contents+parts ready, thinking-budget capped) based on YOUR business + task — so you spend the 1M where it matters. 14-day free trial of DDH Pro, no card.

Browse all prompt tools →