Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Gemini API Free Tier Rate Limits (2026): RPM, TPM & RPD by Model

Every Gemini API free tier rate limit in one place — requests per minute, tokens per minute, and requests per day for Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.0 Flash, and Flash-Lite. Plus the paid tier upgrade path and the data-usage tradeoff that most developers miss.

By DDH Research Team at Digital Dashboard HubUpdated

The Gemini API free tier is one of the most generous free tiers in the LLM space — but it comes with hard caps that can surprise you mid-build. If your integration suddenly starts returning 429 errors, you've hit one of three limits: requests per minute (RPM), tokens per minute (TPM), or requests per day (RPD). Which limit you hit first depends entirely on your usage pattern.

This guide covers the exact numbers for every major Gemini model as of June 2026, sourced from Google's official rate limits page and Google's pricing page. It also explains the data-usage tradeoff on the free tier versus paid tiers, how to upgrade, and how free tier compares to paid Tier 1, Tier 2, and Tier 3.

If you're also evaluating cost across providers, see our LLM rate limits comparison for 2026 or the OpenAI API pricing guide for 2026. And if you want to calculate your actual cost at paid tier, our AI Prompt Cost Calculator covers Gemini, OpenAI, and Anthropic side-by-side.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card — AICHAT30 = 30% off Pro.

Gemini API free tier rate limits by model (2026)

Feature
RPM (requests/min)
TPM (tokens/min)
RPD (requests/day)
Gemini 2.5 Pro5250,00025
Gemini 2.5 Flash10250,000500
Gemini 2.5 Flash-8B15250,0001,500
Gemini 2.0 Flash151,000,0001,500
Gemini 2.0 Flash-Lite301,000,0001,500
Gemini 1.5 Pro232,00050
Gemini 1.5 Flash151,000,0001,500
Gemini 1.5 Flash-8B151,000,0001,500

Source: [ai.google.dev/gemini-api/docs/rate-limits](https://ai.google.dev/gemini-api/docs/rate-limits). Limits are per API key. Paid tier limits differ significantly — see sections below. Numbers current as of June 2026; Google updates these without versioning, so verify directly before shipping.

What the three rate limit dimensions actually mean

Google enforces three independent rate limit windows on the free tier. RPM (requests per minute) caps how many API calls you can make in any 60-second rolling window. TPM (tokens per minute) caps the combined input + output tokens across all requests in that window. RPD (requests per day) is a hard 24-hour cap that resets at midnight Pacific Time.

You can hit any of the three independently. A single very large request — say, a 100k-token document analysis against Gemini 2.5 Pro — could max out your TPM without touching RPM. A burst of short requests could exhaust RPM before you've moved much of the TPM budget. And a sustained day of development can exhaust RPD even if individual bursts are well within RPM.

The practical implication: when you receive a 429 error on the free tier, the error body tells you which limit fired. Read it. The retry strategy differs: an RPM 429 needs only a 60-second backoff, an RPD 429 needs you to wait until midnight Pacific, and a TPM 429 needs you to either wait or switch to a smaller model for that request.

The most surprising limit for developers new to Gemini is the RPD on Gemini 2.5 Pro: only 25 requests per day on the free tier. At 5 RPM that sounds like you can burn through it in 5 minutes — and you can. If you're building a demo or running evals, budget those 25 daily calls carefully or upgrade to a paid key immediately.


Gemini 2.5 Pro free tier limits in depth

Gemini 2.5 Pro is Google's frontier reasoning model as of mid-2026. On the free tier it carries the tightest limits of any current Gemini model: 5 RPM, 250,000 TPM, and only 25 RPD. The low RPD is the binding constraint for most developers — 25 requests per day is roughly one every 58 minutes if you spread them out.

The 250,000 TPM budget sounds large, but Gemini 2.5 Pro's context window goes up to 1 million tokens. A single max-context call would consume the entire daily TPM budget in one shot. In practice, free tier is viable for personal projects and API exploration, but not for any production workload that needs more than a few dozen requests per day.

For comparison, the paid Tier 1 limits for Gemini 2.5 Pro are 150 RPM, 2,000,000 TPM, and no RPD cap (unlimited daily requests). That's a 30x increase in RPM and 8x increase in TPM — plus removal of the daily cap. See Google's pricing page for the current per-token cost at each paid tier.

If you're evaluating Gemini 2.5 Pro against other frontier models on cost, our how much does ChatGPT cost in 2026 post has the full side-by-side. The short version: Gemini 2.5 Pro is priced competitively against GPT-5-class models, and the free tier lets you validate before committing.


Gemini 2.5 Flash free tier limits in depth

Gemini 2.5 Flash is the workhorse model — faster and cheaper than Pro, still capable of complex reasoning. On the free tier: 10 RPM, 250,000 TPM, 500 RPD. The higher RPD (500 vs 25 for Pro) makes Flash meaningfully more usable for extended development sessions and small-scale demos.

The 10 RPM limit is the binding constraint for most Flash users. At sustained usage you can make one request every 6 seconds — fast enough for interactive development but too slow for concurrent request patterns. If you're building a web app that could serve multiple users simultaneously, the free tier will back-pressure you to serialized requests unless you add queuing.

Gemini 2.5 Flash-8B (the smaller variant) gets 15 RPM, 250,000 TPM, and 1,500 RPD on the free tier — a better deal for tasks where the 8B model is sufficient. If your workload involves straightforward classification, summarization, or short-form generation, Flash-8B on the free tier gives you 3x the daily budget of standard Flash.

Gemini 2.5 Flash thinking tokens (the extended reasoning mode) count against the same TPM budget. Heavy thinking-mode usage on the free tier can burn through 250,000 TPM faster than you'd expect, since thinking tokens are billed separately but still count toward the per-minute cap.


Gemini 2.0 Flash and Flash-Lite free tier limits

Gemini 2.0 Flash stands out in the free tier lineup because it carries a 1,000,000 TPM budget — 4x the TPM of the 2.5 series on the free tier. Paired with 15 RPM and 1,500 RPD, it's the most generous free tier in the current Gemini lineup for high-token-volume use cases. If you're processing long documents or building multi-turn chat applications and don't need 2.5-series reasoning quality, 2.0 Flash often gives you more runway before you need to upgrade.

Gemini 2.0 Flash-Lite pushes further: 30 RPM, 1,000,000 TPM, 1,500 RPD. The doubled RPM (30 vs 15 for Flash) means you can sustain two requests per second before hitting the rate limit. For lightweight extraction tasks, template filling, or rapid iteration on short prompts, Flash-Lite on the free tier is the most permissive option available without a billing account.

One caveat: Gemini 2.0 Flash and Flash-Lite are slightly older generation than 2.5 Flash. On benchmarks for reasoning and instruction-following, 2.5 Flash-8B typically outperforms 2.0 Flash despite the latter's more generous free-tier limits. Choose based on your quality requirements, not just the rate limit ceiling.


The data-usage tradeoff: what Google does with your free tier prompts

This is the detail most developers skip and later regret. On the Gemini API free tier, Google's terms allow your prompts, inputs, and outputs to be used to improve Google's products and models. That means the content you send through the free tier may be reviewed by human raters or used in training data. Google has confirmed this in their rate limits documentation and pricing terms.

On all paid tiers (Tier 1, Tier 2, Tier 3), Google does not use your prompts or outputs to train or improve their models. Your data stays your data. This is the primary reason to move to a paid key even if you're within the free tier rate limits — not throughput, but data privacy.

The practical implication: anything you send through a free-tier Gemini API key should be treated as potentially visible to Google. For personal projects, experimentation, and public-facing prompts, this is likely fine. For any enterprise use case, customer data, proprietary content, or regulated data (HIPAA, GDPR-sensitive), you must use a billing-enabled account. This is non-negotiable regardless of volume.

This data tradeoff is also relevant for prompt engineering workflows. If you're using the free tier to iterate on prompt templates for a commercial product, the prompt structure itself could theoretically inform Google's training. Whether that matters depends on how proprietary your prompting approach is — but it's worth knowing before you test production prompts on a free key.


Paid tier rate limits: Tier 1, Tier 2, and Tier 3

When you add a billing account to your Google AI Studio project, your API key automatically unlocks Tier 1 rate limits. No application required — billing is the only gate. Tier 1 limits for Gemini 2.5 Flash are 2,000 RPM, 4,000,000 TPM, and no RPD cap. For Gemini 2.0 Flash, Tier 1 is 2,000 RPM, 4,000,000 TPM, unlimited RPD.

Tier 2 and Tier 3 are available to accounts with higher spend history. Google auto-upgrades accounts that consistently spend above threshold amounts; the exact thresholds are documented at ai.google.dev/gemini-api/docs/rate-limits. Tier 2 roughly doubles Tier 1 limits; Tier 3 is for high-volume production workloads and can be further extended via a quota increase request.

The jump from free tier to Tier 1 is dramatic: for Gemini 2.5 Flash, RPM goes from 10 to 2,000 (200x), TPM goes from 250,000 to 4,000,000 (16x), and the RPD cap disappears entirely. For most production use cases, Tier 1 is more than sufficient.

It's worth noting that even at low Tier 1 usage, monthly costs on Gemini Flash are modest. Gemini 2.5 Flash is priced at $0.30 per million input tokens and $2.50 per million output tokens as of June 2026 (verify current prices at ai.google.dev/pricing). For a typical application sending 1,000 tokens in and receiving 500 tokens out per request, that's roughly $0.0004 per call — you'd need 250,000 calls per month to hit $100 in API spend. For most developers, the cost to upgrade from free is negligible.


How to upgrade from free tier to paid tier

The upgrade process is entirely self-serve through Google AI Studio. Navigate to your project settings, select Billing, and link a Google Cloud billing account. If you don't have a Google Cloud billing account, you'll create one during this flow — it requires a credit card but you won't be charged until you exceed free tier credits (Google provides $300 in Cloud credits for new accounts).

Once billing is linked, your existing API key immediately operates at Tier 1 limits. There is no waiting period, no application, and no approval step. Your free tier RPM/RPD limits are replaced by Tier 1 limits the moment billing is active.

One common confusion: Google AI Studio and Google Cloud Vertex AI are separate products that both offer Gemini access. AI Studio is the developer-facing API (what this guide covers). Vertex AI is the enterprise product with different pricing, different SLAs, and different rate limit structures. If you're evaluating Gemini for enterprise or regulated workloads, Vertex AI is typically the right path — but the free tier discussion in this guide applies specifically to the Gemini API via AI Studio.

For a broader look at how Gemini's context window and token accounting works — which affects how fast you burn through TPM — see our Gemini 1.5 Pro context length explained post. Token counting in Gemini can differ from OpenAI's tokenizer in ways that affect your effective throughput.


When you'll hit each limit first: usage pattern analysis

Different usage patterns collide with different limits. Interactive development (one request at a time, medium-length prompts): you'll almost certainly hit RPD before RPM or TPM. The 25 RPD on Gemini 2.5 Pro runs out in a single focused work session. Upgrade to paid immediately if you're doing serious development on Pro.

Batch processing (many short requests fired concurrently): you'll hit RPM first. The 10-30 RPM free tier limits mean concurrent requests from multi-threaded code will back-pressure immediately. Solution: either rate-limit your client-side concurrency, or upgrade to paid Tier 1 (2,000 RPM on most models).

Long-document analysis (sending 50k+ token documents): you'll hit TPM. At 250,000 TPM for Gemini 2.5-series models, three back-to-back 100k-token requests exhaust your minute budget. Either space requests out with deliberate delays, or use Gemini 2.0 Flash (1,000,000 TPM free tier) for document processing tasks where Flash quality is sufficient.

For cost modeling across your expected usage pattern, our AI Prompt Cost Calculator lets you input request volume, token length, and model selection to see exactly when you'd exceed free tier and what the paid tier cost would be. It's the fastest way to decide whether the free tier works for your specific workload.


Handling 429 errors gracefully in your Gemini integration

When the Gemini API returns a 429 (Too Many Requests), the response body includes a `message` field that identifies which limit fired and, in many cases, a `retryDelay` value. Build your client to parse this rather than applying a fixed retry interval.

For RPM 429s, exponential backoff starting at 60 seconds is the standard pattern. The Google AI Python and Node SDKs include built-in retry logic with jitter; if you're using the REST API directly, implement exponential backoff with a maximum of 3-5 retries before surfacing an error to the user. A simple pattern: wait = min(2^attempt * 30, 300) seconds.

For RPD 429s, no amount of backoff will help until the next UTC day boundary. Your application should detect this error code specifically (the message typically contains 'daily limit' or 'quota exceeded') and fail fast with a user-friendly message rather than retrying. Logging these events is useful for understanding when you're pushing against the daily cap.

For TPM 429s, the effective retry window depends on how much of your TPM budget was consumed by the most recent requests. A simple approach: wait 60 seconds (the full minute window resets) before retrying. For latency-sensitive applications, consider switching to a smaller or less capable model as a fallback when TPM limits are hit — Flash-Lite's larger TPM budget makes it a reasonable fallback for Pro or standard Flash under load.

See our LLM rate limits guide for 2026 for a comparison of how Gemini's rate limiting behavior compares to OpenAI and Anthropic — the error formats and retry semantics differ meaningfully between providers.


Model selection strategy for free tier usage

If you're building under free tier constraints, model selection is your primary lever. The hierarchy from most to least restrictive RPD: Gemini 2.5 Pro (25/day) → Gemini 1.5 Pro (50/day) → Gemini 2.5 Flash (500/day) → everything else (1,500/day). Match your task requirements to the least-capable model that delivers acceptable quality.

Gemini 2.0 Flash and Flash-Lite are underrated options on the free tier precisely because of their 1,000,000 TPM budget. For token-heavy workflows like long document processing, RAG pipelines, or multi-turn conversations with long history, the extra TPM headroom often matters more than the model quality gap between 2.0 and 2.5.

A practical tiering strategy for free-tier builds: use Gemini 2.0 Flash-Lite for high-volume light tasks (classification, extraction, formatting), Gemini 2.5 Flash for complex generation tasks, and reserve Gemini 2.5 Pro for the specific requests that genuinely need frontier reasoning. This matches the tiering approach in our AI cost optimization checklist for 2026 — the same model-tiering principle that cuts paid API bills 40-80% also maximizes your free tier budget.

One underused feature on the free tier: Gemini's context caching. If your requests share a large stable prefix (a long system prompt, a document you're repeatedly querying), context caching can dramatically reduce your effective TPM consumption per request. Cached tokens count at a reduced rate toward TPM, which means you can fit more requests within the per-minute budget. This is available on both free and paid tiers for models that support it — check ai.google.dev/gemini-api/docs/rate-limits for the current list of caching-eligible models.


Gemini free tier vs OpenAI free tier: how they compare

OpenAI's free tier (via the Playground) is not available as a standalone API key — there's no free RPM/TPM/RPD equivalent for OpenAI API access. You get $5 in free credits when you create an account, but those expire and API access requires a billing account after that. By contrast, the Gemini API's free tier is indefinite and rate-limit-based, not credit-based. This makes Gemini's free tier structurally more useful for ongoing development.

The tradeoff is throughput. Gemini's free tier is rate-limited to protect infrastructure, while OpenAI's credits-based approach lets you burst freely until the credits run out. For short-term experimentation, OpenAI's model (burst then pay) can be more convenient. For ongoing development over weeks or months, Gemini's persistent free tier is the better deal.

For a full cost comparison once you move to paid tiers, see our how much does ChatGPT cost in 2026 post and our OpenAI API pricing guide for 2026. On many workloads, Gemini 2.5 Flash is meaningfully cheaper per token than GPT-4o-class models, which makes the migration economics straightforward once you've validated quality on the free tier.

One dimension where OpenAI still has an advantage: the Batch API with 50% discount on async workloads. Gemini has context caching but no equivalent batch pricing tier as of June 2026. If your workload is primarily async batch processing, the OpenAI Batch API or Anthropic Message Batches API may be more cost-effective at scale even if Gemini's base per-token price is lower. See our AI cost optimization checklist for how to model this tradeoff with real numbers.


Common mistakes that waste your free tier budget

The most common mistake is testing against Gemini 2.5 Pro when your production workload will use Gemini 2.5 Flash. Pro's 25 RPD limit is gone by noon; Flash's 500 RPD gives you a full day of development. Unless your application specifically requires Pro-tier reasoning, do your initial development and prompt engineering on Flash.

The second most common mistake is not setting output token limits. On the free tier, every output token counts against your TPM budget. If you're iterating on prompts and the model is generating 2,000-token responses when you only need 200, you're burning 10x your effective TPM. Always set `maxOutputTokens` explicitly during development — it also forces you to think about the output contract your prompts should produce.

Third: ignoring the midnight Pacific reset on RPD. Developers who work in non-Pacific timezones sometimes get confused when their API calls start working again at what seems like an arbitrary time. If you're hitting RPD limits regularly, schedule your heavy test runs to start shortly after midnight Pacific (03:00 Eastern, 08:00 UTC) to maximize your daily window.

Fourth: using the free tier key for any prompt that contains customer data or proprietary business logic. As covered in the data-usage section above, free tier prompts can be used by Google for model improvement. Establish a hard rule in your team: free keys are for synthetic test data only, paid keys for anything real.

For how token counting works at the prompt level — which affects how quickly you exhaust TPM — see our how many tokens does a typical prompt use guide. Understanding token density in your prompts is the fastest way to predict your actual rate limit exposure.

Continue your research on adjacent topics — calculators, rate limits, head-to-head comparisons, and guides.

Frequently Asked Questions

What are the Gemini API free tier rate limits for Gemini 2.5 Pro?

Gemini 2.5 Pro on the free tier is capped at 5 RPM (requests per minute), 250,000 TPM (tokens per minute), and 25 RPD (requests per day). The RPD limit is the binding constraint for most developers — 25 requests per day runs out quickly during active development. Source: ai.google.dev/gemini-api/docs/rate-limits.

How many requests per minute does Gemini 2.5 Flash allow on the free tier?

Gemini 2.5 Flash allows 10 RPM on the free tier, with 250,000 TPM and 500 RPD. The smaller Gemini 2.5 Flash-8B variant allows 15 RPM with the same TPM and 1,500 RPD.

Does Google use my prompts to train its models on the free tier?

Yes. On the Gemini API free tier, Google's terms permit your inputs and outputs to be used to improve Google products and models, which may include human review. On all paid tiers, Google does not use your data for training. This data-usage difference is a strong reason to upgrade to a paid key before sending any real customer data or proprietary content.

What happens when I hit the RPD limit on the free tier?

You'll receive a 429 error with a message indicating the daily quota is exhausted. The limit resets at midnight Pacific Time. No backoff will help — you either wait for the reset or upgrade to a paid tier. Billing-enabled accounts on Tier 1 have no RPD cap.

How do I upgrade from the Gemini API free tier to paid?

Link a Google Cloud billing account to your Google AI Studio project. The upgrade is immediate — your existing API key moves to Tier 1 rate limits as soon as billing is active. No application or approval is required. Tier 1 dramatically increases limits: Gemini 2.5 Flash goes from 10 RPM to 2,000 RPM, and the daily cap is removed.

Which Gemini model has the highest free tier rate limits?

Gemini 2.0 Flash-Lite has the highest combined free tier limits: 30 RPM, 1,000,000 TPM, and 1,500 RPD. If you need maximum throughput on the free tier and Flash-Lite's quality meets your requirements, it offers the most headroom before hitting limits.

Are the Gemini API free tier limits per API key or per account?

Rate limits apply per API key. A Google account can have multiple projects, and each project gets its own API key with its own independent rate limit budget. This can be used to spread development work across multiple keys, though Google's terms require each key to represent a distinct use case — creating multiple keys solely to multiply free tier capacity may violate the terms of service.

What is the difference between Gemini API free tier and Vertex AI?

The Gemini API via Google AI Studio (what this guide covers) has a free tier with rate-limited access. Vertex AI is Google's enterprise platform for Gemini access — it has different pricing, different SLAs, and no free tier (you pay from the first token). Vertex AI is appropriate for regulated industries, enterprise data governance requirements, or production deployments that need guaranteed SLAs. The Gemini API free tier is for development and prototyping.

How do I avoid hitting the free tier TPM limit?

Set maxOutputTokens on every request to the minimum your use case requires. Use context caching for repeated system prompts or documents — cached tokens consume less of your TPM budget. Consider using Gemini 2.0 Flash or Flash-Lite (1,000,000 TPM free) instead of 2.5-series models (250,000 TPM free) for token-heavy workloads. And see our how many tokens a typical prompt uses guide for sizing estimates.

Know your exact Gemini cost before you upgrade.

Paste your expected request volume and token length into our AI Prompt Cost Calculator to see the monthly cost at Gemini's paid Tier 1 vs OpenAI vs Anthropic — side by side. Takes 60 seconds.

Browse all prompt tools →