AI Studio quotas vs Vertex AI quotas — two different systems, same models
The single most expensive misunderstanding among teams shipping Gemini in 2026: assuming that hitting Tier 3 on Google AI Studio gives them the same effective throughput as a properly-quota'd Vertex AI project. It does not — they are two separate quota systems, with different request paths, different SDKs, different billing surfaces, and different ceilings.
**Google AI Studio** (the ai.google.dev / aistudio.google.com surface, called via the `google-genai` SDK or the `gemini` REST endpoint) uses the four-tier ladder. Quota is account-level (your Google identity + linked Cloud billing account). Promotion is automatic against cumulative paid spend + time. Region routing is opaque — Google picks the data-plane region.
**Vertex AI** (the cloud.google.com/vertex-ai surface, called via the `google-cloud-aiplatform` SDK or the Vertex REST endpoint) uses GCP project quotas. Quota is project-level and regional — you request `generate_content_requests_per_minute_per_project_per_base_model` for `us-central1` separately from `europe-west4`. There is no tier ladder. Default per-project quotas are typically generous (60 RPM on Gemini 2.5 Pro in most regions at project creation), and increases are requested through the Cloud Console quota page or through your Google Cloud account team.
Which surface to use: **AI Studio** if you are prototyping, building a SaaS feature with moderate traffic, or want the simplest billing relationship. **Vertex AI** if you need data-residency controls, VPC-SC perimeters, customer-managed encryption keys (CMEK), HIPAA / FedRAMP compliance, or throughput above what Tier 3 AI Studio comfortably provides. Most production teams above ~50M tokens/day end up on Vertex AI for the quota headroom alone.