Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Pinecone Quota Tiers and Limits (2026): Starter, Standard, and Enterprise

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

Pinecone is the most widely cited managed vector database for retrieval-augmented generation (RAG) workloads, but its quota system has grown more complex as the product moved from pod-based to serverless architecture. Developers routinely hit one of three invisible ceilings before they realize what happened: the Starter tier pausing their index after seven days of inactivity, the 200-namespace-per-index cap causing unexpected errors at scale, or the 40KB per-vector metadata limit truncating payloads silently. This page catalogs every documented limit across all three tiers as of June 2026.

Pinecone's billing model changed significantly when serverless became the default architecture. Instead of paying for reserved pod capacity, you now pay for read units and write units consumed — a model that is cheaper at low throughput but can surprise teams doing bulk upserts. Understanding the unit math before you commit to a plan is the single highest-value action you can take during architecture planning. The vector DB cost calculator on this site can help you estimate monthly spend under serverless pricing before you provision anything.

If you are evaluating Pinecone against open-source or alternative managed options, the Pinecone vs Weaviate vs Qdrant comparison covers architectural trade-offs in depth, and the Qdrant Cloud quotas page provides the parallel breakdown for Qdrant's managed offering. For teams building their first RAG pipeline, the build RAG with Pinecone tutorial walks through index creation, upsert, and query end-to-end. All numbers on this page come from pinecone.io/pricing and docs.pinecone.io/reference/quotas-and-limits — verify before procurement, as pricing and limits have changed multiple times in the past 18 months.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Pinecone Plan Limits at a Glance (June 2026)

Feature
Plan
Free / Starter
Standard (entry)
Enterprise
Vectors stored100,000Unlimited (pay-per-GB)Unlimited (custom pricing)
Storage included2 GBNone included; $0.33/GB-monthNegotiated
Indexes1MultipleMultiple (unlimited in practice)
Namespaces per index200UnlimitedUnlimited
Metadata per vector40 KB max40 KB max40 KB max
Inactivity policyPaused after 7 days; data preservedNo inactivity pauseNo inactivity pause
Write unitsIncluded in free allowance$0.33 per 1M write unitsCustom / negotiated
Read unitsIncluded in free allowance$8.25 per 1M read unitsCustom / negotiated
SupportCommunity / docs onlySupport ticket with SLADedicated support + SLA
SOC 2 Type IINoAvailable on requestYes, included
SLA (uptime guarantee)NoneStandard SLACustom SLA
BYOC (Bring Your Own Cloud)NoNoYes

Sources: pinecone.io/pricing, docs.pinecone.io/reference/quotas-and-limits. Prices in USD. Standard entry price ~$50/mo depending on consumption. All figures as of June 2026 — verify before procurement.

The Starter Tier Inactivity Pause: What It Is and How to Avoid It

The most common support question about Pinecone's free tier is not about capacity — it is about the seven-day inactivity pause. If no read or write operations are made against a Starter index for seven consecutive days, Pinecone automatically pauses the index. The index is not deleted; your vectors and metadata are preserved. However, all queries and upserts will return an error until you manually resume the index through the console or API.

The practical consequence for developers is that a demo application, a side project, or a staging environment that goes unused over a long weekend can silently stop working. Monitoring that checks index health periodically will catch this, but many teams do not add that check until after they have already experienced an unexpected outage. The Pinecone console shows a 'Paused' badge on affected indexes, but if you are not actively watching the console, the first signal is usually a 503 or a 'Index not ready' error in your application logs.

There are two reliable prevention strategies. The first is a lightweight keep-alive script — a cron job that issues a trivial query (such as a nearest-neighbor search for a single dummy vector) every five or six days. This resets the inactivity timer without consuming meaningful quota. The second is simply upgrading to Standard before your application goes into production; Standard accounts have no inactivity policy. If your project is genuinely experimental and you want to stay on Starter, the keep-alive cron is the correct answer. There is no setting in the Pinecone console to disable the inactivity policy on a free account.


Serverless Billing Anatomy: Read Units, Write Units, and Storage

Pinecone's serverless architecture, which became the default in 2024 and is the only recommended architecture for new projects as of 2026, bills on three dimensions: storage consumed (GB-months), write units (WU) consumed during upserts and updates, and read units (RU) consumed during queries. Understanding each dimension separately is important because the dominant cost driver varies significantly by workload type.

Storage is straightforward: you pay $0.33 per GB-month. A 768-dimensional float32 vector occupies roughly 3 KB of raw vector data. One million such vectors require approximately 3 GB of raw storage, which translates to about $1.00/month in storage costs alone. However, Pinecone stores metadata alongside vectors, and metadata storage is included in the same GB-month rate. If your metadata averages 1 KB per vector, one million vectors add another 1 GB of storage cost. The actual bill also reflects index overhead, which Pinecone does not fully document but which empirical measurements suggest adds 20-40% to raw vector size.

Write units are charged at $0.33 per one million units. A single upsert of a 768-dimensional vector with modest metadata consumes approximately one write unit, though the exact definition of a write unit is tied to vector dimensions and payload size in ways that Pinecone documents at a high level rather than with a precise formula. For teams doing bulk initial indexing, this cost is typically incurred once and is relatively small — indexing one million vectors at one WU each costs $0.33. Ongoing writes from a production pipeline that indexes a few thousand new documents per day add negligible cost. Read units are the more significant ongoing expense at $8.25 per one million units. A single top-k=10 query against a 768-dimensional index consumes roughly one read unit, meaning one million daily queries would cost $8.25/day or roughly $250/month. Teams with high query volume should model this carefully before assuming serverless is cheaper than the legacy pod model. The vector DB cost calculator provides a side-by-side estimate.

One practical note: serverless read unit costs are sensitive to top-k values and metadata filter complexity. A query with a tight metadata filter that eliminates most of the index will generally consume fewer read units than a broad scan. Pinecone's documentation recommends using metadata filters aggressively to reduce both latency and cost, which is good architectural advice independent of budget concerns.


Namespace Limits: The 200-Namespace Cap on Starter and When It Matters

Pinecone namespaces are a partitioning mechanism within a single index. Each namespace operates as an isolated sub-index: upserts, queries, and deletes within a namespace do not cross namespace boundaries unless you explicitly query across namespaces. The common use cases for namespaces include multi-tenant SaaS applications (one namespace per tenant), document-type segregation (one namespace for product manuals, another for support tickets), and A/B testing of embedding models (old embeddings in namespace A, new embeddings in namespace B during a migration window).

The Starter tier caps namespaces at 200 per index. For single-tenant applications or small multi-tenant systems, this limit is irrelevant. However, SaaS products that provision one namespace per customer will hit this ceiling at 200 customers — a point that many early-stage founders do not anticipate. When a 201st namespace creation is attempted on a Starter account, the API returns an error rather than silently truncating or consolidating. This means the failure is detectable but disruptive if it reaches production without a guard.

Standard and Enterprise accounts have no documented namespace limit. Pinecone's internal architecture handles namespaces efficiently because they share the same underlying index structure; adding a namespace does not provision new compute resources. The practical ceiling on Standard is a function of your index's overall size and query patterns rather than a hard quota. Teams building multi-tenant systems should plan their namespace strategy with the RAG architecture decision tree in mind — namespaces solve tenant isolation cleanly, but they are not the right tool for completely separating embedding model versions across large-scale indexes where a separate index-per-model-version approach may be preferable.


Metadata Indexing Limits: The 40KB Per-Vector Ceiling and Indexed Field Caps

Every vector in Pinecone can carry an associated metadata payload — an arbitrary JSON object stored alongside the embedding. This metadata is used for both display (returning source text or document IDs in query results) and filtering (restricting queries to vectors that match specific metadata conditions). The hard limit is 40 KB of metadata per vector. This is generous for most RAG use cases where metadata contains document identifiers, short text snippets, and categorical labels, but teams that attempt to store full document text in metadata will hit this ceiling.

A more subtle limit applies to indexed metadata fields. Pinecone distinguishes between metadata fields that are indexed (available for filtering in queries) and fields that are stored but not indexed (returned in results but not filterable). Not all metadata fields are automatically indexed; Pinecone uses a selective indexing approach that affects query performance and storage overhead. The documentation recommends configuring the `metadata_config` on index creation to specify which fields will be indexed, and notes that indexing too many fields — particularly high-cardinality string fields — can degrade query performance and increase storage costs.

A common mistake is storing large text chunks in metadata under the assumption that this avoids the need for a separate document store. This works at small scale but creates two problems: metadata payloads approaching the 40KB limit will cause upsert failures for longer documents, and large metadata payloads increase the cost of both write and read operations. The standard production pattern for RAG systems is to store only identifiers and short filter-relevant attributes in Pinecone metadata, and retrieve full document content from a separate store (PostgreSQL, S3, or Redis) keyed on the document ID returned by Pinecone. This architecture is covered in detail in the RAG with Pinecone tutorial.


Pod-Based vs Serverless: The Architecture Migration

Pinecone originally offered only pod-based indexes, where you paid for reserved compute capacity sized by pod type (s1, p1, p2) and pod count. Pod-based indexes offered predictable latency and cost but required upfront sizing decisions, and scaling required manual pod resizing or index recreation. As of 2026, Pinecone classifies pod-based indexes as a legacy architecture and does not recommend them for new projects. Serverless is the current standard for all new index creation.

The key differences between the two architectures matter for teams still running pod-based indexes or evaluating a migration. Pod-based indexes bill per hour regardless of query volume — a p1.x1 pod costs a fixed monthly amount whether you run one query or one million. Serverless bills purely on consumption. For low-query-volume applications that maintain large indexes (such as internal knowledge bases with infrequent but high-value queries), pod pricing could be lower than serverless. However, Pinecone has not published a sunset date for pod-based indexes, and teams on pods should track announcements from Pinecone carefully. Continuing to build new features on a deprecated architecture is a technical debt decision that should be deliberate, not accidental.

Migration from pod to serverless is not a one-click operation. It requires creating a new serverless index, re-upserting all vectors (which incurs write unit costs), re-pointing application queries to the new index endpoint, and validating query results for consistency. For indexes with tens of millions of vectors, this migration can take hours and requires careful coordination to avoid serving stale results during the cutover. Pinecone's documentation provides migration tooling, but there is no zero-downtime migration path available through the API alone — blue-green index strategies (writing to both old and new during migration) are the standard approach for production systems.


Enterprise Plan: SLA, SOC 2, Dedicated Tier, and BYOC

Pinecone's Enterprise tier is a custom-priced contract that removes the constraints of Standard and adds compliance, security, and deployment flexibility features. The most operationally significant additions are: a documented uptime SLA (Standard provides an SLA, but Enterprise SLAs include stronger remedies and more granular commitments), SOC 2 Type II certification documentation available for vendor reviews, and BYOC (Bring Your Own Cloud) deployment.

BYOC is the feature that unlocks Pinecone for organizations with data residency requirements. Under BYOC, Pinecone's control plane manages the service orchestration, but the actual vector data and index storage runs within the customer's own AWS, GCP, or Azure account. This means vector data never leaves the customer's cloud tenancy. It satisfies the data residency and data sovereignty requirements that appear in GDPR Article 46 DPA clauses, HIPAA BAAs, and FedRAMP-adjacent workloads. Note that BYOC is not the same as self-hosting — the control plane and management layer remain Pinecone-operated; only the data plane runs in your cloud. Organizations that need fully air-gapped, fully self-managed deployments (government classified environments, for example) should evaluate Pinecone's on-premises offering or alternatives like Qdrant Private Cloud.

Enterprise pricing is negotiated directly with Pinecone's sales team and is not published. Based on community reports and sales materials, entry-level Enterprise contracts typically start well above the Standard consumption pricing for equivalent workloads. The value proposition is not cost reduction but rather compliance certification, dedicated support response times, and BYOC. If your organization does not have a legal or compliance requirement driving the decision, Standard with support tickets is usually adequate. If a security review or a data processing agreement requires documented SOC 2 or SLA remedies, Enterprise is the only tier that provides them.


Sourcing, Verification, and Live Data Caveats

All figures in this document are sourced from pinecone.io/pricing and docs.pinecone.io/reference/quotas-and-limits, last verified in June 2026. Pinecone has updated pricing and limits several times in the past two years — the move from pod-based to serverless billing changed the fundamental pricing model, and specific unit prices (per-GB storage, per-million read units, per-million write units) have been adjusted. The figures here were accurate at time of writing, but should be confirmed against the official pricing page before making procurement decisions.

The 40KB metadata limit, the 200-namespace Starter cap, and the seven-day inactivity pause are all documented in Pinecone's official limits reference and have been stable across multiple documentation versions. The serverless unit pricing ($0.33/GB-month, $0.33/M write units, $8.25/M read units) reflects the current published rates but is the most likely figure to have changed if you are reading this significantly after June 2026. The enterprise pricing section deliberately contains no specific dollar figures because Pinecone does not publish them and any third-party estimate would be unreliable.

Pinecone's changelog and status page (status.pinecone.io) are the authoritative sources for limit changes. For teams building systems with hard dependencies on specific Pinecone limits, subscribing to Pinecone's changelog or monitoring their documentation RSS feed is worthwhile. Limit increases are generally announced with advance notice; limit reductions or deprecations (like the pod-based architecture transition) have historically come with longer lead times.

Choosing the Right Pinecone Plan

  1. 1

    Size your index before choosing a plan

    Estimate your total vector count and average metadata size before touching the Pinecone console. Multiply your document count by your expected embedding dimensions (1536 for text-embedding-3-small, 3072 for text-embedding-3-large, 768 for many open-source models) to get raw vector bytes. Add estimated metadata per vector. If your total footprint is under 100,000 vectors and 2GB, Starter will hold you. If you project beyond that within the next 90 days, start on Standard — migrating from Starter to Standard mid-project is possible but requires downtime planning.

  2. 2

    Calculate your serverless read and write unit costs

    Estimate write units by counting your expected daily upserts. A rough heuristic: one write unit per vector upserted for typical 768-1536 dim vectors with modest metadata. Estimate read units by counting your expected daily queries multiplied by the query complexity (top-k value, metadata filter selectivity). At $8.25 per million read units, 10,000 queries per day is 300,000 read units per month, costing roughly $2.50/month. At 100,000 queries per day, the same math yields $25/month. Use the vector DB cost calculator to model your specific workload before committing.

  3. 3

    Configure inactivity prevention for Starter accounts

    If you are staying on the Starter tier for development or staging, add a keep-alive mechanism immediately. A simple Node.js or Python cron job that runs every five days, issues a single dummy query to your index, and logs the response is sufficient. This can be a GitHub Actions workflow on a schedule trigger, a local cron on your development machine, or a lightweight serverless function in AWS Lambda or Vercel. Do not rely on your application's normal traffic to keep the index alive — if your staging environment goes idle over a holiday or sprint break, the index will pause.

  4. 4

    Design your namespace architecture before initial upsert

    Namespaces cannot be renamed or merged after creation. Design your namespace schema during the architecture phase, not after data is already in the index. For multi-tenant applications, decide whether you want per-customer namespaces (simple isolation, scales to ~200 on Starter or unlimited on Standard) or shared-namespace with metadata-based tenant filtering (scales farther on Starter but requires careful query-time filter enforcement). For applications with mixed content types (product descriptions, support tickets, documentation), namespacing by content type allows independent recall quality tuning without affecting query cost for unrelated content.

  5. 5

    Graduate from Starter to Standard with a planned migration window

    When you are ready to move from Starter to Standard, plan a brief maintenance window rather than doing it under traffic. The migration steps are: (1) create a new Standard index with the same dimension and distance metric as your Starter index; (2) re-upsert all vectors using a batched script that reads from your source documents rather than exporting from Pinecone (Pinecone does not provide a bulk export API); (3) validate query results by running a sample query set against both indexes and comparing top-k results for consistency; (4) update your application's index endpoint and API key; (5) delete the Starter index after confirming the Standard index is serving correctly. Budget 2-4 hours for this migration for a 100,000-vector index and more for larger collections.

Frequently Asked Questions

What happens when a Pinecone Starter index is paused due to inactivity?

The index enters a paused state and all API calls to it return an error. Your data is not deleted — vectors, metadata, and namespace structure are preserved. You resume the index through the Pinecone console by clicking the Resume button on the index detail page, or via the API using the resume index endpoint. Resume typically takes 30-60 seconds. There is no data loss, and the inactivity timer resets to zero on resume. Subsequent inactivity of another seven days will pause it again.

What is the difference between a namespace and an index in Pinecone?

An index is the top-level resource in Pinecone — it defines the vector dimension, distance metric (cosine, euclidean, dot product), and billing unit. A namespace is a logical partition within an index. Vectors in different namespaces cannot query each other by default, but they share the same index capacity and are billed together under the same index. Indexes are the right granularity for completely isolated workloads (separate products, separate organizations). Namespaces are the right granularity for partitions within a single logical workload (per-tenant data in a multi-tenant SaaS, per-content-type segregation, A/B embedding model testing).

My metadata payload is close to 40KB. What happens if I exceed it?

Upsert requests with metadata payloads exceeding 40KB will return a 400 error. The vector will not be written. The 40KB limit applies to the serialized JSON metadata object per vector, not to the total metadata stored in the index. If you are approaching this limit, the most common cause is storing long text content (full paragraphs, HTML, or JSON blobs) in metadata. The solution is to store only identifiers and short filterable attributes in Pinecone metadata, and retrieve full content from a separate document store using the IDs returned by Pinecone queries.

Should I use serverless or pod-based Pinecone in 2026?

Serverless for all new projects. Pod-based is a legacy architecture that Pinecone no longer recommends. If you have an existing pod-based index that is stable and cost-efficient, there is no immediate deadline to migrate — Pinecone has not announced a sunset date as of June 2026. However, new features, optimizations, and tooling are all being built around serverless. Teams starting new indexes should use serverless exclusively. The exception would be a very specific cost analysis showing that a reserved pod at fixed cost is cheaper than serverless consumption for your exact workload — this situation is less common than it was two years ago.

What is BYOC (Bring Your Own Cloud) and which plan includes it?

BYOC means the vector data and index storage run within your own cloud account (AWS, GCP, or Azure) rather than in Pinecone's shared infrastructure. Pinecone's control plane still manages the service, but your data never leaves your cloud tenancy. This satisfies data residency requirements common in regulated industries and GDPR-adjacent compliance situations. BYOC is available only on the Enterprise plan, which requires a custom contract. There is no self-service signup for BYOC.

How do write units and read units map to actual API calls?

Pinecone does not publish a precise formula that maps individual API calls to exact unit counts. The general guidance from their documentation is that write units are proportional to the number of vectors upserted and their dimensions, and read units are proportional to the number of query results returned (top-k) and the index size searched. Empirically, a typical 768-dim upsert of a few hundred vectors consumes on the order of hundreds of write units, and a single top-k=10 query consumes on the order of 1-2 read units. The Pinecone console shows unit consumption in the usage tab, which is the most accurate way to measure your specific workload's unit cost before projecting to production scale.

What is the maximum number of indexes I can have on the Starter plan?

The Starter plan allows one index. This means you cannot have separate indexes for development, staging, and production on a free account. A common workaround is to use namespaces within the single index to simulate environment separation (e.g., a 'dev' namespace and a 'staging' namespace), but this trades clean isolation for namespace slots. For teams that need true environment isolation, the Standard plan's multi-index support is necessary.

Does the 200-namespace limit on Starter apply per index or per account?

The 200-namespace limit applies per index, and since Starter accounts have only one index, it is effectively also the per-account limit. Standard and Enterprise accounts have no documented namespace limit per index, and since they support multiple indexes, the namespace ceiling for those accounts is not a practical constraint for most workloads.

Calculate Your Pinecone Costs Before You Commit

Use the vector DB cost calculator to model your actual storage, read unit, and write unit costs against Pinecone serverless pricing before provisioning. Takes about two minutes and will tell you whether Starter, Standard, or a self-hosted alternative is the right fit for your workload.

Browse all prompt tools →