Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Vector DB Cost per 1M Embeddings (2026)

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

The embedding model cost is a one-time or periodic batch spend. The vector database cost is permanent and recurring — you pay it every month you have a live RAG system. As of June 2026, the spread between the cheapest and most expensive vector DB option for a 1M-vector index runs from near-zero (pgvector bundled in existing Postgres) to $140+/month (Pinecone dedicated pod). Picking the wrong tier on day one is common and expensive.

Three cost shapes exist in this market. **Serverless / pay-per-operation** (Pinecone Serverless, Turbopuffer): you pay per write unit, per read unit, and per GB stored — no idle cost. **Cluster-based flat-rate** (Weaviate Cloud, Qdrant, Zilliz): you pay a monthly base for a cluster regardless of utilization, often with per-vector overage. **Bundled** (pgvector on Supabase, Neon, RDS): vector storage is indistinguishable from your Postgres bill; you get vector search as an extension at no incremental list price.

This page covers the storage and query cost side of the RAG stack. For the upstream embedding generation cost — what you pay to create those vectors in the first place — see our embeddings cost calculator. For the LLM call cost that dominates query-time spend, see our RAG cost-per-query breakdown. For a head-to-head feature comparison of the databases themselves, see Pinecone vs Weaviate vs Qdrant.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Vector DB storage + query pricing — June 2026

Feature
Provider
Storage cost
Write cost
Read/query cost
Pinecone Serverless$0.33/GB-month$0.33/1M write units (~1 vec/unit at 384 dim)$8.25/1M read units (~1 query/unit)
Pinecone Standard pod (p1.x1)Included in podIncluded in pod$70–140/mo flat
Weaviate Cloud Serverless Standard$0.095/1M vectors/month (above 250k free tier)$25/mo base included$25/mo base + per-query metering
Qdrant Cloud Free1 GB free clusterFree (within 1 GB)Free (within 1 GB)
Qdrant Cloud StandardIncluded in cluster$30–60/mo entry clusterIncluded in cluster
Zilliz Cloud (Milvus managed) ServerlessIncluded in CUFrom $0.10/hr per Compute UnitIncluded in CU
Chroma Cloud$0.06/GB stored/monthIncluded in basePay-as-you-go query metering — see trychroma.com/pricing for current rates; product was in early access as of June 2026
Turbopuffer$0.10/GB-monthIncluded$0.40/1M query operations
pgvector (Supabase/Neon/RDS)Bundled in Postgres storageNo incremental costNo incremental cost

Sources as of June 2026: Pinecone pricing (pinecone.io/pricing — serverless write/read unit rates and pod SKUs); Weaviate Cloud pricing (weaviate.io/pricing — $25/mo base + per-vector storage on Standard tier); Qdrant Cloud pricing (qdrant.tech/pricing — free 1 GB cluster, paid starts ~$30-60/mo); Zilliz Cloud pricing (zilliz.com/pricing — serverless CU from $0.10/hr); Chroma Cloud pricing (trychroma.com/pricing — verify query rates before procurement as Chroma's pay-as-you-go pricing was still in early access as of this writing); Turbopuffer pricing (turbopuffer.com/pricing — $0.10/GB storage, $0.40/1M queries). pgvector pricing varies by Postgres host; see Supabase (supabase.com/pricing), Neon (neon.tech/pricing), AWS RDS (aws.amazon.com/rds/postgresql/pricing) for base instance rates. Prices subject to change — verify before finalizing any budget.

The vector DB cost formula

Vector DB cost has three independent components. On serverless providers all three are billed separately; on cluster-based providers storage and compute are bundled into the cluster rate:

``` monthly_cost = (vectors_stored × bytes_per_vector / 1_000_000_000) × storage_$/GB_month + (monthly_writes / 1_000_000) × write_$/M + (monthly_queries / 1_000_000) × read_$/M ```

Bytes per vector = dimension_count × 4 (float32). A 384-dim vector = 1,536 bytes. A 1,536-dim vector = 6,144 bytes. A 3,072-dim vector = 12,288 bytes. This is the number that surprises teams most — switching from a 384-dim embedding model to a 3,072-dim model increases raw storage bytes by 8x before any pricing difference.

Write operations are typically one-time or infrequent (initial index build, periodic incremental updates). Read/query operations are recurring — every user query is at least one read. In production, reads dominate the bill. On Pinecone Serverless, writes are 25x cheaper per unit than reads ($0.33 vs $8.25 per million). Plan your budget around query volume, not write volume.


Worked example 1: 1M vectors — small RAG index

1M vectors at 1,536 dimensions (OpenAI text-embedding-3-small default) = 6.1 GB raw float32 storage.

**Pinecone Serverless:** Storage = 6.1 × $0.33 = **$2.01/month**. Write cost (one-time): 1M × $0.33/M = $0.33. Query cost at 10,000 queries/month: 10,000 × $8.25/M = **$0.08/month**. Total: ~**$2.10/month** ongoing at low query volume.

**Turbopuffer:** Storage = 6.1 × $0.10 = **$0.61/month**. Queries at 10,000/month: 10,000 × $0.40/M = **$0.004/month**. Total: ~**$0.61/month** — cheapest hosted option at this scale.

**Weaviate Cloud Standard:** $25/mo base covers the index. 1M vectors under the per-vector overage threshold means the $25/mo base covers you. At low query volume: **$25/month**.

**Qdrant Cloud Standard:** Entry cluster ~$30–60/mo covers a 1M-vector index with room to spare.

**pgvector (Supabase Free/Pro):** Supabase Free includes 500 MB Postgres storage; 6.1 GB overflows to Pro ($25/mo base + $0.125/GB above 8 GB). At 1M vectors, pgvector fits on the $25/mo Pro plan with comfortable headroom. Near-zero incremental vector cost.

Takeaway at this scale: Turbopuffer is cheapest for storage-heavy, query-light workloads. Pinecone Serverless is competitive. Weaviate/Qdrant bundles make sense if you value managed operations over raw per-unit cost.


Worked example 2: 100M vectors — medium production RAG

100M vectors at 1,536 dimensions = 614 GB raw float32. This is a mid-market enterprise RAG — a SaaS knowledge base, multi-product documentation, a financial-data corpus.

**Pinecone Serverless:** Storage = 614 × $0.33 = **$202.62/month**. Writes (one-time 100M): $33 one-time. Queries at 1M/month: 1M × $8.25/M = **$8.25/month** query. Total: ~**$211/month**.

**Turbopuffer:** Storage = 614 × $0.10 = **$61.40/month**. Queries at 1M/month: $0.40. Total: ~**$61.80/month** — 3.4x cheaper than Pinecone Serverless at this storage scale.

**Pinecone Standard pod (p1.x2):** A p1.x2 pod holds ~250M vectors at 768 dims, roughly equivalent at 1,536 dims to a 125M-vector capacity. ~$140–280/month depending on provisioning. Flat-rate predictability vs serverless variability.

**Weaviate Cloud Standard:** $25/mo base + 99M vectors above the 1M base at $0.095/1M = **$9.41/month** overage. Total: **$34.41/month** if within the Standard tier's compute budget for your query volume — significantly cheaper than Pinecone at this vector count.

**pgvector on managed Postgres:** 614 GB Postgres storage on Supabase ($0.125/GB above 8 GB) = ~$76/month storage only, no query surcharge. Requires tuning HNSW index parameters for performance at this scale — verify `ivfflat` vs HNSW recall tradeoffs before production.

Takeaway at 100M vectors: Weaviate Cloud Standard and Turbopuffer are the most cost-competitive managed options. Pinecone is easiest to operate but costs 3-6x more per GB stored.


Worked example 3: 1B vectors — enterprise scale

1B vectors at 1,536 dimensions = 6,144 GB (6.1 TB) raw float32. Enterprise-scale RAG — a legal firm's full document archive, a global e-commerce product catalog with embeddings per SKU per market.

**Pinecone Serverless:** Storage = 6,144 × $0.33 = **$2,027/month**. Queries at 10M/month: 10M × $8.25/M = **$82.50/month**. Total: ~**$2,110/month**.

**Turbopuffer:** Storage = 6,144 × $0.10 = **$614/month**. Queries at 10M/month: 10M × $0.40/M = **$4/month**. Total: ~**$618/month** — roughly 3.4x cheaper than Pinecone at this scale.

**Zilliz Cloud dedicated cluster:** At 1B vectors, Zilliz dedicated tiers (Milvus managed) are designed for this workload. CU pricing at this scale requires a custom quote from the Zilliz sales team — expect enterprise contract pricing rather than self-serve rates. Verify at zilliz.com/pricing or contact sales for exact monthly figures.

**pgvector on AWS RDS:** 6.1 TB Postgres storage on RDS gp3 (~$0.115/GB-month) = **$703/month** storage alone, before instance cost. A db.r6g.4xlarge for HNSW index at 1B vectors = ~$1,100/month instance. Total: ~$1,800/month but with full SQL/Postgres ecosystem integration.

At 1B vectors, dim count reduction matters enormously. Switching from 1,536 to 768 dims (available via OpenAI Matryoshka truncation or Voyage configurable dims) halves storage bytes — Turbopuffer drops from $614 to $307/month. At 384 dims: $153/month. The retrieval quality tradeoff must be validated on your eval set — but for most corpora, 768-dim embeddings lose less than 3% recall@10 versus 1,536-dim on standard benchmarks. Verify on your own corpus before any production dim reduction.


The dimensionality cost trap — the number that triples your bill

Most teams discover this the hard way. Embedding model documentation leads with quality metrics; storage cost is buried. The math is linear and unavoidable:

``` storage_bytes = vector_count × dim_count × 4 (float32) Examples at 1M vectors: 384 dim = 1,536 MB = 1.54 GB 768 dim = 3,072 MB = 3.07 GB 1,536 dim = 6,144 MB = 6.14 GB (OpenAI text-embedding-3-small default) 3,072 dim = 12,288 MB = 12.3 GB (OpenAI text-embedding-3-large default) ```

On Turbopuffer at $0.10/GB-month: those four options cost $0.15, $0.31, $0.61, and $1.23/month for 1M vectors respectively — the difference is rounding error at small scale. At 1B vectors: $154, $307, $614, and $1,228/month — now you're comparing $154 vs $1,228 for the same vector count. The 3,072-dim option costs 8x more to store.

**Mitigation levers:**

1. Use OpenAI text-embedding-3 `dimensions` parameter to request a lower-dim projection (e.g., `dimensions=768` instead of the default 3,072). OpenAI uses Matryoshka representation learning — the truncated vectors retain most retrieval quality. Their docs report minimal MTEB score drop at 1,536 dims vs 3,072 dims. Verify on your own corpus before committing to production.

2. Voyage voyage-3.5 supports configurable output dimensions similarly.

3. Some vector DBs support scalar quantization (int8) or binary quantization — halving or quartering storage bytes at the cost of some recall. Pinecone, Weaviate, and Qdrant all offer quantization options. Measure recall@10 before and after on a held-out query set.


Re-indexing cost: the lifecycle write bill

On serverless providers like Pinecone, writes cost money. On cluster-based providers like Weaviate or Qdrant, the write is bundled in the monthly cluster fee. The practical difference matters when you re-index.

Re-indexing events: a new embedding model ships (re-embed + re-write all vectors); chunking strategy changes (split differently, embed each new chunk, write the new index); metadata schema update (some metadata changes require deleting and re-inserting vectors); namespace reorganization.

**Pinecone Serverless write cost for 1B vectors:** 1B × $0.33/1M = $330 one-time. For 100M vectors: $33. These are one-time charges per re-index, not recurring — but they're invisible until you see the bill. Budget at least 1.5x the initial write cost per year for a production system that will evolve.

**Weaviate, Qdrant, Zilliz cluster-based:** re-index is a compute cost inside the cluster, not an incremental per-vector charge. The cluster may need to be temporarily scaled up if re-indexing concurrent with query serving — but there's no per-write unit fee. This is a meaningful advantage for systems that re-index frequently.

Practical pattern: run the new index alongside the old one in production (dual-index, A/B traffic split), validate quality metrics, then hard-switch. The temporary double storage cost is the price of a safe migration.


pgvector: the zero-incremental-cost option

pgvector is a PostgreSQL extension that adds vector similarity search natively. If you already pay for a managed Postgres instance (Supabase, Neon, Tembo, AWS RDS, Google Cloud SQL), vector storage and search cost zero additional dollars — it is just Postgres rows.

**When pgvector is the right answer:** your corpus is under 50M vectors, you already operate Postgres, your query latency requirement is above ~50ms p95, and you want to minimize vendor surface area. For most early-stage and mid-market RAG systems, pgvector with an HNSW index performs within an acceptable latency band and costs nothing incremental.

**When pgvector is the wrong answer:** you are above 100M vectors and need sub-10ms query latency; you need distributed vector storage across regions; you need advanced metadata filtering at query time with high selectivity. At that point, purpose-built vector DBs (Pinecone, Weaviate, Qdrant) earn their cost premium through purpose-built indexing structures and distributed operation.

pgvector HNSW indexes in PostgreSQL 16+ support `ef_construction` and `m` parameters that directly trade build time and index size against recall accuracy. Start with the defaults (`m=16`, `ef_construction=64`) and tune from there on your eval set. See the pgvector vs Pinecone tutorial for a worked benchmarking walkthrough.


Metadata filtering cost — the hidden multiplier

Most RAG queries include a metadata filter: retrieve vectors where `user_id = X` or `document_type = 'contract'` or `date > 2025-01-01`. On purpose-built vector DBs, the cost model for filtered vs unfiltered queries can differ significantly.

**Pinecone Serverless:** filtered queries may consume more read units than unfiltered queries on the same namespace, because the engine must scan more of the index to satisfy the filter. Pinecone's documentation describes the unit cost as scaling with the result set cardinality under filtering — verify at docs.pinecone.io before budgeting a high-filter-selectivity workload.

**Weaviate:** supports pre-filtering (filter first, then ANN search on the filtered subset) vs post-filtering. Pre-filtering is more accurate but can be slower on highly selective filters. Cloud pricing impact depends on query complexity — verify in the Weaviate Cloud billing dashboard.

**Qdrant:** uses payload indexing for metadata filters; filtered vector search is a first-class operation. Performance and cost characteristics at scale should be verified against Qdrant's benchmarks (qdrant.tech/benchmarks) for your specific filter cardinality.

Bottom line: if your RAG use case is heavily filtered (tenant isolation, per-user namespacing, date-range queries), benchmark the filtered query cost on your actual data distribution before committing to a provider. Filtered query performance varies more between providers than unfiltered performance does.


When to migrate between vector DBs

Migration is costly (re-insert all vectors, update all application code pointing to the old endpoint, validate query quality against the new index). Migrate only when a clear threshold is crossed.

**Migrate from serverless to cluster when:** your serverless read bill exceeds what a cluster would cost at your query volume. For Pinecone Serverless: at $8.25/1M reads, the p1.x1 pod ($70-140/mo) becomes cheaper above ~8-17M queries/month. Do the arithmetic at your actual query volume before assuming serverless is always cheaper.

**Migrate from cluster to serverless when:** your cluster is underutilized — you pay for a cluster sized for peak load, but average utilization is under 20%. The operational simplicity of serverless plus the pay-per-query model means idle capacity is free. Many teams overbuy dedicated clusters in year one.

**Migrate from pgvector to purpose-built when:** HNSW query latency at your vector count exceeds your SLA at p95, or you need multi-region replication, or your metadata filtering complexity outgrows what Postgres query planning handles efficiently. Typical trigger: 50-100M+ vectors with sub-15ms latency requirements.

For a detailed comparison of the databases and their architectural tradeoffs, see Pinecone vs Weaviate vs Qdrant and our RAG architecture decision tree.


The cost model you should build before picking a provider

Build this spreadsheet before committing to any vector DB. Four numbers drive 95% of the monthly bill:

``` 1. vector_count — current corpus size, not projected max 2. dim_count — from your chosen embedding model 3. monthly_queries — from your actual or estimated query volume 4. monthly_write_rate — vectors added/updated per month (incremental index updates) monthly_storage_GB = vector_count × dim_count × 4 / 1_000_000_000 Pinecone Serverless: monthly = (storage_GB × 0.33) + (monthly_queries / 1_000_000 × 8.25) + (monthly_writes / 1_000_000 × 0.33) Turbopuffer: monthly = (storage_GB × 0.10) + (monthly_queries / 1_000_000 × 0.40) Weaviate Cloud Standard: monthly = 25 + max(0, (vector_count - 1_000_000) / 1_000_000 × 0.095) ```

The formula makes one thing obvious: at high vector counts with low query volume, Turbopuffer and Weaviate beat Pinecone Serverless on storage cost. At high query volume with low vector count, Pinecone Serverless is cost-competitive because you only pay for what you query.

For the LLM call that happens after retrieval — which typically dominates the total RAG bill — see the RAG cost-per-query calculator.

How to estimate your vector DB bill in 5 steps

  1. 1

    Count your vectors and choose dimensionality

    Vector count comes from your corpus size and chunking strategy: a 1M-token corpus at 512-token chunks = 2,000 vectors. Dimensionality comes from your embedding model. Lock these two numbers first — they drive every other calculation.

  2. 2

    Calculate raw storage in GB

    storage_GB = vector_count × dim_count × 4 / 1,000,000,000. A 1M-vector 1,536-dim index = 6.14 GB. A 100M-vector 768-dim index = 307 GB. This number tells you immediately whether serverless or cluster-based pricing will dominate.

  3. 3

    Estimate monthly query volume

    Every user interaction that hits the vector DB is at least one read. A 10,000-user product at 5 queries/day = 50,000 queries/day = 1.5M/month. At Pinecone's $8.25/1M that's $12.38/month in read units alone — a real number at scale.

  4. 4

    Price out three providers

    Use the formulas in the cost model section above. Price Pinecone Serverless, Turbopuffer, and either Weaviate Cloud Standard or Qdrant Standard. The cheapest option varies with your specific storage/query ratio — do not assume serverless is always cheaper.

  5. 5

    Add a 1.5x re-index budget line

    Production RAG systems re-index at least once in their first 18 months. For serverless providers with per-write-unit pricing, the write cost of a full re-index is real. Budget annual spend at 1.5x your initial write cost to cover one full rebuild.

Frequently Asked Questions

How much does it cost to store 1 million vectors in Pinecone in 2026?

On Pinecone Serverless with 1,536-dim vectors (6.1 GB raw): $0.33/GB × 6.1 = ~$2.01/month ongoing storage. The one-time write cost for 1M vectors is $0.33. At 10,000 queries/month the read cost adds $0.08/month. Total: roughly $2.10/month at low query volume. Source: pinecone.io/pricing.

What is the cheapest way to run vector search in production?

If you already run Postgres, pgvector is zero incremental cost — vector storage is just Postgres rows. Among hosted vector DB options, Turbopuffer is the cheapest at $0.10/GB-month storage + $0.40/1M queries. Weaviate Cloud Standard at $25/month base is competitive for small-to-mid corpora. Qdrant Cloud free tier handles up to 1 GB at no cost.

How does dimensionality affect vector DB storage cost?

Linearly and directly. Each additional dimension adds 4 bytes (float32) per vector. A 3,072-dim embedding uses 8x more storage bytes than a 384-dim embedding for the same number of vectors. At 1B vectors, this means $154/month vs $1,228/month on Turbopuffer — an 8x difference driven purely by dim count. Use OpenAI's `dimensions` parameter or Voyage's configurable output dims to reduce storage cost when your eval shows acceptable recall.

When does Pinecone dedicated pods become cheaper than Pinecone Serverless?

Roughly above 8-17M queries/month. A p1.x1 pod costs $70-140/month flat. Pinecone Serverless reads are $8.25/1M, so the serverless read bill alone exceeds $70 above ~8.5M reads/month. At that volume, dedicated pods offer predictable pricing. Verify at pinecone.io/pricing for current pod SKU rates, as these shift with new pod generations.

Is pgvector good enough for production RAG?

For most teams under 50M vectors with latency tolerance above 30ms p95: yes. pgvector with HNSW indexes in PostgreSQL 16+ is production-grade. The tradeoff is operational familiarity (you manage Postgres tuning) versus purpose-built operational simplicity. Above 100M vectors with sub-15ms latency requirements, purpose-built vector DBs earn their cost premium.

What does Weaviate Cloud cost for 100M vectors?

Weaviate Cloud Serverless Standard: $25/month base + $0.095/1M vectors above the base. 100M vectors = 99M above base × $0.095/1M = $9.41/month overage. Total: $34.41/month — significantly cheaper than Pinecone Serverless at this vector count. Source: weaviate.io/pricing. Verify current overage rates before procurement.

How much does a re-index cost on Pinecone Serverless?

Writes cost $0.33/1M write units on Pinecone Serverless. A full re-index of 100M vectors = $33. A 1B-vector re-index = $330. These are one-time charges but occur every time you rebuild your index — which happens at least once in the first 18 months of any evolving production RAG system. Cluster-based providers (Weaviate, Qdrant) bundle write cost in the flat monthly rate.

What is Turbopuffer and how does it compare to Pinecone?

Turbopuffer is a serverless vector database priced at $0.10/GB-month storage and $0.40/1M query operations — roughly 3-7x cheaper than Pinecone Serverless on storage, and 20x cheaper per million queries. Tradeoff: smaller ecosystem, fewer operational integrations, and less mature documentation. For cost-sensitive teams comfortable with a newer provider, it is worth evaluating. Source: turbopuffer.com/pricing.

Build the RAG system. Then cut the bill.

The right embedding queries reduce re-runs and trim your per-query token cost. Our AI Prompt Generator writes efficient retrieval query patterns for RAG — fewer tokens per query, higher precision recall. 14-day free trial, no card.

Browse all prompt tools →