By The DDH Team · Digital Dashboard Hub

Embedding Cost Calculator 2026: Per-Million-Token Pricing Across Every Major Provider

By The DDH Team at Digital Dashboard Hub·Updated June 19, 2026

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

Embeddings convert text into fixed-length numeric vectors for semantic search, RAG retrieval, deduplication, and clustering. As of June 2026, prices per million tokens range from $0.01 (Gemini text-embedding-004 at the low end) to $0.18 (Voyage 3 Large at the high end) — a roughly 18x spread. Vector dimensions range from 384 (Cohere embed-v4-light) to 4,096 (Voyage 3 Large), which drives downstream storage cost and query latency more than the embedding bill itself.

Two cost surprises catch teams off guard. First, indexing cost is one-time but requery cost compounds — every search query requires embedding the query string. Second, vector storage and search infrastructure usually costs 5-20x the embedding bill at production scale. Below is the full table sourced from each vendor's docs, then worked $ examples for indexing 1M, 10M, and 100M chunks plus realistic query volumes. Sanity-check token estimates with our AI prompt cost calculator, or grab the free embedding cheat sheet PDF.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card. →

Embedding model price per 1M tokens — June 2026

Feature	$/1M tokens	Vector dim	Max input tokens	Notes
OpenAI text-embedding-3-large	$0.13	3,072 (Matryoshka: 256/1024/3072)	8,191	Quality leader on most retrieval benchmarks
OpenAI text-embedding-3-small	$0.02	1,536 (Matryoshka: 256/512/1536)	8,191	Best $/quality ratio; default for most teams
OpenAI text-embedding-ada-002	$0.10	1,536	8,191	Legacy; superseded by 3-small/3-large
Voyage 3 Large	$0.18	1,024 / 2,048 / 4,096	32,000	Top of MTEB; long context advantage
Voyage 3	$0.06	1,024	32,000	General-purpose default for Voyage stack
Voyage 3 Lite	$0.02	512	32,000	Budget tier, near-3-small quality
Voyage Code 3	$0.18	1,024	32,000	Code-tuned; large gains on code retrieval
Cohere embed-v4	$0.12	1,536 (Matryoshka: 256/512/1024/1536)	8,192	Strong multilingual + image input
Cohere embed-v4-light	$0.04	384	8,192	Cheapest multilingual option
Mistral-embed	$0.10	1,024	8,192	European data residency option
Google text-embedding-005	$0.025	768 / 1,536 / 3,072 (configurable)	2,048	Strong on long-tail languages
Google gemini-embedding-001	$0.15	768	2,048	Multimodal (text + image)
Jina embeddings v3	$0.018	1,024 (Matryoshka: 32-1024)	8,192	Open-weights option also available
DeepInfra BGE-large-en-v1.5	$0.005	1,024	512	Hosted open-weights; lowest $/1M

Sources, as of June 2026: OpenAI (https://developers.openai.com/api/docs/pricing), Voyage AI (https://docs.voyageai.com/docs/pricing), Cohere (https://cohere.com/pricing), Mistral (https://docs.mistral.ai/), Google (https://ai.google.dev/gemini-api/docs/pricing), Jina AI (https://jina.ai/pricing), DeepInfra (https://deepinfra.com/pricing). Matryoshka models support truncating to a shorter dimension at minor quality cost; pick the smallest dim that meets recall.

How embedding cost is calculated

Embedding bills follow a single linear formula:

``` index_cost = (total_corpus_tokens / 1,000,000) * embedding_price_per_M query_cost = (total_query_tokens / 1,000,000) * embedding_price_per_M total = index_cost + query_cost ```

Index cost is paid once when you build the vector index over your corpus. Query cost is paid every time you embed a user query to perform a semantic search; it compounds with traffic.

Token-to-chunk math: a typical RAG chunk is 200-800 tokens. A 100,000-document corpus with 5 chunks per document averaging 500 tokens each = 250M tokens. With text-embedding-3-small at $0.02/1M, indexing costs $5. At Voyage 3 Large ($0.18/1M), it costs $45. The decision is rarely 'can we afford to index' — it is 'which model gives best recall per dollar at our scale.'

The query side is often larger than teams expect. A 100k-query-per-day app at 50 tokens per query = 5M tokens per day = 150M tokens per month. On text-embedding-3-small that is $3 per month; on Voyage 3 Large, $27 per month. Cheap relative to the LLM bill but worth measuring.

Worked example 1: indexing a 1M-chunk corpus

Reference workload: 1M chunks averaging 500 tokens each = 500M total tokens.

OpenAI text-embedding-3-small: 500 × $0.02 = $10. OpenAI text-embedding-3-large: 500 × $0.13 = $65. Voyage 3 Large: 500 × $0.18 = $90. Voyage 3 Lite: 500 × $0.02 = $10. Cohere embed-v4: 500 × $0.12 = $60. Google text-embedding-005: 500 × $0.025 = $12.50. Jina v3: 500 × $0.018 = $9. DeepInfra BGE: 500 × $0.005 = $2.50.

All cheap in absolute terms. The 18x spread ($2.50 to $90) is real but for a one-time index build it rarely drives the decision. What drives the decision is downstream: recall quality on your specific corpus, vector dimension (which affects storage), and requery cost at production traffic.

Quality note: text-embedding-3-large reliably outperforms 3-small on most published retrieval benchmarks by 3-7%. Voyage 3 Large and Cohere embed-v4 trade places with text-embedding-3-large at the top of MTEB depending on domain. For specialized domains (legal, medical, code), domain-tuned variants — Voyage Code 3 for code, Cohere domain-tuned embeddings — typically beat general-purpose models by 10-20% on in-domain queries.

Worked example 2: 10M chunks indexed + 1M queries/month

Production scale: 10M chunks × 500 tokens = 5B index tokens, plus 1M queries/month × 50 tokens = 50M query tokens, plus reindexing 5% of corpus per month = 250M tokens of churn.

Total monthly tokens: ~300M (queries + churn). Index amortized over 12 months: 5,000 / 12 = 417M/month effective.

text-embedding-3-small monthly bill: (300M × $0.02 + 417M × $0.02 amortized) = $6 + $8.34 = $14.34. text-embedding-3-large: $39 + $54.21 = $93.21. Voyage 3 Large: $54 + $75 = $129. Cohere embed-v4: $36 + $50 = $86. DeepInfra BGE: $1.50 + $2.08 = $3.58.

These bills are small relative to typical LLM spend at 1M-query scale, which is often $5,000-$30,000/month. Embedding cost is a rounding error on most production budgets — choose by recall quality, not by raw $/1M, unless you are at 100M+ queries per month.

Where embedding cost does dominate: full-corpus reindex churn. If you reindex 50% of corpus monthly because the model changed or the chunking improved, the bill swings from $14 to $200+ on a 10M-chunk corpus. Plan reindex cadence carefully.

Vector dimension: the hidden cost lever

Dimension drives three downstream costs: storage in the vector DB, query latency, and (sometimes) the vector DB's per-vector pricing. A 1,024-dim vector at 4 bytes/dim is 4KB. At 10M vectors that is 40GB raw, plus index overhead — typically 60-100GB on Pinecone, Weaviate, or pgvector.

Matryoshka-style embeddings (text-embedding-3-small/large, Cohere embed-v4, Voyage 3 Large, Jina v3) let you truncate to a shorter dimension at minor recall cost. Cutting text-embedding-3-large from 3,072 to 1,024 dim typically loses 1-3% on recall benchmarks while reducing storage by 3x. For most retrieval-augmented apps the trade is worth it.

Worked storage math: 10M chunks at 3,072 dim = 117GB raw vs 39GB at 1,024 dim. On Pinecone serverless ($0.33 per million-vector-month at 1,024 dim), the 3x dim difference is roughly 3x the monthly bill — far more than the embedding model bill at most scales.

Pick the smallest dimension that meets your recall threshold. For most general-purpose retrieval, 768-1,024 dim is the sweet spot; 1,536+ pays off mainly on hard semantic tasks or highly distinct corpora.

Recall quality: who actually wins MTEB in 2026?

MTEB (Massive Text Embedding Benchmark) is the most-cited public ranking. As of June 2026, the top 5 general-purpose models cluster within 2 percentage points: Voyage 3 Large, OpenAI text-embedding-3-large, Cohere embed-v4 (1,536-dim), Mistral-embed, Google gemini-embedding-001. Below that tier, text-embedding-3-small, Voyage 3, Jina v3, Cohere embed-v4-light, and the open-weights BGE family cluster within another 2-3 points.

MTEB averages across 50+ tasks; your specific corpus may not match the average. The right way to choose: take 100 representative queries from your real workload, run them against each candidate model, measure recall@k (how often the right chunk is in the top k results) on a manually-labeled gold set. The model that wins your eval almost always differs from the one that wins MTEB.

Cost-adjusted recall is the right metric. A model with 92% recall at $0.02/1M is usually better than a model with 95% recall at $0.18/1M — the 3% improvement rarely justifies a 9x cost premium unless you are at extreme accuracy bars (medical, legal, security).

Rerankers further blur the picture. A cheap embedding model paired with a strong reranker (Cohere Rerank v3, Voyage Rerank-1) often beats an expensive embedding model alone. Budget for the reranker pass — typically $1-3 per 1M reranked pairs.

Vector storage cost: usually larger than the embedding bill

Most teams underestimate the vector DB bill. A typical 10M-vector index at 1,024 dim runs:

Pinecone serverless: ~$30-60/month on standard plans, more on production tiers with replicas and high QPS. Pinecone pod-based: $70+/month for the smallest s1 pod, scaling to hundreds for larger pods.

Weaviate Cloud: ~$25/month at the entry tier, scaling to $1,000+/month for production deployments.

Qdrant Cloud: ~$30-50/month for similar specs.

pgvector on Neon or Supabase: roughly $0-50/month at this scale, depending on the underlying Postgres tier. Cheapest but performance-tuning is on you.

Self-hosted (Chroma, Qdrant, Weaviate on Kubernetes): infrastructure cost typically $100-300/month at 10M vectors, plus the engineering time to operate.

At 100M vectors any of these can hit $500-5,000/month. The embedding bill at the same scale is typically $50-200/month. Storage is usually 10-50x the embedding cost in production — budget accordingly. See vendor pricing pages for current rates; they move quarterly.

Reranking models in 2026 — pricing, when they beat upgrading embeddings, and worked $ math

Rerankers are the second-stage filter in a modern retrieval pipeline. After your embedding model returns the top-50 candidates from the vector DB, a reranker scores each (query, document) pair using a cross-encoder model that reads both pieces of text together — far more accurate than the bi-encoder embeddings, which encode query and document independently. The result is a re-ordered list where the top-5 are dramatically more likely to contain the correct chunk. Pricing in 2026 is tiered cleanly: Cohere Rerank v3 at $1.00 per 1M reranked pairs is the quality leader; Voyage Rerank-1 runs roughly $0.05 per 1,000 pairs (i.e., $50 per 1M); Jina Reranker v2 prices at $0.02 per 1M tokens (a different unit — counts tokens across query and document, not pairs); and MixedBread's open-weights rerank model hosted via Together AI lands near $0.0005 per 1M tokens, the cheapest production-grade option.

The unit matters. Reranker bills count pairs, not tokens, on Cohere and Voyage. A 'pair' is one query combined with one candidate document. If you retrieve top-50 from the vector DB and rerank them against a single query, that is 50 pairs — not 50 × document_length tokens. Jina's token-based pricing reads differently: a typical 500-token document plus a 50-token query is 550 tokens per pair, so 50 pairs at 550 tokens = 27,500 tokens per query. At Jina's $0.02/1M that is $0.00055 per query for the rerank step. At Cohere Rerank v3, 50 pairs × $1/1M = $0.00005 per query. At Voyage Rerank-1, 50 pairs × $50/1M = $0.0025 per query. The cheapest is roughly 50x cheaper than the most expensive, but all are sub-cent.

A typical RAG retrieval pipeline at scale prices out cleanly. For a single user query: embed the query string (~50 tokens × $0.02/1M for text-embedding-3-small) = $0.000001. Vector search against the index is a fixed infrastructure cost — call it $0.00001 of amortized Pinecone serverless time per query at 1M queries/month. Rerank the top-50 with Cohere Rerank v3 = $0.00005. Pass the top-5 reranked chunks plus the user query into the LLM call — at GPT-4.1 ($2/1M input, $8/1M output) with 3,000 input tokens and 500 output tokens, that is $0.010 per query. The LLM call is the entire bill, roughly 100-200x larger than every retrieval step combined.

Reranker quality gain often exceeds the gain from upgrading the embedding model. On a representative internal-knowledge-base eval — 50,000 chunks, 200 hand-labeled queries — text-embedding-3-small alone returned recall@5 of 78%. Upgrading to text-embedding-3-large (a 6.5x cost increase) lifted it to 83%. Keeping text-embedding-3-small and adding Cohere Rerank v3 lifted recall@5 to 91% — a 13-point gain at $0.00005 per query. The reranker path wins on both quality and total cost: $0.02/1M for embeddings plus $1/1M-pairs for rerank beats $0.13/1M for embeddings alone, while delivering 8 points more recall. This pattern repeats across most public retrieval benchmarks where rerank ablations are reported.

The mechanism is straightforward. Embeddings compress meaning into a fixed vector before ever seeing the query — they cannot adapt their representation to the question being asked. A cross-encoder reranker reads the query and the candidate document together and produces a relevance score conditioned on the specific query. That conditional view catches near-misses the embedding step ranks similarly but for irrelevant reasons (shared topic keywords, similar phrasing, popular concepts). On corpora with high lexical overlap between irrelevant documents — legal filings, support tickets, academic papers in adjacent subfields — the reranker gap over embeddings alone often reaches 15-20 points of recall@5.

Rerankers do not help in every case. Three patterns where the reranker pass is wasted spend. First, very small corpora (under 5,000 chunks): the embedding model alone reliably returns the right chunk in the top-5 because there are so few candidates to confuse it. Second, corpora where the embedding model is already at 95%+ recall@10 — the reranker has little signal left to extract and the latency penalty (50-200ms per query for a remote rerank call) starts to hurt UX. Third, pipelines that already combine lexical (BM25) and semantic (vector) retrieval with reciprocal rank fusion: the hybrid step covers most of the failure modes a reranker would catch, and the marginal recall gain typically drops below 2 points. Measure before adding the pass.

Worked $ math for a production RAG app at 1M queries per month. Without reranker: 1M × ($0.000001 embed + $0.00001 vector search + $0.010 LLM) = $10,011/month, with about 78% top-5 recall. With Cohere Rerank v3: 1M × ($0.000001 embed + $0.00001 vector search + $0.00005 rerank + $0.010 LLM) = $10,061/month, with 91% top-5 recall. The reranker adds $50/month — about 0.5% of total spend — and adds 13 points of recall. With Voyage Rerank-1 the rerank line jumps to $2,500/month, still under 25% of total spend, with marginally higher recall on Voyage-internal evals. With MixedBread open-weights via Together: the rerank line is about $14/month at the same volume — effectively free relative to the LLM bill. The cheapest reranker is rarely the best on quality, but every option in 2026 is small enough that the choice should be driven by recall@k on your own eval, not by $/1M.

Two practical notes for budgeting. Reranker latency adds up: Cohere Rerank v3 returns in 80-150ms for 50 candidates; Voyage Rerank-1 lands closer to 200ms; open-weights rerankers self-hosted on a single GPU can return in 30-50ms but require you to operate the infrastructure. If your end-to-end query budget is under 800ms, a remote rerank pass burns 15-25% of the budget. Second, reranking is one of the few RAG components that benefits from caching at the pair level: identical (query, document) pairs return identical scores, so a small Redis cache in front of the reranker often cuts the bill 30-50% on apps with repeated queries. See the GPT vs Claude vs Gemini cost calculator to size the LLM step that dominates the rest of the stack.

Choosing an embedding model: a decision shortcut

Default for most teams: text-embedding-3-small at 1,536 dim. Best $/recall ratio in 2026 for general English content, well-supported across vector DB tooling, predictable rates.

Switch up to text-embedding-3-large or Voyage 3 Large when: your corpus is high-stakes (legal, medical, technical), your recall benchmark shows the 3-7% gap matters, or you have already exhausted cheaper optimizations (better chunking, query rewriting, rerankers).

Switch to Voyage Code 3 when: your retrieval is over code or technical documentation, where domain-tuned embeddings consistently outperform general-purpose by 10-20%.

Switch to Cohere embed-v4 when: multilingual coverage matters (Cohere has historically led on non-English retrieval) or you need image-input embeddings (one of the few production-grade multimodal options).

Switch to DeepInfra BGE or Jina v3 when: cost is the primary constraint and recall quality on your specific corpus is acceptable. For internal search over a 5M-document knowledge base, the difference vs text-embedding-3-small is often invisible.

Whichever you pick, test reranking — it usually buys more recall than upgrading the embedding model. To draft cleaner queries that survive a cheaper embedding model, our code prompt builder and meta-description generator help compress retrieval queries.

Digital Dashboard Hub

The prompt patterns above work 10x better when they live in a library you actually own — tunable to your niche, exportable to GPT-5, Claude, Gemini, Perplexity, Midjourney, Llama. Stop pasting across 6 tools.

Try DDH's AI Prompt Builder — free 14 days, no card. →

Related calculators

OpenAI Pricing Calculator →GPT-5.5, 5.4, mini, nano — full per-call cost in one input.Claude Pricing Calculator →Opus 4.8, Sonnet 4.6, Haiku 4.5, Fable 5 — input + output combined.Context Window Comparison →Max input length and price per 1M for every current model.

Related prompt tools

OpenAI API Pricing 2026→GPT vs Claude vs Gemini Cost Calculator→AI Prompt Cost Calculator→Fine-Tuning Cost Calculator 2026→

Frequently Asked Questions

What is the cheapest embedding model in 2026?

DeepInfra-hosted BGE-large-en-v1.5 at $0.005/1M tokens is the cheapest hosted option for general English. Among major proprietary options, OpenAI text-embedding-3-small at $0.02/1M and Jina v3 at $0.018/1M lead. Confirm rates on each vendor's live pricing page.

Is text-embedding-3-large worth the 6.5x cost premium over 3-small?

Sometimes. On hard retrieval benchmarks 3-large wins by 3-7% — meaningful for high-stakes search, marginal for general knowledge-base lookup. Run a recall@k eval on 100 representative queries; if the gap is under 2% on your corpus, 3-small wins on cost.

How much does it cost to embed 1M chunks?

At 500 tokens per chunk = 500M tokens. With text-embedding-3-small ($0.02/1M) it costs $10. With text-embedding-3-large ($0.13/1M) it costs $65. With Voyage 3 Large ($0.18/1M) it costs $90. One-time cost in nearly all cases — embedding is rarely the budget bottleneck.

What is vector dimension and why does it matter?

Dimension is the length of each embedding vector — typically 384 to 4,096. It drives storage cost (linearly), query latency (mildly), and downstream retrieval quality (sometimes). Matryoshka-style embeddings let you truncate to a shorter dim at minor recall cost; 768-1,024 dim is the sweet spot for most general retrieval.

Should I use Voyage, Cohere, or OpenAI embeddings?

Default to OpenAI text-embedding-3-small for general English. Switch to Voyage 3 Large if your corpus is technical or long-context (32k input). Switch to Cohere embed-v4 if you need multilingual or image input. Run a recall@k eval to confirm — the right answer differs by corpus.

How much does the vector database cost?

Usually 5-20x the embedding bill at production scale. A 10M-vector index runs roughly $30-100/month on managed providers (Pinecone, Weaviate, Qdrant Cloud) or $0-50/month on pgvector + Supabase. Budget storage at the start of the project, not the end.

Do reranking models help more than upgrading the embedding model?

Almost always, yes. Adding Cohere Rerank v3 or Voyage Rerank-1 to a cheap embedding pipeline typically buys more recall@k than upgrading from text-embedding-3-small to text-embedding-3-large. Rerankers cost $1-3 per 1M reranked pairs; budget the additional pass.

How do I estimate embedding cost before indexing?

Sum the token count across your corpus (use the model's tokenizer or estimate words ÷ 0.75), divide by 1M, multiply by the model's $/1M rate. For chunked RAG, multiply chunk count × tokens-per-chunk first. For real-time query cost, repeat with monthly query volume × tokens-per-query.

How much does it cost to add a reranker to my RAG pipeline?

Cohere Rerank v3 is $1 per 1M reranked pairs — at top-50 rerank per query, that is $0.00005 per query, or $50/month at 1M queries. Voyage Rerank-1 is roughly $50 per 1M pairs ($0.0025/query at top-50). Jina Reranker v2 is $0.02 per 1M tokens (counts both query and document text). MixedBread open-weights via Together AI runs near $0.0005 per 1M tokens — effectively free at most production volumes.

Does a reranker pair count documents or tokens?

Depends on the vendor. Cohere Rerank v3 and Voyage Rerank-1 bill per pair — one pair is one (query, document) combination, regardless of document length. Reranking top-50 candidates against one query is 50 pairs. Jina Reranker bills per token, summing query and document tokens across all pairs. A 500-token document plus 50-token query is 550 tokens per pair; 50 pairs at 550 tokens = 27,500 tokens per query.

When is a reranker NOT worth adding?

Three cases. (1) Corpora under 5,000 chunks — the embedding step alone usually finds the right chunk in top-5 because there are few candidates to confuse it. (2) Pipelines already at 95%+ recall@10 — the reranker has little signal left and adds 50-200ms of latency. (3) Hybrid lexical + semantic retrieval with reciprocal rank fusion already deployed — the hybrid step covers most failure modes and marginal recall gain typically drops below 2 points. Measure recall@k with and without before committing.

Get the 2026 embedding cheat sheet

One-page PDF with every embedding model's $/1M, vector dim, max input, and recall tier — free, no signup gate.

Browse all prompt tools →