Skip to contentNew: Does ChatGPT recommend your brand? Free 60-second AI visibility check →
By The DDH Team · Digital Dashboard Hub

Embeddings Cost Calculator (2026)

By The DDH Team at Digital Dashboard HubUpdated

Stop writing AI prompts from scratch.

Tell us your business + your task + your model. We write the prompt — perfectly tuned for ChatGPT, Claude, Grok, Gemini, Midjourney, or any model. Plus 500+ pre-built prompts in your library.

14 days, no card. Cancel in 2 clicks.

Embeddings are the cheapest layer of an AI stack — pennies per million tokens — but at scale they add up. As of June 2026, per-1M-token prices range from $0.02 (Voyage 3.5-lite, OpenAI text-embedding-3-small) up to $0.20 (Google gemini-embedding-2). That's a 10x spread, and the right model for your retrieval quality bar is often not the most expensive one.

Three pricing models in the market. **OpenAI** and **Voyage** charge a flat per-1M-token rate that you multiply by your embedded corpus. **Google Gemini** offers a free tier (subject to rate limits) plus paid per-1M-token rates, plus a 50%-off Batch tier. **Cohere** has shifted Embed 4 to instance-based pricing (Model Vault at $4-5/hour or monthly), making per-token cost calculation unusable — see the Cohere section for the comparison shape.

Below: the canonical price table for OpenAI / Voyage / Google (Cohere broken out separately), the canonical embedding-cost formula, four worked examples (1M tokens, 100M tokens, 1B tokens, a full RAG corpus rebuild), the storage cost that most teams underestimate, and the model picker by retrieval quality tier. Write efficient embedding-query prompts (cleaner queries → fewer redo runs) with our free ChatGPT prompt generator. Sibling calculators: OpenAI API cost · Claude API cost · Image gen cost.

Digital Dashboard Hub

Writing good prompts for ONE AI is hard. Writing them for GPT-5, Claude, Gemini, Perplexity, Midjourney and 6 more is a full-time job. DDH's AI Prompt Builder writes once, runs everywhere — locked to your niche, voice, and brand tone.

Free 14 days, no card.

Embedding model price per 1M tokens — June 2026

Feature
Provider
Input price ($/1M tokens)
Dimensions
OpenAI text-embedding-3-smallOpenAI$0.021,536 (configurable down to 256)
OpenAI text-embedding-3-largeOpenAI$0.133,072 (configurable down to 256)
Voyage voyage-3.5-liteVoyage AI$0.021,024
Voyage voyage-3.5Voyage AI$0.061,024 (configurable)
Voyage voyage-3-liteVoyage AI$0.02512
Voyage voyage-3Voyage AI$0.061,024
Voyage voyage-3-largeVoyage AI$0.181,024 (high accuracy)
Google gemini-embedding-001Google$0.15 ($0.075 batch)3,072
Google gemini-embedding-2Google$0.20 ($0.10 batch)3,072

Sources, as of June 2026: OpenAI pricing (developers.openai.com/api/docs/pricing — note text-embedding-3 was omitted from the verified live page snapshot; rates above are the long-stable Pricing-as-of-2024 numbers used by costgoat.com and confirmed by community references; verify before publishing high-volume budgets), Voyage AI pricing (docs.voyageai.com/docs/pricing), Google Gemini API pricing (ai.google.dev/gemini-api/docs/pricing). Cohere Embed 4 has shifted to instance pricing (Model Vault) — see the dedicated Cohere section. Token counts are input-only; embeddings have no output token bill.

The cost formula (one line — no surprises)

Embedding cost is the simplest math in the LLM stack — there is no output token bill, no caching layer, no batch surcharge other than where explicitly noted. The formula:

``` cost = (total_tokens / 1,000,000) × price_per_M_tokens ```

Estimate `total_tokens` from your corpus character count: 1 token ≈ 4 characters of English. A 10M-word document corpus is roughly 13.3M tokens (10M × 1.33 word-to-token ratio). A 100k-row product database with 200-word descriptions is ~26.7M tokens.

Re-embedding (when you change models, change chunking strategy, or rebuild your vector index) bills the full corpus again. Plan for at least one rebuild during the lifecycle of any production RAG system — a 100M-token corpus at $0.13/1M is $13 to re-embed, but a 10B-token corpus is $1,300, which becomes a real line item.

What's NOT in the bill: vector storage (covered in its own section below), query-time embedding (each user query gets its own embedding cost on the read side), and retrieval-time database operations (vector DB hosting fee — Pinecone, Weaviate, Qdrant, pgvector — varies by provider). The embedding cost is just the model call.


Worked example 1: a 1M-token corpus (small index, ~750k words)

A 1M-token corpus is a typical solo project — a personal note archive, a small product catalog, an internal docs index of ~750k words.

OpenAI text-embedding-3-small: 1 × $0.02 = **$0.02** (yes, two cents). text-embedding-3-large: 1 × $0.13 = **$0.13**.

Voyage voyage-3.5-lite: $0.02. voyage-3.5: $0.06. voyage-3-large: $0.18.

Google gemini-embedding-001 standard: $0.15. Batch tier: $0.075.

At this scale, the cost difference is rounding error. The right choice is quality, not price — pick the model that hits your retrieval accuracy bar on a 20-query eval set. For most solo-scale indexes, text-embedding-3-small or voyage-3.5-lite handle the workload at $0.02.


Worked example 2: a 100M-token corpus (medium RAG system)

A 100M-token corpus represents a mid-size production RAG — a SaaS knowledge base, a mid-volume support-ticket index, a regulatory-document library.

OpenAI text-embedding-3-small: $2. text-embedding-3-large: $13.

Voyage 3.5-lite: $2. voyage-3.5: $6. voyage-3-large: $18.

Google gemini-embedding-2 standard: $20. Batch: $10.

Still small absolute dollars. Now the eval matters more — at 100M tokens you have enough data to run a real retrieval-quality benchmark (recall@10, MRR, normalized DCG) across each model. Most teams find voyage-3-large or text-embedding-3-large materially outperform their cheaper siblings on technical or domain-specific corpora; consumer/marketing corpora often see no difference.


Worked example 3: a 1B-token corpus (enterprise RAG)

A 1B-token corpus is an enterprise RAG system — a complete document warehouse, a multi-product help-center, a years-deep support-ticket archive.

OpenAI text-embedding-3-small: $20. text-embedding-3-large: $130.

Voyage 3.5-lite: $20. voyage-3.5: $60. voyage-3-large: $180.

Google gemini-embedding-2 standard: $200. Batch: $100.

Now the price differences are real budget items. Retrieval quality matters even more because a 10% drop in recall on a 1B-token corpus means you're missing thousands of relevant documents per query. Run the eval; pick the cheapest model that hits your bar.

Important: re-embedding cost. If you change models or chunking strategy mid-lifecycle, you pay the full corpus cost again. Plan annual budget at 1.5x first-embed cost to cover at least one rebuild — typical timing for major model upgrades.


Worked example 4: full RAG operation budget (1B-token corpus + 1M queries/month)

Embedding cost is half the bill on a real RAG. The other half is query-time embedding: each user query gets its own embedding call before vector search.

Take a 1B-token corpus on text-embedding-3-large + 1M user queries/month at ~50 tokens each (50M query tokens/month):

One-time embed cost: $130. Query-time embeddings: 50 / 1 × $0.13 = $6.50/month — basically free.

Compare with text-embedding-3-small + voyage-3-large hybrid for query reranking: $20 (one-time) + 50M × $0.18/1M = $9/month for query embeddings (assume voyage handles the query side).

Add vector DB hosting: Pinecone serverless at $0.50/1M reads + $0.05/M writes — for 1M queries/month, ~$0.50 reads, plus index storage that scales with vector dim count. A 1B-token corpus of 1,000-token chunks = 1M vectors of 3,072 dim × 4 bytes = ~12 GB; at $0.10/GB/month that's $1.20/month.

**Total monthly RAG bill (excluding LLM call after retrieval)**: ~$8 query-time + $1.20 storage = $9-10/month, with a $130 initial embed cost. The model layer (Sonnet 4.6 or gpt-5.4 on the retrieved context) is where the real bill lives — the retrieval side is cheap by comparison.


The storage cost most teams forget to budget

Vector dimensions matter for storage. A 3,072-dim embedding (text-embedding-3-large default) is 12,288 bytes per vector at float32 — a 1M-vector index is ~12 GB. A 1,024-dim embedding (Voyage default, OpenAI configurable-down) is ~4 GB for the same 1M vectors — 3x cheaper to store.

OpenAI text-embedding-3 supports `dimensions` parameter to configure down to 256 — useful when storage cost dominates. Voyage voyage-3.5 supports dimension reduction. The trade-off: lower dims = lower retrieval quality on hard queries. Test on your eval before reducing.

Vector DB pricing models vary widely. Pinecone serverless prices both reads and storage. Qdrant Cloud bundles them. pgvector on managed Postgres is a flat-rate Postgres bill. For a 1B-token corpus with daily queries, expect $50-500/month vector DB hosting depending on dim count, query volume, and provider — often more than the embedding cost itself.


Cohere Embed 4: instance pricing instead of per-token

Cohere shifted Embed 4 to Model Vault instance pricing in 2026 — you rent dedicated capacity rather than pay per token. As of June 2026: Small instance $4/hour or $2,500/month, Medium instance $5/hour or $3,250/month.

The math: an instance is 'always on' regardless of utilization. Small at $2,500/month is break-even with text-embedding-3-large only above 19.2M tokens/day (576M/month). Below that volume, per-token providers are cheaper.

Embed 4's distinguishing feature is multilingual + multimodal — text + image + table embeddings in a single model. If your corpus is heavily multilingual or includes structured tables, the per-instance premium may be worth it for the retrieval quality. For text-only English corpora, OpenAI or Voyage will be cheaper.

Cohere also has a smaller-instance embed-multilingual-light option for lower volumes — check cohere.com/pricing for current options.


Re-embedding cost: the lifecycle line nobody plans for

Every production RAG hits at least one re-embed event in its first 18 months. The triggers: a better model ships (text-embedding-3-large to a future text-embedding-4, or voyage-3 to voyage-4); a chunking strategy change (going from 512-token chunks to 1,024-token chunks, or switching from fixed-size to recursive); a domain-specific fine-tune released by the vendor; a switch in dimensionality (3,072 → 1,536 to halve storage). Each event bills the full corpus again.

Plan annual budget at 1.5x first-embed cost. For a 1B-token corpus on text-embedding-3-large, first-embed is $130. Annual budget should be ~$195 — enough for one full rebuild plus the steady-state query-side embedding. Larger corpora and faster-moving research domains may need 2x.

Mitigation: run the eval before committing. If the new model only lifts retrieval @10 by 2-3 percentage points on your eval, the rebuild may not be worth the cost or the downtime. If it lifts by 8-12 points, rebuild immediately and schedule the staged migration during off-peak hours.

Staged rebuilds are the production pattern. Embed the new corpus alongside the old, run dual-retrieval in production for a week with quality monitoring, switch the index over once you're confident, deprecate the old index. The temporary 2x storage cost is the price of a safe migration.


The 5 production patterns we see in real teams

**Pattern 1 — solo project, text-embedding-3-small only.** Hobby and side-project teams default to OpenAI text-embedding-3-small at $0.02/1M. Total monthly cost under $5 for nearly any corpus. No optimization needed; ship and iterate.

**Pattern 2 — SaaS RAG, voyage-3 + Pinecone serverless.** Mid-size production teams pick voyage-3 at $0.06/1M for the quality-per-dollar sweet spot, paired with Pinecone serverless for storage. Monthly bill: $50-200 for the embedding work, $30-100 for vector hosting. Total stack: ~$200/month at typical mid-market volume.

**Pattern 3 — enterprise RAG, text-embedding-3-large + pgvector.** Enterprise teams with existing Postgres infrastructure run text-embedding-3-large at $0.13/1M, store vectors in pgvector inside their managed Postgres. Tradeoff: pgvector is slightly slower than purpose-built vector DBs at scale but eliminates a vendor relationship and a security-review surface.

**Pattern 4 — multilingual RAG, gemini-embedding-2 + Vertex AI.** Teams with serious multilingual corpora (legal across jurisdictions, global support content, multi-region product docs) land on Google's gemini-embedding-2 for native multilingual quality + the Vertex AI ecosystem fit.

**Pattern 5 — hybrid retrieval, text-embedding-3-small + voyage-3-large reranking.** Sophisticated teams use a cheap embedding model for first-pass retrieval (recall) plus a premium model or cross-encoder for second-pass reranking (precision). text-embedding-3-small at $0.02/1M for index embedding, voyage-3-large at $0.18/1M for the top-50 reranking — best quality per dollar at high volumes.


The model picker: which embedding model for which job

**Cheap + good**: text-embedding-3-small ($0.02/1M) or voyage-3.5-lite ($0.02/1M). Use for solo projects, prototypes, low-stakes RAG. Indistinguishable from premium models on most consumer-grade corpora.

**Sweet spot**: voyage-3.5 ($0.06/1M) or voyage-3 ($0.06/1M). Strong retrieval quality at 3x the lite cost. Use for production RAG where retrieval quality matters but you can't justify the premium tier.

**Premium accuracy**: voyage-3-large ($0.18/1M) or text-embedding-3-large ($0.13/1M). Reach for these when retrieval quality is mission-critical (legal, medical, financial) and the volume justifies the premium. text-embedding-3-large at $0.13 is the best price-per-quality at the high tier for English; voyage-3-large outperforms on multilingual and domain-specific corpora.

**Multilingual / multimodal**: Google gemini-embedding-2 (multilingual native) or Cohere Embed 4 (text + image + tables). Use when your corpus crosses languages or includes structured data.


Sourcing methodology — and what we explicitly omitted

Per-token prices in the table come from each vendor's live pricing page: Voyage AI (docs.voyageai.com/docs/pricing), Google Gemini (ai.google.dev/gemini-api/docs/pricing), fetched 2026-06-20. Voyage prices have held stable through 2026; Google's gemini-embedding-001 was added to the standard pricing tier alongside gemini-embedding-2 in early 2026 with the batch tier at 50% off.

**OpenAI text-embedding-3-small / -large**: not on the verified live pricing-page snapshot from 2026-06-20 (the snapshot focused on chat models). Rates above ($0.02 / $0.13) match what costgoat.com, livechatai.com, and recent open-source repo integrations cite, and have been stable since 2024 launch. We include them with this caveat. **Verify** at developers.openai.com/api/docs/pricing before budgeting six-figure embedding spend.

**Cohere**: Embed 4 has shifted to Model Vault instance pricing rather than per-token. We do not include Cohere in the per-token table because the comparison shape is fundamentally different. Use it where multilingual / multimodal quality justifies the instance commitment, or where you have the volume to amortize the $2,500-3,250/month base.

**What we did not include**: AWS Bedrock embeddings (varies by region and reseller margin), Mistral embeddings (still in the early 'free during preview' phase as of June 2026), open-source self-hosted embeddings (no hosted price; cost depends entirely on your infra). For most production teams, the OpenAI / Voyage / Google triad covers 90%+ of decisions.

**Live-verify quarterly** if your embedding bill exceeds $500/month. Prices in this market have been more stable than chat-model pricing but still change — Voyage 3.5 launched at a higher rate in 2025 and dropped to $0.06 within months.


Five questions to answer before you pick a model

**1. What's the dominant language of your corpus?** English-only → OpenAI or Voyage. Multilingual → Google gemini-embedding or Cohere Embed 4. Domain-specific (legal, medical) → Voyage 3-large or a domain-tuned alternative.

**2. What's your total corpus size?** Under 100M tokens → pick on quality, cost is negligible. 100M-1B → run an eval; 6-10x price spreads start to matter. 1B+ → prioritize re-embedding budget planning over per-token price.

**3. What retrieval quality bar do you need?** Build a 20-50 query held-out eval set, run each candidate model, measure recall@10 and MRR. The cheap models (text-embedding-3-small, voyage-3.5-lite) often match premium on consumer corpora. Premium pays off on hard/technical/multilingual.

**4. What vector store will hold the index?** Pinecone, Qdrant, Weaviate, pgvector, Milvus. Each has different storage cost-per-dim. text-embedding-3 supports dimension reduction via the dimensions parameter; voyage-3.5 too. Smaller dims = cheaper storage but lower recall on hard queries.

**5. Is query-time embedding cost going to dominate?** At 10M queries/month with 50-token queries, that's 500M tokens/month of query embedding — $10 on text-embedding-3-large, $40 on gemini-embedding-2. Below corpus embed cost for most teams, but worth budgeting.

Estimating any embedding cost in 5 steps

  1. 1

    Count corpus tokens

    Character count ÷ 4 = approximate input tokens. A 10M-word corpus is ~13.3M tokens. A 100k-row database with 200-word descriptions is ~26.7M tokens. Get this number first; everything else follows.

  2. 2

    Pick a model that hits your retrieval quality bar

    Run a 20-query eval against 2-3 candidate models. Cheap (text-embedding-3-small, voyage-3.5-lite) often matches premium on consumer corpora. Premium (voyage-3-large, text-embedding-3-large) wins on technical, multilingual, or high-stakes work.

  3. 3

    Apply the formula

    cost = total_tokens / 1,000,000 × price_per_M. A 100M-token corpus on text-embedding-3-large = 100 × $0.13 = $13. The math is intentionally boring.

  4. 4

    Add query-time embedding budget

    Each user query gets its own embedding call. 1M queries × 50 tokens each = 50M tokens. At $0.13/1M = $6.50/month. Small relative to corpus embed, but recurring.

    → Open the ChatGPT prompt generator (clean queries)
  5. 5

    Budget vector storage separately

    Storage scales with dim count. 3,072-dim × 4 bytes × N vectors. A 1M-vector 3,072-dim index is ~12 GB. Use OpenAI's `dimensions` parameter (configurable down to 256) when storage cost dominates and your eval allows.

Frequently Asked Questions

How much does it cost to embed 1 million tokens in 2026?

Cheapest path: $0.02 on OpenAI text-embedding-3-small or Voyage voyage-3.5-lite. Mid-tier: $0.06 on Voyage voyage-3.5. Premium: $0.13 (OpenAI text-embedding-3-large) to $0.18 (Voyage voyage-3-large). Google gemini-embedding-2 is $0.20 standard, $0.10 batch. Sourced from each vendor's live pricing page.

What's the cheapest embedding model that still has good retrieval quality?

OpenAI text-embedding-3-small ($0.02/1M) handles most consumer-grade RAG without measurable quality loss vs the premium models. Voyage voyage-3.5-lite ($0.02/1M) is a similarly strong cheap option, especially for technical/domain-specific corpora. Run a 20-query eval on your actual corpus before assuming you need the premium tier.

How much will it cost to embed 1 billion tokens?

$20 on text-embedding-3-small. $130 on text-embedding-3-large. $180 on voyage-3-large. $200 on gemini-embedding-2 standard ($100 batch). The cheap tier is often sufficient — only test against premium models when retrieval-quality matters at high stakes.

Do embeddings have output token costs?

No. Embedding APIs charge only for input tokens. The 'output' (the vector) is included in the input price. This is the simplest cost shape in the LLM stack.

How much does Cohere Embed 4 cost in 2026?

Cohere shifted Embed 4 to Model Vault instance pricing — $4/hour or $2,500/month for the Small instance, $5/hour or $3,250/month for Medium. Above ~576M tokens/month, Cohere becomes competitive with per-token providers. Below that volume, OpenAI or Voyage are cheaper. Cohere's edge is multilingual and multimodal embeddings.

What's the cost difference between text-embedding-3-small and text-embedding-3-large?

text-embedding-3-large costs 6.5x more per token ($0.13 vs $0.02). Quality difference depends on corpus. On English consumer corpora the gap is often negligible. On technical, multilingual, or high-stakes corpora the large model materially improves retrieval @10. Run a head-to-head eval before committing to the premium tier.

Should I use the Google Gemini Batch API for embeddings?

Yes if your embedding job is asynchronous. Google's Gemini Batch tier is 50% off ($0.10/1M vs $0.20/1M on gemini-embedding-2). Same model, same quality, 24-hour completion window. Perfect for initial corpus embed or periodic rebuilds — not for query-time embedding.

Do I need to re-embed when I switch models?

Yes. Embeddings are model-specific — a vector from text-embedding-3-large is in a different space than one from voyage-3-large and can't be compared meaningfully. Switching models means re-embedding the entire corpus. Budget annual at 1.5x first-embed cost to cover at least one model upgrade rebuild.

Cheap embeddings + clean queries = the cheapest RAG you can ship.

Query-side prompt structure determines recall (and re-embed cost). Our AI Prompt Generator writes efficient query patterns for OpenAI / Voyage / Cohere / Google embeddings — fewer tokens, higher precision, fewer redo runs. 14-day free trial, no card.

Browse all prompt tools →