Real $/1M tokens: the embedding math that actually matters
Embedding cost is usually a one-time-per-document expense (you embed once at ingest, then query against the vector store), so the headline $/1M number understates how cheap embeddings actually are at most operational scales. A 1M-document corpus averaging 500 tokens per doc = 500M tokens to embed once.
**OpenAI text-embedding-3-small** at $0.02/1M × 500M = **$10 one-time** for the full corpus. The Batch API at 50% off = **$5**. text-embedding-3-large at $0.13/1M × 500M = **$65** (or $32.50 batched). For most non-trivial RAG systems, the embedding bill is rounding error compared to the LLM inference bill that comes after.
**Voyage AI voyage-3-lite** at $0.02/1M × 500M = **$10 one-time**. voyage-3 at $0.06/1M = **$30**. voyage-3-large at $0.18/1M = **$90**. The premium models are 30-40% more expensive than OpenAI's flagship — but they typically beat OpenAI on retrieval benchmarks (see next section).
**Cohere embed-v4.0** at $0.12/1M × 500M = **$60 one-time**. Roughly comparable to OpenAI text-embedding-3-large. The cost-per-token isn't the differentiator — the 128k context window is.
**Re-embedding cost**: if you upgrade models or re-chunk, you pay the embedding cost again. This argues for picking a model you'll stick with for 12+ months, not the cheapest model that just shipped this week. Voyage and Cohere have been more stable on their embedding model lineup than OpenAI — text-embedding-3 launched in early 2024 and has held, but earlier models (ada-002) were deprecated and forced re-embeds.
**Verdict on cost**: at typical RAG scales (single-digit-million documents), embedding cost is a one-time spend in the $10-100 range. Optimize on quality, not cost. Reserve cost optimization for the operational LLM-inference layer, which is 100-1000x more expensive over the lifetime of the system.