The cost formula (one line — no surprises)
Embedding cost is the simplest math in the LLM stack — there is no output token bill, no caching layer, no batch surcharge other than where explicitly noted. The formula:
``` cost = (total_tokens / 1,000,000) × price_per_M_tokens ```
Estimate `total_tokens` from your corpus character count: 1 token ≈ 4 characters of English. A 10M-word document corpus is roughly 13.3M tokens (10M × 1.33 word-to-token ratio). A 100k-row product database with 200-word descriptions is ~26.7M tokens.
Re-embedding (when you change models, change chunking strategy, or rebuild your vector index) bills the full corpus again. Plan for at least one rebuild during the lifecycle of any production RAG system — a 100M-token corpus at $0.13/1M is $13 to re-embed, but a 10B-token corpus is $1,300, which becomes a real line item.
What's NOT in the bill: vector storage (covered in its own section below), query-time embedding (each user query gets its own embedding cost on the read side), and retrieval-time database operations (vector DB hosting fee — Pinecone, Weaviate, Qdrant, pgvector — varies by provider). The embedding cost is just the model call.