Why embedding choice matters more than vector DB choice
Engineering teams spend weeks debating Pinecone vs Weaviate vs Qdrant vs pgvector. Embedding model choice gets 30 minutes and a default to OpenAI. This is exactly backwards. Swapping a vector DB is mechanical — export vectors, import vectors, update the client SDK, point the app at the new endpoint. A weekend of work for a small corpus, a week for a big one. The data does not change; only the storage substrate does.
Swapping an embedding model is a different beast entirely. You re-embed every document in your corpus (paying the per-token cost again), you rebuild every vector in your index, you re-evaluate retrieval quality because the new model has different semantic-space geometry, you potentially re-tune your chunking strategy because the new model has different max-input limits, and you reset your production metrics baseline. For a 10M-doc corpus averaging 800 tokens/doc (8B tokens total), re-embedding at OpenAI 3-large pricing ($0.13/M) is $1,040; at Voyage 3 pricing ($0.18/M) it's $1,440; at full Voyage 3 large pricing ($0.30/M) it's $2,400. Plus 1-2 days of engineering time for re-eval, plus the risk of regressing production quality during the cutover.
The math: embedding model choice is roughly 100x more load-bearing than vector DB choice over a 3-year horizon. Yet most teams optimize the wrong variable. Spend the engineering effort on getting embeddings right on day one, accept whatever pgvector or your existing Postgres setup gives you, and circle back to vector DB optimization only when you hit actual scale problems (>100M vectors, sub-50ms latency targets, complex hybrid-search requirements).