What vector RAG actually does (mechanics, not hype)
Vector RAG has three stages. During indexing, each document is split into chunks — typically 256 to 1024 tokens — and each chunk is passed through an embedding model to produce a dense vector, usually 768 to 3072 dimensions depending on the model. Those vectors are stored in a vector database alongside the raw chunk text. The embedding model leaderboard 2026 covers current model options in detail, but the workhorse choices in 2026 are OpenAI's text-embedding-3-small (1536 dimensions, $0.02/1M tokens) and text-embedding-3-large (3072 dimensions, $0.13/1M tokens), alongside open-source alternatives like Nomic-Embed and E5-Mistral.
During retrieval, the user's query is embedded using the same model and compared against every stored vector using cosine similarity or dot product. The top-k most similar chunks — typically 3 to 10 — are returned and passed to the LLM as context. The LLM then generates an answer grounded in those chunks. The entire retrieval step runs in milliseconds for corpora up to tens of millions of chunks, and the database vendors (Pinecone vs Weaviate vs Qdrant has a detailed comparison) have optimized approximate nearest-neighbor search to keep end-to-end latency under 500ms in production at scale.
The fundamental limitation is that cosine similarity measures semantic closeness between a query and individual chunks in isolation. If answering a question requires combining information from five documents about three different entities, the top-k retrieval may return the five most individually similar chunks without ever retrieving the connective tissue that ties them together. Multi-hop questions break vector RAG not because the model is weak but because the retrieval mechanism is architecturally blind to cross-document relationships. For most teams, this is not a problem — their query distribution is dominated by factual lookups where vector RAG performs well. For teams with genuinely analytical query patterns, it is the reason to consider GraphRAG.