Why corpus size is the first branch
The most important variable in any RAG architecture decision is not which vector database has the best benchmark score — it is how many documents your system needs to search across. Corpus size determines whether retrieval is necessary at all, what kind of index structure is appropriate, and whether you need approximate or exact nearest-neighbor search. Getting this wrong early means either overengineering a simple use case or hitting hard performance ceilings as your corpus grows.
The reason corpus size comes first is that the economics change by orders of magnitude at each threshold. Below ten thousand documents, the total token count of a reasonably chunked corpus often fits inside a 128k context window, which means retrieval adds latency and complexity without adding accuracy. Between ten thousand and one million documents, a vanilla vector database with good chunking will handle the majority of factual lookup queries at acceptable cost. Above one million documents, sparse-dense hybrid search starts to outperform vanilla vector RAG because BM25-style exact term matching catches rare proper nouns, product codes, and identifiers that dense embeddings frequently smooth over.
Above ten million documents, the architecture decision becomes more nuanced. Sharding a vector database across multiple nodes is operationally complex and expensive. GraphRAG — which builds a knowledge graph from the corpus during an offline processing step — can be more cost-effective for analytical workloads at this scale because query-time retrieval becomes a graph traversal rather than an exhaustive approximate nearest-neighbor search across hundreds of millions of vectors. The tradeoff is a significant upfront build cost and a corpus that must be reprocessed when documents change substantially.