Architecture fundamentals: in-process vs server vs distributed
The single most important axis for choosing between ChromaDB, FAISS, and Milvus is architecture — specifically whether you want your vector index to live inside your application process, inside a separate server process on the same machine, or distributed across multiple machines. Each model has trade-offs that flow through everything else: operational complexity, fault tolerance, horizontal scalability, and the kinds of bugs you will debug at 2am.
**ChromaDB embedded mode** runs the entire vector index inside your Python or JavaScript process. There is no network hop, no server to spin up, no connection string to manage. You call `client = chromadb.Client()` and you are done. The persistent mode writes to SQLite on disk. This is genuinely zero-infrastructure vector storage — appropriate for any project where the vector index needs to be 'just a library' rather than 'a service I operate'. The ceiling is roughly 10M vectors before embedded performance becomes a bottleneck, as of June 2026.
**ChromaDB client-server mode** (and Chroma Cloud) moves the index into a separate process reachable over HTTP. This gives you the standard client-server benefits (multiple application instances can share one index, the index persists independently of your app process) at the cost of a network hop and an infrastructure component to manage. Chroma Cloud is the managed version — you connect via API key, they handle the infrastructure.
**FAISS is always in-process**, with no server mode at all. The entire library runs inside your process, period. This is a deliberate design choice — FAISS is a research-grade similarity search library, not a database service. You manage persistence yourself (calling `faiss.write_index()` / `faiss.read_index()`), you manage concurrency yourself, and you build any metadata filtering, multi-tenancy, or access control on top yourself. The payoff is that FAISS has the lowest latency of any option on this list — no network, no server, no overhead beyond the math.
**Milvus is a distributed system** with real microservice components: query nodes, data nodes, index nodes, etcd for cluster coordination, and MinIO or S3 for persistent storage. The minimum production Milvus deployment involves several containers. This is not a choice for a prototype; it is a choice for a production system where you need horizontal scaling, fault tolerance, and enterprise features. **Milvus Lite** is the escape hatch for development — an embedded mode with the same API surface as full Milvus, similar to how ChromaDB embedded works, but you can promote to full Milvus without rewriting any application code.
**The architectural decision tree in one sentence**: start with ChromaDB embedded, promote to Chroma Cloud or Milvus when you hit the 10M-vector ceiling or need multi-tenancy/RBAC, and consider FAISS only if you need maximum single-machine throughput and are willing to build the surrounding service layer yourself.