A vector database stores high-dimensional vectors (typically 384 to 3072 floating-point numbers, depending on the embedding model) and supports fast nearest-neighbor search across millions or billions of them. The fundamental operation is: given a query vector, find the k vectors in the database that are closest to it, measured by cosine similarity, dot product, or Euclidean distance. Brute-force search (comparing the query against every stored vector) is exact but far too slow at scale. So vector databases use approximate nearest-neighbor (ANN) algorithms that trade a tiny amount of accuracy for massive speed gains — typically finding 95–99% of the true nearest neighbors while searching only a small fraction of the index.
The most common ANN algorithm is HNSW (Hierarchical Navigable Small World), used by Qdrant, Weaviate, pgvector, and many others. HNSW builds a multi-layered graph where each vector is a node connected to its nearest neighbors. Searching starts at the top layer (sparse, long-range connections) and drills down to lower layers (dense, short-range connections), like zooming in on a map. It's fast, accurate, and works well for datasets up to a few hundred million vectors. The trade-off is memory: HNSW keeps the graph in RAM, so you need enough memory to hold your vectors plus the graph overhead. For a million 1536-dimensional vectors (OpenAI's ada-002 output), that's roughly 6–8 GB. Alternatives like IVF (inverted file index) and ScaNN use less memory but require more tuning. Pinecone and some Qdrant configurations use quantization — compressing vectors from float32 to int8 or binary — to fit more vectors in the same memory at the cost of slight accuracy loss.
Choosing between the major vector databases depends on your constraints. Qdrant and Weaviate are open-source and self-hostable, which matters for data privacy and cost control — you run them on your own infrastructure and pay only for compute. Pinecone is fully managed (no infra to operate) but vendor-locked and priced per vector, which gets expensive at scale. ChromaDB is lightweight and embedded (runs in-process, stores to disk), great for prototyping and small datasets but not built for production workloads with millions of vectors. PostgreSQL with the pgvector extension is appealing if you already run Postgres, since you avoid adding a new database to your stack, but its query performance falls behind purpose-built vector databases at larger scales. For most production RAG systems, Qdrant or Weaviate give you the best balance of performance, features, and operational control.
Metadata filtering is a feature that separates serious vector databases from toy implementations. In practice, you almost never want to search your entire collection — you want to search "all documents uploaded by this user" or "only documents from the last 30 days" or "only chunks from this specific PDF." Vector databases let you store metadata alongside each vector and apply filters before or during the similarity search. This is called pre-filtering (filter first, then search the reduced set) or post-filtering (search everything, then discard results that don't match). Pre-filtering is more efficient but requires the index to support it; most production databases now do. Getting your metadata schema right at indexing time saves enormous pain later — retrofitting filters onto a collection that wasn't designed for them often means re-indexing everything.
Vector databases existed before the current AI wave — Spotify used approximate nearest-neighbor search for music recommendations years ago, and Facebook's Faiss library has been around since 2017. But the explosion of embedding models and RAG in 2023–2024 turned them from a niche technology into critical infrastructure. The space is still maturing fast: multi-tenancy (efficiently isolating data between customers in a shared deployment), hybrid search (combining vector and keyword search in a single query), and on-disk indexing (handling datasets larger than RAM) are all areas where the products differ significantly and are improving rapidly. If you're starting a project today, pick a database that handles your current scale, supports metadata filtering and hybrid search, and has an active maintenance trajectory. You can always migrate later — the embedding vectors are portable.