The pipeline: (1) encode your documents into embeddings using a model like BGE, E5, or Voyage, (2) store these embeddings in a vector database (Pinecone, Qdrant, Weaviate, pgvector), (3) when a query arrives, encode it with the same model, (4) find the nearest embeddings using similarity metrics like cosine similarity or dot product. The query "how to fix a memory leak" matches a document titled "debugging RAM consumption in Node.js" because their embeddings are close in vector space.
Pure semantic search has a weakness: it can miss exact matches that keyword search catches easily. If someone searches for error code "ERR_SSL_PROTOCOL_ERROR," semantic search might return general SSL troubleshooting instead of the exact error. Hybrid search combines both: keyword matching (BM25) for precision and semantic search for recall, then merges the results. Most production search systems use hybrid approaches.
The quality of semantic search depends entirely on the embedding model. General-purpose models (OpenAI's text-embedding-3, Cohere Embed) work well for most text. Domain-specific models (trained on medical, legal, or code data) outperform general models in their domain. Multilingual models enable cross-language search. The MTEB leaderboard benchmarks embedding models across many tasks — it's the best resource for choosing one.