Embedding: Definition & Meaning — AI Wiki

A way to represent text (or images, or audio) as a list of numbers (a vector) that captures its meaning. Similar concepts end up close together in this number space — "cat" and "kitten" are nearby, while "cat" and "economics" are far apart.

Why it matters

Embeddings are the foundation of semantic search and RAG. They're how AI understands that a search for "fix login bug" should match a document about "authentication error resolution" even though no words overlap.

Deep Dive

An embedding model takes a piece of text — a sentence, a paragraph, a whole document — and compresses it into a fixed-length vector of floating-point numbers, typically somewhere between 384 and 4096 dimensions. The magic is in how those numbers are arranged: during training, the model learns to place semantically similar texts near each other in this high-dimensional space and push dissimilar texts apart. The standard training approach uses contrastive learning, where the model sees pairs of texts that are related (a question and its answer, a sentence and its paraphrase) and learns to minimize the distance between their vectors while maximizing the distance from unrelated pairs. Models like BAAI's bge-large-en, OpenAI's text-embedding-3, and Cohere's embed-v3 all use this general recipe, though they differ in architecture, training data, and the specific contrastive objectives they optimize.

The Retrieval Pipeline

In practice, you use embeddings by first encoding your documents into vectors and storing them in a vector database like Qdrant, Pinecone, Milvus, or FAISS. At query time, you encode the user's question into a vector using the same model and perform a nearest-neighbor search to find the most similar document vectors. The distance metric matters — cosine similarity is the most common, but some models are trained for dot product or Euclidean distance. One thing that trips people up: you must use the same embedding model for both documents and queries. Vectors from different models live in completely different spaces and cannot be compared, even if they happen to have the same number of dimensions.

Dimensions and Trade-offs

The dimensionality of the embedding vector represents a trade-off between expressiveness and cost. A 1536-dimensional vector can capture more nuance than a 384-dimensional one, but it also costs four times as much to store and search. For a million documents, the difference is tens of gigabytes of RAM in your vector database versus a few gigabytes. Some newer models support Matryoshka embeddings, where you can truncate the vector to fewer dimensions with graceful degradation — use the full 1024 dimensions for your most important collection and the first 256 for a less critical one. Quantization helps too: storing vectors as INT8 instead of float32 cuts memory by 4x with surprisingly little accuracy loss, which is why production systems increasingly use quantized embeddings.

The Limits of Similarity

A common misconception is that embedding models understand meaning the way humans do. They are very good at surface-level semantic similarity — synonyms, paraphrases, related concepts — but they can struggle with negation ("the restaurant was not good" and "the restaurant was good" often end up close together), with complex logical relationships, and with domain-specific jargon they were not trained on. This is why retrieval-augmented generation systems often combine vector search with keyword search (hybrid search) and use a reranker model as a second pass to improve precision. The embedding retrieves a broad set of candidates; the reranker, which is slower but more accurate, sorts them by actual relevance. Getting this pipeline right matters far more than picking the embedding model with the highest score on the MTEB leaderboard.

Embedding

Why it matters

Deep Dive

The Retrieval Pipeline

Dimensions and Trade-offs

The Limits of Similarity

Related Concepts