Grounding is the practice of tethering a model's outputs to verifiable external information, and it exists because language models have a fundamental architectural limitation: they do not know what they know. A model's training data is baked into its weights as statistical patterns, not as a retrievable database of facts. It cannot look up whether a specific claim is in its training set or check a date against a reliable source. Grounding compensates for this by giving the model access to authoritative data at inference time, so it can base answers on provided evidence rather than pattern-matched recall.
The most common grounding technique in production today is retrieval-augmented generation (RAG). The basic pattern is straightforward: take the user's question, use it to search a knowledge base (usually a vector database with embedded document chunks), retrieve the most relevant passages, and include them in the model's context alongside the question. The model then generates an answer based on those retrieved passages. Google's Vertex AI, Amazon Bedrock, and most enterprise AI platforms offer RAG pipelines as managed services. The key insight is that you are shifting the model's job from "recall facts from training" to "synthesize an answer from provided documents" — a task models are much more reliable at.
Web search grounding takes a different approach. Instead of searching a private knowledge base, the model queries the live web and incorporates results into its response. Perplexity built their entire product around this idea. Google's Gemini models can access Google Search directly. ChatGPT's browsing feature does similar work. The advantage over RAG is freshness — web search grounding can answer questions about events that happened yesterday, while a RAG system is only as current as its last index update. The downside is that the web itself contains misinformation, so you are trading one source of error for another.
Citation requirements are a lighter-weight form of grounding that works at the prompt level. When you tell a model "Only make claims you can attribute to the provided documents, and cite your sources inline," you are not giving it new capabilities — you are constraining its behavior to stay closer to verifiable material. This works surprisingly well in practice, especially with capable models like Claude or GPT-4. The model will often decline to answer or explicitly flag uncertainty rather than fabricate a citation, because generating a fake citation that looks structurally correct is harder than just saying "I don't have that information." That said, citation grounding is not foolproof. Models can still hallucinate citations that look plausible but reference the wrong section or misrepresent what a source actually says.
A practical trap with grounding is over-reliance on retrieval quality. If your RAG pipeline retrieves irrelevant chunks — because the embeddings did not capture the query's intent, or the chunking strategy split a critical passage across two chunks — the model will ground its answer on the wrong material and produce a confidently wrong response with citations. Grounding does not eliminate hallucination; it changes the failure mode. Instead of the model inventing facts from nothing, it can now misinterpret or over-extrapolate from real sources. Good grounding requires good retrieval, which means investing in embedding quality, chunk sizing, reranking, and evaluation — not just plugging a vector database into your pipeline and calling it done.