Zubnet AI学习Wiki › Retrieval
基础

Retrieval

Information Retrieval, IR
响应一个查询,从大集合中找到相关文档、段落或数据的过程。在 AI 中,retrieval 是 RAG 里的“R” — 在把相关上下文交给语言模型之前,获取它的那一步。Retrieval 可以用关键词匹配(BM25)、语义相似(embedding)、或两者结合的混合方法。

为什么重要

Retrieval 是让 LLM 对真实应用实用的原因。模型的内部知识是静态的、不完整的、有时是错的。Retrieval 在推理时给它访问当前、准确、领域特定信息的通道。你的 retrieval pipeline 的质量直接决定你的 RAG 系统的质量 — 最好的 LLM 也不能从坏的上下文产出好答案。

Deep Dive

Traditional retrieval (BM25, TF-IDF) matches query keywords against document keywords, weighted by frequency and importance. It's fast, interpretable, and excellent for exact matches. Semantic retrieval encodes queries and documents as embeddings and finds nearest neighbors in vector space. It handles paraphrase and conceptual similarity but can miss exact keyword matches. Hybrid retrieval combines both, typically using reciprocal rank fusion to merge results.

Chunking Strategy

For RAG, documents must be split into chunks before embedding. Chunk size is a critical design decision: too small and you lose context, too large and you dilute relevant information with noise. Common strategies include fixed-size chunks with overlap, sentence-level splitting, paragraph-level splitting, and recursive splitting that respects document structure (headers, sections). The optimal approach depends on your documents and queries.

Reranking

A common pattern: retrieve a broad set of candidates (say 50) using fast retrieval, then rerank them using a more accurate (but slower) model. Cross-encoder rerankers (like Cohere Rerank or BGE-Reranker) process query-document pairs together, producing more accurate relevance scores than embedding similarity alone. This two-stage pipeline balances speed (fast initial retrieval) with accuracy (precise reranking of the top candidates).

相关概念

← 所有术语
← Residual Connection Reward Model →