Zubnet AI學習Wiki › Retrieval
基礎

Retrieval

Information Retrieval, IR
回應一個查詢,從大集合中找到相關文件、段落或資料的過程。在 AI 中,retrieval 是 RAG 裡的「R」 — 在把相關上下文交給語言模型之前,獲取它的那一步。Retrieval 可以用關鍵字匹配(BM25)、語意相似(embedding)、或兩者結合的混合方法。

為什麼重要

Retrieval 是讓 LLM 對真實應用實用的原因。模型的內部知識是靜態的、不完整的、有時是錯的。Retrieval 在推理時給它存取當前、準確、領域特定資訊的通道。你的 retrieval pipeline 的品質直接決定你的 RAG 系統的品質 — 最好的 LLM 也不能從壞的上下文產出好答案。

Deep Dive

Traditional retrieval (BM25, TF-IDF) matches query keywords against document keywords, weighted by frequency and importance. It's fast, interpretable, and excellent for exact matches. Semantic retrieval encodes queries and documents as embeddings and finds nearest neighbors in vector space. It handles paraphrase and conceptual similarity but can miss exact keyword matches. Hybrid retrieval combines both, typically using reciprocal rank fusion to merge results.

Chunking Strategy

For RAG, documents must be split into chunks before embedding. Chunk size is a critical design decision: too small and you lose context, too large and you dilute relevant information with noise. Common strategies include fixed-size chunks with overlap, sentence-level splitting, paragraph-level splitting, and recursive splitting that respects document structure (headers, sections). The optimal approach depends on your documents and queries.

Reranking

A common pattern: retrieve a broad set of candidates (say 50) using fast retrieval, then rerank them using a more accurate (but slower) model. Cross-encoder rerankers (like Cohere Rerank or BGE-Reranker) process query-document pairs together, producing more accurate relevance scores than embedding similarity alone. This two-stage pipeline balances speed (fast initial retrieval) with accuracy (precise reranking of the top candidates).

相關概念

← 所有術語
← Residual Connection Reward Model →