Retrieval: परिभाषा और अर्थ — AI विकी

एक query के response में बड़े collection से relevant documents, passages, या data खोजने की प्रक्रिया। AI में, retrieval RAG का "R" है — वह step जहां language model को दिए जाने से पहले relevant context fetch किया जाता है। Retrieval keyword matching (BM25), semantic similarity (embeddings), या दोनों को combine करने वाले hybrid approaches का उपयोग कर सकता है।

यह क्यों मायने रखता है

Retrieval वह है जो LLMs को real-world applications के लिए व्यावहारिक बनाता है। Model का internal knowledge static, incomplete, और कभी-कभी गलत है। Retrieval उसे inference time पर current, accurate, domain-specific जानकारी तक पहुंच देता है। आपकी retrieval pipeline की quality सीधे आपके RAG system की quality निर्धारित करती है — सबसे अच्छा LLM खराब context से अच्छे answers उत्पन्न नहीं कर सकता।

गहन अध्ययन

Traditional retrieval (BM25, TF-IDF) query keywords को document keywords से match करता है, frequency और importance से weighted। यह तेज़, interpretable, और exact matches के लिए excellent है। Semantic retrieval queries और documents को embeddings के रूप में encode करता है और vector space में nearest neighbors खोजता है। यह paraphrase और conceptual similarity संभालता है लेकिन exact keyword matches miss कर सकता है। Hybrid retrieval दोनों को combine करता है, आमतौर पर results merge करने के लिए reciprocal rank fusion का उपयोग करता है।

Chunking Strategy

RAG के लिए, documents को embedding से पहले chunks में split करना होगा। Chunk size एक critical design decision है: बहुत छोटा और आप context खो देते हैं, बहुत बड़ा और आप relevant जानकारी को noise से dilute करते हैं। सामान्य strategies में overlap के साथ fixed-size chunks, sentence-level splitting, paragraph-level splitting, और document structure (headers, sections) का respect करने वाली recursive splitting शामिल हैं। Optimal approach आपके documents और queries पर निर्भर करता है।

Reranking

एक सामान्य pattern: fast retrieval का उपयोग करके candidates का broad set (मान लें 50) retrieve करें, फिर एक अधिक accurate (लेकिन धीमे) model का उपयोग करके उन्हें rerank करें। Cross-encoder rerankers (जैसे Cohere Rerank या BGE-Reranker) query-document pairs को एक साथ process करते हैं, embedding similarity से अकेले की तुलना में अधिक accurate relevance scores produce करते हैं। यह two-stage pipeline speed (fast initial retrieval) को accuracy (top candidates का precise reranking) के साथ balance करती है।

Retrieval

यह क्यों मायने रखता है

गहन अध्ययन

Chunking Strategy

Reranking

संबंधित अवधारणाएँ