Zubnet AIसीखेंWiki › Beam Search
मूल सिद्धांत

Beam Search

एक decoding strategy जो multiple candidate sequences (“beam”) को simultaneously maintain करती है, हर एक को हर step पर एक token से expand करती है और सिर्फ top-scoring candidates रखती है। Greedy decoding (हमेशा best next token चुनना) या sampling (randomly चुनना) के विपरीत, beam search multiple paths explore करता है और overall highest-probability sequence ढूँढता है। Commonly translation और summarization के लिए used।

यह क्यों matter करता है

Beam search दिखाता है कि locally best choice हमेशा globally best नहीं होता। Greedy decoding first word के लिए “The” pick कर सकता है जब “In” एक much better overall sentence की ओर लेकर जाएगा। Multiple candidates रखकर, beam search बहुत जल्दी commit करने से बचता है। हालांकि, open-ended generation (chat, creative writing) के लिए, sampling beam search से ज़्यादा diverse और natural text produce करती है।

Deep Dive

The algorithm: maintain a beam of width k (e.g., k=5). At each step, expand each candidate by all possible next tokens, score the resulting sequences, and keep the top k. Continue until all candidates have generated an end token or hit a length limit. Return the highest-scoring complete sequence. The beam width k trades off quality vs. compute: k=1 is greedy decoding, larger k explores more paths but costs k times more compute.

The Length Penalty Problem

Raw beam search favors shorter sequences (fewer tokens = fewer probability multiplications = higher total probability). A length penalty (dividing the score by length^α) counteracts this bias, encouraging the model to generate complete, well-formed outputs rather than cutting short. The penalty factor α is a hyperparameter: α=0 is no penalty, α=1 normalizes fully by length. Typical values are 0.6–1.0.

When to Use Beam Search

Beam search works best for tasks with a "correct" answer (translation, summarization, structured generation) where you want the single most likely output. It works poorly for creative or conversational tasks where diversity matters — beam search tends to produce generic, repetitive text because high-probability sequences are often boring. Modern LLM interfaces use sampling (with temperature and top-p) for chat, and beam search is mainly used internally for specific tasks like tool-call generation.

संबंधित अवधारणाएँ

← सभी Terms
← Batch Size & Epoch Benchmark →