Zubnet AILearnWiki › Beam Search
Fundamentals

Beam Search

A decoding strategy that maintains multiple candidate sequences (the "beam") simultaneously, expanding each by one token at each step and keeping only the top-scoring candidates. Unlike greedy decoding (always pick the best next token) or sampling (randomly pick), beam search explores multiple paths and finds the overall highest-probability sequence. Commonly used for translation and summarization.

Why it matters

Beam search shows that the locally best choice isn't always globally best. Greedy decoding might pick "The" as the first word when "In" would lead to a much better overall sentence. By keeping multiple candidates, beam search avoids committing too early. However, for open-ended generation (chat, creative writing), sampling produces more diverse and natural text than beam search.

Deep Dive

The algorithm: maintain a beam of width k (e.g., k=5). At each step, expand each candidate by all possible next tokens, score the resulting sequences, and keep the top k. Continue until all candidates have generated an end token or hit a length limit. Return the highest-scoring complete sequence. The beam width k trades off quality vs. compute: k=1 is greedy decoding, larger k explores more paths but costs k times more compute.

The Length Penalty Problem

Raw beam search favors shorter sequences (fewer tokens = fewer probability multiplications = higher total probability). A length penalty (dividing the score by length^α) counteracts this bias, encouraging the model to generate complete, well-formed outputs rather than cutting short. The penalty factor α is a hyperparameter: α=0 is no penalty, α=1 normalizes fully by length. Typical values are 0.6–1.0.

When to Use Beam Search

Beam search works best for tasks with a "correct" answer (translation, summarization, structured generation) where you want the single most likely output. It works poorly for creative or conversational tasks where diversity matters — beam search tends to produce generic, repetitive text because high-probability sequences are often boring. Modern LLM interfaces use sampling (with temperature and top-p) for chat, and beam search is mainly used internally for specific tasks like tool-call generation.

Related Concepts

← All Terms
← Batch Size & Epoch Benchmark →