Beam Search: Definition & Meaning — AI Wiki

一种解码策略,同时维持多个候选序列(“beam”),每步用一个 token 扩展每一个,只保留得分最高的候选。不像贪婪解码(总选最好的下一个 token)或采样(随机选),beam search 探索多条路径,找到整体最高概率的序列。常用于翻译和摘要。

为什么重要

Beam search 显示局部最佳选择不总是全局最佳。贪婪解码可能选“The”作为第一个词,而“In”会导致整体更好的句子。通过保留多个候选,beam search 避免过早承诺。然而,对于开放式生成(聊天、创意写作),采样比 beam search 产出更多样、更自然的文本。

Deep Dive

The algorithm: maintain a beam of width k (e.g., k=5). At each step, expand each candidate by all possible next tokens, score the resulting sequences, and keep the top k. Continue until all candidates have generated an end token or hit a length limit. Return the highest-scoring complete sequence. The beam width k trades off quality vs. compute: k=1 is greedy decoding, larger k explores more paths but costs k times more compute.

The Length Penalty Problem

Raw beam search favors shorter sequences (fewer tokens = fewer probability multiplications = higher total probability). A length penalty (dividing the score by length^α) counteracts this bias, encouraging the model to generate complete, well-formed outputs rather than cutting short. The penalty factor α is a hyperparameter: α=0 is no penalty, α=1 normalizes fully by length. Typical values are 0.6–1.0.

When to Use Beam Search

Beam search works best for tasks with a "correct" answer (translation, summarization, structured generation) where you want the single most likely output. It works poorly for creative or conversational tasks where diversity matters — beam search tends to produce generic, repetitive text because high-probability sequences are often boring. Modern LLM interfaces use sampling (with temperature and top-p) for chat, and beam search is mainly used internally for specific tasks like tool-call generation.

Beam Search

为什么重要

Deep Dive

The Length Penalty Problem

When to Use Beam Search

相关概念