Beam Search: Definition & Meaning — AI Wiki

一種解碼策略,同時維持多個候選序列(「beam」),每步用一個 token 擴展每一個,只保留得分最高的候選。不像貪婪解碼(總選最好的下一個 token)或採樣(隨機選),beam search 探索多條路徑,找到整體最高機率的序列。常用於翻譯和摘要。

為什麼重要

Beam search 顯示局部最佳選擇不總是全域最佳。貪婪解碼可能選「The」作為第一個詞,而「In」會導致整體更好的句子。透過保留多個候選,beam search 避免過早承諾。然而,對於開放式生成(聊天、創意寫作),採樣比 beam search 產出更多樣、更自然的文字。

Deep Dive

The algorithm: maintain a beam of width k (e.g., k=5). At each step, expand each candidate by all possible next tokens, score the resulting sequences, and keep the top k. Continue until all candidates have generated an end token or hit a length limit. Return the highest-scoring complete sequence. The beam width k trades off quality vs. compute: k=1 is greedy decoding, larger k explores more paths but costs k times more compute.

The Length Penalty Problem

Raw beam search favors shorter sequences (fewer tokens = fewer probability multiplications = higher total probability). A length penalty (dividing the score by length^α) counteracts this bias, encouraging the model to generate complete, well-formed outputs rather than cutting short. The penalty factor α is a hyperparameter: α=0 is no penalty, α=1 normalizes fully by length. Typical values are 0.6–1.0.

When to Use Beam Search

Beam search works best for tasks with a "correct" answer (translation, summarization, structured generation) where you want the single most likely output. It works poorly for creative or conversational tasks where diversity matters — beam search tends to produce generic, repetitive text because high-probability sequences are often boring. Modern LLM interfaces use sampling (with temperature and top-p) for chat, and beam search is mainly used internally for specific tasks like tool-call generation.

Beam Search

為什麼重要

Deep Dive

The Length Penalty Problem

When to Use Beam Search

相關概念