Beam Search: Definition & Meaning — AI Wiki

Uma estratégia de decodificação que mantém múltiplas sequências candidatas (o “beam”) simultaneamente, expandindo cada uma por um token a cada passo e mantendo só os candidatos com as maiores pontuações. Diferente do greedy decoding (sempre escolher o melhor próximo token) ou sampling (escolher aleatoriamente), beam search explora múltiplos caminhos e encontra a sequência de maior probabilidade global. Comumente usado para tradução e sumarização.

Por que importa

Beam search mostra que a melhor escolha localmente nem sempre é melhor globalmente. Greedy decoding pode escolher “The” como primeira palavra quando “In” levaria a uma frase global muito melhor. Mantendo múltiplos candidatos, beam search evita se comprometer cedo demais. Porém, para geração aberta (chat, escrita criativa), sampling produz texto mais diverso e natural que beam search.

Deep Dive

The algorithm: maintain a beam of width k (e.g., k=5). At each step, expand each candidate by all possible next tokens, score the resulting sequences, and keep the top k. Continue until all candidates have generated an end token or hit a length limit. Return the highest-scoring complete sequence. The beam width k trades off quality vs. compute: k=1 is greedy decoding, larger k explores more paths but costs k times more compute.

The Length Penalty Problem

Raw beam search favors shorter sequences (fewer tokens = fewer probability multiplications = higher total probability). A length penalty (dividing the score by length^α) counteracts this bias, encouraging the model to generate complete, well-formed outputs rather than cutting short. The penalty factor α is a hyperparameter: α=0 is no penalty, α=1 normalizes fully by length. Typical values are 0.6–1.0.

When to Use Beam Search

Beam search works best for tasks with a "correct" answer (translation, summarization, structured generation) where you want the single most likely output. It works poorly for creative or conversational tasks where diversity matters — beam search tends to produce generic, repetitive text because high-probability sequences are often boring. Modern LLM interfaces use sampling (with temperature and top-p) for chat, and beam search is mainly used internally for specific tasks like tool-call generation.

Beam Search

Por que importa

Deep Dive

The Length Penalty Problem

When to Use Beam Search

Conceitos relacionados