Zubnet AI學習Wiki › Sampling
基礎

Sampling

Decoding Strategy, Top-p, Top-k
從模型預測的機率分佈中選下一個要生成的 token 的過程。貪婪解碼總是選最可能的 token。隨機採樣按機率比例選。Temperature、top-p(nucleus)、top-k 是調整選擇隨機性和多樣性的控制。採樣策略大幅影響輸出的品質、創造性、一致性。

為什麼重要

採樣參數是控制 LLM 行為最可觸及的旋鈕。Temperature 0 用於確定性程式生成。Temperature 0.7 用於創意寫作。Top-p 0.9 是個好平衡。這些不是魔法數字 — 它們直接控制模型在每一步考慮哪些 token。理解採樣能幫你為你具體的用例調整輸出。

Deep Dive

The sampling pipeline: (1) model produces logits for all vocabulary tokens, (2) temperature scaling divides logits by T, (3) top-k filtering keeps only the k highest logits (setting rest to −∞), (4) top-p filtering keeps the smallest set of tokens whose cumulative probability exceeds p, (5) softmax converts filtered logits to probabilities, (6) a token is randomly sampled from this distribution. Steps 3 and 4 are optional and can be combined.

Choosing Parameters

For factual/code tasks: temperature 0 (or very low), no top-p/top-k. You want the most likely tokens. For creative writing: temperature 0.7–1.0, top-p 0.9–0.95. You want diversity without incoherence. For brainstorming: temperature 1.0+, wider top-p. You want surprising, unexpected connections. The key insight: there's no universal best setting. Different tasks need different sampling strategies, and the optimal parameters also vary by model.

Beyond Simple Sampling

進階 strategies include: beam search (maintain multiple candidate sequences, pick the overall best — good for translation, less useful for open-ended generation), contrastive decoding (boost tokens where a large model outperforms a small model), and min-p sampling (dynamic threshold that keeps tokens with probability above a fraction of the top token's probability). These techniques address specific failure modes of simple sampling, like repetition loops or degenerate outputs.

相關概念

← 所有術語
← SambaNova Sarvam AI →