Sampling: Definition & Meaning — AI Wiki

从模型预测的概率分布中选下一个要生成的 token 的过程。贪婪解码总是选最可能的 token。随机采样按概率比例选。Temperature、top-p(nucleus)、top-k 是调整选择随机性和多样性的控制。采样策略大幅影响输出的质量、创造性、一致性。

为什么重要

采样参数是控制 LLM 行为最可触及的旋钮。Temperature 0 用于确定性代码生成。Temperature 0.7 用于创意写作。Top-p 0.9 是个好平衡。这些不是魔法数字 — 它们直接控制模型在每一步考虑哪些 token。理解采样能帮你为你具体的用例调整输出。

Deep Dive

The sampling pipeline: (1) model produces logits for all vocabulary tokens, (2) temperature scaling divides logits by T, (3) top-k filtering keeps only the k highest logits (setting rest to −∞), (4) top-p filtering keeps the smallest set of tokens whose cumulative probability exceeds p, (5) softmax converts filtered logits to probabilities, (6) a token is randomly sampled from this distribution. Steps 3 and 4 are optional and can be combined.

Choosing Parameters

For factual/code tasks: temperature 0 (or very low), no top-p/top-k. You want the most likely tokens. For creative writing: temperature 0.7–1.0, top-p 0.9–0.95. You want diversity without incoherence. For brainstorming: temperature 1.0+, wider top-p. You want surprising, unexpected connections. The key insight: there's no universal best setting. Different tasks need different sampling strategies, and the optimal parameters also vary by model.

Beyond Simple Sampling

进阶 strategies include: beam search (maintain multiple candidate sequences, pick the overall best — good for translation, less useful for open-ended generation), contrastive decoding (boost tokens where a large model outperforms a small model), and min-p sampling (dynamic threshold that keeps tokens with probability above a fraction of the top token's probability). These techniques address specific failure modes of simple sampling, like repetition loops or degenerate outputs.

Sampling

为什么重要

Deep Dive

Choosing Parameters

Beyond Simple Sampling

相关概念