Zubnet AI学习Wiki › Sampling
基础

Sampling

Decoding Strategy, Top-p, Top-k
从模型预测的概率分布中选下一个要生成的 token 的过程。贪婪解码总是选最可能的 token。随机采样按概率比例选。Temperature、top-p(nucleus)、top-k 是调整选择随机性和多样性的控制。采样策略大幅影响输出的质量、创造性、一致性。

为什么重要

采样参数是控制 LLM 行为最可触及的旋钮。Temperature 0 用于确定性代码生成。Temperature 0.7 用于创意写作。Top-p 0.9 是个好平衡。这些不是魔法数字 — 它们直接控制模型在每一步考虑哪些 token。理解采样能帮你为你具体的用例调整输出。

Deep Dive

The sampling pipeline: (1) model produces logits for all vocabulary tokens, (2) temperature scaling divides logits by T, (3) top-k filtering keeps only the k highest logits (setting rest to −∞), (4) top-p filtering keeps the smallest set of tokens whose cumulative probability exceeds p, (5) softmax converts filtered logits to probabilities, (6) a token is randomly sampled from this distribution. Steps 3 and 4 are optional and can be combined.

Choosing Parameters

For factual/code tasks: temperature 0 (or very low), no top-p/top-k. You want the most likely tokens. For creative writing: temperature 0.7–1.0, top-p 0.9–0.95. You want diversity without incoherence. For brainstorming: temperature 1.0+, wider top-p. You want surprising, unexpected connections. The key insight: there's no universal best setting. Different tasks need different sampling strategies, and the optimal parameters also vary by model.

Beyond Simple Sampling

进阶 strategies include: beam search (maintain multiple candidate sequences, pick the overall best — good for translation, less useful for open-ended generation), contrastive decoding (boost tokens where a large model outperforms a small model), and min-p sampling (dynamic threshold that keeps tokens with probability above a fraction of the top token's probability). These techniques address specific failure modes of simple sampling, like repetition loops or degenerate outputs.

相关概念

← 所有术语
← SambaNova Sarvam AI →