Zubnet AILearnWiki › Sampling
Fundamentals

Sampling

Decoding Strategy, Top-p, Top-k
The process of selecting which token to generate next from the model's predicted probability distribution. Greedy decoding always picks the most likely token. Random sampling picks proportionally to probabilities. Temperature, top-p (nucleus), and top-k are controls that adjust the randomness and diversity of the selection. The sampling strategy dramatically affects output quality, creativity, and consistency.

Why it matters

Sampling parameters are the most accessible knobs for controlling LLM behavior. Temperature 0 for deterministic code generation. Temperature 0.7 for creative writing. Top-p 0.9 for a good balance. These aren't magic numbers — they directly control which tokens the model considers at each step. Understanding sampling helps you tune outputs for your specific use case.

Deep Dive

The sampling pipeline: (1) model produces logits for all vocabulary tokens, (2) temperature scaling divides logits by T, (3) top-k filtering keeps only the k highest logits (setting rest to −∞), (4) top-p filtering keeps the smallest set of tokens whose cumulative probability exceeds p, (5) softmax converts filtered logits to probabilities, (6) a token is randomly sampled from this distribution. Steps 3 and 4 are optional and can be combined.

Choosing Parameters

For factual/code tasks: temperature 0 (or very low), no top-p/top-k. You want the most likely tokens. For creative writing: temperature 0.7–1.0, top-p 0.9–0.95. You want diversity without incoherence. For brainstorming: temperature 1.0+, wider top-p. You want surprising, unexpected connections. The key insight: there's no universal best setting. Different tasks need different sampling strategies, and the optimal parameters also vary by model.

Beyond Simple Sampling

Advanced strategies include: beam search (maintain multiple candidate sequences, pick the overall best — good for translation, less useful for open-ended generation), contrastive decoding (boost tokens where a large model outperforms a small model), and min-p sampling (dynamic threshold that keeps tokens with probability above a fraction of the top token's probability). These techniques address specific failure modes of simple sampling, like repetition loops or degenerate outputs.

Related Concepts

← All Terms
← SambaNova Sarvam AI →