Sampling: Definition & Meaning — AI Wiki

Model की predicted probability distribution से अगला कौन सा token generate करना है ये select करने की process। Greedy decoding हमेशा सबसे likely token चुनती है। Random sampling probabilities के proportion में चुनती है। Temperature, top-p (nucleus), और top-k controls हैं जो selection की randomness और diversity adjust करती हैं। Sampling strategy output की quality, creativity, और consistency को dramatically affect करती है।

यह क्यों matter करता है

Sampling parameters LLM behavior control करने के लिए सबसे accessible knobs हैं। Deterministic code generation के लिए Temperature 0। Creative writing के लिए Temperature 0.7। एक अच्छे balance के लिए Top-p 0.9। ये magic numbers नहीं हैं — ये directly control करते हैं कि model हर step पर कौन से tokens consider करे। Sampling समझना आपके specific use case के लिए outputs tune करने में help करता है।

Deep Dive

The sampling pipeline: (1) model produces logits for all vocabulary tokens, (2) temperature scaling divides logits by T, (3) top-k filtering keeps only the k highest logits (setting rest to −∞), (4) top-p filtering keeps the smallest set of tokens whose cumulative probability exceeds p, (5) softmax converts filtered logits to probabilities, (6) a token is randomly sampled from this distribution. Steps 3 and 4 are optional and can be combined.

Choosing Parameters

For factual/code tasks: temperature 0 (or very low), no top-p/top-k. You want the most likely tokens. For creative writing: temperature 0.7–1.0, top-p 0.9–0.95. You want diversity without incoherence. For brainstorming: temperature 1.0+, wider top-p. You want surprising, unexpected connections. The key insight: there's no universal best setting. Different tasks need different sampling strategies, and the optimal parameters also vary by model.

Beyond Simple Sampling

Advanced strategies include: beam search (maintain multiple candidate sequences, pick the overall best — good for translation, less useful for open-ended generation), contrastive decoding (boost tokens where a large model outperforms a small model), and min-p sampling (dynamic threshold that keeps tokens with probability above a fraction of the top token's probability). These techniques address specific failure modes of simple sampling, like repetition loops or degenerate outputs.

Sampling

यह क्यों matter करता है

Deep Dive

Choosing Parameters

Beyond Simple Sampling

संबंधित अवधारणाएँ