Zubnet AIAprenderWiki › Sampling
Fundamentos

Sampling

Decoding Strategy, Top-p, Top-k
El proceso de seleccionar qué token generar después de la distribución de probabilidad predicha del modelo. El decoding greedy siempre elige el token más probable. El sampling aleatorio elige proporcionalmente a las probabilidades. Temperature, top-p (nucleus) y top-k son controles que ajustan la aleatoriedad y diversidad de la selección. La estrategia de sampling afecta dramáticamente la calidad, creatividad y consistencia de la salida.

Por qué importa

Los parámetros de sampling son las perillas más accesibles para controlar el comportamiento de LLM. Temperature 0 para generación de código determinista. Temperature 0.7 para escritura creativa. Top-p 0.9 para un buen balance. Estos no son números mágicos — controlan directamente qué tokens considera el modelo en cada paso. Entender sampling te ayuda a ajustar salidas para tu caso de uso específico.

Deep Dive

The sampling pipeline: (1) model produces logits for all vocabulary tokens, (2) temperature scaling divides logits by T, (3) top-k filtering keeps only the k highest logits (setting rest to −∞), (4) top-p filtering keeps the smallest set of tokens whose cumulative probability exceeds p, (5) softmax converts filtered logits to probabilities, (6) a token is randomly sampled from this distribution. Steps 3 and 4 are optional and can be combined.

Choosing Parameters

For factual/code tasks: temperature 0 (or very low), no top-p/top-k. You want the most likely tokens. For creative writing: temperature 0.7–1.0, top-p 0.9–0.95. You want diversity without incoherence. For brainstorming: temperature 1.0+, wider top-p. You want surprising, unexpected connections. The key insight: there's no universal best setting. Different tasks need different sampling strategies, and the optimal parameters also vary by model.

Beyond Simple Sampling

Avanzado strategies include: beam search (maintain multiple candidate sequences, pick the overall best — good for translation, less useful for open-ended generation), contrastive decoding (boost tokens where a large model outperforms a small model), and min-p sampling (dynamic threshold that keeps tokens with probability above a fraction of the top token's probability). These techniques address specific failure modes of simple sampling, like repetition loops or degenerate outputs.

Conceptos relacionados

← Todos los términos
← SambaNova Sarvam AI →