Sampling: Definition & Meaning — AI Wiki

El proceso de seleccionar qué token generar después de la distribución de probabilidad predicha del modelo. El decoding greedy siempre elige el token más probable. El sampling aleatorio elige proporcionalmente a las probabilidades. Temperature, top-p (nucleus) y top-k son controles que ajustan la aleatoriedad y diversidad de la selección. La estrategia de sampling afecta dramáticamente la calidad, creatividad y consistencia de la salida.

Por qué importa

Los parámetros de sampling son las perillas más accesibles para controlar el comportamiento de LLM. Temperature 0 para generación de código determinista. Temperature 0.7 para escritura creativa. Top-p 0.9 para un buen balance. Estos no son números mágicos — controlan directamente qué tokens considera el modelo en cada paso. Entender sampling te ayuda a ajustar salidas para tu caso de uso específico.

Deep Dive

The sampling pipeline: (1) model produces logits for all vocabulary tokens, (2) temperature scaling divides logits by T, (3) top-k filtering keeps only the k highest logits (setting rest to −∞), (4) top-p filtering keeps the smallest set of tokens whose cumulative probability exceeds p, (5) softmax converts filtered logits to probabilities, (6) a token is randomly sampled from this distribution. Steps 3 and 4 are optional and can be combined.

Choosing Parameters

For factual/code tasks: temperature 0 (or very low), no top-p/top-k. You want the most likely tokens. For creative writing: temperature 0.7–1.0, top-p 0.9–0.95. You want diversity without incoherence. For brainstorming: temperature 1.0+, wider top-p. You want surprising, unexpected connections. The key insight: there's no universal best setting. Different tasks need different sampling strategies, and the optimal parameters also vary by model.

Beyond Simple Sampling

Avanzado strategies include: beam search (maintain multiple candidate sequences, pick the overall best — good for translation, less useful for open-ended generation), contrastive decoding (boost tokens where a large model outperforms a small model), and min-p sampling (dynamic threshold that keeps tokens with probability above a fraction of the top token's probability). These techniques address specific failure modes of simple sampling, like repetition loops or degenerate outputs.

Sampling

Por qué importa

Deep Dive

Choosing Parameters

Beyond Simple Sampling

Conceptos relacionados