Logits: Definition & Meaning — AI Wiki

模型输出的原始未归一化分数,之后被 softmax 函数转成概率。对语言模型来说,logits 是一个向量,词表里每个 token 一个值 — 值越高,模型认为这个 token 越可能。Logits 是模型产出的最有信息量的输出,比最终的概率分布包含更多信息。

为什么重要

理解 logits 能帮你理解模型怎么“思考”。Temperature、top-p、top-k 采样都在 logits 上操作。图像生成中的 classifier-free guidance 操纵 logits。Logit bias(给特定 token 加偏置)让你引导模型行为。如果你在构建基本 chat 之外的 AI 应用,迟早要直接跟 logits 打交道。

Deep Dive

The model's final layer produces a vector of size V (vocabulary size, typically 32K–128K). Each element is a logit for that token. Softmax converts these to probabilities: P(token_i) = exp(logit_i) / ∑ exp(logit_j). Before softmax, the logits can be any real number — positive, negative, or zero. A logit of 10 vs. 5 means the model considers the first token about e^5 ≈ 150x more likely.

Logit Manipulation

Several techniques work directly on logits. Temperature divides all logits by T before softmax (T<1 sharpens, T>1 flattens). Top-k zeroes out all logits except the k highest. Top-p (nucleus sampling) zeroes out logits for tokens outside the smallest set whose cumulative probability exceeds p. Logit bias adds a fixed offset to specific tokens' logits — adding +10 to the logit for "JSON" makes the model strongly prefer starting with JSON. Repetition penalty reduces logits of recently generated tokens.

Log-Probabilities

Most APIs can return log-probabilities (log of the softmax output) alongside generated tokens. These are useful for: measuring model confidence (low log-prob = uncertain), calibrating outputs (are 90%-confident predictions correct 90% of the time?), and building classifiers from LLMs (compare log-probs of different completions). Log-probs are more numerically stable than raw probabilities for extreme values.

Logits

为什么重要

Deep Dive

Logit Manipulation

Log-Probabilities

相关概念