Logits: Definition & Meaning — AI Wiki

模型輸出的原始未正規化分數,之後被 softmax 函數轉成機率。對語言模型來說,logits 是一個向量,詞彙表裡每個 token 一個值 — 值越高,模型認為這個 token 越可能。Logits 是模型產出的最有資訊量的輸出,比最終的機率分佈包含更多資訊。

為什麼重要

理解 logits 能幫你理解模型怎麼「思考」。Temperature、top-p、top-k 採樣都在 logits 上操作。影像生成中的 classifier-free guidance 操縱 logits。Logit bias(給特定 token 加偏置)讓你引導模型行為。如果你在建構基本 chat 之外的 AI 應用,遲早要直接跟 logits 打交道。

Deep Dive

The model's final layer produces a vector of size V (vocabulary size, typically 32K–128K). Each element is a logit for that token. Softmax converts these to probabilities: P(token_i) = exp(logit_i) / ∑ exp(logit_j). Before softmax, the logits can be any real number — positive, negative, or zero. A logit of 10 vs. 5 means the model considers the first token about e^5 ≈ 150x more likely.

Logit Manipulation

Several techniques work directly on logits. Temperature divides all logits by T before softmax (T<1 sharpens, T>1 flattens). Top-k zeroes out all logits except the k highest. Top-p (nucleus sampling) zeroes out logits for tokens outside the smallest set whose cumulative probability exceeds p. Logit bias adds a fixed offset to specific tokens' logits — adding +10 to the logit for "JSON" makes the model strongly prefer starting with JSON. Repetition penalty reduces logits of recently generated tokens.

Log-Probabilities

Most APIs can return log-probabilities (log of the softmax output) alongside generated tokens. These are useful for: measuring model confidence (low log-prob = uncertain), calibrating outputs (are 90%-confident predictions correct 90% of the time?), and building classifiers from LLMs (compare log-probs of different completions). Log-probs are more numerically stable than raw probabilities for extreme values.

Logits

為什麼重要

Deep Dive

Logit Manipulation

Log-Probabilities

相關概念