The model's final layer produces a vector of size V (vocabulary size, typically 32K–128K). Each element is a logit for that token. Softmax converts these to probabilities: P(token_i) = exp(logit_i) / ∑ exp(logit_j). Before softmax, the logits can be any real number — positive, negative, or zero. A logit of 10 vs. 5 means the model considers the first token about e^5 ≈ 150x more likely.
Several techniques work directly on logits. Temperature divides all logits by T before softmax (T<1 sharpens, T>1 flattens). Top-k zeroes out all logits except the k highest. Top-p (nucleus sampling) zeroes out logits for tokens outside the smallest set whose cumulative probability exceeds p. Logit bias adds a fixed offset to specific tokens' logits — adding +10 to the logit for "JSON" makes the model strongly prefer starting with JSON. Repetition penalty reduces logits of recently generated tokens.
Most APIs can return log-probabilities (log of the softmax output) alongside generated tokens. These are useful for: measuring model confidence (low log-prob = uncertain), calibrating outputs (are 90%-confident predictions correct 90% of the time?), and building classifiers from LLMs (compare log-probs of different completions). Log-probs are more numerically stable than raw probabilities for extreme values.