Zubnet AI学习Wiki › Loss Function
Training

Loss Function

Objective Function, Cost Function
一个数学函数,衡量一个模型的预测有多错。模型在训练时的唯一目标就是最小化这个数字。对 LLM 来说,损失函数通常是 cross-entropy — 它衡量模型对实际下一个 token 的“惊讶”程度,相对于它预测的概率分布。损失越低,模型的预测越接近现实。

为什么重要

损失函数是训练的指南针。一个模型学到的一切都是为了减少这一个数字。选错损失函数意味着模型在为错误的东西优化。理解损失能帮你读懂训练曲线、诊断问题(loss 平台?发散?过拟合?),并理解模型为什么表现成那样。

Deep Dive

Cross-entropy loss for language models works like this: at each position in the text, the model predicts a probability distribution over its entire vocabulary. The loss is the negative log probability assigned to the actual next token. If the model predicted the correct token with 90% probability, loss is low (−log(0.9) ≈ 0.1). If it predicted the correct token with 1% probability, loss is high (−log(0.01) ≈ 4.6). Summing across all positions gives the total loss.

Perplexity: Loss Made Intuitive

Perplexity is just 2^(cross-entropy loss), or equivalently e^(loss) when using natural log. It represents "how many options the model is effectively choosing between at each token." A perplexity of 10 means the model is as uncertain as if it were picking randomly among 10 equally likely tokens. Lower perplexity = more confident and accurate predictions. It's the standard metric for comparing language models' raw text modeling ability.

Loss Isn't Everything

A lower loss doesn't always mean a better model for users. A model with slightly higher loss but better alignment (via RLHF/DPO) is usually more useful than a model with minimal loss but no alignment. Loss measures how well the model predicts text; alignment measures how well it follows instructions and avoids harm. The gap between "good at predicting text" and "good at being helpful" is what post-training addresses.

相关概念

← 所有术语
← LoRA LSTM →