Zubnet AI學習Wiki › Loss Function
Training

Loss Function

Objective Function, Cost Function
一個數學函數,衡量一個模型的預測有多錯。模型在訓練時的唯一目標就是最小化這個數字。對 LLM 來說,損失函數通常是 cross-entropy — 它衡量模型對實際下一個 token 的「驚訝」程度,相對於它預測的機率分佈。損失越低,模型的預測越接近現實。

為什麼重要

損失函數是訓練的指南針。一個模型學到的一切都是為了減少這一個數字。選錯損失函數意味著模型在為錯誤的東西優化。理解損失能幫你讀懂訓練曲線、診斷問題(loss 停滯?發散?過擬合?),並理解模型為什麼表現成那樣。

Deep Dive

Cross-entropy loss for language models works like this: at each position in the text, the model predicts a probability distribution over its entire vocabulary. The loss is the negative log probability assigned to the actual next token. If the model predicted the correct token with 90% probability, loss is low (−log(0.9) ≈ 0.1). If it predicted the correct token with 1% probability, loss is high (−log(0.01) ≈ 4.6). Summing across all positions gives the total loss.

Perplexity: Loss Made Intuitive

Perplexity is just 2^(cross-entropy loss), or equivalently e^(loss) when using natural log. It represents "how many options the model is effectively choosing between at each token." A perplexity of 10 means the model is as uncertain as if it were picking randomly among 10 equally likely tokens. Lower perplexity = more confident and accurate predictions. It's the standard metric for comparing language models' raw text modeling ability.

Loss Isn't Everything

A lower loss doesn't always mean a better model for users. A model with slightly higher loss but better alignment (via RLHF/DPO) is usually more useful than a model with minimal loss but no alignment. Loss measures how well the model predicts text; alignment measures how well it follows instructions and avoids harm. The gap between "good at predicting text" and "good at being helpful" is what post-training addresses.

相關概念

← 所有術語
← LoRA LSTM →