Early Stopping: Definition & Meaning — AI Wiki

在留出的驗證集上性能停止改善時停止訓練,而不是訓練固定步數。訓練繼續,訓練 loss 繼續下降,但驗證 loss 最終開始上升 — 模型在訓練資料上過擬合。早停抓住這個拐點,在品質退化前儲存最好的模型。

為什麼重要

早停是 fine-tuning 最簡單、最有效的正則化技術。沒有它,你冒險訓練太久、摧毀你想保留的能力。有了它,模型自動在最佳點停止。「patience」參數(停止前沒有改善的評估次數)是 fine-tuning 中最重要的超參數之一。

Deep Dive

The process: (1) split your data into training and validation sets, (2) evaluate on the validation set periodically during training, (3) track the best validation metric (loss, accuracy, F1), (4) if the metric hasn't improved for N evaluations (patience), stop training and revert to the checkpoint with the best validation score. This prevents the model from memorizing training data beyond the point where it helps generalization.

In LLM Fine-Tuning

For LLM fine-tuning, early stopping is especially important because catastrophic forgetting can destroy base model capabilities. A model fine-tuned for too long on customer support data might become great at support but lose its ability to do math or write code. Monitoring validation loss across multiple task types (not just the fine-tuning task) helps catch this. Typical fine-tuning runs are 1–5 epochs with patience of 2–3 evaluations.

Not Used in Pre-Training

Interestingly, LLM pre-training rarely uses early stopping. The training runs are so expensive and the datasets so large that models typically train for a predetermined number of tokens (based on scaling laws). Overfitting is rare during pre-training because the model usually never sees the same data twice. Early stopping is primarily a fine-tuning and classical ML technique.

Early Stopping

為什麼重要

Deep Dive

In LLM Fine-Tuning

Not Used in Pre-Training

相關概念