Zubnet AIसीखेंWiki › Early Stopping
Training

Early Stopping

Patience, Validation-Based Stopping
Training को fixed number of steps के लिए run करने के बजाय जब एक held-out validation set पर performance improve होना बंद हो जाए तब रोकना। जैसे-जैसे training continue होती है, training loss गिरती रहती है लेकिन validation loss eventually बढ़ने लगती है — model training data पर overfit कर रहा है। Early stopping इस inflection point को catch करता है और quality degrade होने से पहले best model save करता है।

यह क्यों matter करता है

Early stopping fine-tuning के लिए सबसे simple और most effective regularization technique है। इसके बिना, आप बहुत लम्बा train करने और जो capabilities आप preserve करना चाहते थे उन्हें destroy करने का risk लेते हैं। इसके साथ, model automatically अपने best point पर stop करता है। “Patience” parameter (stop करने से पहले कितनी evaluations without improvement) fine-tuning में सबसे important hyperparameters में से एक है।

Deep Dive

The process: (1) split your data into training and validation sets, (2) evaluate on the validation set periodically during training, (3) track the best validation metric (loss, accuracy, F1), (4) if the metric hasn't improved for N evaluations (patience), stop training and revert to the checkpoint with the best validation score. This prevents the model from memorizing training data beyond the point where it helps generalization.

In LLM Fine-Tuning

For LLM fine-tuning, early stopping is especially important because catastrophic forgetting can destroy base model capabilities. A model fine-tuned for too long on customer support data might become great at support but lose its ability to do math or write code. Monitoring validation loss across multiple task types (not just the fine-tuning task) helps catch this. Typical fine-tuning runs are 1–5 epochs with patience of 2–3 evaluations.

Not Used in Pre-Training

Interestingly, LLM pre-training rarely uses early stopping. The training runs are so expensive and the datasets so large that models typically train for a predetermined number of tokens (based on scaling laws). Overfitting is rare during pre-training because the model usually never sees the same data twice. Early stopping is primarily a fine-tuning and classical ML technique.

संबंधित अवधारणाएँ

← सभी Terms
ESC