Zubnet AIसीखेंWiki › Validation Set
Training

Validation Set

Dev Set, Hold-Out Set
Training से held back data का एक subset, development के दौरान model performance evaluate करने और hyperparameters tune करने के लिए use किया जाता है। Three-way split: training set model को train करता है, validation set model के बारे में decisions guide करता है (learning rate, architecture, कब stop करना है), और test set final, unbiased performance estimate provide करता है। Validation set development के दौरान आपका mirror है।

यह क्यों matter करता है

Validation set के बिना, आप blind flying कर रहे हैं। Training loss आपको बताती है कि model training data पर कितना अच्छा fit करता है, लेकिन ये नहीं कि वो कितना अच्छा generalize करता है। Validation set वो question answer करता है जो actually matter करता है: “ये model ऐसे data पर कैसा perform करेगा जो उसने नहीं देखा?” Model development के दौरान हर decision — hyperparameters, architecture choices, training duration — validation set पर evaluate होना चाहिए।

Deep Dive

Typical splits: 80% training, 10% validation, 10% test. For large datasets, smaller percentages for validation and test suffice (even 1% of a million examples is 10,000 — plenty for reliable evaluation). For small datasets, cross-validation is preferred (see: Cross-Validation). The key rule: never use the test set for any decision during development. It's only for the final evaluation. If you peek at the test set during development, your performance estimate becomes biased.

Stratification

When splitting data, ensure each split has a representative distribution of classes, domains, and other important characteristics. If your dataset is 90% English and 10% French, a random split might put all French examples in the training set, leaving you unable to evaluate French performance. Stratified splitting ensures proportional representation in each split. For time-series data, use temporal splits (train on past, validate on future) rather than random splits.

Validation in LLM Development

For LLM pre-training, the validation set is a held-out portion of the training corpus, used to compute perplexity during training. For fine-tuning, it's a held-out portion of the fine-tuning dataset. For alignment (RLHF/DPO), validation is more complex: automated metrics (reward model scores) plus human evaluation on held-out prompts. The validation strategy should match how the model will actually be used — if users will ask diverse questions, the validation set should be diverse.

संबंधित अवधारणाएँ

← सभी Terms
ESC