Validation Set: Definition & Meaning — AI Wiki

從訓練裡留出的資料子集,用來在開發期間評估模型性能和調超參數。三路分割:訓練集訓練模型,驗證集指導關於模型的決定(學習率、架構、何時停),測試集提供最終的無偏性能估計。驗證集是你開發期的鏡子。

為什麼重要

沒有驗證集你就是盲飛。訓練 loss 告訴你模型對訓練資料擬合得多好,但不告訴你它泛化得多好。驗證集回答真正重要的問題:「這個模型在沒見過的資料上會表現如何?」模型開發期間的每個決定 — 超參數、架構選擇、訓練時長 — 都該在驗證集上評估。

Deep Dive

Typical splits: 80% training, 10% validation, 10% test. For large datasets, smaller percentages for validation and test suffice (even 1% of a million examples is 10,000 — plenty for reliable evaluation). For small datasets, cross-validation is preferred (see: Cross-Validation). The key rule: never use the test set for any decision during development. It's only for the final evaluation. If you peek at the test set during development, your performance estimate becomes biased.

Stratification

When splitting data, ensure each split has a representative distribution of classes, domains, and other important characteristics. If your dataset is 90% English and 10% French, a random split might put all French examples in the training set, leaving you unable to evaluate French performance. Stratified splitting ensures proportional representation in each split. For time-series data, use temporal splits (train on past, validate on future) rather than random splits.

Validation in LLM Development

For LLM pre-training, the validation set is a held-out portion of the training corpus, used to compute perplexity during training. For fine-tuning, it's a held-out portion of the fine-tuning dataset. For alignment (RLHF/DPO), validation is more complex: automated metrics (reward model scores) plus human evaluation on held-out prompts. The validation strategy should match how the model will actually be used — if users will ask diverse questions, the validation set should be diverse.

Validation Set

為什麼重要

Deep Dive

Stratification

Validation in LLM Development

相關概念