Cross-Validation: Definition & Meaning — AI Wiki

在沒有足夠資料做單獨測試集時評估模型性能的技術。K-fold cross-validation 把資料切成 K 等份,在 K−1 份上訓練、在剩下的一份上評估,輪換 K 次,讓每個資料點都用於訓練和評估。K 個 fold 的平均分數比單次 train/test 切分給出更可靠的性能估計。

為什麼重要

Cross-validation 在資料稀少時必不可少 — 如果你只有 500 個範例,拿 100 個做測試意味著訓練少 20% 資料。Cross-validation 讓所有資料用於訓練和評估。它還給你信賴區間(fold 間的變異數)而不是單個數字,告訴你模型性能多穩定。

Deep Dive

5-fold CV: split data into 5 parts. Train on parts 1-4, evaluate on part 5. Then train on parts 1-3+5, evaluate on part 4. Repeat for all 5 folds. Average the 5 evaluation scores. The result is more reliable than a single 80/20 split because it's robust to the particular split — a "lucky" or "unlucky" test set can't skew the result. The standard deviation across folds indicates reliability.

Stratified K-Fold

For classification with imbalanced classes (rare disease: 5% positive, 95% negative), random splitting might put all positives in one fold. Stratified K-fold ensures each fold has the same class distribution as the full dataset. This prevents folds with no positive examples (useless for evaluation) and gives more reliable performance estimates for minority classes. Always use stratified K-fold for classification.

When Not to Use It

Cross-validation is computationally expensive (K times the training cost) and rarely used for large models. Fine-tuning a 7B model 5 times for 5-fold CV is impractical. For LLMs, a single held-out validation set is standard because: the datasets are large enough for reliable single-split evaluation, training is expensive, and the model's pre-trained representations make it less sensitive to the specific training split. Cross-validation is most valuable for small datasets with classical ML models.

Cross-Validation

為什麼重要

Deep Dive

Stratified K-Fold

When Not to Use It

相關概念