Cross-Validation: Definition & Meaning — AI Wiki

A technique for evaluating model performance when you don't have enough data for a separate test set. K-fold cross-validation splits data into K equal parts, trains on K−1 parts and evaluates on the remaining part, rotating K times so every data point is used for both training and evaluation. The average score across all K folds gives a more reliable performance estimate than a single train/test split.

Why it matters

Cross-validation is essential when data is scarce — if you only have 500 examples, setting aside 100 for testing means training on 20% less data. Cross-validation uses all data for both training and evaluation. It also gives you a confidence interval (variance across folds) rather than a single number, telling you how stable your model's performance is.

Deep Dive

5-fold CV: split data into 5 parts. Train on parts 1-4, evaluate on part 5. Then train on parts 1-3+5, evaluate on part 4. Repeat for all 5 folds. Average the 5 evaluation scores. The result is more reliable than a single 80/20 split because it's robust to the particular split — a "lucky" or "unlucky" test set can't skew the result. The standard deviation across folds indicates reliability.

Stratified K-Fold

For classification with imbalanced classes (rare disease: 5% positive, 95% negative), random splitting might put all positives in one fold. Stratified K-fold ensures each fold has the same class distribution as the full dataset. This prevents folds with no positive examples (useless for evaluation) and gives more reliable performance estimates for minority classes. Always use stratified K-fold for classification.

When Not to Use It

Cross-validation is computationally expensive (K times the training cost) and rarely used for large models. Fine-tuning a 7B model 5 times for 5-fold CV is impractical. For LLMs, a single held-out validation set is standard because: the datasets are large enough for reliable single-split evaluation, training is expensive, and the model's pre-trained representations make it less sensitive to the specific training split. Cross-validation is most valuable for small datasets with classical ML models.

Cross-Validation

Why it matters

Deep Dive

Stratified K-Fold

When Not to Use It

Related Concepts