Zubnet AIAprenderWiki › Cross-Validation
Training

Cross-Validation

K-Fold CV, Leave-One-Out
Una técnica para evaluar rendimiento del modelo cuando no tienes suficientes datos para un test set separado. La K-fold cross-validation divide los datos en K partes iguales, entrena en K−1 partes y evalúa en la parte restante, rotando K veces para que cada punto de datos se use para entrenamiento y evaluación. El puntaje promedio a través de las K folds da una estimación de rendimiento más confiable que un único split train/test.

Por qué importa

La cross-validation es esencial cuando los datos son escasos — si solo tienes 500 ejemplos, reservar 100 para pruebas significa entrenar en 20% menos de datos. La cross-validation usa todos los datos para entrenamiento y evaluación. También te da un intervalo de confianza (varianza a través de folds) en vez de un solo número, diciéndote qué tan estable es el rendimiento de tu modelo.

Deep Dive

5-fold CV: split data into 5 parts. Train on parts 1-4, evaluate on part 5. Then train on parts 1-3+5, evaluate on part 4. Repeat for all 5 folds. Average the 5 evaluation scores. The result is more reliable than a single 80/20 split because it's robust to the particular split — a "lucky" or "unlucky" test set can't skew the result. The standard deviation across folds indicates reliability.

Stratified K-Fold

For classification with imbalanced classes (rare disease: 5% positive, 95% negative), random splitting might put all positives in one fold. Stratified K-fold ensures each fold has the same class distribution as the full dataset. This prevents folds with no positive examples (useless for evaluation) and gives more reliable performance estimates for minority classes. Always use stratified K-fold for classification.

When Not to Use It

Cross-validation is computationally expensive (K times the training cost) and rarely used for large models. Fine-tuning a 7B model 5 times for 5-fold CV is impractical. For LLMs, a single held-out validation set is standard because: the datasets are large enough for reliable single-split evaluation, training is expensive, and the model's pre-trained representations make it less sensitive to the specific training split. Cross-validation is most valuable for small datasets with classical ML models.

Conceptos relacionados

← Todos los términos
ESC