Zubnet AIAprenderWiki › Hyperparameter Tuning
Training

Hyperparameter Tuning

HPO, Hyperparameter Optimization, Grid Search
Buscar sistemáticamente los mejores hiperparámetros — las elecciones de configuración que no se aprenden durante el entrenamiento sino que deben establecerse antes de que empiece. Learning rate, batch size, número de capas, tasa de dropout y rango LoRA son todos hiperparámetros. Los métodos de tuning incluyen grid search (probar todas las combinaciones), random search (probar combinaciones aleatorias) y optimización bayesiana (usar resultados pasados para guiar la búsqueda).

Por qué importa

La diferencia entre un buen y mal conjunto de hiperparámetros puede ser enorme — un learning rate equivocado puede hacer divergir el entrenamiento o converger a una solución pobre. El tuning de hiperparámetros es cómo sacas el máximo a tu arquitectura de modelo y tus datos. Para fine-tuning de LLMs, el learning rate y el número de epochs son típicamente los hiperparámetros de mayor impacto para tunear.

Deep Dive

Grid search evaluates every combination of specified hyperparameter values: learning rates [1e-3, 1e-4, 1e-5] × batch sizes [16, 32, 64] = 9 experiments. It's exhaustive but exponentially expensive as more hyperparameters are added. Random search samples random combinations from specified ranges — surprisingly, it often finds better configurations than grid search because it explores the space more evenly (Bergstra & Bengio, 2012).

Bayesian Optimization

Bayesian optimization uses a probabilistic model (typically a Gaussian process or tree-based model) to predict which hyperparameters are likely to perform well based on past experiments, then prioritizes those regions. Libraries like Optuna, Ray Tune, and W&B Sweeps implement this. For expensive experiments (training a model takes hours), Bayesian optimization's efficiency advantage over random search is significant — it typically finds good configurations in 3–5x fewer experiments.

Practical Tips

Start with established defaults for your architecture (published learning rates, batch sizes, etc.), then tune the most impactful parameters first. For LLM fine-tuning, learning rate is almost always the most important (try 1e-5 to 5e-4). For LoRA, rank (4–64) and alpha (typically 2× rank) matter most. Use early stopping to cut unpromising experiments short. Log everything to W&B or similar — you'll want to compare runs and understand what worked.

Conceptos relacionados

← Todos los términos
ESC