Hyperparameter Tuning: Definition & Meaning — AI Wiki

Systematically searching for the best hyperparameters — the configuration choices that aren't learned during training but must be set before it starts. Learning rate, batch size, number of layers, dropout rate, and LoRA rank are all hyperparameters. Tuning methods include grid search (try all combinations), random search (try random combinations), and Bayesian optimization (use past results to guide the search).

Why it matters

The difference between a good and bad set of hyperparameters can be enormous — a wrong learning rate can make training diverge or converge to a poor solution. Hyperparameter tuning is how you get the most out of your model architecture and data. For fine-tuning LLMs, learning rate and number of epochs are typically the most impactful hyperparameters to tune.

Deep Dive

Grid search evaluates every combination of specified hyperparameter values: learning rates [1e-3, 1e-4, 1e-5] × batch sizes [16, 32, 64] = 9 experiments. It's exhaustive but exponentially expensive as more hyperparameters are added. Random search samples random combinations from specified ranges — surprisingly, it often finds better configurations than grid search because it explores the space more evenly (Bergstra & Bengio, 2012).

Bayesian Optimization

Bayesian optimization uses a probabilistic model (typically a Gaussian process or tree-based model) to predict which hyperparameters are likely to perform well based on past experiments, then prioritizes those regions. Libraries like Optuna, Ray Tune, and W&B Sweeps implement this. For expensive experiments (training a model takes hours), Bayesian optimization's efficiency advantage over random search is significant — it typically finds good configurations in 3–5x fewer experiments.

Practical Tips

Start with established defaults for your architecture (published learning rates, batch sizes, etc.), then tune the most impactful parameters first. For LLM fine-tuning, learning rate is almost always the most important (try 1e-5 to 5e-4). For LoRA, rank (4–64) and alpha (typically 2× rank) matter most. Use early stopping to cut unpromising experiments short. Log everything to W&B or similar — you'll want to compare runs and understand what worked.

Hyperparameter Tuning

Why it matters

Deep Dive

Bayesian Optimization

Practical Tips

Related Concepts