Zubnet AILearnWiki › Fine-tuning
Training

Fine-tuning

Taking a pre-trained model and training it further on a smaller, specific dataset to specialize its behavior. Like taking a general practitioner and putting them through a surgical residency — same foundational knowledge, new expertise.

Why it matters

Fine-tuning is how generic models become useful for specific tasks. A fine-tuned model can learn your company's tone, your domain's terminology, or a specific output format without starting from scratch.

Deep Dive

Fine-tuning works by continuing the training process on a new, typically much smaller dataset while starting from the pre-trained model's weights rather than random initialization. The mechanics are straightforward: you prepare your data as input-output pairs (or instruction-response pairs), set a low learning rate (usually 10x to 100x lower than pre-training), and train for a few epochs. The low learning rate is critical — too high and you destroy the knowledge the model learned during pre-training, a phenomenon called catastrophic forgetting. Too low and the model barely adapts to your new data. Finding the right balance is more art than science, and it often takes several runs to get right.

The Flavors

There are several distinct flavors of fine-tuning, and the terminology gets muddled. Full fine-tuning updates every parameter in the model — this is the most expressive but also the most expensive and the most prone to overfitting on small datasets. Supervised fine-tuning (SFT) refers specifically to training on labeled instruction-response pairs, which is how base models get turned into chat assistants. This is what OpenAI does when you use their fine-tuning API, and what projects like Axolotl and LLaMA-Factory make easy to do locally. Then there are parameter-efficient methods like LoRA and QLoRA, which only update a small fraction of the parameters and have largely replaced full fine-tuning for most practical use cases. The distinction matters because each approach has different data requirements, compute costs, and risks.

Data Quality Over Quantity

The quality and format of your fine-tuning dataset matters enormously — often more than its size. A few hundred high-quality, carefully constructed examples in the right format can produce better results than tens of thousands of noisy ones. The standard format for instruction tuning is a structured conversation: system message, user message, assistant response. Consistency in formatting, tone, and quality within your dataset is more important than volume. One common pitfall is training on data that contradicts what the model learned in pre-training — if your dataset says the sky is green, the model will learn to say the sky is green, but only in contexts similar to your training examples. Elsewhere, it will revert to its pre-training knowledge, creating inconsistent behavior that is hard to debug.

When to Use It

Knowing when to fine-tune versus when to use other approaches is one of the most important practical decisions in applied AI. Fine-tuning is the right tool when you need the model to consistently adopt a specific format, tone, or behavior pattern that cannot be reliably achieved through prompting alone. It is probably overkill — and possibly counterproductive — if you just need the model to know about your company's products (use RAG instead) or follow specific instructions on a per-request basis (use system prompts). A good rule of thumb: if you can write a prompt that gets the behavior you want 90% of the time, fine-tuning can push that to 99%. If your prompt only works 20% of the time, fine-tuning alone is unlikely to fix the problem — you probably need to rethink the approach entirely.

Related Concepts

← All Terms
← Evaluation Foundation Model →
ESC