Zubnet AI学习Wiki › Fine-tuning
Training

Fine-tuning

拿一个预训练好的模型,在一个更小、更具体的数据集上继续训练,让它的行为走向专业化。就像把一个全科医生送去做外科住院医师 — 底层知识不变,新的专业能力长出来。

为什么重要

Fine-tuning 是通用模型变成专用工具的方式。一个 fine-tune 过的模型能学会你公司的语气、你领域的术语,或者一个特定的输出格式 — 不用从零开始训练。

Deep Dive

Fine-tuning works by continuing the training process on a new, typically much smaller dataset while starting from the pre-trained model's weights rather than random initialization. The mechanics are straightforward: you prepare your data as input-output pairs (or instruction-response pairs), set a low learning rate (usually 10x to 100x lower than pre-training), and train for a few epochs. The low learning rate is critical — too high and you destroy the knowledge the model learned during pre-training, a phenomenon called catastrophic forgetting. Too low and the model barely adapts to your new data. Finding the right balance is more art than science, and it often takes several runs to get right.

The Flavors

There are several distinct flavors of fine-tuning, and the terminology gets muddled. Full fine-tuning updates every parameter in the model — this is the most expressive but also the most expensive and the most prone to overfitting on small datasets. Supervised fine-tuning (SFT) refers specifically to training on labeled instruction-response pairs, which is how base models get turned into chat assistants. This is what OpenAI does when you use their fine-tuning API, and what projects like Axolotl and LLaMA-Factory make easy to do locally. Then there are parameter-efficient methods like LoRA and QLoRA, which only update a small fraction of the parameters and have largely replaced full fine-tuning for most practical use cases. The distinction matters because each approach has different data requirements, compute costs, and risks.

Data Quality Over Quantity

The quality and format of your fine-tuning dataset matters enormously — often more than its size. A few hundred high-quality, carefully constructed examples in the right format can produce better results than tens of thousands of noisy ones. The standard format for instruction tuning is a structured conversation: system message, user message, assistant response. Consistency in formatting, tone, and quality within your dataset is more important than volume. One common pitfall is training on data that contradicts what the model learned in pre-training — if your dataset says the sky is green, the model will learn to say the sky is green, but only in contexts similar to your training examples. Elsewhere, it will revert to its pre-training knowledge, creating inconsistent behavior that is hard to debug.

When to Use It

Knowing when to fine-tune versus when to use other approaches is one of the most important practical decisions in applied AI. Fine-tuning is the right tool when you need the model to consistently adopt a specific format, tone, or behavior pattern that cannot be reliably achieved through prompting alone. It is probably overkill — and possibly counterproductive — if you just need the model to know about your company's products (use RAG instead) or follow specific instructions on a per-request basis (use system prompts). A good rule of thumb: if you can write a prompt that gets the behavior you want 90% of the time, fine-tuning can push that to 99%. If your prompt only works 20% of the time, fine-tuning alone is unlikely to fix the problem — you probably need to rethink the approach entirely.

相关概念

← 所有术语
← Few-Shot 学习ing Flash Attention →
ESC