Model Merging: Definition & Meaning — AI Wiki

Combinar os pesos de múltiplos modelos fine-tuned em um único modelo sem qualquer treinamento adicional. Se o modelo A é ótimo em codificação e o modelo B é ótimo em escrita criativa, fundi-los pode produzir um modelo que é bom em ambos. Métodos populares de fusão incluem SLERP (interpolação esférica), TIES (resolvendo conflitos de sinal) e DARE (drop aleatório de parâmetros antes de fundir).

Por que importa

Model merging é a arma secreta da comunidade open-source. Custa zero compute (só matemática em tensors de pesos) e pode produzir modelos que superam seus componentes. Muitos modelos top no Open LLM Leaderboard são fusões. Também é como praticantes combinam múltiplos fine-tunes LoRA em um único modelo versátil. Entender fusão desbloqueia uma capacidade poderosa e grátis para qualquer um trabalhando com modelos abertos.

Deep Dive

The simplest merge: linear interpolation. New_weight = α · A_weight + (1−α) · B_weight, where α controls the balance. This works surprisingly well when the models share the same base model (e.g., two different Llama fine-tunes). The merged model interpolates between the behaviors of both sources. SLERP (Spherical Linear Interpolation) interpolates along the hypersphere surface rather than linearly, often producing slightly better results.

Task Arithmetic

A more principled approach: compute "task vectors" (the difference between a fine-tuned model and the base model), then add task vectors to the base model. This lets you compose capabilities: base + coding_vector + writing_vector = a model with both skills. TIES improves on this by resolving sign conflicts between task vectors (when two tasks want to move the same weight in opposite directions). DARE improves it by randomly dropping most of the task vector entries, reducing interference.

Why It Works (and When It Doesn't)

Merging works because fine-tuning typically modifies a small subset of the model's behavior while preserving most of its general capabilities. The modifications from different fine-tunes often occupy different "regions" of parameter space with minimal conflict. It fails when fine-tunes conflict directly (two models trained to behave oppositely), when the base models are too different (can't merge a Llama with a Mistral), or when one component's modifications are so large that they dominate the merge.

Model Merging

Por que importa

Deep Dive

Task Arithmetic

Why It Works (and When It Doesn't)

Conceitos relacionados