Model Merging: Definition & Meaning — AI Wiki

Combinar los pesos de múltiples modelos fine-tuned en un solo modelo sin ningún entrenamiento adicional. Si el modelo A es genial en código y el modelo B es genial en escritura creativa, fusionarlos puede producir un modelo que es bueno en ambos. Métodos populares de fusión incluyen SLERP (interpolación esférica), TIES (resolviendo conflictos de signo) y DARE (dropeo aleatorio de parámetros antes de fusionar).

Por qué importa

El model merging es el arma secreta de la comunidad open-source. Cuesta cero compute (solo matemáticas en tensors de pesos) y puede producir modelos que superan sus componentes. Muchos modelos top en el Open LLM Leaderboard son fusiones. También es cómo los practicantes combinan múltiples fine-tunes LoRA en un solo modelo versátil. Entender la fusión desbloquea una capacidad poderosa y gratuita para cualquiera trabajando con modelos abiertos.

Deep Dive

The simplest merge: linear interpolation. New_weight = α · A_weight + (1−α) · B_weight, where α controls the balance. This works surprisingly well when the models share the same base model (e.g., two different Llama fine-tunes). The merged model interpolates between the behaviors of both sources. SLERP (Spherical Linear Interpolation) interpolates along the hypersphere surface rather than linearly, often producing slightly better results.

Task Arithmetic

A more principled approach: compute "task vectors" (the difference between a fine-tuned model and the base model), then add task vectors to the base model. This lets you compose capabilities: base + coding_vector + writing_vector = a model with both skills. TIES improves on this by resolving sign conflicts between task vectors (when two tasks want to move the same weight in opposite directions). DARE improves it by randomly dropping most of the task vector entries, reducing interference.

Why It Works (and When It Doesn't)

Merging works because fine-tuning typically modifies a small subset of the model's behavior while preserving most of its general capabilities. The modifications from different fine-tunes often occupy different "regions" of parameter space with minimal conflict. It fails when fine-tunes conflict directly (two models trained to behave oppositely), when the base models are too different (can't merge a Llama with a Mistral), or when one component's modifications are so large that they dominate the merge.

Model Merging

Por qué importa

Deep Dive

Task Arithmetic

Why It Works (and When It Doesn't)

Conceptos relacionados