Zubnet AIApprendreWiki › Model Merging
Training

Model Merging

TIES, DARE, SLERP, Frankenmerge
Combiner les poids de multiples modèles fine-tunés en un seul modèle sans aucun entraînement additionnel. Si le modèle A est super au code et le modèle B est super à l'écriture créative, les merger peut produire un modèle qui est bon aux deux. Les méthodes de merging populaires incluent SLERP (interpolation sphérique), TIES (résolution des conflits de signes) et DARE (drop aléatoire de paramètres avant le merging).

Pourquoi c'est important

Le model merging est l'arme secrète de la communauté open-source. Ça coûte zéro compute (juste des maths sur des tensors de poids) et peut produire des modèles qui surpassent leurs composants. Beaucoup des top modèles sur l'Open LLM Leaderboard sont des merges. C'est aussi comment les praticiens combinent multiples fine-tunes LoRA en un seul modèle versatile. Comprendre le merging débloque une capacité puissante et gratuite pour quiconque travaille avec des modèles ouverts.

Deep Dive

The simplest merge: linear interpolation. New_weight = α · A_weight + (1−α) · B_weight, where α controls the balance. This works surprisingly well when the models share the same base model (e.g., two different Llama fine-tunes). The merged model interpolates between the behaviors of both sources. SLERP (Spherical Linear Interpolation) interpolates along the hypersphere surface rather than linearly, often producing slightly better results.

Task Arithmetic

A more principled approach: compute "task vectors" (the difference between a fine-tuned model and the base model), then add task vectors to the base model. This lets you compose capabilities: base + coding_vector + writing_vector = a model with both skills. TIES improves on this by resolving sign conflicts between task vectors (when two tasks want to move the same weight in opposite directions). DARE improves it by randomly dropping most of the task vector entries, reducing interference.

Why It Works (and When It Doesn't)

Merging works because fine-tuning typically modifies a small subset of the model's behavior while preserving most of its general capabilities. The modifications from different fine-tunes often occupy different "regions" of parameter space with minimal conflict. It fails when fine-tunes conflict directly (two models trained to behave oppositely), when the base models are too different (can't merge a Llama with a Mistral), or when one component's modifications are so large that they dominate the merge.

Concepts liés

← Tous les termes
← Model Collapse Model Serving →