Model Merging: Definition & Meaning — AI Wiki

Multiple fine-tuned models के weights को एक single model में combine करना बिना किसी additional training के। अगर model A coding में great है और model B creative writing में great है, उन्हें merge करना एक ऐसा model produce कर सकता है जो दोनों में अच्छा हो। Popular merging methods में SLERP (spherical interpolation), TIES (sign conflicts resolve करना), और DARE (merge करने से पहले randomly parameters drop करना) शामिल हैं।

यह क्यों matter करता है

Model merging open-source community का secret weapon है। ये zero compute cost लेता है (सिर्फ weight tensors पर math) और अपने components को outperform करने वाले models produce कर सकता है। Open LLM Leaderboard पर कई top models merges हैं। ये practitioners का multiple LoRA fine-tunes को एक single versatile model में combine करने का तरीक़ा भी है। Merging समझना open models के साथ काम करने वाले किसी भी person के लिए एक powerful, free capability unlock करता है।

Deep Dive

The simplest merge: linear interpolation. New_weight = α · A_weight + (1−α) · B_weight, where α controls the balance. This works surprisingly well when the models share the same base model (e.g., two different Llama fine-tunes). The merged model interpolates between the behaviors of both sources. SLERP (Spherical Linear Interpolation) interpolates along the hypersphere surface rather than linearly, often producing slightly better results.

Task Arithmetic

A more principled approach: compute "task vectors" (the difference between a fine-tuned model and the base model), then add task vectors to the base model. This lets you compose capabilities: base + coding_vector + writing_vector = a model with both skills. TIES improves on this by resolving sign conflicts between task vectors (when two tasks want to move the same weight in opposite directions). DARE improves it by randomly dropping most of the task vector entries, reducing interference.

Why It Works (and When It Doesn't)

Merging works because fine-tuning typically modifies a small subset of the model's behavior while preserving most of its general capabilities. The modifications from different fine-tunes often occupy different "regions" of parameter space with minimal conflict. It fails when fine-tunes conflict directly (two models trained to behave oppositely), when the base models are too different (can't merge a Llama with a Mistral), or when one component's modifications are so large that they dominate the merge.

Model Merging

यह क्यों matter करता है

Deep Dive

Task Arithmetic

Why It Works (and When It Doesn't)

संबंधित अवधारणाएँ