Model Merging: Definition & Meaning — AI Wiki

把多個 fine-tune 過的模型權重合併成一個模型,不需要額外訓練。如果模型 A 擅長編碼,模型 B 擅長創意寫作,合併它們能產出一個兩者都擅長的模型。流行的合併方法包括 SLERP(球面插值)、TIES(解決符號衝突)、DARE(合併前隨機丟棄參數)。

為什麼重要

Model merging 是開源社群的秘密武器。零算力成本(只是權重張量上的數學),能產出超越其元件的模型。Open LLM Leaderboard 上許多頂級模型是合併。它也是從業者把多個 LoRA fine-tune 合併成一個多功能模型的方法。理解合併為任何和開源模型工作的人解鎖一個強大、免費的能力。

Deep Dive

The simplest merge: linear interpolation. New_weight = α · A_weight + (1−α) · B_weight, where α controls the balance. This works surprisingly well when the models share the same base model (e.g., two different Llama fine-tunes). The merged model interpolates between the behaviors of both sources. SLERP (Spherical Linear Interpolation) interpolates along the hypersphere surface rather than linearly, often producing slightly better results.

Task Arithmetic

A more principled approach: compute "task vectors" (the difference between a fine-tuned model and the base model), then add task vectors to the base model. This lets you compose capabilities: base + coding_vector + writing_vector = a model with both skills. TIES improves on this by resolving sign conflicts between task vectors (when two tasks want to move the same weight in opposite directions). DARE improves it by randomly dropping most of the task vector entries, reducing interference.

Why It Works (and When It Doesn't)

Merging works because fine-tuning typically modifies a small subset of the model's behavior while preserving most of its general capabilities. The modifications from different fine-tunes often occupy different "regions" of parameter space with minimal conflict. It fails when fine-tunes conflict directly (two models trained to behave oppositely), when the base models are too different (can't merge a Llama with a Mistral), or when one component's modifications are so large that they dominate the merge.

Model Merging

為什麼重要

Deep Dive

Task Arithmetic

Why It Works (and When It Doesn't)

相關概念