Model Merging: Definition & Meaning — AI Wiki

把多个 fine-tune 过的模型权重合并成一个模型,不需要额外训练。如果模型 A 擅长编码,模型 B 擅长创意写作,合并它们能产出一个两者都擅长的模型。流行的合并方法包括 SLERP(球面插值)、TIES(解决符号冲突)、DARE(合并前随机丢弃参数)。

为什么重要

Model merging 是开源社区的秘密武器。零算力成本(只是权重张量上的数学),能产出超越其组件的模型。Open LLM Leaderboard 上许多顶级模型是合并。它也是从业者把多个 LoRA fine-tune 合并成一个多功能模型的方法。理解合并为任何和开源模型工作的人解锁一个强大、免费的能力。

Deep Dive

The simplest merge: linear interpolation. New_weight = α · A_weight + (1−α) · B_weight, where α controls the balance. This works surprisingly well when the models share the same base model (e.g., two different Llama fine-tunes). The merged model interpolates between the behaviors of both sources. SLERP (Spherical Linear Interpolation) interpolates along the hypersphere surface rather than linearly, often producing slightly better results.

Task Arithmetic

A more principled approach: compute "task vectors" (the difference between a fine-tuned model and the base model), then add task vectors to the base model. This lets you compose capabilities: base + coding_vector + writing_vector = a model with both skills. TIES improves on this by resolving sign conflicts between task vectors (when two tasks want to move the same weight in opposite directions). DARE improves it by randomly dropping most of the task vector entries, reducing interference.

Why It Works (and When It Doesn't)

Merging works because fine-tuning typically modifies a small subset of the model's behavior while preserving most of its general capabilities. The modifications from different fine-tunes often occupy different "regions" of parameter space with minimal conflict. It fails when fine-tunes conflict directly (two models trained to behave oppositely), when the base models are too different (can't merge a Llama with a Mistral), or when one component's modifications are so large that they dominate the merge.

Model Merging

为什么重要

Deep Dive

Task Arithmetic

Why It Works (and When It Doesn't)

相关概念