Knowledge Editing: Definition & Meaning — AI Wiki

修改训练好模型中特定事实而不重训它的技术。如果一个模型在新选举后错误声称“法国总统是马克龙”,知识编辑可以通过修改有针对的权重来更新这个特定事实,不影响模型的其他知识或能力。目标是手术级精度:改一个事实,其他一切不变。

为什么重要

知识编辑解决一个实际问题:模型会过时,重训很贵。如果你能便宜地更新特定事实,模型就能在主要训练运行之间保持最新。它也有安全含义:你能编辑掉危险知识吗?领域有前景但不成熟 — 编辑常常对相关知识有意想不到的副作用。

Deep Dive

The dominant approach (ROME/MEMIT): identify which feedforward network weights encode a specific fact by tracing the causal effect of neurons on the model's prediction, then modify those weights to change the stored association. For example, to update "The Eiffel Tower is in Paris" to "The Eiffel Tower is in London," you find the weights that map "Eiffel Tower" → "Paris" in the FFN layers and redirect them to "London."

The Ripple Effect Problem

Editing "The Eiffel Tower is in London" should also change answers to "What country is the Eiffel Tower in?" (UK, not France) and "What landmarks are in Paris?" (no longer the Eiffel Tower). Current editing methods often fail at this: they change the direct fact but leave related inferences inconsistent. This "ripple effect" problem suggests that knowledge in LLMs is more interconnected than the surgical editing metaphor implies.

Scaling Challenges

A few edits work reasonably well. Hundreds of edits start to degrade model quality — the edited weights accumulate changes that interfere with each other and with unedited knowledge. This limits knowledge editing's practical use: it's fine for a few corrections but can't serve as a general model update mechanism. For staying current, RAG (providing updated information at inference time) remains more practical than editing the model's weights.

Knowledge Editing

为什么重要

Deep Dive

The Ripple Effect Problem

Scaling Challenges

相关概念