Knowledge Editing: Definition & Meaning — AI Wiki

修改訓練好模型中特定事實而不重訓它的技術。如果一個模型在新選舉後錯誤聲稱「法國總統是馬克龍」,知識編輯可以透過修改有針對的權重來更新這個特定事實,不影響模型的其他知識或能力。目標是手術級精度:改一個事實,其他一切不變。

為什麼重要

知識編輯解決一個實際問題:模型會過時,重訓很貴。如果你能便宜地更新特定事實,模型就能在主要訓練執行之間保持最新。它也有安全含義:你能編輯掉危險知識嗎?領域有前景但不成熟 — 編輯常常對相關知識有意想不到的副作用。

Deep Dive

The dominant approach (ROME/MEMIT): identify which feedforward network weights encode a specific fact by tracing the causal effect of neurons on the model's prediction, then modify those weights to change the stored association. For example, to update "The Eiffel Tower is in Paris" to "The Eiffel Tower is in London," you find the weights that map "Eiffel Tower" → "Paris" in the FFN layers and redirect them to "London."

The Ripple Effect Problem

Editing "The Eiffel Tower is in London" should also change answers to "What country is the Eiffel Tower in?" (UK, not France) and "What landmarks are in Paris?" (no longer the Eiffel Tower). Current editing methods often fail at this: they change the direct fact but leave related inferences inconsistent. This "ripple effect" problem suggests that knowledge in LLMs is more interconnected than the surgical editing metaphor implies.

Scaling Challenges

A few edits work reasonably well. Hundreds of edits start to degrade model quality — the edited weights accumulate changes that interfere with each other and with unedited knowledge. This limits knowledge editing's practical use: it's fine for a few corrections but can't serve as a general model update mechanism. For staying current, RAG (providing updated information at inference time) remains more practical than editing the model's weights.

Knowledge Editing

為什麼重要

Deep Dive

The Ripple Effect Problem

Scaling Challenges

相關概念