Catastrophic Forgetting: Definition & Meaning — AI Wiki

當一個神經網路在新任務上訓練後,失去執行先前學到任務的能力。在客服資料上 fine-tune 一個模型可能讓它在支援上棒極了但編碼上糟糕。新的學習覆蓋了編碼舊能力的權重,「忘記」了它們。

為什麼重要

災難性遺忘是 fine-tuning 和持續學習的核心挑戰。這就是為什麼你不能一個接一個 fine-tune 任務,指望它什麼都做得好。這也是為什麼 LoRA(只修改參數的一個小子集)和仔細的學習率選擇對保留基礎模型能力至關重要。

Deep Dive

The root cause is weight sharing: the same parameters encode multiple capabilities, and updating them for a new task disrupts the existing encodings. In a large neural network, knowledge isn't stored in dedicated neurons — it's distributed across the weights in complex, overlapping patterns (superposition). Modifying those weights for new knowledge inevitably disturbs old knowledge.

Mitigation Strategies

Several techniques reduce forgetting. Low learning rates during fine-tuning minimize weight changes. LoRA adds new trainable parameters while keeping the original weights frozen. Elastic Weight Consolidation (EWC) identifies which weights are important for old tasks and penalizes changes to them. Replay methods mix old task data into new task training. None fully solve the problem — there's always a trade-off between plasticity (learning new things) and stability (retaining old things).

The Continual 學習ing Dream

Continual learning (also called lifelong learning) is the research goal of building models that can keep learning from new data without forgetting old capabilities — the way humans do. Current LLMs sidestep this by training once on a massive dataset and then fine-tuning carefully. True continual learning remains an open problem and would be transformative: imagine a model that keeps learning from every conversation without degrading.

Catastrophic Forgetting

為什麼重要

Deep Dive

Mitigation Strategies

The Continual 學習ing Dream

相關概念