Zubnet AIसीखेंWiki › Catastrophic Forgetting
Training

Catastrophic Forgetting

Catastrophic Interference
जब एक neural network जो एक नई task पर trained है, वो पहले से सीखी गई tasks perform करने की ability खो देता है। एक model को customer support data पर fine-tune करना उसे support पर great लेकिन coding पर terrible बना सकता है। New learning उन weights को overwrite कर देता है जो old capabilities को encode करते थे, उन्हें “forget” करते हुए।

यह क्यों matter करता है

Catastrophic forgetting fine-tuning और continual learning की central challenge है। यही वजह है कि आप एक model को task after task fine-tune करते नहीं जा सकते और उम्मीद कर सकते हैं कि वो सब कुछ अच्छा करे। यही वजह है कि LoRA (जो सिर्फ parameters का एक small subset modify करता है) और careful learning rate selection जैसी techniques base model capabilities preserve करने के लिए critical हैं।

Deep Dive

The root cause is weight sharing: the same parameters encode multiple capabilities, and updating them for a new task disrupts the existing encodings. In a large neural network, knowledge isn't stored in dedicated neurons — it's distributed across the weights in complex, overlapping patterns (superposition). Modifying those weights for new knowledge inevitably disturbs old knowledge.

Mitigation Strategies

Several techniques reduce forgetting. Low learning rates during fine-tuning minimize weight changes. LoRA adds new trainable parameters while keeping the original weights frozen. Elastic Weight Consolidation (EWC) identifies which weights are important for old tasks and penalizes changes to them. Replay methods mix old task data into new task training. None fully solve the problem — there's always a trade-off between plasticity (learning new things) and stability (retaining old things).

The Continual सीखेंing Dream

Continual learning (also called lifelong learning) is the research goal of building models that can keep learning from new data without forgetting old capabilities — the way humans do. Current LLMs sidestep this by training once on a massive dataset and then fine-tuning carefully. True continual learning remains an open problem and would be transformative: imagine a model that keeps learning from every conversation without degrading.

संबंधित अवधारणाएँ

← सभी Terms
← Cartesia Cerebras →