Zubnet AIसीखेंWiki › Induction Head
मूल सिद्धांत

Induction Head

Transformers में discovered एक specific two-attention-head circuit जो pattern matching द्वारा in-context learning implement करता है। अगर model ने context में पहले “A B” pattern देखा है और अब फिर से “A” देखता है, induction head predict करता है कि “B” follow करेगा। ये simple mechanism believed है कि LLMs अपने context में examples से कैसे सीखते हैं उसका एक fundamental building block है।

यह क्यों matter करता है

Induction heads mechanistic interpretability में सबसे best-understood circuit हैं — इसका एक concrete example कि Transformers learned weights से एक useful algorithm कैसे implement करते हैं। वो explain करते हैं कि few-shot prompting क्यों work करती है: जब आप examples देते हैं, induction heads pattern detect करते हैं और उसे apply करते हैं। Induction heads समझना ज़्यादा complex learned behaviors को समझने के लिए foundation provide करता है।

Deep Dive

The circuit uses two heads across two layers. The first head (a "previous token head" in an earlier layer) copies information about which token preceded the current one. The second head (the actual "induction head" in a later layer) uses this information to complete patterns: if token B was preceded by A earlier in the context, and A appears again, the induction head boosts the prediction of B. This is a simple but powerful form of in-context learning.

Discovery and Verification

Olsson et al. (2022, Anthropic) identified induction heads through careful analysis of attention patterns in Transformers of various sizes. They observed a phase change during training: induction heads form suddenly, and their formation coincides with a dramatic improvement in the model's ability to do in-context learning. This suggests that induction heads are not just one of many circuits but a foundational capability that enables higher-level in-context learning.

Beyond Simple Patterns

Real-world in-context learning is more complex than "A B ... A → B." Models learn to generalize patterns: "capital of France is Paris, capital of Germany is Berlin, capital of Japan is..." requires understanding the abstract pattern, not just copying. Research suggests that more complex induction-like circuits build on the basic induction head mechanism, composing it with other circuits to handle abstraction and generalization.

संबंधित अवधारणाएँ

← सभी Terms
← Image Generation Inference →