Layer: Definition & Meaning — AI Wiki

神经网络中在特定抽象层级处理数据的一组神经元。输入层接收原始数据。隐藏层(中间的那些)学习越来越抽象的表示。输出层产生最终结果。“深度”学习意思是很多隐藏层 — 现代 LLM 有 32 到 128+ 层。

为什么重要

层创造了让深度学习强大的层级。早期层学习简单模式(图像中的边缘、文本中的词片段)。中间层把这些组合成概念(脸、短语)。深层把概念组合成高层理解(场景识别、推理)。一个网络的深度决定了它能学到的模式的复杂度。

Deep Dive

In a Transformer, each layer (called a "block") consists of two sub-layers: a multi-head attention layer (which mixes information across tokens) and a feedforward network (which processes each token independently). Each sub-layer has a residual connection (the input is added back to the output) and normalization. A 32-layer Transformer applies this attention+FFN pattern 32 times, each time refining the representation.

What Happens Across Layers

Research has revealed a rough pattern in LLMs: early layers handle syntax and surface patterns, middle layers handle semantic meaning and entity recognition, and late layers handle task-specific reasoning and output formatting. This isn't a hard boundary — information flows through all layers via residual connections — but it explains why some fine-tuning techniques only modify certain layers and why pruning middle layers often hurts more than pruning early or late ones.

Width vs. Depth

A network's "width" is the number of neurons per layer (the model dimension). Its "depth" is the number of layers. Both matter, but they contribute differently: wider layers can represent more features simultaneously, while deeper networks can learn more complex, compositional patterns. Modern LLMs tend to be both wide (dimensions of 4096–8192) and deep (32–128 layers). Scaling laws suggest that width and depth should be scaled together for optimal performance.

Layer

为什么重要

Deep Dive

What Happens Across Layers

Width vs. Depth

相关概念