Backpropagation: Definition & Meaning — AI Wiki

計算神經網路中每個參數對誤差貢獻了多少的演算法,使梯度下降能高效更新參數。反向傳播把微積分的鏈式法則反過來應用到網路上:從輸出端的損失開始,向後傳播梯度穿過每一層,以確定每個權重應當承擔的「責任」。

為什麼重要

反向傳播是讓神經網路訓練成為可能的演算法。沒有一種高效計算數十億參數梯度的辦法,梯度下降在運算上就不可行。你使用的每個模型 — 從一個小分類器到 400B 的 LLM — 都是用反向傳播訓練的。它是深度學習中最重要的一個演算法。

Deep Dive

The forward pass: input flows through the network, each layer applies its transformation, and the final layer produces a prediction. The loss function computes how wrong the prediction is. The backward pass: starting from the loss, backpropagation computes ∂loss/∂weight for every weight in the network using the chain rule: ∂loss/∂w = ∂loss/∂output · ∂output/∂hidden · ∂hidden/∂w. Each layer receives the gradient from the layer above and passes its own gradient to the layer below.

Computational Efficiency

Naively computing the gradient for each weight independently would require a separate forward pass per weight — impossibly expensive for billions of parameters. Backpropagation reuses intermediate results: the gradient at each layer is computed once and shared with all weights in that layer. The backward pass costs roughly 2x the forward pass in compute, meaning the total cost of one training step (forward + backward + update) is about 3x a single forward pass.

Automatic Differentiation

Modern deep learning frameworks (PyTorch, JAX) implement backpropagation through automatic differentiation (autograd). You define the forward computation, and the framework automatically constructs the backward computation graph and computes gradients. This means you never manually derive gradients — you define the model architecture and loss, call loss.backward(), and the framework handles the rest. This automation is what makes rapid architecture experimentation practical.

Backpropagation

為什麼重要

Deep Dive

Computational Efficiency

Automatic Differentiation

相關概念