Zubnet AIसीखेंWiki › Backpropagation
मूल सिद्धांत

Backpropagation

Backprop, Backward Pass
वो algorithm जो compute करता है कि एक neural network में हर parameter ने error में कितना contribute किया, जिससे gradient descent parameters को efficiently update कर सके। Backpropagation calculus के chain rule को network के through reverse में apply करता है: output पर loss से start करके, हर layer के through gradients को backward propagate करता है ताकि हर weight का blame का हिस्सा determine कर सके।

यह क्यों matter करता है

Backpropagation वो algorithm है जो neural network training possible बनाता है। Billions of parameters के लिए gradients efficiently compute करने के तरीके के बिना, gradient descent computationally infeasible होता। हर वो model जो आप use करते हैं — एक small classifier से लेकर एक 400B LLM तक — backpropagation use करके trained था। ये deep learning का सबसे important algorithm है।

Deep Dive

The forward pass: input flows through the network, each layer applies its transformation, and the final layer produces a prediction. The loss function computes how wrong the prediction is. The backward pass: starting from the loss, backpropagation computes ∂loss/∂weight for every weight in the network using the chain rule: ∂loss/∂w = ∂loss/∂output · ∂output/∂hidden · ∂hidden/∂w. Each layer receives the gradient from the layer above and passes its own gradient to the layer below.

Computational Efficiency

Naively computing the gradient for each weight independently would require a separate forward pass per weight — impossibly expensive for billions of parameters. Backpropagation reuses intermediate results: the gradient at each layer is computed once and shared with all weights in that layer. The backward pass costs roughly 2x the forward pass in compute, meaning the total cost of one training step (forward + backward + update) is about 3x a single forward pass.

Automatic Differentiation

Modern deep learning frameworks (PyTorch, JAX) implement backpropagation through automatic differentiation (autograd). You define the forward computation, and the framework automatically constructs the backward computation graph and computes gradients. This means you never manually derive gradients — you define the model architecture and loss, call loss.backward(), and the framework handles the rest. This automation is what makes rapid architecture experimentation practical.

संबंधित अवधारणाएँ

← सभी Terms
← Autoregressive Batch Size & Epoch →