LSTM: Definition & Meaning — AI Wiki

एक recurrent neural network (RNN) का type जो sequential data में long-range dependencies सीखने के लिए design है। LSTM एक “cell state” introduce करता है — एक memory highway जो many time steps के across information को बिना बदले carry कर सकती है — जो तीन gates से control होती है: एक input gate (क्या add करना है), एक forget gate (क्या remove करना है), और एक output gate (क्या expose करना है)। 1997 में invent हुआ, LSTM Transformers के emerge होने तक sequence modeling में हावी रहा।

यह क्यों matter करता है

LSTM एक दशक (2010s) तक NLP की backbone था: machine translation, speech recognition, text generation, और sentiment analysis सब LSTMs पर चलते थे। LSTM समझना आपको ये समझने में help करता है कि Transformers ने इसे क्यों replace किया (parallelism और long-range attention vs. sequential processing और compressed state) और Mamba जैसे SSMs interesting क्यों हैं (वो gated-state idea को modern improvements के साथ revisit करते हैं)।

Deep Dive

LSTM's three gates are all small neural networks that output values between 0 (completely block) and 1 (completely pass through). The forget gate decides which cell state information to discard. The input gate decides which new information to add. The output gate decides which cell state information to expose as the hidden state. This gating mechanism lets the network learn what to remember and what to forget over long sequences — something vanilla RNNs couldn't do.

Why LSTMs Were Revolutionary

Before LSTM, RNNs suffered from vanishing gradients: information from early in a sequence couldn't influence processing of later parts because gradients decayed exponentially through time. LSTM's cell state acts as a gradient highway — it can carry gradients unchanged through hundreds of steps. This is what enabled sequence-to-sequence learning: machine translation (encode source sentence, decode target sentence), text summarization, and question answering all became practical with LSTMs.

LSTM to Transformer to SSM

LSTMs process tokens sequentially (can't parallelize during training) and compress all history into a fixed-size hidden state (information bottleneck). Transformers solve both: parallel training and direct attention to any token. But Transformers trade these gains for quadratic memory cost in sequence length. SSMs like Mamba revisit the gated-state idea: they maintain a compressed state (like LSTM) but make the gates input-dependent (selective) and hardware-efficient, getting LSTM's constant-memory advantage with Transformer-level quality.

LSTM

यह क्यों matter करता है

Deep Dive

Why LSTMs Were Revolutionary

LSTM to Transformer to SSM

संबंधित अवधारणाएँ