LSTM: Definition & Meaning — AI Wiki

一種遞迴神經網路(RNN)的類型,設計用來學習序列資料中的長距離依賴。LSTM 引入了一個「cell state」 — 一條能跨越多個時間步不改變地攜帶資訊的記憶高速公路 — 由三個 gate 控制:input gate(要加什麼)、forget gate(要去掉什麼)、output gate(要暴露什麼)。1997 年發明,LSTM 在 Transformer 出現之前主導著序列建模。

為什麼重要

LSTM 在 NLP 統治了十年(2010s):機器翻譯、語音辨識、文字生成、情感分析全都跑在 LSTM 上。理解 LSTM 能幫你理解為什麼 Transformer 取代了它(平行和長距離 attention vs. 循序處理和壓縮狀態),以及為什麼像 Mamba 這樣的 SSM 有意思(它們重新審視 gated state 的想法,加上現代改進)。

Deep Dive

LSTM's three gates are all small neural networks that output values between 0 (completely block) and 1 (completely pass through). The forget gate decides which cell state information to discard. The input gate decides which new information to add. The output gate decides which cell state information to expose as the hidden state. This gating mechanism lets the network learn what to remember and what to forget over long sequences — something vanilla RNNs couldn't do.

Why LSTMs Were Revolutionary

Before LSTM, RNNs suffered from vanishing gradients: information from early in a sequence couldn't influence processing of later parts because gradients decayed exponentially through time. LSTM's cell state acts as a gradient highway — it can carry gradients unchanged through hundreds of steps. This is what enabled sequence-to-sequence learning: machine translation (encode source sentence, decode target sentence), text summarization, and question answering all became practical with LSTMs.

LSTM to Transformer to SSM

LSTMs process tokens sequentially (can't parallelize during training) and compress all history into a fixed-size hidden state (information bottleneck). Transformers solve both: parallel training and direct attention to any token. But Transformers trade these gains for quadratic memory cost in sequence length. SSMs like Mamba revisit the gated-state idea: they maintain a compressed state (like LSTM) but make the gates input-dependent (selective) and hardware-efficient, getting LSTM's constant-memory advantage with Transformer-level quality.

LSTM

為什麼重要

Deep Dive

Why LSTMs Were Revolutionary

LSTM to Transformer to SSM

相關概念