LSTM: Definition & Meaning — AI Wiki

一种递归神经网络(RNN)的类型,设计用来学习序列数据中的长距离依赖。LSTM 引入了一个“cell state” — 一条能跨越多个时间步不改变地携带信息的记忆高速公路 — 由三个 gate 控制:input gate(要加什么)、forget gate(要去掉什么)、output gate(要暴露什么)。1997 年发明,LSTM 在 Transformer 出现之前主导着序列建模。

为什么重要

LSTM 在 NLP 统治了十年(2010s):机器翻译、语音识别、文本生成、情感分析全都跑在 LSTM 上。理解 LSTM 能帮你理解为什么 Transformer 取代了它(并行和长距离 attention vs. 顺序处理和压缩状态),以及为什么像 Mamba 这样的 SSM 有意思(它们重新审视 gated state 的想法,加上现代改进)。

Deep Dive

LSTM's three gates are all small neural networks that output values between 0 (completely block) and 1 (completely pass through). The forget gate decides which cell state information to discard. The input gate decides which new information to add. The output gate decides which cell state information to expose as the hidden state. This gating mechanism lets the network learn what to remember and what to forget over long sequences — something vanilla RNNs couldn't do.

Why LSTMs Were Revolutionary

Before LSTM, RNNs suffered from vanishing gradients: information from early in a sequence couldn't influence processing of later parts because gradients decayed exponentially through time. LSTM's cell state acts as a gradient highway — it can carry gradients unchanged through hundreds of steps. This is what enabled sequence-to-sequence learning: machine translation (encode source sentence, decode target sentence), text summarization, and question answering all became practical with LSTMs.

LSTM to Transformer to SSM

LSTMs process tokens sequentially (can't parallelize during training) and compress all history into a fixed-size hidden state (information bottleneck). Transformers solve both: parallel training and direct attention to any token. But Transformers trade these gains for quadratic memory cost in sequence length. SSMs like Mamba revisit the gated-state idea: they maintain a compressed state (like LSTM) but make the gates input-dependent (selective) and hardware-efficient, getting LSTM's constant-memory advantage with Transformer-level quality.

LSTM

为什么重要

Deep Dive

Why LSTMs Were Revolutionary

LSTM to Transformer to SSM

相关概念