RNN: Definition & Meaning — AI Wiki

Una red neuronal que procesa secuencias manteniendo un estado oculto que se actualiza en cada paso — «recuerda» lo que ha visto hasta ahora. LSTMs y GRUs son variantes mejoradas que resuelven la tendencia del RNN original a olvidar dependencias de largo alcance. Los RNNs dominaron el NLP y el reconocimiento de voz antes de que los Transformers los reemplazaran hacia 2018–2020.

Por qué importa

Los RNNs son los ancestros de los modelos de lenguaje modernos. Entender por qué fallaron (procesamiento secuencial lento, dificultad con dependencias de largo alcance) explica por qué los Transformers tuvieron éxito (procesamiento paralelo, atención sobre todas las posiciones). La arquitectura SSM/Mamba es, en cierta forma, un retorno a la idea del RNN con arreglos modernos.

Deep Dive

An RNN processes a sequence token by token, updating its hidden state at each step: h_t = f(h_{t-1}, x_t). The hidden state is a compressed representation of everything seen so far. The problem: as sequences get longer, the hidden state must compress more and more information into a fixed-size vector, and gradient signals for early tokens vanish during backpropagation (the "vanishing gradient problem").

LSTM and GRU

Long Short-Term Memory (LSTM, 1997) and Gated Recurrent Units (GRU, 2014) addressed vanishing gradients by introducing gates — learned mechanisms that control what information to keep, update, or forget. LSTMs have a separate cell state that can carry information unchanged across many steps, with gates controlling access. GRUs simplify LSTMs by merging the cell and hidden states while maintaining similar performance.

Why Transformers Won

RNNs process tokens sequentially — token 5 can't be processed until tokens 1–4 are done. This makes them inherently slow on parallel hardware (GPUs). Transformers process all tokens simultaneously using attention, making them dramatically faster to train. Attention also directly connects every token to every other token, solving the long-range dependency problem without relying on a compressed hidden state. The trade-off: Transformers use quadratic memory in sequence length, while RNNs use constant memory. This is why SSMs (Mamba) are interesting — they offer RNN-like efficiency with Transformer-like performance.

RNN

Por qué importa

Deep Dive

LSTM and GRU

Why Transformers Won

Conceptos relacionados