Sigmoid: Definition & Meaning — AI Wiki

Una función matemática que comprime cualquier número real al rango (0, 1): σ(x) = 1 / (1 + e^(−x)). Históricamente la función de activación por defecto en redes neuronales, ahora ampliamente reemplazada por ReLU y GELU para capas ocultas, pero aún usada para salidas de clasificación binaria, mecanismos de gating (en LSTMs y GLU) y operaciones tipo atención donde necesitas valores entre 0 y 1.

Por qué importa

Sigmoid aparece en todas partes en IA aunque ya no sea la activación oculta por defecto. Las gates de LSTM usan sigmoid. La activación SiLU/Swish es x · sigmoid(x). Los clasificadores binarios usan sigmoid como activación de salida. Entender sigmoid — y por qué fue reemplazada por ReLU para capas ocultas — es conocimiento fundamental para entender decisiones de diseño en redes neuronales.

Deep Dive

Sigmoid's shape: it's an S-curve centered at 0. For large positive inputs, it saturates near 1. For large negative inputs, it saturates near 0. Around 0, it transitions smoothly. This shape made it a natural choice for early neural networks: it mimics a biological neuron's firing rate (off to on) and naturally produces bounded outputs.

Why It Was Replaced

Sigmoid has two problems for deep networks. First, vanishing gradients: in the saturated regions (very positive or very negative inputs), the gradient is near zero, meaning learning effectively stops for those neurons. Second, non-zero-centered outputs: sigmoid always outputs positive values, which causes gradients to be either all positive or all negative, slowing convergence. ReLU solves both: it has a constant gradient of 1 for positive inputs and is zero-centered (for positive inputs).

Where Sigmoid Survives

Sigmoid remains the right choice when you specifically need a (0, 1) output: binary classification (probability of the positive class), gating (how much to let through, as in LSTMs), and any operation where you need a smooth, bounded activation. The SiLU activation function (x · sigmoid(x)) brings sigmoid back into modern architectures in a gating role, combining sigmoid's smoothness with the identity function's gradient properties.

Sigmoid

Por qué importa

Deep Dive

Why It Was Replaced

Where Sigmoid Survives

Conceptos relacionados