Sigmoid: Definition & Meaning — AI Wiki

Uma função matemática que espreme qualquer número real para o intervalo (0, 1): σ(x) = 1 / (1 + e^(−x)). Historicamente a função de ativação padrão em redes neurais, agora amplamente substituída por ReLU e GELU para camadas ocultas, mas ainda usada para saídas de classificação binária, mecanismos de gating (em LSTMs e GLU) e operações tipo atenção onde você precisa de valores entre 0 e 1.

Por que importa

Sigmoid aparece em todo lugar em IA mesmo não sendo mais a ativação oculta padrão. Gates de LSTM usam sigmoid. A ativação SiLU/Swish é x · sigmoid(x). Classificadores binários usam sigmoid como ativação de saída. Entender sigmoid — e por que foi substituída por ReLU em camadas ocultas — é conhecimento fundamental para entender escolhas de design em redes neurais.

Deep Dive

Sigmoid's shape: it's an S-curve centered at 0. For large positive inputs, it saturates near 1. For large negative inputs, it saturates near 0. Around 0, it transitions smoothly. This shape made it a natural choice for early neural networks: it mimics a biological neuron's firing rate (off to on) and naturally produces bounded outputs.

Why It Was Replaced

Sigmoid has two problems for deep networks. First, vanishing gradients: in the saturated regions (very positive or very negative inputs), the gradient is near zero, meaning learning effectively stops for those neurons. Second, non-zero-centered outputs: sigmoid always outputs positive values, which causes gradients to be either all positive or all negative, slowing convergence. ReLU solves both: it has a constant gradient of 1 for positive inputs and is zero-centered (for positive inputs).

Where Sigmoid Survives

Sigmoid remains the right choice when you specifically need a (0, 1) output: binary classification (probability of the positive class), gating (how much to let through, as in LSTMs), and any operation where you need a smooth, bounded activation. The SiLU activation function (x · sigmoid(x)) brings sigmoid back into modern architectures in a gating role, combining sigmoid's smoothness with the identity function's gradient properties.

Sigmoid

Por que importa

Deep Dive

Why It Was Replaced

Where Sigmoid Survives

Conceitos relacionados