Sigmoid: Definition & Meaning — AI Wiki

Une fonction mathématique qui écrase n'importe quel nombre réel dans l'intervalle (0, 1) : σ(x) = 1 / (1 + e^(−x)). Historiquement la fonction d'activation par défaut dans les réseaux de neurones, maintenant largement remplacée par ReLU et GELU pour les couches cachées, mais toujours utilisée pour les sorties de classification binaire, les mécanismes de gating (dans les LSTM et GLU) et les opérations de type attention où tu as besoin de valeurs entre 0 et 1.

Pourquoi c'est important

Sigmoid apparaît partout en IA même si ce n'est plus l'activation cachée par défaut. Les gates des LSTM utilisent sigmoid. L'activation SiLU/Swish est x · sigmoid(x). Les classifieurs binaires utilisent sigmoid comme activation de sortie. Comprendre sigmoid — et pourquoi elle a été remplacée par ReLU pour les couches cachées — est un savoir fondamental pour comprendre les choix de design des réseaux de neurones.

Deep Dive

Sigmoid's shape: it's an S-curve centered at 0. For large positive inputs, it saturates near 1. For large negative inputs, it saturates near 0. Around 0, it transitions smoothly. This shape made it a natural choice for early neural networks: it mimics a biological neuron's firing rate (off to on) and naturally produces bounded outputs.

Why It Was Replaced

Sigmoid has two problems for deep networks. First, vanishing gradients: in the saturated regions (very positive or very negative inputs), the gradient is near zero, meaning learning effectively stops for those neurons. Second, non-zero-centered outputs: sigmoid always outputs positive values, which causes gradients to be either all positive or all negative, slowing convergence. ReLU solves both: it has a constant gradient of 1 for positive inputs and is zero-centered (for positive inputs).

Where Sigmoid Survives

Sigmoid remains the right choice when you specifically need a (0, 1) output: binary classification (probability of the positive class), gating (how much to let through, as in LSTMs), and any operation where you need a smooth, bounded activation. The SiLU activation function (x · sigmoid(x)) brings sigmoid back into modern architectures in a gating role, combining sigmoid's smoothness with the identity function's gradient properties.

Sigmoid

Pourquoi c'est important

Deep Dive

Why It Was Replaced

Where Sigmoid Survives

Concepts liés