Sigmoid: Definition & Meaning — AI Wiki

एक mathematical function जो किसी भी real number को (0, 1) range में squash कर देता है: σ(x) = 1 / (1 + e^(−x))。 Historically neural networks में default activation function था, अब hidden layers के लिए काफी हद तक ReLU और GELU ने replace कर दिया है, लेकिन binary classification outputs, gating mechanisms (LSTMs और GLU में), और attention-like operations में अभी भी use होता है जहाँ आपको 0 और 1 के बीच values चाहिए।

यह क्यों matter करता है

Sigmoid AI में हर जगह appear होता है भले ही वो अब default hidden activation नहीं है। LSTM gates sigmoid use करते हैं। SiLU/Swish activation x · sigmoid(x) है। Binary classifiers sigmoid को output activation के रूप में use करते हैं। Sigmoid समझना — और क्यों इसे hidden layers के लिए ReLU ने replace किया — neural network design choices को समझने के लिए foundational knowledge है।

Deep Dive

Sigmoid's shape: it's an S-curve centered at 0. For large positive inputs, it saturates near 1. For large negative inputs, it saturates near 0. Around 0, it transitions smoothly. This shape made it a natural choice for early neural networks: it mimics a biological neuron's firing rate (off to on) and naturally produces bounded outputs.

Why It Was Replaced

Sigmoid has two problems for deep networks. First, vanishing gradients: in the saturated regions (very positive or very negative inputs), the gradient is near zero, meaning learning effectively stops for those neurons. Second, non-zero-centered outputs: sigmoid always outputs positive values, which causes gradients to be either all positive or all negative, slowing convergence. ReLU solves both: it has a constant gradient of 1 for positive inputs and is zero-centered (for positive inputs).

Where Sigmoid Survives

Sigmoid remains the right choice when you specifically need a (0, 1) output: binary classification (probability of the positive class), gating (how much to let through, as in LSTMs), and any operation where you need a smooth, bounded activation. The SiLU activation function (x · sigmoid(x)) brings sigmoid back into modern architectures in a gating role, combining sigmoid's smoothness with the identity function's gradient properties.

Sigmoid

यह क्यों matter करता है

Deep Dive

Why It Was Replaced

Where Sigmoid Survives

संबंधित अवधारणाएँ