Neural Network: Definition & Meaning — AI Wiki

A computing system loosely inspired by biological brains, made of layers of interconnected "neurons" (mathematical functions) that learn patterns from data. Information flows through layers, getting progressively transformed until the network produces an output. Every modern AI model is a neural network of some kind.

Why it matters

Neural networks are the "how" behind all of AI. Understanding that they're math (not magic, not brains) helps demystify what AI can and can't do. They're pattern matchers — extraordinarily powerful ones, but pattern matchers nonetheless.

Deep Dive

A neural network is, at bottom, a chain of matrix multiplications interspersed with nonlinear functions. Each "neuron" takes a weighted sum of its inputs, adds a bias term, and passes the result through an activation function (ReLU, GELU, sigmoid, and others). Stack thousands of these neurons into layers, stack dozens of layers deep, and you get a network capable of learning astonishingly complex functions — from recognizing faces to generating prose to folding proteins. The magic is not in any individual neuron (which is trivially simple math) but in the composition: layers build on layers, each learning progressively more abstract representations of the input data.

How Training Works

Training a neural network means finding the right values for all those weights and biases — often billions of them. This happens through backpropagation and gradient descent. You feed the network an input, compare its output to the desired answer, compute how wrong it was (the loss), then work backward through every layer computing how each weight contributed to that error. Each weight gets nudged slightly in the direction that reduces the loss. Repeat this billions of times across your entire dataset, and the network converges on weights that produce useful outputs. The process is conceptually straightforward, but making it work at scale requires careful engineering: learning rate schedules, batch normalization, weight initialization strategies, and a lot of GPU memory.

The Road to 2012

The history matters for understanding where we are today. Neural networks were first proposed in the 1940s and had a heyday in the 1960s (perceptrons), followed by a long "AI winter" when they fell out of favor. The modern resurgence started around 2012, when a deep convolutional neural network called AlexNet crushed the ImageNet competition by a margin that shocked the field. What changed was not the theory — backpropagation had been around since the 1980s — but the hardware (GPUs made massive parallelism affordable) and the data (the internet provided training sets orders of magnitude larger than anything before). Every major AI breakthrough since then, from AlphaGo to GPT-4 to Sora, has been a neural network of some variety.

The Architecture Zoo

Today, the term "neural network" covers a sprawling family of architectures, each suited to different problems. Convolutional neural networks (CNNs) dominate image tasks by exploiting spatial structure. Recurrent neural networks (RNNs) and their LSTM variants were the go-to for sequential data before Transformers replaced them. Transformers, built on self-attention, power virtually all modern LLMs. State-space models (SSMs) like Mamba offer an alternative for long sequences with linear-time complexity instead of the Transformer's quadratic cost. Graph neural networks handle molecular structures and social networks. Diffusion models (a type of neural network trained to reverse a noising process) generate images and video. The architecture you choose shapes what your model can learn efficiently, and picking the wrong one for your problem can matter more than having more data or compute.

Not Actually a Brain

A persistent misconception is that neural networks work "like the brain." They really do not. Biological neurons communicate with timed electrical spikes, form recurrent loops, rewire physically, and operate on timescales and energy budgets utterly unlike silicon. Artificial neural networks borrowed the metaphor of connected nodes and then diverged almost completely. Nobody doing serious AI research today looks at neuroscience papers to design better Transformers. The brain analogy is useful for a five-second intuition ("it learns from examples") but misleading for anything deeper. What neural networks actually are — differentiable function approximators trained by gradient descent — is both less romantic and more precisely useful to understand.

Neural Network