Neural Network: Definition & Meaning — AI Wiki

一種鬆散地受生物大腦啟發的運算系統,由多層互相連接的「神經元」(數學函數)組成,它們從資料中學習模式。資訊流經各層,逐步轉化,直到網路產生輸出。每個現代 AI 模型都是某種形式的神經網路。

為什麼重要

神經網路是所有 AI 背後的「怎麼做」。理解它們只是數學(不是魔法,也不是大腦),能幫你把 AI 能做什麼、不能做什麼的真相去神秘化。它們是模式匹配器 — 非常強大的那種,但終究是模式匹配器。

Deep Dive

A neural network is, at bottom, a chain of matrix multiplications interspersed with nonlinear functions. Each "neuron" takes a weighted sum of its inputs, adds a bias term, and passes the result through an activation function (ReLU, GELU, sigmoid, and others). Stack thousands of these neurons into layers, stack dozens of layers deep, and you get a network capable of learning astonishingly complex functions — from recognizing faces to generating prose to folding proteins. The magic is not in any individual neuron (which is trivially simple math) but in the composition: layers build on layers, each learning progressively more abstract representations of the input data.

How Training Works

Training a neural network means finding the right values for all those weights and biases — often billions of them. This happens through backpropagation and gradient descent. You feed the network an input, compare its output to the desired answer, compute how wrong it was (the loss), then work backward through every layer computing how each weight contributed to that error. Each weight gets nudged slightly in the direction that reduces the loss. Repeat this billions of times across your entire dataset, and the network converges on weights that produce useful outputs. The process is conceptually straightforward, but making it work at scale requires careful engineering: learning rate schedules, batch normalization, weight initialization strategies, and a lot of GPU memory.

The Road to 2012

The history matters for understanding where we are today. Neural networks were first proposed in the 1940s and had a heyday in the 1960s (perceptrons), followed by a long "AI winter" when they fell out of favor. The modern resurgence started around 2012, when a deep convolutional neural network called AlexNet crushed the ImageNet competition by a margin that shocked the field. What changed was not the theory — backpropagation had been around since the 1980s — but the hardware (GPUs made massive parallelism affordable) and the data (the internet provided training sets orders of magnitude larger than anything before). Every major AI breakthrough since then, from AlphaGo to GPT-4 to Sora, has been a neural network of some variety.

The Architecture Zoo

Today, the term "neural network" covers a sprawling family of architectures, each suited to different problems. Convolutional neural networks (CNNs) dominate image tasks by exploiting spatial structure. Recurrent neural networks (RNNs) and their LSTM variants were the go-to for sequential data before Transformers replaced them. Transformers, built on self-attention, power virtually all modern LLMs. State-space models (SSMs) like Mamba offer an alternative for long sequences with linear-time complexity instead of the Transformer's quadratic cost. Graph neural networks handle molecular structures and social networks. Diffusion models (a type of neural network trained to reverse a noising process) generate images and video. The architecture you choose shapes what your model can learn efficiently, and picking the wrong one for your problem can matter more than having more data or compute.

Not Actually a Brain

A persistent misconception is that neural networks work "like the brain." They really do not. Biological neurons communicate with timed electrical spikes, form recurrent loops, rewire physically, and operate on timescales and energy budgets utterly unlike silicon. Artificial neural networks borrowed the metaphor of connected nodes and then diverged almost completely. Nobody doing serious AI research today looks at neuroscience papers to design better Transformers. The brain analogy is useful for a five-second intuition ("it learns from examples") but misleading for anything deeper. What neural networks actually are — differentiable function approximators trained by gradient descent — is both less romantic and more precisely useful to understand.

Neural Network