The artificial neuron is loosely inspired by biological neurons but shouldn't be taken as a literal analogy. A biological neuron receives electrical signals through dendrites, integrates them in the cell body, and fires (or doesn't) through the axon. An artificial neuron computes: output = activation(w1·x1 + w2·x2 + ... + wn·xn + bias). The weights (w) determine how much each input matters. The bias shifts the activation threshold. The activation function (ReLU, GELU) introduces non-linearity.
The perceptron (Rosenblatt, 1958) was the first artificial neuron — a single unit that could learn to classify linearly separable data. Minsky and Papert showed in 1969 that a single perceptron couldn't learn XOR (a simple non-linear function), contributing to the first AI winter. The solution: stack multiple layers of neurons (multi-layer perceptrons / MLPs), which can learn any function given enough neurons. This is the universal approximation theorem — the theoretical foundation of deep learning.
A model like Llama-70B has roughly 70 billion parameters (weights and biases connecting neurons). Each feedforward layer has thousands of neurons. But modern research shows that individual neurons often don't correspond to single concepts — instead, concepts are encoded as directions in activation space across many neurons (superposition). A single neuron might participate in encoding dozens of different features, making interpretation challenging.