The process of running a trained model to generate outputs. Training is learning; inference is using what was learned. Every time you send a prompt to Claude or generate an image with Stable Diffusion, that's inference. It's what costs providers GPU hours and what you pay for per token.
Why it matters
Inference cost and speed determine the economics of AI products. Faster inference = lower latency = better UX. Cheaper inference = lower prices = wider adoption. The entire quantization and optimization industry exists to make inference more efficient.