Mixed Precision Training: Definition & Meaning — AI Wiki

Training neural networks using lower-precision number formats (16-bit instead of 32-bit) for most computations while keeping critical operations in full precision. This doubles the effective memory capacity and computation speed of GPUs with minimal impact on model quality. BF16 (bfloat16) is the standard for LLM training; FP16 is used for inference.

Why it matters

Mixed precision is why we can train models as large as we do. A 70B parameter model in FP32 would need 280 GB just for weights — impossible on any single GPU. In BF16, it needs 140 GB, which fits across a few GPUs. Mixed precision effectively doubled the AI industry's compute capacity for free, just by using a smarter number format.

Deep Dive

The key insight: most neural network computations don't need 32 bits of precision. The weights, activations, and gradients can be represented in 16 bits without meaningful quality loss. But some operations (loss computation, weight updates) need higher precision to avoid numerical instability. Mixed precision keeps a master copy of weights in FP32 for updates, while using FP16/BF16 for the forward and backward passes.

BF16 vs. FP16

FP16 (IEEE half-precision) has 5 exponent bits and 10 mantissa bits. BF16 (Brain Float 16) has 8 exponent bits and 7 mantissa bits. BF16's wider exponent range means it can represent the same range of values as FP32 (avoiding overflow/underflow), while FP16's narrower range requires loss scaling to prevent gradients from underflowing to zero. For training, BF16 is simpler and more stable. For inference, FP16 sometimes offers slightly better precision for the same memory cost.

FP8 and Beyond

The latest GPUs (NVIDIA H100, H200) support FP8 (8-bit floating point) for even faster computation. FP8 halves memory and doubles throughput compared to FP16, but requires careful handling to avoid quality degradation. Current practice: train in BF16, serve in FP16 or FP8, and quantize to INT4/INT8 for edge deployment. Each step down in precision trades a tiny amount of quality for significant speed and memory gains.

Mixed Precision Training

Why it matters

Deep Dive

BF16 vs. FP16

FP8 and Beyond

Related Concepts