The LPU (Language Processing Unit) is built around a deterministic execution model. Unlike GPUs, which schedule work dynamically and suffer from memory bandwidth bottlenecks, LPUs have a fixed dataflow architecture where computation and data movement are orchestrated at compile time. This eliminates scheduling overhead and allows the chip to sustain near-peak throughput for the sequential, memory-bound operations that dominate LLM inference (especially token generation, which is limited by how fast you can read model weights from memory).
Groq's speed advantage comes with constraints. The deterministic architecture works best for models that fit a known execution pattern — standard Transformer inference. Custom architectures, training workloads, and highly dynamic computation graphs are harder to map to the LPU. Groq is also an inference-only solution; you still need GPUs (or TPUs) for training. And the cost-per-token, while decreasing, isn't always cheaper than GPU inference for high-throughput batch workloads where GPUs can amortize their flexibility.