Groq: Definition & Meaning — AI Wiki

一家构建自定义 AI 推理处理器 LPU(Language Processing Units)的芯片公司。不像 NVIDIA GPU 是为 AI 适配的通用并行处理器,Groq 的 LPU 专门为 LLM 推理需要的顺序 token 生成打造。结果:极快的推理速度,往往比基于 GPU 的替代方案快 10 倍做 LLM 生成。

为什么重要

Groq 展示了 LLM 推理不必慢。他们的云 API 以每秒 500–800 token 的速度服务开源模型(Llama、Mixtral) — 快到响应几乎即时出现。这个速度优势来自硬件架构,不是软件优化,暗示当前以 GPU 为中心的 AI 推理方法可能不是长期赢家。

Deep Dive

The LPU (Language Processing Unit) is built around a deterministic execution model. Unlike GPUs, which schedule work dynamically and suffer from memory bandwidth bottlenecks, LPUs have a fixed dataflow architecture where computation and data movement are orchestrated at compile time. This eliminates scheduling overhead and allows the chip to sustain near-peak throughput for the sequential, memory-bound operations that dominate LLM inference (especially token generation, which is limited by how fast you can read model weights from memory).

The Trade-offs

Groq's speed advantage comes with constraints. The deterministic architecture works best for models that fit a known execution pattern — standard Transformer inference. Custom architectures, training workloads, and highly dynamic computation graphs are harder to map to the LPU. Groq is also an inference-only solution; you still need GPUs (or TPUs) for training. And the cost-per-token, while decreasing, isn't always cheaper than GPU inference for high-throughput batch workloads where GPUs can amortize their flexibility.

Groq

为什么重要

Deep Dive

The Trade-offs

相关概念