Groq: Definition & Meaning — AI Wiki

一家建構自訂 AI 推理處理器 LPU(Language Processing Units)的晶片公司。不像 NVIDIA GPU 是為 AI 適配的通用平行處理器,Groq 的 LPU 專門為 LLM 推理需要的循序 token 生成打造。結果:極快的推理速度,往往比基於 GPU 的替代方案快 10 倍做 LLM 生成。

為什麼重要

Groq 展示了 LLM 推理不必慢。他們的雲端 API 以每秒 500–800 token 的速度服務開源模型(Llama、Mixtral) — 快到回應幾乎即時出現。這個速度優勢來自硬體架構,不是軟體優化,暗示當前以 GPU 為中心的 AI 推理方法可能不是長期贏家。

Deep Dive

The LPU (Language Processing Unit) is built around a deterministic execution model. Unlike GPUs, which schedule work dynamically and suffer from memory bandwidth bottlenecks, LPUs have a fixed dataflow architecture where computation and data movement are orchestrated at compile time. This eliminates scheduling overhead and allows the chip to sustain near-peak throughput for the sequential, memory-bound operations that dominate LLM inference (especially token generation, which is limited by how fast you can read model weights from memory).

The Trade-offs

Groq's speed advantage comes with constraints. The deterministic architecture works best for models that fit a known execution pattern — standard Transformer inference. Custom architectures, training workloads, and highly dynamic computation graphs are harder to map to the LPU. Groq is also an inference-only solution; you still need GPUs (or TPUs) for training. And the cost-per-token, while decreasing, isn't always cheaper than GPU inference for high-throughput batch workloads where GPUs can amortize their flexibility.

Groq

為什麼重要

Deep Dive

The Trade-offs

相關概念