Cerebras: Definition & Meaning — AI Wiki

一家构建 wafer 级 AI 处理器的芯片公司 — 整个硅晶圆大小的芯片,比标准 GPU 大 100 倍以上。Cerebras WSE-3(Wafer Scale Engine)包含 4 万亿晶体管、90 万核心。他们的 CS-3 系统为训练和推理两者设计,提供了数千个独立 GPU 集群之外的另一种选择。

为什么重要

Cerebras 代表 AI 硬件最激进的重新思考。不是用有限带宽连接数千个小芯片,而是把所有东西放到一个有巨大片上内存带宽的巨型芯片上。潜在优势是消除了限制多 GPU 训练的通信瓶颈。wafer 级计算能否与 NVIDIA 的庞大生态竞争,是个十亿美元的问题。

Deep Dive

The WSE-3 has 44 GB of on-chip SRAM — not HBM or DRAM, but SRAM directly on the compute die. This provides ~21 PB/s of memory bandwidth, orders of magnitude more than GPU HBM bandwidth. For memory-bandwidth-bound operations (like LLM inference, which is limited by how fast you can read model weights), this is a fundamental advantage. The trade-off: 44 GB of on-chip memory can't hold the largest models, requiring model-parallel strategies across multiple CS-3 systems.

Inference Speed

Cerebras has demonstrated impressive inference speeds — serving Llama-70B at over 2,000 tokens/second, competitive with or exceeding Groq's LPU. The approach is different (wafer-scale chip vs. deterministic ASICs) but the result is similar: purpose-built hardware that dramatically outperforms GPUs for the specific workload of LLM token generation.

Cerebras

为什么重要

Deep Dive

Inference Speed

相关概念