Cerebras: Definition & Meaning — AI Wiki

一家建構 wafer 級 AI 處理器的晶片公司 — 整個矽晶圓大小的晶片,比標準 GPU 大 100 倍以上。Cerebras WSE-3(Wafer Scale Engine)包含 4 兆電晶體、90 萬核心。他們的 CS-3 系統為訓練和推理兩者設計,提供了數千個獨立 GPU 叢集之外的另一種選擇。

為什麼重要

Cerebras 代表 AI 硬體最激進的重新思考。不是用有限頻寬連接數千個小晶片,而是把所有東西放到一個有巨大晶片上記憶體頻寬的巨型晶片上。潛在優勢是消除了限制多 GPU 訓練的通訊瓶頸。wafer 級運算能否與 NVIDIA 的龐大生態競爭,是個十億美元的問題。

Deep Dive

The WSE-3 has 44 GB of on-chip SRAM — not HBM or DRAM, but SRAM directly on the compute die. This provides ~21 PB/s of memory bandwidth, orders of magnitude more than GPU HBM bandwidth. For memory-bandwidth-bound operations (like LLM inference, which is limited by how fast you can read model weights), this is a fundamental advantage. The trade-off: 44 GB of on-chip memory can't hold the largest models, requiring model-parallel strategies across multiple CS-3 systems.

Inference Speed

Cerebras has demonstrated impressive inference speeds — serving Llama-70B at over 2,000 tokens/second, competitive with or exceeding Groq's LPU. The approach is different (wafer-scale chip vs. deterministic ASICs) but the result is similar: purpose-built hardware that dramatically outperforms GPUs for the specific workload of LLM token generation.

Cerebras

為什麼重要

Deep Dive

Inference Speed

相關概念