Cerebras: Definition & Meaning — AI Wiki

A chip company that builds wafer-scale AI processors — chips the size of an entire silicon wafer, over 100x larger than a standard GPU. The Cerebras WSE-3 (Wafer Scale Engine) contains 4 trillion transistors and 900,000 cores. Their CS-3 systems are designed for both training and inference, offering an alternative to clusters of thousands of individual GPUs.

Why it matters

Cerebras represents the most radical rethinking of AI hardware. Instead of connecting thousands of small chips with limited bandwidth, they put everything on one massive chip with enormous on-chip memory bandwidth. The potential advantage is eliminating the communication bottleneck that limits multi-GPU training. Whether wafer-scale computing can compete with NVIDIA's massive ecosystem is the billion-dollar question.

Deep Dive

The WSE-3 has 44 GB of on-chip SRAM — not HBM or DRAM, but SRAM directly on the compute die. This provides ~21 PB/s of memory bandwidth, orders of magnitude more than GPU HBM bandwidth. For memory-bandwidth-bound operations (like LLM inference, which is limited by how fast you can read model weights), this is a fundamental advantage. The trade-off: 44 GB of on-chip memory can't hold the largest models, requiring model-parallel strategies across multiple CS-3 systems.

Inference Speed

Cerebras has demonstrated impressive inference speeds — serving Llama-70B at over 2,000 tokens/second, competitive with or exceeding Groq's LPU. The approach is different (wafer-scale chip vs. deterministic ASICs) but the result is similar: purpose-built hardware that dramatically outperforms GPUs for the specific workload of LLM token generation.

Cerebras

Why it matters

Deep Dive

Inference Speed

Related Concepts