AI21 Labs: Definition & Meaning — AI Wiki

An Israeli AI company known for Jamba, the first production-grade hybrid architecture that combines Transformer attention layers with Mamba SSM layers. AI21 was founded by AI researchers (including Yoav Shoham) and has been building language models since 2017, predating ChatGPT. Their models are available via API and through cloud providers.

Why it matters

AI21 Labs matters because Jamba proved that hybrid Transformer-SSM architectures work in practice, not just in research papers. By interleaving attention and Mamba layers, Jamba achieves a 256K context window with lower memory usage than pure Transformer models of similar quality. This hybrid approach may be the future of LLM architecture.

Deep Dive

Jamba's architecture interleaves Transformer blocks (with standard attention) and Mamba blocks (with selective state spaces) in a ratio of roughly 1:7 — one attention layer for every seven Mamba layers. This captures the best of both: Mamba layers handle the bulk of sequence processing efficiently (linear in sequence length), while attention layers provide the global token interaction that pure SSMs sometimes lack. The result: a model that fits in a single 80GB GPU at 256K context while matching Transformer-only models on quality.

The MoE Component

Jamba also uses Mixture of Experts (MoE), with 52B total parameters but only ~12B active per token. This combination of SSM + Attention + MoE is the most complex hybrid architecture in production and demonstrates that these techniques compose well. The 3x reduction in KV cache memory compared to a pure Transformer of equivalent quality is practically significant for serving long-context workloads.

AI21 Labs

Why it matters

Deep Dive

The MoE Component

Related Concepts