AI21 Labs: Definition & Meaning — AI Wiki

一家以色列 AI 公司,以 Jamba 聞名,這是第一個生產級的混合架構,把 Transformer attention 層和 Mamba SSM 層結合在一起。AI21 由 AI 研究者(包括 Yoav Shoham)創立,從 2017 年就開始建構語言模型,比 ChatGPT 還早。他們的模型透過 API 和雲端供應商可用。

為什麼重要

AI21 Labs 重要是因為 Jamba 證明了 Transformer-SSM 混合架構在實際中運作,不只是在研究論文裡。透過交錯排布 attention 和 Mamba 層,Jamba 在和純 Transformer 模型相似品質的情況下,實現了 256K 的上下文視窗,記憶體使用更低。這種混合方法可能是 LLM 架構的未來。

Deep Dive

Jamba's architecture interleaves Transformer blocks (with standard attention) and Mamba blocks (with selective state spaces) in a ratio of roughly 1:7 — one attention layer for every seven Mamba layers. This captures the best of both: Mamba layers handle the bulk of sequence processing efficiently (linear in sequence length), while attention layers provide the global token interaction that pure SSMs sometimes lack. The result: a model that fits in a single 80GB GPU at 256K context while matching Transformer-only models on quality.

The MoE Component

Jamba also uses Mixture of Experts (MoE), with 52B total parameters but only ~12B active per token. This combination of SSM + Attention + MoE is the most complex hybrid architecture in production and demonstrates that these techniques compose well. The 3x reduction in KV cache memory compared to a pure Transformer of equivalent quality is practically significant for serving long-context workloads.

AI21 Labs

為什麼重要

Deep Dive

The MoE Component

相關概念