Cohere Command A+: 218B sparse MoE (25B active), 2x H100 W4A4, Apache 2.0 open

Cohere ने Command A+ को Apache 2.0 open-weight के रूप में release किया: एक decoder-only sparse Mixture-of-Experts transformer 218 billion total parameters के साथ, per token 25 billion active। Topology: 128 experts per token 8 active plus 1 shared expert के साथ। 128K input context, 64K max generation। Deployment story builders के लिए headline है: W4A4 quantization (NVFP4 केवल MoE experts पर applied, attention paths full precision पर kept) कम से कम 2 H100 GPUs पर चलता है। Alternative configurations: 1x B200, 4x H100 FP8 पर, 8x H100 BF16 पर। HuggingFace पर available, vLLM 0.21.0+ और Transformers द्वारा supported। Quantization-Aware Distillation post-training W4A4 पर quality recover करती है। Cohere Command A+ को unified multimodal Command A के रूप में position करता है (text, image, tool inputs; text, reasoning, tool use outputs)।

Cohere के पिछले Command A Reasoning versus agentic benchmark deltas substantive signal हैं। τ²-Bench Telecom 37% से 85% हुआ। Terminal-Bench Hard agentic coding 3% से 25% गया। Agentic QA accuracy 20 percentage points improved। Terminal-Bench Hard delta सबसे telling है — वह benchmark multi-step command-line agentic problem-solving test करता है, और Hard tier पर 3% से 25% jump systems work के लिए agent reliability में step change है। Cohere उसी agentic capability claim को target कर रहा है जो Anthropic के Code With Claude Capability Curve framing (SWE-bench बारह महीनों में 62%→87%) और Google के Gemini 3.5 Flash agent-first framing target करते हैं, लेकिन closed API के बजाय open weights के साथ। W4A4 deployment story वो है जो differentiate करता है: 2 H100s पर 218B-class frontier MoE चलाना वो accessible-to-mid-market scenario है जिसे closed-weight Anthropic/Google/OpenAI frontier models TCO पर match नहीं कर सकते।

Ecosystem context। NVFP4 (4-bit format जिसे हमने 18 मई NVIDIA pre-training piece में cover किया) यहाँ quantization standard है — Cohere इसे MoE expert paths पर use कर रहा है जबकि attention को full precision पर रख रहा है। यह NVFP4 adoption का practical shape है: full-model 4-bit नहीं, बल्कि high-parameter-count low-precision-tolerant layers पर selective application। MoE design (218B total, 25B active) DeepSeek-V3 और Llama 4 Behemoth lineage follow करता है — sparse activation model को frontier-scale knowledge carry करने देती है frontier-scale inference cost के बिना। Apache 2.0 strategic differentiator है: Cohere Anthropic और Google के closed-weight vertical (Code With Claude, Antigravity) और Mistral के industrial-vertical (Emmi acquisition) के against open-weights frontier-class option के रूप में position कर रहा है। पाँच labs, पाँच different bets इस week visible। Cohere का bet accessible hardware पर open-weights agentic frontier है।

सोमवार: अगर आप closed-API frontier models (Claude Opus, GPT-4-class, Gemini Pro) पर agentic workloads run करते हैं, अपने evals पर Command A+ benchmark करें — Apache 2.0 का मतलब है commercial-use restrictions के बिना fine-tune, redistribute, modify कर सकते हैं। Specific tests: (1) अपनी terminal-style agentic tasks को 2 H100s पर Command A+ W4A4 के against run करें, wall-clock और quality को अपने current closed-API spend से compare करें। Terminal-Bench Hard 3%→25% claim अपनी task distribution पर verify करने के लिए concrete enough है। (2) 128K input / 64K generation budget को अपनी agentic context needs के against evaluate करें — अधिकांश long-horizon agents output generation से bounded होते हैं, input context से नहीं, तो 64K max generation relevant constraint है। (3) अगर आपने closed-API cost या data-egress concerns की वजह से agentic deployment hold off किया है, W4A4 / 2-H100 deployment story वह gap close कर सकती है। Broader trend के लिए: open-weights frontier-class agentic models अब real category हैं, future hope नहीं। Cohere ने अभी इसे concrete बनाया। अगले quarter में DeepSeek, Llama, और Qwen से अपनी NVFP4-quantized agentic-tuned releases के साथ follow करने के लिए watch करें।

Cohere Command A+: 218B sparse MoE (25B active), 2x H100 W4A4, Apache 2.0 open

और समाचार