DeepSeek released V4-Pro and V4-Flash on April 24, both shipped as open weights under MIT license and immediately available via the DeepSeek API. The headline numbers are pointed enough that they belong in any builder's evaluation pile this week. V4-Pro is 1.6 trillion total parameters with 49 billion activated per token, an MoE configuration with roughly 3% activation density that makes it cheap to serve relative to its capability ceiling. V4-Flash is the smaller variant at 284B/13B. Both models support a 1 million token context window with a 384K-token maximum output, both ship under MIT, and both are listed at deepseek-ai on Hugging Face. SWE-bench Verified score on V4-Pro is 80.6% — within 0.2 points of Claude Opus 4.6 — and the API price is around $1.74 input / $3.48 output per million tokens, which The Rundown's coverage estimates as roughly 7x cheaper per output token than the closed frontier alternatives.
The architecture detail that should attract more attention than the benchmark numbers is the new hybrid attention mechanism. V4 combines what DeepSeek calls Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to handle the 1M context efficiently. The reported impact: at 1M tokens, V4-Pro uses only 27% of the single-token inference FLOPs and 10% of the KV cache compared to DeepSeek V3.2 on the same context length. That is a much bigger structural improvement than another point of MMLU. KV cache size is the binding constraint on serving long-context inference at any reasonable concurrency, and a 10x reduction is the difference between "we can offer 1M context as a marketing bullet" and "we can offer 1M context as a real production option." Other labs will copy this fast.
For builders, the practical shift is in the price-capability frontier on coding workloads. SWE-bench Verified at 80.6% is essentially within noise of Claude Opus 4.6's 80.8%, and at one-seventh the output cost it changes the calculus for any high-volume agent product where the user does not need the absolute top number. Coding agents that run dozens of inference steps per task — Cursor-style refactor agents, autonomous PR-review systems, automated migration tools — were budget-constrained by per-token cost on closed frontier models. With V4-Pro the same workload runs at a price point that is closer to commodity compute. The corollary is that closed-frontier providers cannot keep charging the same multiples; the floor on production-grade agent inference just moved.
The strategic context is also worth naming. DeepSeek shipped Huawei Ascend support alongside V4, which means the entire training-and-serving stack runs on Chinese-domestic silicon, not just the trained model. That makes V4 the strongest single argument so far that US export controls have shaped, not stopped, the Chinese AI buildout: the gap between frontier closed models from Anthropic and OpenAI and the open-weights alternatives from DeepSeek is now small enough that workload-by-workload, the choice depends on price and licensing, not on capability ceilings. The honest caveats: DeepSeek's own evaluation methodology should be checked against independent runs, AA's Intelligence Index puts V4-Pro in the fourth tier rather than the top, and benchmark scores at this point in the cycle are increasingly contaminated by training-data overlap with the eval sets. Run your own internal evals before betting product roadmaps on the headline numbers. But the open-weights frontier just took another step in toward where the closed-weights frontier is, and that has real implications for which models the builder ecosystem standardizes on next.
