DeepSeek made its V4 Pro pricing permanent at a 75% discount as of May 22, 2026 — cache hit input at $0.003625 per million tokens, cache miss input at $0.435/Mtok, output at $0.87/Mtok. The model is DeepSeek's flagship with 1M context, reasoning, coding and math performance. The article doesn't publish the pre-cut numbers, so independent verification of the "75%" framing requires checking DeepSeek's own pricing history — but the absolute prices themselves are the builder-relevant data point. For comparison context: a typical agent loop running 50K input + 5K output tokens per call now costs roughly $0.026 per call on V4 Pro (cache-miss) versus essentially nothing if the prefix hits cache. That's the pricing tier where production agents become unit-economics-positive without aggressive cost engineering.

The rationale DeepSeek cites is the architectural news under the price cut: "constraints in high-end compute capacity" drove V4 Pro's original 12× Flash-variant pricing, and the cut aligns with anticipated large-scale deployment of Huawei's Ascend 950 AI chips in H2 2026. This is the inference-side counterpart to the Chinese-domestic accelerator story builders have been watching: as Ascend capacity comes online, Chinese frontier-model serving costs fall to where they can compete on price even without TSMC-fabbed Nvidia silicon. The geopolitical-infra layer (Ascend deployment) shapes the model-pricing layer (V4 Pro cut), shapes the builder-economics layer (agents become cheaper to run). The whole stack moves when one tier moves.

Ecosystem read: the pricing pressure story is now bilateral. Last week, Microsoft's Experiences + Devices division dropped Claude Code licenses internally for cost reasons — that's the demand-side response. This week DeepSeek prices a frontier-class 1M-context model at $0.87/Mtok output — that's the supply-side response. The cost gradient is dominating model selection conversations inside large eng orgs in a way it wasn't six months ago. Builders evaluating "which model do we standardize on" should re-run the per-developer-monthly numbers with this DeepSeek line in the spreadsheet, especially for code-completion and high-volume agentic workloads where the cache-hit pricing essentially zeroes out the prefix-heavy portion of cost.

Monday morning: if your stack already has a DeepSeek API path (most enterprise model gateways do), the V4 Pro cost line just became the cheapest 1M-context reasoning option in the market by a meaningful margin. Re-evaluate workloads where you've been routing to GPT-5 or Claude 4.x purely because they were the only 1M-context options that hit your benchmark bar. Honest caveats: weights status not addressed in the release (DeepSeek has historically open-weighted, builders should verify V4 Pro's specific license), parameter count and architecture not disclosed, benchmarks vs Western frontier models not provided in this article. If you're shipping commercial product on top of DeepSeek inference, the data-residency and export-control questions belong on your legal team's desk separately from the pricing math.