Mistral Medium 3.5: 128B densa open-weights, SWE-Bench Verified पर 77.6%

Mistral ने इस हफ़्ते Medium 3.5 को Hugging Face पर open weights के साथ अपनी license के तहत push किया: 128B densa parameters, 256k context, SWE-Bench Verified पर 77.6%, τ³-Telecom पर 91.4। self-hosted agents चलाने वाले builders के लिए जो मायने रखता है वो combination है — coding-capable backbone जिसे आप pull कर सकते हो, अपने codebase पर fine-tune कर सकते हो, और अपने ही GPUs पर serve कर सकते हो। closed frontier अभी भी आगे है, लेकिन long-tail real-issue resolution में gap इतना compress हो चुका है कि hosting choices फिर से तौलने लायक बन गई हैं।

दो architectural choices flag करने लायक। पहली, densa है, mixture-of-experts नहीं: Medium 3.5 ने SWE-Bench पर Qwen3.5 397B-A17B (MoE, ~17B active) को हराया absolute weights में छोटा होने के बावजूद। Mistral की «merged model» language का मतलब है कि उन्होंने पहले के Mistral instruct और Devstral coding-specialist के split को एक ही weights set में collapse कर दिया जो instruct, reasoning और coding सब cover करता है — जिन builders को दो endpoints juggle करना पसंद नहीं था उनके लिए ops simpler हो गई। दूसरी, 77.6% single-pass है 500-task Verified subset पर; Sonnet 4.5 का 82% parallel test-time compute के साथ था, तो असली comparison headline से ज़्यादा करीब है। Mistral ने disclose नहीं किया contamination story और Vibe harness post-process करता है या नहीं — Medium 3.5 को production loop में port करने से पहले यही अगला सवाल पूछना है।

Vibe surface इस release का दूसरा आधा है। Vibe पहले से ही Mistral का CLI coding agent था — Claude Code, Cursor का Composer, Aider वाली ही category — लेकिन Remote Agents इसे proper Cursor/Devin competitor बना देते हैं: long-running tasks का sandboxed cloud execution जब आप कहीं और काम कर रहे हों, sessions CLI या Le Chat से launch हो सकते हैं। ecosystem reading: open-weights labs अब सिर्फ़ model ship करके agent surface को wrappers पर नहीं छोड़ रहे। Mistral खुद loop close कर रही है, जैसे Anthropic ने Sonnet 4.5 के साथ Claude Code ship किया था। builders के लिए मतलब ये कि open stack अब end-to-end credible है: host करने लायक weights, सीधे use करने लायक agent surface, या टुकड़ों में integrate करने के लिए peel off करने लायक। closed labs का moat सिकुड़कर test-time compute, deeper tool integration, और CAISI pre-release eval pipeline जो देता है उसी तक सीमित रह जाता है।

practical move: अगर आप अपने agent के पीछे Devstral 2 या non-Mistral coding specialist चला रहे हो, Medium 3.5 इस हफ़्ते अपने ही eval set पर benchmark swap के लायक है। single weights set deploy को simple करता है, 256k context real-codebase windows handle करता है, और Vibe Remote Agents out-of-the-box use लायक हैं अगर आप sandboxing खुद नहीं बनाना चाहते। अगर आप पहले से ही closed-frontier API पर हो और per-token economics देख रहे हो, 128B densa model इतना छोटा है कि self-hosting का math single 8xH100 node पर बैठ जाए — open-weights agents के pitch से यही calculation missing था।

Mistral Medium 3.5: 128B densa open-weights, SWE-Bench Verified पर 77.6%

और समाचार