Google, Microsoft, xAI Anthropic और OpenAI के साथ CAISI pre-release evals में शामिल

Commerce Department के Center for AI Standards and Innovation (CAISI) ने मंगलवार को बताया कि Google, Microsoft और xAI ने security और capability evaluation के लिए अपने frontier models का pre-release access देने को signed किया है। ये OpenAI और Anthropic के साथ शामिल हो रहे हैं, जिन्होंने Trump administration के AI Action Plan से align करने के लिए अपनी existing CAISI partnerships renegotiate कीं। पाँच closed labs अब unreleased state-of-the-art models को एक federal eval pipeline में feed करती हैं public deployment से पहले — soft-regulation का वो जवाब जो EU ने legislation में encode करने की कोशिश की थी, यहाँ अलग administration के तहत voluntary pact के रूप में आ रहा है।

substantive details announcement से पतली हैं। CAISI कहती है उसने 40+ evaluations complete की हैं, जिनमें unreleased frontier models भी शामिल हैं, लेकिन क्या evaluate होता है, results कौन देखता है, और कुछ actually deployment को gate करता है या नहीं — ये undisclosed है। eval scope capability और security terms में describe किया गया है — standard CBRN, cyber, autonomous-action axes — लेकिन harness specifics, contamination protocols, और red-team access models public नहीं हैं। OpenAI और Anthropic की existing partnerships पर «renegotiated» wording वो point है जो watch करना है: terms नई administration के under shift हुए, और क्या shift हुआ ये किसी भी lab ने disclose नहीं किया। builders के लिए मतलब ये कि eval pipeline real है पर वो जो criteria enforce करती है essentially black-boxed हैं।

open-weights labs — Mistral, Meta, DeepSeek, Qwen, Zyphra — पूरी तरह इस loop के बाहर हैं। वो weights publish करते हैं, तो negotiate करने को कोई «pre-release» gate नहीं है। result एक regulatory bifurcation जो मायने रखने लगी है: closed-frontier gov-eval pipeline के अंदर, open-weights बाहर। Mistral का इसी हफ़्ते Medium 3.5 ship करना (128B densa, SWE-Bench Verified पर 77.6%, weights Hugging Face पर) live demonstration है — एक coding-capable backbone जो किसी federal pre-release review के बिना deploy हुआ, builder infrastructure पर hostable। regulated customers की ओर pointed agent stacks के लिए ये differential procurement choices को compress करेगा: gov-adjacent buyers पूछना शुरू करेंगे कि क्या किसी model का CAISI eval status है, और «नहीं» या «open-weights, n/a» «हाँ» से अलग पढ़ा जाएगा। commercial builders के लिए differential दूसरी तरफ़ कटता है — open-weights regulatory friction से बचने का advantage पाते हैं जो self-hosting की math को attractive बनाता है।

Monday-morning concrete: अगर आप government, defense, finance या healthcare को ship करते हो, अपने model vendor से CAISI status पूछो — जल्द procurement bullet बनेगा। अगर commercial के लिए open-vs-closed तौल रहे हो, regulatory differential अब real cost line है: closed-frontier pre-release eval friction carry करता है (अगर evals issues मिलें तो potentially लम्बे release cycles), open-weights inverse risk carry करता है (कोई federal seal नहीं, पर कोई federal gate भी नहीं)। middle case messy है — CAISI-pipeline labs के Llama, Gemma और दूसरे open releases खुद releases के तौर पर pre-evaluated नहीं हैं, चाहे उनकी parent lab program में हो। उन weights पर बनाने वाले builders ambiguity inherit करते हैं जो paper पर अब तक resolve नहीं हुई।

Google, Microsoft, xAI Anthropic और OpenAI के साथ CAISI pre-release evals में शामिल

और समाचार