Google, Microsoft, xAI join Anthropic and OpenAI in CAISI pre-release evals

The Commerce Department's Center for AI Standards and Innovation (CAISI) announced Tuesday that Google, Microsoft, and xAI signed on to give pre-release access to their frontier models for security and capability evaluation. They join OpenAI and Anthropic, which renegotiated existing CAISI partnerships to align with the Trump administration's AI Action Plan. Five closed labs now feed unreleased state-of-the-art models into a federal eval pipeline before public deployment — the soft-regulation answer the EU tried to encode into legislation, arriving instead as a voluntary pact under a different administration.

The substantive details are thinner than the announcement suggests. CAISI says it has completed more than 40 evaluations, including on unreleased frontier models, but what gets evaluated, who sees results, and whether anything actually gates a deployment remain undisclosed. The eval scope is described in capability and security terms — the standard CBRN, cyber, autonomous-action axes — but the harness specifics, contamination protocols, and red-team access models aren't public. The "renegotiated" wording on OpenAI and Anthropic's existing partnerships is the part to watch: the terms shifted under the new administration, and what shifted hasn't been disclosed by either lab. For builders, that means the eval pipeline is real but the criteria it enforces are essentially black-boxed.

Open-weights labs — Mistral, Meta, DeepSeek, Qwen, Zyphra — sit outside this loop entirely. They publish weights, so there's no "pre-release" gate to negotiate. The result is a regulatory bifurcation that's starting to matter: closed-frontier sits inside the gov-eval pipeline, open-weights sits outside. Mistral shipping Medium 3.5 this same week (128B dense, 77.6% SWE-Bench Verified, weights on Hugging Face) is the live demonstration — a coding-capable backbone deployed without any pre-release federal review, hostable on builder infrastructure. For agent stacks pointed at regulated customers, this differential is going to compress procurement choices: gov-adjacent buyers will start asking whether a model has CAISI eval status, and a "no" or "open-weights, n/a" will read differently than a "yes." For commercial builders, the differential cuts the other way — open-weights gain the avoidance-of-regulatory-friction advantage that makes self-hosting math more attractive.

The Monday-morning concrete: if you ship to government, defense, finance, or healthcare, ask your model vendor about CAISI status — soon to be a procurement bullet. If you're weighing open-vs-closed for commercial, the regulatory differential is now a real cost line: closed-frontier carries pre-release eval friction (potentially longer release cycles if evals find issues), open-weights carries the inverse risk (no federal seal, but no federal gate either). The middle case is the messy one — Llama, Gemma, and other open releases from CAISI-pipeline labs aren't themselves pre-evaluated as releases, even though their parent labs are in the program. Builders building on those weights inherit ambiguity that hasn't been resolved on paper yet.

Google, Microsoft, xAI join Anthropic and OpenAI in CAISI pre-release evals

More News