The White House Office of Science and Technology Policy released a memo Thursday accusing China-based entities of running industrial-scale campaigns to steal US frontier AI model capabilities. OSTP director Michael Kratsios named distillation as the primary technique, alleging tens of thousands of proxy accounts and jailbreak prompts used to extract proprietary outputs from US labs. The Chinese Embassy in Washington called the claims baseless and slander, reiterating its commitment to IP protection and opposing what it termed the unjustified suppression of Chinese companies. Timing matters: the memo lands three weeks before a rescheduled Trump-Xi summit on May 14.
Distillation is not exotic. You train a small student model on outputs generated by a larger teacher, matching logits or response distributions. Done well, the student captures a surprising fraction of the teacher's capability at a fraction of the parameter count. What makes it contentious is API terms. OpenAI and Anthropic both prohibit using outputs to train competing models, and jailbreak prompts are explicitly banned. Two allegations are tangled together here. One is contractual: firms like DeepSeek, Moonshot AI, and MiniMax allegedly violated ToS by training on scraped outputs. The other is operational: thousands of proxy accounts and jailbreaking prompts used to evade rate limits and safety filters.
What changed Thursday is that a ToS dispute became a national-security framing. Kratsios's memo converts contractual enforcement, normally handled through API bans, rate limits, and civil suits, into a sovereign IP question. That invites export controls, entity list additions, and the kind of bilateral escalation the chip restrictions already produced. The memo's vagueness is its feature: industrial-scale names no specific dollar figure and no specific incident, but it establishes the diplomatic posture. Anthropic and OpenAI have raised distillation concerns publicly for months; this memo ratifies those concerns at the executive level.
If you are training a model using outputs from an API'd frontier model, the legal exposure just multiplied. What used to be a ToS violation, bad but bounded, is now being framed as theft of national-security-relevant IP. That matters even for builders outside China. Mixing synthetic data from GPT or Claude into your training pipeline carried contract risk before; now it carries political risk, especially if you are distributing the resulting model. The honest path is clear: if you cannot explain where every training signal came from and whose ToS governs it, you have a supply-chain problem that will not stay quiet.
