Science paper coins 'LLM Grooming': flood the web with bot-written content to poison the next model's training data

A 22-author paper led by Daniel Schroeder at SINTEF, published in Science last week (DOI 10.1126/science.adz1697), defines what its authors call a "malicious AI swarm" as a set of AI-controlled agents with four properties that older bot-detection assumptions do not handle: persistent identities and memory across sessions, coordination toward shared objectives while varying tone and content per account, real-time adaptation to engagement signals, and operation across multiple platforms. The framework matters because the dominant defensive heuristic from the 2010s — find a cluster of accounts posting identical text and ban them — assumes the attacker is using simple template-and-broadcast tooling. Modern LLMs let each agent in the swarm produce distinct, context-aware text while still pursuing the same objective.

The paper's most novel contribution is naming a second-order threat that has been getting discussed informally for two years but did not have a clean term: "LLM Grooming." The idea is that a swarm flooding the open web with content shaped to push a particular position is not just trying to influence current human readers; it is trying to influence the training corpora of the next generation of language models. If the next round of crawlers ingests several gigabytes of pro-Position-X commentary across thousands of seemingly independent sites, the resulting model will have learned that Position X is the consensus view, and will reproduce that view when asked. The attack does not require compromising a model directly; it requires sustained writing volume on the open web. Schroeder et al. argue that this makes the AI training pipeline itself a national security surface.

The paper's named real-world example is the Pravda network — a pro-Kremlin operation that researchers at NewsGuard and elsewhere have documented producing thousands of articles per month across hundreds of look-alike sites since 2024, deliberately optimized for AI ingestion rather than human readership. The paper notes that early measurements of frontier models show non-trivial reproduction of Pravda-aligned framing on certain queries about Ukraine, Russia, and NATO. The mechanism is exactly what LLM Grooming predicts: the model has read more pro-Kremlin content on those topics during training than the underlying empirical record would warrant, and it weights its outputs accordingly. The Pravda case is the proof of concept; the paper argues many smaller-scale equivalents are running now.

For builders working on AI products, the practical implications are not subtle. Detection of a single bot account by stylometric or behavioral signals is becoming harder than the 2017-era detection literature assumes. Defending the training corpus is now its own problem distinct from defending the model: provenance tooling, source diversity audits, and hard caps on the influence any single domain or cluster can have are all genuine engineering work and are mostly not being done. The paper does not propose detailed defenses, which is fair; identifying a threat clearly is its own contribution. The honest takeaway is that the legacy frame of "AI safety" focused on model output filtering is increasingly inadequate against attackers whose goal is to alter what the next model learns, not to jailbreak the current one. The economics favor the attacker: bot text is cheap, and crawlers cannot easily tell synthetic from authentic at scale.

Science paper coins 'LLM Grooming': flood the web with bot-written content to poison the next model's training data

More News