The cleanest version of agent automation was supposed to come from enterprise SaaS: CRM gets an AI copilot, IDE gets an AI pair programmer, nobody has to think too hard about what's being modeled. The messier version is already in production inside Chinese tech companies, and it runs on GitHub. MIT Technology Review has a piece this week on Colleague Skill, a tool by Tianwing Zhou of Shanghai AI Lab that does exactly what its name suggests. Feed it a coworker's name, it scrapes your workplace chat apps, and hands you back a workflow manual an agent can replay.
The mechanics are direct. Colleague Skill pulls chat histories and files from Lark and DingTalk (the Chinese enterprise equivalents of Slack and Teams), runs the data through a distillation pipeline, and emits a document describing job duties plus "unique quirks" to replicate. The MITTR piece doesn't pin down which models power the distillation step, and Zhou hasn't published architecture details. What's documented is the data substrate: workplace chat logs, file metadata, and enough behavioral signal to encode personality. Separately, another engineer, Koki Xu, shipped an anti-distillation tool on April 4 with light, medium, and heavy sabotage modes that rewrite the input workflow documents into "generic, non-actionable language" before they get absorbed. The sabotage-tool demo video got more than five million likes.
This is the real shape of agent deployment, not the pitch-deck version. The deployment surface isn't a clean API or a polished enterprise feature; it's whatever data your employer can reach, specifically the chat histories and shared files where most knowledge work actually happens. The countermove isn't policy, it's engineering. Xu's tool is essentially adversarial data shaping, executed by the workers whose output is the training signal. If you build agent tools, this is a signal worth paying attention to: the people whose knowledge you want to capture have a shipping culture, and they are already treating distillation pipelines as hostile input to be perturbed.
Two things to chew on. First, the chat-app scrape is the real training substrate — Lark, DingTalk, Slack, Teams, Notion are where the signal lives, and agent tooling that depends on clean hand-written SOPs is missing the actual data distribution. Second, the sabotage mode is going to generalize. Once a light/medium/heavy pattern for adversarial workflow rewrites catches on, you will see it deployed against every "your AI coworker" tool whose input is user-facing workplace content. If your agent pipeline treats the distillation input as trustworthy, plan for the input not being trustworthy.
