A small ecosystem of brokers has emerged to monetize a previously-overlooked asset class: the years of internal communications accumulated by startups before they shut down. SimpleClosure, a wind-down specialist, has completed nearly 100 such transactions in the past year, paying out more than 1 million dollars to founders with typical deals ranging from 10,000 to 100,000 dollars per company. The data being sold is not generic logs. It is Slack message archives, email correspondence, Jira tickets, and multi-terabyte Google Drive directories representing the daily work product of every employee for the company's lifetime. Protege, an AI training-data broker run by CEO Bobby Samuels, vets and resells the data to model developers. Cielo24, a 13-year-old transcription company that closed under CEO Shanna Johnson, is one of the documented examples. The legal basis is mundane: employees signed IP agreements covering work materials. The ethical basis is contested.

The privacy mechanics deserve a careful look. Standard IP assignment clauses in employment contracts grant the employer rights over work product but do not contemplate the post-shutdown sale of personal-but-work-adjacent communications. Slack DMs, candid email exchanges, and the running text of a company's internal life are technically work product but practically a record of human relationships. Marc Rotenberg, founder of the Center for AI and Digital Policy, has flagged this gap explicitly. Anonymization is the obvious mitigation, but Bobby Samuels of Protege has acknowledged that imperfect anonymization can leak into model output. The pattern of risk is similar to the medical-records-anonymization debate of the early 2010s, where research showed that supposedly de-identified data often contained enough signals to re-identify specific individuals. The same vulnerability applies here, with the additional twist that the dataset includes the kinds of personal disclosures employees make to colleagues but would not make publicly.

The macro picture is that high-quality conversational training data is a scarce resource and pricing is rising. Reddit's licensing deal with Google was 60 million dollars per year for 2024 conversational data; Stack Overflow with OpenAI was at a similar order of magnitude. As public-internet data gets exhausted and contested, AI developers are actively pursuing closed-conversation corpora that capture how professionals actually talk to each other in working contexts. Defunct startup Slacks fit that profile precisely. They contain technical discussions, customer-service dialogues, internal debates, and the kind of context-rich back-and-forth that pretraining datasets struggle to replicate from public sources. The economic logic for AI labs is clear. The economic logic for shutting-down founders, who otherwise have to pay for data destruction services, is also clear. The misalignment is between those two parties and the third party, the employees, whose communications are the actual asset.

For builders, the practical takeaway is twofold. First, if you are building or licensing AI models, the provenance question on training data is getting more pointed. Whether your training set includes data your end users would consider private is increasingly a procurement-due-diligence question, not a footnote. Second, if you are an employee or have been one, your reasonable expectation about the lifespan and use of your work communications no longer matches reality. A defensive practice is to audit what you have said in employer-controlled channels under the assumption that some non-zero fraction of those messages will end up in a training dataset, possibly attributed to you in some inference output years from now. That is a depressing framing, but it is the operating one. Industry lobbying or legislative action could change it. As of today, what is happening is happening, and the legal infrastructure is permissive.