GitHub announced it will begin using interaction data from Copilot Free, Pro, and Pro+ users to train and improve its AI models starting April 24, unless users explicitly opt out. The change affects millions of developers but notably excludes Copilot Business and Enterprise customers, whose data remains protected. GitHub will collect prompts, code suggestions, accepted outputs, file names, repository structure, and user feedback to refine model performance.

This move puts GitHub squarely in line with the broader AI industry's data-hungry approach, where user interactions become training fuel for better models. GitHub CPO Mario Rodriguez frames it as essential for AI development, stating the company needs "real-world interaction data from developers like you." The timing is telling—as AI coding assistants mature beyond their initial training on public code, companies need interaction data to understand how developers actually work, not just how code looks in repositories.

The policy creates a clear two-tier system: individual developers and small teams become data sources, while enterprise customers maintain data privacy protections. GitHub promises not to share training data with third-party AI providers, keeping it within the Microsoft ecosystem. The company also states that private repository content "at rest" won't be used for training, though the distinction between processed interaction data and stored code may confuse some users.

Developers should review their privacy settings before April 24 if they want to avoid contributing to GitHub's model training. Those who've already opted out remain protected, but the default opt-in approach means most users will unknowingly become part of GitHub's training dataset unless they take action.