product updateGitHub

GitHub will train Copilot models on user interaction data starting April 2026

TL;DR

GitHub will use Copilot interaction data from Free, Pro, and Pro+ plan users to train AI models starting April 24, 2026, unless users actively opt out. The policy does not affect Copilot Business and Enterprise customers. Data shared will include prompts, outputs, code snippets, filenames, and repository structures.

2 min read
0

GitHub Will Train Copilot Models on User Interaction Data Starting April 2026

GitHub announced a significant change to its Copilot data policy effective April 24, 2026. Starting that date, interaction data from users on Free, Pro, and Pro+ plans will be used to train AI models unless users explicitly opt out.

What Data Will Be Collected

The data collection will include:

  • User prompts and model outputs
  • Code snippets
  • Filenames
  • Repository structures
  • User feedback on model suggestions

Users who have previously opted out of data collection will retain their existing settings and will not be automatically enrolled.

Scope and Limitations

Copilot Business and Enterprise customers are exempt from this policy change. GitHub clarified that collected data can be shared with Microsoft but will not be shared with third-party AI model providers.

GitHub Chief Product Officer Mario Rodriguez stated that real-world usage data improves model quality. Internal testing with data from Microsoft employees already demonstrated higher acceptance rates for code suggestions, suggesting the approach yields measurable improvements.

Opt-Out Process

Users who wish to prevent their data from being used for training can opt out through Copilot settings under the "Privacy" section. GitHub indicated that more details are available on the GitHub blog.

What This Means

This policy represents a shift toward extracting training value from Copilot's large user base of developers. The limitation to Free, Pro, and Pro+ plans—while exempting Enterprise customers—suggests GitHub is balancing competitive advantage with enterprise customer expectations around data usage. The explicit opt-out structure (rather than opt-in) will likely result in significantly higher participation rates unless developers proactively change settings. The exclusion of third-party providers indicates Microsoft intends to keep this training data advantage internal rather than licensing it broadly, positioning GitHub Copilot improvements as a Microsoft-exclusive benefit.

Related Articles

product update

GitHub Reduces Token Usage in Copilot Agentic Workflows Running on Pull Requests

GitHub has optimized token usage in its production agentic workflows that run on every pull request. The company instrumented its own Copilot workflows to identify inefficiencies and built agents to address them, aiming to reduce accumulated API costs.

product update

GitHub reduces token costs in production agentic workflows with instrumentation and automated fixes

GitHub details how it reduced token consumption in production agentic workflows that run on every pull request. The company instrumented its own workflows to identify inefficiencies and built automated agents to address them.

research

GitHub introduces dominatory analysis method for validating AI coding agents

GitHub has published a research approach for validating AI coding agents when traditional correctness testing breaks down. The company proposes dominatory analysis as an alternative to brittle scripts and black-box LLM judges for building what it calls a 'Trust Layer' for GitHub Copilot Coding Agents.

research

GitHub develops dominance analysis method to validate AI coding agent outputs without deterministic correctness

GitHub has published research on validating agentic AI behavior when there's no single "correct" answer. The company proposes dominance analysis as an alternative to brittle scripts or opaque LLM-as-judge approaches for building a trust layer in GitHub Copilot coding agents.

Comments

Loading...