What Data Does Microsoft Actually Use To Train Its AI?

March 28, 2025

MARCH 27, 2025

Rumors online have suggested for months that Microsoft could be using your Word data to train its artificial intelligence (AI) and large language models (LLMs).

Microsoft, however, says it doesn’t use Word—or any of its Microsoft 365 products—to train its AI.

So what data does the tech company use?

De-Identified Data

Microsoft says it trains its AI models using de-identified data from Bing searches, MSN activity, Copilot conversations and ad interactions—unless users are excluded, or have opted out.

“De-identified” means any personal information that could link the data back to an individual—like names, email addresses or account IDs—has been removed or obscured.

Microsoft says it also takes additional steps like removing metadata from images, or blurring faces to prevent the data from being traceable to specific users.

A spokesperson for Microsoft told Newsweek in an email that the company uses, “A variety of data sources, including publicly available information, in a manner consistent with copyright and [intellectual property] laws.”

The spokesperson added: “Microsoft is committed to responsibly scaling artificial intelligence and to listen, learn and improve our tools.”

How to Opt Out

To prevent your interactions with Microsoft’s Copilot from being used to train its AI models, you can opt out by adjusting your privacy settings.

Here’s how:

On Windows:

Open the Copilot application.
Click on your profile name or ‘Account’ in the settings menu.
Navigate to ‘Privacy’ > ‘Model training’.
Toggle off the ‘Model training’ option.

On Microsoft Edge:

Open Microsoft Edge.
Click on the menu (three dots) and select ‘Settings’.
Go to ‘Sidebar’ > ‘Copilot’ > ‘Copilot Settings’.
Disable ‘Model Training on Text’ to opt out of text-based data training.

After opting out, Microsoft says your past, present and future conversations with Copilot will be excluded from AI model training.

Changes may take up to 30 days to be fully implemented across Microsoft’s systems.

The Data Microsoft Does Not Use

Microsoft says it does not use the following types of data to train its AI models:

Data from commercial customers, or anyone logged into a Microsoft 365 (M365) organizational, personal or family subscription.
Users who are not logged into Copilot using a Microsoft Account, or a supported third-party authentication method.
Authenticated users under the age of 18.
Users who have opted out of training.
Users located in nearly 40 countries including Brazil, China (excluding Hong Kong), Israel, Nigeria, South Korea and Vietnam. These users can access AI features, but their data is not used for training.

Additionally, the spokesperson said Microsoft does not train AI on:

Microsoft account profile data
Email contents
Contents of files uploaded to Copilot

While conversations about uploaded files may be used, Microsoft says any associated images are de-identified—such as by removing metadata or blurring faces—to protect data privacy.

“At Microsoft, we take our commitments to responsible AI seriously,” the spokesperson said.

“Providing data protection to consumers and the enterprise is at the core of what we do. As part of our AI principles, we believe that AI systems should be secure and respect privacy.”

Courtesy/Source: Newsweek