Tuesday, November 26th 2024
Microsoft Office Tools Reportedly Collect Data for AI Training, Requiring Manual Opt-Out
Microsoft's Office suite is the staple in productivity tools, with millions of users entering sensitive personal and company data into Excel and Word. According to @nixCraft, an author from Cyberciti.biz, Microsoft left its "Connected Experiences" feature enabled by default, reportedly using user-generated content to train the company's AI models. This feature is enabled by default, meaning data from Word and Excel files may be used in AI development unless users manually opt-out. As a default option, this setting raises security concerns, especially from businesses and government workers relying on Microsoft Office for proprietary work. The feature allows documents such as articles, government data, and other confidential files to be included in AI training, creating ethical and legal challenges regarding consent and intellectual property.
Disabling the feature requires going to: File > Options > Trust Center > Trust Center Settings > Privacy Options > Privacy Settings > Optional Connected Experiences, and unchecking the box. Even with an unnecessary long opt-out steps, the European Union's GPDR agreement, which Microsoft complies with, requires all settings to be opt-in rather than opt-out by default. This directly contradicts EU GDPR laws, which could prompt an investigation from the EU. Microsoft has yet to confirm whether user content is actively being used to train its AI models. However, its Services Agreement includes a clause granting the company a "worldwide and royalty-free intellectual property license" to use user-generated content for purposes such as improving Microsoft products. The controversy raised from this is not new, especially where more companies leverage user data for AI development, often without explicit consent.For the current LLM AI models, the data on which they are being trained is the key to distinguishing them from competitors. Quality data is the prize, and when a unique dataset like the one Microsoft has access to is collected, that AI model could outperform the competition by a mile in tasks like writing and basic reasoning. Especially with sensitive data not available to the public, Microsoft could extend its AI lead. However, LLMs are not immune to leaking a part of their training data, so a skilled professional could extract it. For now, users who wish to protect their intellectual property are advised to review their settings carefully.
Update Nov 26th 08:00 UTC: Microsoft reached out to us via email and confirmed:
Source:
via Tom's Hardware
Disabling the feature requires going to: File > Options > Trust Center > Trust Center Settings > Privacy Options > Privacy Settings > Optional Connected Experiences, and unchecking the box. Even with an unnecessary long opt-out steps, the European Union's GPDR agreement, which Microsoft complies with, requires all settings to be opt-in rather than opt-out by default. This directly contradicts EU GDPR laws, which could prompt an investigation from the EU. Microsoft has yet to confirm whether user content is actively being used to train its AI models. However, its Services Agreement includes a clause granting the company a "worldwide and royalty-free intellectual property license" to use user-generated content for purposes such as improving Microsoft products. The controversy raised from this is not new, especially where more companies leverage user data for AI development, often without explicit consent.For the current LLM AI models, the data on which they are being trained is the key to distinguishing them from competitors. Quality data is the prize, and when a unique dataset like the one Microsoft has access to is collected, that AI model could outperform the competition by a mile in tasks like writing and basic reasoning. Especially with sensitive data not available to the public, Microsoft could extend its AI lead. However, LLMs are not immune to leaking a part of their training data, so a skilled professional could extract it. For now, users who wish to protect their intellectual property are advised to review their settings carefully.
Update Nov 26th 08:00 UTC: Microsoft reached out to us via email and confirmed:
Statement from MicrosoftMicrosoft does not use customer data from Microsoft 365 consumer and commercial applications to train large language models. Additionally, the Connected Services setting has no connection to how Microsoft trains large language models.Connected Experiences allows users to search and download online content to enhance their documents. This includes templates, images, 3D models, videos, and reference materials. Examples include Microsoft Office templates and PowerPoint QuickStarter presentations. Microsoft has also provided a table of what Connected Experiences downloads, which you can see below:
56 Comments on Microsoft Office Tools Reportedly Collect Data for AI Training, Requiring Manual Opt-Out
If you can get a hold of an old version of Outlook (up to Office 2010) or Thunderbird (up to version 52) that has an importer for Outlook Express
Or... run a virtual machine just for Outlook Express, if you really can't stand the crappy webmail UXs or any current local email program. :pimp:
www.thunderbird.net/
I wouldn’t touch new outlook at home with a 20 meter science pole.
I've been using LibreOffice for years now.
They went back to google docs, but they're still pushing for full cloud intune enrollment and AD, having not learned their lesson. In my experience normal operation for onedrive is total disobedience and de synced folders. There's really 0 reason for 99% of people to be using office. Google docs is free, with cloud backups and network administration. Libreoffice is free, standalone, with no spying or other nonsense. There are other options too. Paying hundreds of dollar s"because office" is one of the most boomer things I have to put up with.
M$ has yet to take this lesson.
Thanks Dr. Dro & Microsoft for the advertisement of Libre Office!
Microsoft trusts the ecosystem they helped create. They trust that the end user (who pays for the computer, licenses and electricity and incidentally keeps that computer at their home/office) wont mess with it much. That's what it is.
I've never been on board with this "pay us indefinitely to get all the enhancements and improvements" concept. My experience with that is Office365 at work, where it often feels like MS rearranging the deck chairs. Maybe I didn't want the latest feature you put out, especially in its half-baked condition. SaaS takes away the ability to vote with your wallet without making a big sacrifice. Instead of just sticking to an older, proven version while waiting for the software vendor to prove their worth, you have to find a new alternative altogether. They include some cloud storage in their bundles for a few reasons. One, it's to get your data and get you hooked, but two, it's to make you think you're getting something of lasting value in your subscription.
support.google.com/a/answer/60762?hl=en
Claims they are compliant with ISO/IEC 27018:2014. For our org that's good enough for compliance.