Monday, July 3rd 2023

Google Will Use Your Data to Train Their AI According to Updated Privacy Policy

Google made a small but important change to their privacy policy over the weekend that effectively lays claim to anything you post publicly online for use to train their AI models. The original wording of the section of their privacy policy claimed that public data would be used for business purposes, research, and for improving Google Translate services. Now however the section has been updated to read the following:
Google uses information to improve our services and to develop new products, features and technologies that benefit our users and the public. For example, we use publicly available information to help train Google's AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.
Further down in the policy text Google has another section which exemplifies the areas of "publicly available" information they seek to scrape,
For example, we may collect information that's publicly available online or from other public sources to help train Google's AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities. Or, if your business's information appears on a website, we may index and display it on Google services.

The new change has already gone into effect as of July 1st, 2023. Given the scope and longevity of Google accounts (think how long some people have had Gmail and YouTube accounts) this change now formally includes an incredibly vast amount of public interaction data stretched over decades. What is still uncertain is whether those individuals that have committed to "de-Googling" their online lives could be caught up in the dragnet of Google's data scraping regardless of whether they've agreed to this policy change, or if simply having any contact with Google over the years is enough. Large-scale public scraping has already been happening regardless of individual consent with other large language training models, such as OpenAI's ChatGPT. Ideally though this change affects only those whom have active accounts with various Google services. One important point to be made is that Google does not mention anything about using private data, and such data shared with Google is apparently safe from being ingested into the AI machine. For now.
Sources: Gizmodo, Google
Add your own comment

34 Comments on Google Will Use Your Data to Train Their AI According to Updated Privacy Policy

#1
FierceRed
Can't be too surprised. The only reason Google is so powerful is their population of users using their services over decades. Deeper datasets = deeper insights usually.

To not leverage that advantage would be asinine.

Always awesome to see these changes happen before a long weekend when everyone is too busy living life to put it in a news cycle. :shadedshu:
Posted on Reply
#2
R0H1T
You can always choose to delete your data with them like browsing history/youtube/maps/search queries et al. Though I'm not sure when it's completely purged from their servers, if at all.
Posted on Reply
#3
Ferrum Master
If it trains on online trolls, what a bright future we will have here
Posted on Reply
#4
Jism
Future fully automated - any human intervention gone at some point.

Google is build on other people's data. It's that simple.
Posted on Reply
#5
Bomby569
That's their bussiness model, use other people's data to make money. The disclaimer is just that, like saying tobacco kills.
i avoid their products like the plague.
Posted on Reply
#6
bug
They've always used users' data in their products (and been upfront about it), I'm surprised using it for AI wasn't already covered.
Posted on Reply
#7
Bomby569
bugThey've always used users' data in their products (and been upfront about it), I'm surprised using it for AI wasn't already covered.
Ai will be heavily regulated in some places like the EU and i think Japan too, so probably to avoid any law problems for not mentioning it specifically
Posted on Reply
#8
kondamin
I hope we can see improvements to translations quickly.

it’s ok for romance and Germanic languages but there is a lot to be desired when going between let’s say English and Korean.
Posted on Reply
#9
Turmeric
"AI" they still use that misleading marketing term. is there no laws to stop this shit.....
do not answer this, laws wont fix anything, i am just venting.
and google, you can go to hell! assimilate this comment you twat.
Posted on Reply
#10
trsttte
Wait, weren't they already doing so!?
Posted on Reply
#12
kapone32
Since Google have decided to not cover Canadian News having to pay for News Content the can go somewhere.
Posted on Reply
#13
bushlin
Many things are publicly available but copyright, ownership and rights to reproduce are not as free and easy as Google appear to be expecting.
Conflating information not blocked for collection so that it appears in search results, with information used to train AI seems suspiciously like Google abusing their market position.
Posted on Reply
#14
bug
bushlinMany things are publicly available but copyright, ownership and rights to reproduce are not as free and easy as Google appear to be expecting.
Conflating information not blocked for collection so that it appears in search results, with information used to train AI seems suspiciously like Google abusing their market position.
Google is not abusing anything, companies are using double standard when they complain about Google using their content.
developers.google.com/search/docs/crawling-indexing/block-indexing

This simple mechanism has existed forever. But companies don't use it. Why? Because they need traffic Google generates for them, many would go under without it. At the same time, they feel that because Google will slap a (contextual) ad next to their content and charge for that, they are entitled to a piece of that, too.
I mean, let's turn things around and imagine if Google asked for a part of the companies' revenue because they sent some traffic their way. Crazy, right?
Posted on Reply
#15
R-T-B
Ferrum MasterIf it trains on online trolls, what a bright future we will have here
This is largely why guardrails have been implemented: that has already become an issue.
Posted on Reply
#16
bug
Ferrum MasterIf it trains on online trolls, what a bright future we will have here
Most people don't understand how these models work (I got it wrong at first, too): they are trained on a set of curated inputs, the resulting model is tested and released only if it passes validation. It doesn't learn anything after that. It's quite a big limitation, but at the same time, it's the only way to guarantee models won't go off the farm.

At the same time, I have no doubt somewhere in Russia and China someone is training models on their troll farms specifically.
Posted on Reply
#17
bushlin
bugGoogle is not abusing anything, companies are using double standard when they complain about Google using their content.
developers.google.com/search/docs/crawling-indexing/block-indexing

This simple mechanism has existed forever. But companies don't use it. Why? Because they need traffic Google generates for them, many would go under without it. At the same time, they feel that because Google will slap a (contextual) ad next to their content and charge for that, they are entitled to a piece of that, too.
I mean, let's turn things around and imagine if Google asked for a part of the companies' revenue because they sent some traffic their way. Crazy, right?
By your logic, if you don't want to compromise the ownership and copyright of your online content for AI training you must cut off by far the most used, effectively a monopoly, search engine and tank your discoverability... Pick one.
I don't see how that isn't abusing a monopolistic position.
Posted on Reply
#18
claes
Don’t know what copyright has to do with any of this. I consume copyrighted properties all day, analyze them, and then summarize and make observations about it. Is this illegal?

It’s not a reproduction of material, it’s an interpretation of it.
Posted on Reply
#19
bug
bushlinBy your logic, if you don't want to compromise the ownership and copyright of your online content for AI training you must cut off by far the most used, effectively a monopoly, search engine and tank your discoverability... Pick one.
I don't see how that isn't abusing a monopolistic position.
By your own logic, Google is obligated to index everything under the Sun and pay for things they index?

Before Google there were many persons/shows that would summarize printed press for the masses. Obviously they made money off of that, they would include advertising and stuff in their 15-30 minute slots. Yet nobody thought about asking for a piece of that pie.
Posted on Reply
#20
bushlin
bugBy your own logic, Google is obligated to index everything under the Sun and pay for things they index?

Before Google there were many persons/shows that would summarize printed press for the masses. Obviously they made money off of that, they would include advertising and stuff in their 15-30 minute slots. Yet nobody thought about asking for a piece of that pie.
What Google are obliged to do is not break antitrust law, Google profit greatly from the work of others by indexing it and serving ads alongside search results. They're not performing an altruistic act, it's business.
What they can't do is leverage the effective monopoly they have on search, as a means of preventing others from protecting their work from being further profited from to train an AI model.

In certain sectors AI threatens livelihoods, if that was you, you'd be pretty annoyed if you're being undercut by a derivative of your own work, taken without your consent... while person-on-the-internet's solution is to prevent indexing of your portfolio as if that's a perfectly valid option.
claesDon’t know what copyright has to do with any of this. I consume copyrighted properties all day, analyze them, and then summarize and make observations about it. Is this illegal?

It’s not a reproduction of material, it’s an interpretation of it.
We're entering a new murky world where the arguments applied as to what a human is capable of, doesn't really translate to an AI... which can 'learn' more in a few minutes than would be possible in the lifetime of a human.
This is Stable Diffusion's implementation of an image, I wonder if they scraped Getty Images?

Posted on Reply
#21
claes
You mean because I am stupider than a machine I can’t photoshop memes anymore? Damn.
Posted on Reply
#22
AsRock
TPU addict
FierceRedCan't be too surprised. The only reason Google is so powerful is their population of users using their services over decades. Deeper datasets = deeper insights usually.

To not leverage that advantage would be asinine.

Always awesome to see these changes happen before a long weekend when everyone is too busy living life to put it in a news cycle. :shadedshu:
And cannot avoid using them as even a lot of games even require you connecting to google these days.
Posted on Reply
#23
chrcoluk
A bigger issue is that they seem to be rolling out a removal of pagination on their text search page. Thats a big enough issue for me that it will stop me using the search engine.

On a new install of windows with me not logged into google I have no pagination, just an option to either auto generate new results on scroll or to manually click.

Luckily at the moment on this PC whilst signed in I still have pagination.
Posted on Reply
#25
bug
chrcolukA bigger issue is that they seem to be rolling out a removal of pagination on their text search page. Thats a big enough issue for me that it will stop me using the search engine.

On a new install of windows with me not logged into google I have no pagination, just an option to either auto generate new results on scroll or to manually click.

Luckily at the moment on this PC whilst signed in I still have pagination.
Out of curiosity, if everything can be accomplished by simply scrolling, why would you require the paginations controls, too?
Posted on Reply
Add your own comment
Apr 29th, 2024 22:36 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts