Home
Tech
Meta admits it copied all Australian Facebook posts since 2007 to train its AI

Meta admits it copied all Australian Facebook posts since 2007 to train its AI

Meta has admitted that it used public Facebook and Instagram posts from Australian users to train its AI models and has collected data since 2007.

An Australian parliamentary committee has heard that while European users can opt out under GDPR laws, Australian customers do not have that option.

Meta has denied using information from minors under 18, but did confirm that it had used data from more than a decade ago. The company could not answer whether it had extracted photos of children who are now adults (i.e. those who created their accounts as children but have since turned 18).

A changing tide

The process of scraping is essential to AI development and basically involves collecting data from websites, extracting the information and feeding it to a large language model (LLM) that learns from the data. This means that GDPR regulations are proving problematic for more and more LLMs. like ChatGPTthat collects data from all over the Internet without the consent of the original source.

Meta’s global privacy director, Melinda Claybaugh, sat before the committee and admitted that the company was forced to pause the launch of AI products in Europe due to a lack of certainty, and had to give European users an opt-out option due to stronger privacy laws. Senator Shoebridge questioned Meta’s representative.

“The truth of the matter is that unless you have consciously set those posts to private, since 2007, Meta has simply decided that it would remove all photos and all text from every public post on Instagram or Facebook that Australians have shared since 2007, unless there has been a conscious decision to set them to private. But that’s the reality, isn’t it?”

Claybaugh replied: “Correct.” He added that users can set their posts to private now to prevent their information from being tracked in the future, but that this would have no effect on data that has already been collected.

It seems that the public and technology companies are realizing that training AI models requires such large amounts of data that it is “impossible” to do so. Without using copyrighted materialsConsidering that millions of user posts have been used without their consent, it seems that tech giants could face much stricter regulations in the future.

Through The Guardian

You may also like...