British parenting site Mumsnet has filed a lawsuit against OpenAI, alleging that it violated copyright law by using its data to train its AI models, including those that power ChatGPT. It is the first such legal action to be taken against OpenAI in the UK, but is one of a growing number of similar cases reported internationally accusing OpenAI of illicitly mining data for its models without permission. Mumsnet claims that its forums host more than six billion words, and that OpenAI used those words to teach its AI models about parenting and related topics.
“This type of permissionless scraping is an explicit violation of our terms of use, which clearly state that no part of the site may be distributed, scraped or copied for any purpose without our express approval,” Mumsnet co-founder Justine Roberts explained in a post on the website. “LLMs are building models like ChatGPT to provide the answers to any and all potential questions that will mean we will no longer need to look elsewhere for solutions. And they are building those models with content scraped from the websites they are poised to replace.”
The lawsuit points to the timing of the data collection as another point of contention, as this mostly occurred before websites paid attention to whether AI companies were mining their data. Mumsnet alleges that, initially, third-party research institutions performed most of this data mining process.
Roberts wrote that Mumsnet reached out to OpenAI to license its content, noting that the platform has a concentrated collection of women’s writing that is unlike most internet content. But OpenAI turned them down, citing interest in “datasets that are not easily accessible online,” according to Roberts.
Scrape off debris
Mumsnet is not alone in its complaints about OpenAI's use of data, and is now part of a growing cohort of companies taking OpenAI to court over the issue. For example, the Authors Guild has sued OpenAI, alleging that copyrighted books were used to train AI models, as have a group of academics who claim that OpenAI also appropriated their articles. Reuters and The New York Times Both have sued OpenAI not only for stealing data, but also for claiming that ChatGPT generates responses with content that is too similar to their copyrighted articles. Even Creative Commons has filed a lawsuit against the AI developer, alleging that the company used Creative Commons-licensed content to train its AI models in ways that violated the terms of the licenses.
OpenAI has defended its practices by claiming they fall under the doctrine of fair use. In the UK, the company responded to a House of Lords inquiry acknowledging the need to use copyrighted materials to train its AI models and that it should do more to support content creators, but it still maintains that what it does is legal. While this is OpenAI’s first case in the UK on the issue, Getty Images has a similar case underway in the country’s courts against Stability AI over its image-generating AI.
The outcome of the Mumsnet lawsuit and other cases may set precedents for how AI companies handle copyrighted content and could influence future regulations and licensing practices. The effort to balance AI innovation and intellectual property rights is far from settled and likely won’t be for a long time.
To be fair, Mumsnet isn’t against LLMs and AI as a concept. In fact, Mumsnet employed OpenAI’s models to create an AI chatbot called MumsGPT last year. MumsGPT was only available to Mumsnet executives when it was announced and hasn’t been mentioned since, so it’s possible it’s no longer available, but the idea was to offer it as a research tool and even as something policymakers could use to develop regulations related to parenting. Roberts didn’t mention MumsGPT, but she did emphasize that there are positive potential uses for AI in her explanation of the lawsuit.
“But if LLMs are allowed to simply steal content from publishers and communities like Mumsnet, they risk destroying them,” Roberts wrote. “We know that taking on a multinational giant like OpenAI, with its $3 billion in revenue, is no easy feat given the huge resources they will throw at us, but this is too important an issue to ignore. Not just for Mumsnet, but for every website you’ve come to for news, advice, or just to ask if you’re being unreasonable.”