OpenAI has developed some new tools to detect content generated by ChatGPT and its AI models, but it is not going to roll them out yet. The company has come up with a way to overlay a type of watermark onto AI-produced text. This built-in indicator could accomplish the goal of guessing when AI has written some content. However, OpenAI is hesitant to offer this feature when it could harm those who use its models for benign purposes.
OpenAI’s new method would employ algorithms capable of embedding subtle markers in the text generated by ChatGPT. Though invisible to the naked eye, the tool would use a specific format of words and phrases that signal the origin of ChatGPT’s text. There are obvious reasons why this could be a boon for generative AI as an industry, as OpenAI notes. Watermarking could play a critical role in combating misinformation, ensuring transparency in content creation and preserving the integrity of digital communications. It’s also similar to a tactic OpenAI already employs for its AI-generated images. The DALL-E 3 text-to-image model produces visuals with metadata explaining their AI origin, including invisible digital watermarks that can even survive any attempts to remove them through editing.
But words aren’t the same as images. Even under the best of circumstances, OpenAI admitted that all it would take is a third-party tool to reword the AI-generated text and effectively make the watermark disappear. And while OpenAI’s new approach could work in many cases, the company was quick to highlight its limits and even why it might not always be desirable to employ a successful watermark, regardless.
“While it has been highly accurate and even effective against localized manipulation, such as paraphrasing, it is less robust against globalized manipulation — such as using translation systems, rephrasing with another generative model, or asking the model to insert a special character between each word and then removing that character — making it easy to bypass by malicious actors,” OpenAI explained in a blog post. “Another important risk we are weighing is that our research suggests the text watermarking method has the potential to disproportionately impact some groups.”
AI Authorship Seal
OpenAI fears that the negative consequences of launching such AI watermarks will outweigh any positive impact. The company specifically cited those who use ChatGPT for productivity tasks, but it could even lead to stigmatization or outright criticism of users who rely on generative AI tools, regardless of who they are and how they use them.
This could disproportionately affect non-English speaking ChatGPT users who employ translations and create content in a different language. The presence of watermarks could create barriers for these users, reducing the effectiveness and acceptance of AI-generated content in multilingual contexts. The potential negative reaction from users could lead them to abandon the tool if they know their content can be easily identified as AI-generated.
It’s worth noting that this isn’t OpenAI’s first foray into the field of AI text detectors. However, the company ended up shutting down the previous detector after just six months and later said that such tools are ineffective in general, which explains why there’s no such option in a teacher’s guide to using ChatGPT. Still, the update suggests that research into finding a perfect way to detect AI text without causing problems that turn people away from AI text generators is far from over.