AI Watermarking Won't Curb Disinformation
Generative AI allows people to produce piles upon piles of images and words very quickly. It would be nice if there were some way to reliably distinguish AI-generated content from human-generated content. It would help people avoid endlessly arguing with bots online, or believing what a fake image purports to show. One common proposal is that big companies should incorporate watermarks into the outputs of their AIs. For instance, this could involve taking an image and subtly changing many pixels in a way that’s undetectable to the eye but detectable to a computer program. Or it could involve swapping words for synonyms in a predictable way so that the meaning is unchanged, but a program could readily determine the text was generated by an AI.
Unfortunately, watermarking schemes are unlikely to work. So far most have proven easy to remove, and it’s likely that future schemes will have similar problems.
One kind of watermark is already common for digital images. Stock image sites often overlay text on an image that renders it mostly useless for publication. This kind of watermark is visible and is slightly challenging to remove since it requires some photo editing skills.
Images can also have metadata attached by a camera or image processing program, including information like the date, time, and location a photograph was taken, the camera settings, or the creator of an image. This metadata is unobtrusive but can be readily viewed with common programs. It’s also easily removed from a file. For instance, social media sites often automatically remove metadata when people upload images, both to prevent people from accidentally revealing their location and simply to save storage space.
A useful watermark for AI images would need two properties:
- It would need to continue to be detectable after an image is cropped, rotated, or edited in various ways (robustness).
- It couldn’t be conspicuous like the watermark on stock image samples, because the resulting images wouldn’t be of much use to anybody.
One simple technique is to manipulate the least perceptible bits of an image. For instance, to a human viewer these two squares are the same shade:
But to a computer it’s obvious that they are different by a single bit: #93c47d vs 93c57d. Each pixel of an image is represented by a certain number of bits, and some of them make more of a perceptual difference than others. By manipulating those least-important bits, a watermarking program can create a pattern that viewers won’t see, but a watermarking-detecting program will. If that pattern repeats across the whole image, the watermark is even robust to cropping. However, this method has one clear flaw: rotating or resizing the image is likely to accidentally destroy the watermark.
There are more sophisticated watermarking proposals that are robust to a wider variety of common edits. However, proposals for AI watermarking must pass a tougher challenge. They must be robust against someone who knows about the watermark and wants to eliminate it. The person who wants to remove a watermark isn’t limited to common edits, but can directly manipulate the image file. For instance, if a watermark is encoded in the least important bits of an image, someone could remove it by simply setting all the least important bits to 0, or to a random value (1 or 0), or to a value automatically predicted based on neighboring pixels. Just like adding a watermark, removing a watermark this way gives an image that looks basically identical to the original, at least to a human eye.
Coming at the problem from the opposite direction, some companies are working on ways to prove that an image came from a camera (“content authenticity”). Rather than marking AI generated images, they add metadata to camera-generated images, and use cryptographic signatures to prove the metadata is genuine. This approach is more workable than watermarking AI generated images, since there’s no incentive to remove the mark. In fact, there’s the opposite incentive: publishers would want to keep this metadata around because it helps establish that their images are “real.” But it’s still a fiendishly complicated scheme, since the chain of verifiability has to be preserved through all software used to edit photos. And most cameras will never produce this metadata, meaning that its absence can’t be used to prove a photograph is fake.
Comparing watermarking vs content authenticity, watermarking aims to identify or mark (some) fake images; content authenticity aims to identify or mark (some) real images. Neither approach is comprehensive, since most of the images on the Internet will have neither a watermark nor content authenticity metadata.
Watermarking | Content authenticity | |
AI images | Marked | Unmarked |
(Some) camera images | Unmarked | Marked |
Everything else | Unmarked | Unmarked |
Text-based Watermarks
The watermarking problem is even harder for text-based generative AI. Similar techniques can be devised. For instance, an AI could boost the probability of certain words, giving itself a subtle textual style that would go unnoticed most of the time, but could be recognized by a program with access to the list of words. This would effectively be a computer version of determining the authorship of the twelve disputed essays in The Federalist Papers by analyzing Madison’s and Hamilton’s habitual word choices.
But creating an indelible textual watermark is a much harder task than telling Hamilton from Madison, since the watermark must be robust to someone modifying the text trying to remove it. Any watermark based on word choice is likely to be defeated by some amount of rewording. That rewording could even be performed by an alternate AI, perhaps one that is less sophisticated than the one that generated the original text, but not subject to a watermarking requirement.
There’s also a problem of whether the tools to detect watermarked text are publicly available or are secret. Making detection tools publicly available gives an advantage to those who want to remove watermarking, because they can repeatedly edit their text or image until the detection tool gives an all clear. But keeping them a secret makes them dramatically less useful, because every detection request must be sent to whatever company produced the watermarking. That would potentially require people to share private communication if they wanted to check for a watermark. And it would hinder attempts by social media companies to automatically label AI-generated content at scale, since they’d have to run every post past the big AI companies.
Since text output from current AIs isn’t watermarked, services like GPTZero and TurnItIn have popped up, claiming to be able to detect AI-generated content anyhow. These detection tools are so inaccurate as to be dangerous, and have already led to false charges of plagiarism.
Lastly, if AI watermarking is to prevent disinformation campaigns sponsored by states, it’s important to keep in mind that those states can readily develop modern generative AI, and probably will in the near future. A state-sponsored disinformation campaign is unlikely to be so polite as to watermark its output.
Watermarking of AI generated content is an easy-sounding fix for the thorny problem of disinformation. And watermarks may be useful in understanding reshared content where there is no deceptive intent. But research into adversarial watermarking for AI is just beginning, and while there’s no strong reason to believe it will succeed, there are some good reasons to believe it will ultimately fail.