Vue lecture

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.

AI Watermarking Won't Curb Disinformation

Generative AI allows people to produce piles upon piles of images and words very quickly. It would be nice if there were some way to reliably distinguish AI-generated content from human-generated content. It would help people avoid endlessly arguing with bots online, or believing what a fake image purports to show. One common proposal is that big companies should incorporate watermarks into the outputs of their AIs. For instance, this could involve taking an image and subtly changing many pixels in a way that’s undetectable to the eye but detectable to a computer program. Or it could involve swapping words for synonyms in a predictable way so that the meaning is unchanged, but a program could readily determine the text was generated by an AI.

Unfortunately, watermarking schemes are unlikely to work. So far most have proven easy to remove, and it’s likely that future schemes will have similar problems.

One kind of watermark is already common for digital images. Stock image sites often overlay text on an image that renders it mostly useless for publication. This kind of watermark is visible and is slightly challenging to remove since it requires some photo editing skills.

Images can also have metadata attached by a camera or image processing program, including information like the date, time, and location a photograph was taken, the camera settings, or the creator of an image. This metadata is unobtrusive but can be readily viewed with common programs. It’s also easily removed from a file. For instance, social media sites often automatically remove metadata when people upload images, both to prevent people from accidentally revealing their location and simply to save storage space.

A useful watermark for AI images would need two properties: 

  • It would need to continue to be detectable after an image is cropped, rotated, or edited in various ways (robustness). 
  • It couldn’t be conspicuous like the watermark on stock image samples, because the resulting images wouldn’t be of much use to anybody.

One simple technique is to manipulate the least perceptible bits of an image. For instance, to a human viewer these two squares are the same shade:

But to a computer it’s obvious that they are different by a single bit: #93c47d vs 93c57d. Each pixel of an image is represented by a certain number of bits, and some of them make more of a perceptual difference than others. By manipulating those least-important bits, a watermarking program can create a pattern that viewers won’t see, but a watermarking-detecting program will. If that pattern repeats across the whole image, the watermark is even robust to cropping. However, this method has one clear flaw: rotating or resizing the image is likely to accidentally destroy the watermark.

There are more sophisticated watermarking proposals that are robust to a wider variety of common edits. However, proposals for AI watermarking must pass a tougher challenge. They must be robust against someone who knows about the watermark and wants to eliminate it. The person who wants to remove a watermark isn’t limited to common edits, but can directly manipulate the image file. For instance, if a watermark is encoded in the least important bits of an image, someone could remove it by simply setting all the least important bits to 0, or to a random value (1 or 0), or to a value automatically predicted based on neighboring pixels. Just like adding a watermark, removing a watermark this way gives an image that looks basically identical to the original, at least to a human eye.

Coming at the problem from the opposite direction, some companies are working on ways to prove that an image came from a camera (“content authenticity”). Rather than marking AI generated images, they add metadata to camera-generated images, and use cryptographic signatures to prove the metadata is genuine. This approach is more workable than watermarking AI generated images, since there’s no incentive to remove the mark. In fact, there’s the opposite incentive: publishers would want to keep this metadata around because it helps establish that their images are “real.” But it’s still a fiendishly complicated scheme, since the chain of verifiability has to be preserved through all software used to edit photos. And most cameras will never produce this metadata, meaning that its absence can’t be used to prove a photograph is fake.

Comparing watermarking vs content authenticity, watermarking aims to identify or mark (some) fake images; content authenticity aims to identify or mark (some) real images. Neither approach is comprehensive, since most of the images on the Internet will have neither a watermark nor content authenticity metadata.

Watermarking Content authenticity
AI images Marked Unmarked
(Some) camera images Unmarked Marked
Everything else Unmarked Unmarked

 

Text-based Watermarks

The watermarking problem is even harder for text-based generative AI. Similar techniques can be devised. For instance, an AI could boost the probability of certain words, giving itself a subtle textual style that would go unnoticed most of the time, but could be recognized by a program with access to the list of words. This would effectively be a computer version of determining the authorship of the twelve disputed essays in The Federalist Papers by analyzing Madison’s and Hamilton’s habitual word choices.

But creating an indelible textual watermark is a much harder task than telling Hamilton from Madison, since the watermark must be robust to someone modifying the text trying to remove it. Any watermark based on word choice is likely to be defeated by some amount of rewording. That rewording could even be performed by an alternate AI, perhaps one that is less sophisticated than the one that generated the original text, but not subject to a watermarking requirement.

There’s also a problem of whether the tools to detect watermarked text are publicly available or are secret. Making detection tools publicly available gives an advantage to those who want to remove watermarking, because they can repeatedly edit their text or image until the detection tool gives an all clear. But keeping them a secret makes them dramatically less useful, because every detection request must be sent to whatever company produced the watermarking. That would potentially require people to share private communication if they wanted to check for a watermark. And it would hinder attempts by social media companies to automatically label AI-generated content at scale, since they’d have to run every post past the big AI companies.

Since text output from current AIs isn’t watermarked, services like GPTZero and TurnItIn have popped up, claiming to be able to detect AI-generated content anyhow. These detection tools are so inaccurate as to be dangerous, and have already led to false charges of plagiarism.

Lastly, if AI watermarking is to prevent disinformation campaigns sponsored by states, it’s important to keep in mind that those states can readily develop modern generative AI, and probably will in the near future. A state-sponsored disinformation campaign is unlikely to be so polite as to watermark its output.

Watermarking of AI generated content is an easy-sounding fix for the thorny problem of disinformation. And watermarks may be useful in understanding reshared content where there is no deceptive intent. But research into adversarial watermarking for AI is just beginning, and while there’s no strong reason to believe it will succeed, there are some good reasons to believe it will ultimately fail.

Article 45 Will Roll Back Web Security by 12 Years

The EU is poised to pass a sweeping new regulation, eIDAS 2.0. Buried deep in the text is Article 45, which returns us to the dark ages of 2011, when certificate authorities (CAs) could collaborate with governments to spy on encrypted traffic—and get away with it. Article 45 forbids browsers from enforcing modern security requirements on certain CAs without the approval of an EU member government. Which CAs? Specifically the CAs that were appointed by the government, which in some cases will be owned or operated by that selfsame government. That means cryptographic keys under one government’s control could be used to intercept HTTPS communication throughout the EU and beyond.

This is a catastrophe for the privacy of everyone who uses the internet, but particularly for those who use the internet in the EU. Browser makers have not announced their plans yet, but it seems inevitable that they will have to create two versions of their software: one for the EU, with security checks removed, and another for the rest of the world, with security checks intact. We’ve been down this road before, when export controls on cryptography meant browsers were released in two versions: strong cryptography for US users, and weak cryptography for everyone else. It was a fundamentally inequitable situation and the knock-on effects set back web security by decades.

The current text of Article 45 requires that browsers trust CAs appointed by governments, and prohibits browsers from enforcing any security requirements on those CAs beyond what is approved by ETSI. In other words, it sets an upper bar on how much security browsers can require of CAs, rather than setting a lower bar. That in turn limits how vigorously browsers can compete with each other on improving security for their users.

This upper bar on security may even ban browsers from enforcing Certificate Transparency, an IETF technical standard that ensures a CA’s issuing history can be examined by the public in order to detect malfeasance. Banning CT enforcement makes it much more likely for government spying to go undetected.

Why is this such a big deal? The role of a CA is to bootstrap encrypted HTTPS communication with websites by issuing certificates. The CA’s core responsibility is to match web site names with customers, so that the operator of a website can get a valid certificate for that website, but no one else can. If someone else gets a certificate for that website, they can use it to intercept encrypted communications, meaning they can read private information like emails.

We know HTTPS encryption is a barrier to government spying because of the NSA’s famous “SSL added and removed here” note. We also know that misissued certificates have been used to spy on traffic in the past. For instance, in 2011 DigiNotar was hacked and the resulting certificates used to intercept emails for people in Iran. In 2015, CNNIC issued an intermediate certificate used in intercepting traffic to a variety of websites. Each CA was subsequently distrusted.

Distrusting a CA is just one end of a spectrum of technical interventions browsers can take to improve the security of their users. Browsers operate “root programs” to monitor the security and trustworthiness of CAs they trust. Those root programs impose a number of requirements varying from “how must key material be secured” to “how must validation of domain name control be performed” to “what algorithms must be used for certificate signing.” As one example, certificate security rests critically on the security of the hash algorithm used. The SHA-1 hash algorithm, published in 1993, was considered not secure by 2005. NIST disallowed its use in 2013. However, CAs didn't stop using it until 2017, and that only happened because one browser made SHA-1 removal a requirement of its root program. After that, the other browsers followed suit, along with the CA/Browser Forum.

The removal of SHA-1 illustrates the backwards security incentives for CAs. A CA serves two audiences: their customers, who get certificates from them, and the rest of the internet, who trusts them to provide security. When it comes time to raise the bar on security, a CA will often hear from their customers that upgrading is difficult and expensive, as it sometimes is. That motivates the CA to drag their feet and keep offering the insecure technology. But the CA’s other audience, the population of global internet users, needs them to continually improve security. That’s why browser root programs need to (and do) require a steadily increasing level of security of CAs. The root programs advocate for the needs of their users so that they can provide a more secure product. The security of a browser’s root program is, in a very real way, a determining factor in the security of the browser itself.

That’s why it’s so disturbing that eIDAS 2.0 is poised to prevent browsers from holding CAs accountable. By all means, raise the bar for CA security, but permanently lowering the bar means less accountability for CAs and less security for internet users everywhere.

The text isn't final yet, but is subject to approval behind closed doors in Brussels on November 8.

Passkeys and Privacy

This is part 2 of our series on passkeys. See part 1 here.

In our previous article we described what a passkey is: a few hundred bytes of data stored in your password manager, security key, or elsewhere, which allows you to log in to a specific website without a password. The good news is that passkeys are quite well designed from a privacy point of view, even though they give a little more information to websites than a plain old password.

Cross-site Tracking

One of the most important attributes for passkeys is that they shouldn’t enable cross-site tracking. In other words, if you create a passkey on site A, and create a different passkey on site B using a different name, email address, and IP address, the two sites shouldn’t be able to correlate the separate identities, even if they’re sharing information behind the scenes.

Passkeys satisfy this requirement. Each passkey you create is unique, though there are some small caveats to be aware of.

If you store your passkey in a security key or TPM, websites can request the make and model of your device (depending on whether the browser allows it). Usually this only identifies a broad category of common devices. For instance, Chrome’s policy on security keys “expects” each distinct make and model to represent at least 100,000 devices. In the past, some manufacturers shipped security keys where each one had a uniquely identifying make and model, which was a major privacy flaw. It’s possible other manufacturers will make the same mistake, but it’s likely that browsers would block such flawed devices. In general, consumer-facing websites should avoid requesting make and model information, since this feature is intended primarily for companies managing their internal login infrastructure. If you store your passkey in a password manager, websites can learn which password manager you are using.

Similarly, some security keys may implement a “signature counter” for passkeys stored on them. A good implementation should ensure that the signature counter is maintained separately for each site, but some security keys keep a single signature counter for all passkeys. That can be used across unrelated sites to try and correlate your identity by looking for similar values of that signature counter. You can ask the manufacturer of your security key how they handle signature counters.

Biometrics

When using a passkey your phone or computer might prompt you to use a fingerprint or facial recognition. This step is to demonstrate to your device that it’s really you. Your fingerprint, face, or unlock code isn’t sent to the website. Instead, your browser tells the site that “user verification” was successful. This will generally only happen if you already use a fingerprint or facial recognition to unlock your device. If you prefer not to use biometrics at all, you can use your screen unlock PIN or pattern instead.

Shared accounts

For accounts that you share with someone else, passkeys change the privacy situation slightly. With passwords, a website doesn’t know whether it’s you or your friend typing in the password. With passkeys, you’ll most likely need to generate two passkeys for the account: one for you and one for your friend. Each of you can log in using your own passkey, but the site will know which passkey is logging in.

Lost or stolen device

If you store passkeys on a security key, someone who has physical access to your security key can list all the passkeys, including which sites they belong to. Some security keys have a setting to require a PIN before listing the passkeys, in addition to the normal requirement to enter a PIN before logging in with a passkey.

If you store passkeys in a password manager, someone who has physical access to your device and can unlock your password manager will get a list of all the sites for which you have passkeys and passwords - not to mention getting the ability to log in to those sites! If you have a secret account and need to protect against someone with physical access to your devices, passwords may be a better option; just be sure to also use incognito / private browsing mode, and be aware that phishing is still a risk.

Cloud accounts

For most people, the most convenient password manager will be the one built into their operating system: Windows Hello, Google Password Manager (on Android and ChromeOS), or iCloud Keychain. To use them, you’ll have to be logged in with your Microsoft, Google, or Apple account. If you’re not already logged into one of those cloud accounts, logging in may prompt you to share a pile of additional data, like your browsing history and bookmarks. In general you can turn off those extra “sync” features but it requires a little extra attention.

You can also use a third-party password manager, which won’t try to sync all your extra data in addition to your passwords.

Conclusion

For most purposes, passkeys will represent a significant improvement in security at nearly zero cost to privacy. As described in the previous post, there are still significant growing pains in the passkey ecosystem, but they will likely be resolved in the near future.

What the !#@% is a Passkey?

This is part 1 of our series on passkeys. Part 2, on privacy, is here.

A new login technique is becoming available in 2023: the passkey. The passkey promises to solve phishing and prevent password reuse. But lots of smart and security-oriented folks are confused about what exactly a passkey is. There’s a good reason for that. A passkey is in some sense one of two (or three) different things, depending on how it’s stored.

First off: is a passkey one of those little plastic things you stick in your USB port for two-factor authentication? No, that’s a security key. More on security keys in a minute. A passkey is also not something you can type in; it’s not a password, passcode, passphrase, or a PIN.

A passkey is approximately 100-1400 bytes of random data1, generated on your device (like your phone, laptop, or security key) for the purpose of logging in on a specific website. Once the passkey is generated, your browser registers it with the website and it gets stored somewhere safe (for instance, your password manager). From then on, you can use that passkey to log in to that website without entering a password. When you go to a website’s login page, you’ll have the option to “Sign in with a passkey.” If you choose that option you’ll get a confirmation prompt from your password manager, and will be logged in after confirming. For all this to work, there needs to be passkey support in the website, your browser, your password manager, and usually also your operating system.

You can create many passkeys: each passkey unlocks a single account on a single website. For multiple accounts on a single website, you can have multiple passkeys for that website. For instance, if you have a social media account for personal use and one for business, you would have different passkeys for each account.

You can usually have both a password and a passkey on your account2, and can log in with either. Logging in with a passkey is generally faster, since your password manager will offer to do it in a single click, instead of the multiple clicks that logging in with a password usually takes. Also, logging in with a passkey typically lets you skip traditional two-factor authentication (SMS, authenticator app, or security key).

Why is it safe for passkeys to skip traditional two-factor authentication? Passkeys build in a second factor. Each time you use the passkey to log in, your browser or operating system may ask you to re-enter your device unlock PIN. If you use a fingerprint or facial recognition to unlock your device, your browser might instead request you re-enter your fingerprint or show your face, to confirm that it’s really you asking to log in. That gives two factors of authentication: the device that stores your passkey is something you have, and it’s accompanied by something you know (the PIN) or something you are (a fingerprint or a face).

Storage and Backup

A passkey stored on just one computer or phone isn’t that useful. What if you want to log in from a different device? What if your device falls in the toilet? There are at least three solutions here and they’re very different, which is part of why passkeys are in practice three very different things.

  • Solution 1: Passkeys are stored in the password manager, which encrypts them, backs them up to the cloud, and helps you copy them onto all of your devices.
  • Solution 2a: Passkeys are created and stored in a physical security key that you plug in via USB3. To log in on a different device, you plug in the security key when prompted. Passkeys created this way can’t be copied. Only recently-made security keys support this.
  • Solution 2b: Passkeys are created and stored on a high-security chip built into your computer or phone (for instance, a TPM or Secure Enclave, available on most devices made in the last few years). Like solution 2, these passkeys can’t be copied.

Solutions 2a and 2b are less convenient (and solution 2a costs a little bit of money, to buy a security key). But they offer a higher level of security against someone stealing your devices. With solution 1, someone who steals your computer might be able to copy the passkeys if your password manager is unlocked.

Also, solutions 2a and 2b don’t really solve the “device falls in toilet” problem. If you’re using one of those solutions, you should have multiple passkeys stored on different devices as backup. Alternatively you may wind up relying on email-based account recovery.

If you’re using solution 1, you trust your password manager to keep your passkeys secure. Also note that password managers generally won’t let you export a copy of your passkeys for offline backup.

How do passkeys prevent phishing?

Each passkey contains a record of which domain name the passkey was created for. If someone sends you a link to a login page on a lookalike domain name, you may be fooled but your browser will not, since browsers can easily check for an exact match. So your browser will not send the passkey to the lookalike domain name and you’ll be safe.

However, so long as you still have a memorized password in addition to your passkey, a lookalike site could tell you your passkey isn't working and you need to enter the password instead. If you do enter the password, the phishing attack will succeed. So phishing is still possible, but someone who typically logs in on a given site with a passkey is more likely to get suspicious when asked to enter a password instead, which provides some protection even if it’s not complete protection.

Should I use passkeys?

Like all security and privacy topics, the answer is “it depends.” But for most people, passkeys are a good idea. If you’re already using a password manager, generating long unique passwords for each website, and always using the autofill features to log in (i.e. not copy-pasting passwords), passkeys will provide a slightly higher level of security with significantly more convenience.

If you’re not already using a password manager, passkeys will be a tremendous increase in security (and will also require you to start using a password manager).

For sites where you are using two factor authentication (2FA), passkeys will be much more convenient, and may be more secure. SMS or authenticator app 2FA methods are vulnerable to phishing attacks, since a fake site can ask you for the one-time code and pass it along to the real site along with your phished password. Passkeys are more secure than SMS or authenticator app 2FA because they aren’t vulnerable to phishing; your browser knows exactly which site goes with which passkey, and isn’t tricked by fake websites.

Security key 2FA also isn’t vulnerable to phishing, so switching from security key 2FA to a passkey is mainly a matter of convenience; it means one less step during login, and one less password to remember. If you store your passkeys on a security key (protected with a PIN or biometric), you’ll achieve similar results as security key 2FA. If you store your passkeys in a password manager instead, that’s slightly less safe, because anyone who gains access to your password manager can use your passkeys, without needing physical access to your security key.

As of late 2023, passkey support is very uneven, particularly for syncing. For instance, Adam Langley says “Windows Hello doesn’t sync at all, Google Password Manager can only sync between Android devices, and iCloud Keychain only works on Apple devices.” Even once those problems are solved, cross-ecosystem syncing (for instance between iOS and Windows) will remain a big problem. Third-party password managers 1Password, Bitwarden, and Dashlane have passkey support and can sync across ecosystems. But they don’t necessarily support all platforms yet (for instance, 1Password doesn’t fully support passkeys on Android as of October 2023). If you want to try out passkeys on a throwaway account, you can create one on passkeys.io or webauthn.io.

If you like being an early adopter, go ahead and give passkeys a try. You may run into stumbling blocks along the way and have to fall back to that embattled ancient tool, the password.

More about passkeys in part 2, Passkeys and Privacy.

  • 1. A cryptographic public/private key pair.
  • 2. This is true in 2023, though if passkeys see wide adoption, some websites might let you sign up by generating a passkey, and never have a password at all.
  • 3. Or, less commonly, connect via NFC or Bluetooth Low-Energy (BLE)

❌