Understanding Watermarking and Provenance for Synthetic Media: A Closer Look at AI-Generated Content Detection

Saturday - August 12 2023, 11:45 UTC - 1 year ago

tldr #

AI-generated images, while capable of being used as art or to help accessibility, can easily be abused to create misleading or defamatory content. The White House has called on AI companies to implement "robust technical measures" such as watermarking to make it easier to determine when content is AI-generated. However, the term "watermark" is often used to refer to other disclosure methods, such as provenance. Clarifying these terms is an important step toward creating responsible synthetic media standards, but there are still questions around who has the authority to identify AI-generated content and other risks associated with watermarking.

content #

In late May, the Pentagon appeared to be on fire. It wasn’t. It was AI-generated. Yet government officials, journalists, and tech companies were unable to take action before the image had real impact. It not only caused confusion but led to a dip in financial markets. Manipulated and misleading content is not a new phenomenon. But AI enables increasingly accessible, sophisticated, and hyperrealistic content creation that—while it can be used for good, in artistic expression or accessibility improvements—can also be abused to cast doubt on political events, or to defame, harass, and exploit .

The White House recently announced seven of the most prominent AI companies have committed to developing measures to detect when content is AI-generated

There’s no question that we need more transparency if we’re going to be able to differentiate between what is real and what is synthetic. Last month, the White House weighed in on how to do this, announcing that seven of the most prominent AI companies have committed to "develop robust technical measures to ensure that users know when content is AI-generated, such as watermarking."To begin to answer these questions, we need to clarify what we mean by watermarking and other types of disclosure methods .

By the year 2023, synthetically generated media is estimated to be worth $6.4 billion US dollars

It needs to be clear what they are, what we can reasonably expect them to do, and what problems remain even after they’re introduced. Although definitional debates can seem pedantic, the broad use of the term "watermark" is currently contributing to confusion and a lack of coordination across the AI sector. Defining what we mean by these different methods is a crucial prerequisite for the AI field to work together and agree on standards for disclosure .

AI technology is capable of generating synthetic images that are indistinguishable from authentic photos

Otherwise, people are talking at cross-purposes.I’ve observed this problem firsthand while leading the nonprofit Partnership on AI (PAI) in its multi-sector work to develop guidelines for responsible synthetic media, with commitment from organizations like OpenAI, Adobe, Witness, Microsoft, the BBC, and others. Further complicating matters, watermarking is often used as a "catch-all" term for the general act of providing content disclosures, even though there are many methods .

Many scientists and AI researchers consider watermarking techniques to be too easily manipulated and less secure than other forms of authentication

A closer read of the White House commitments describes another method for disclosure known as provenance, which relies on cryptographic signatures, not invisible signals. However, this is often described as watermarking in the popular press. If you find this mish-mash of terms confusing, rest assured you’re not the only one. But clarity matters: the AI sector cannot implement consistent and robust transparency measures if there is not even agreement on how we refer to the different techniques .

Provenance uses cryptographic signatures to authenticate the creator of a given synthetic media file

Even if the AI sector agrees to implement invisible watermarks, deeper questions are inevitably going to emerge around who has the capacity to detect these signals and eventually make authoritative claims based on them. Who gets to decide whether content is AI-generated, and perhaps as an extension, whether it is misleading? If everyone can detect watermarks, that might render them susceptible to misuse by bad actors .

Establishing clear definitions of terms, like watermarking and provenance, is a crucial step for the AI field in creating robust disclosure methods

On the other hand, controlled access to detection of invisible watermarks—especially if it is dictated by large AI companies—might degrade openness and entrench technical gatekeeping. Implementing responsible, transparent standards for synthetic media can never be a static process. It demands close tracking of technological, economic, and cultural shifts, and continual course correction if needed .

hashtags #

ai syntheticmedia watermarking provenance transparency

worddensity #

ai (8, 1.54%)
content (5, 0.96%)
watermarking (4, 0.77%)
even (4, 0.77%)
ai-generated (3, 0.58%)