Uncovering the Authenticity of Deepfake Audio with AI-powered Voice Cloning

Category Machine Learning

tldr #

A team of researchers from UC Berkeley have developed accurate methods for determining the authenticity of deepfake audio, which has become a growing concern on the internet. Their research involved analyzing various perceptual and spectral features of audio clips and utilizing a deep-learning model to extract multi-dimensional representations. By making their research open source, the team hopes to combat the malicious use of voice cloning and deepfake audio.


content #

With the rise of deepfakes and doctored audio, it has become increasingly difficult to trust what we see and hear on the internet. In response to this growing concern, a team of students and alumni from the School of Information at the University of California, Berkeley have conducted groundbreaking research in the field of voice cloning and deepfake audio detection.

Romit Barua, Gautham Koorma, and Sarah Barrington (all MIMS '23) first presented their work as their final project for the Master of Information Management and Systems degree program. Their research, supervised by Professor Hany Farid, aimed to develop methods for differentiating between real and cloned voices, specifically those designed to impersonate a specific person.

AI-powered voice cloning technology has evolved rapidly in recent years, allowing for highly convincing deepfake audio to be created.

Initially, Professor Farid had advised the team not to worry about deepfake audio as voice cloning technology was not yet advanced enough. However, just a few months later, AI-powered voice cloning had surprisingly become highly convincing, demonstrating the fast evolution of this technology. The team's research has now become crucial in addressing this new threat of deepfake audio.

To begin their research, the team first analyzed various audio clips of real and fake voices, looking at perceptual features or patterns that could be visually identified. Through this approach, they identified key factors such as pauses and amplitude (consistency and variation in voice) as important indicators of voice authenticity. However, they also discovered that this approach, while easy to understand, may produce less accurate results.

Voice cloning can be used for malicious purposes such as impersonation and fraud.

In their next approach, the team delved deeper, using off-the-shelf audio wave analysis packages to examine general spectral features. This analysis involved extracting over 6,000 features from the audio clips, including summary statistics and regression coefficients, before selecting the 20 most important features for comparison. In this way, Barrington, Barua, and Koorma were able to develop a more accurate method for determining the authenticity of audio clips.

The team's research used a combination of visual and spectral analysis techniques to determine the authenticity of audio clips.

However, their most successful results came from their learned features approach, which utilized a deep-learning model. This model takes raw audio and processes it, extracting multi-dimensional representations called embeddings. These embeddings then serve as the basis for the model to distinguish between real and synthetic audio, resulting in incredibly accurate results with as little as 0% error in lab settings. However, the team acknowledges that this approach may be difficult for the general public to understand without proper context.

Their most accurate method involved training a deep-learning model to process and extract multi-dimensional representations of audio called embeddings.

The team's research has significant implications for addressing concerns about the use of voice cloning and deepfake audio for malicious purposes. By developing accurate and accessible methods for detecting these fabricated audio clips, the team's research may play a crucial role in curbing impersonation and fraud through deepfakes. Moreover, by making their research open source, they hope to encourage further exploration and development in this important field.

The team's research was presented as a final project for the Master of Information Management and Systems degree program at the School of Information.

With technology advancing at a rapid pace, it has become more important than ever to question the authenticity of the media we consume. Thanks to the groundbreaking research by this team from UC Berkeley, we may now have a reliable method for determining the authenticity of deepfake audio—a technology that has threatened to further destabilize trust on the internet.


hashtags #
worddensity #

Share