FB pixel

A new idea to fight voice deepfakes from Ruhr University Bochum researchers

A new idea to fight voice deepfakes from Ruhr University Bochum researchers
 

Researchers from the Ruhr-University Bochum in Germany have released a new report with suggestions on how to tackle voice deepfakes through the use of a novel dataset.

The research focuses mainly on the “image domain” as the researchers claimed that studies exploring generated audio signals have so far been neglected by global research. To this end, Joel Frank and Lea Schönherr researched three different aspects of the audio deepfake challenge to “narrow this gap.”

The first consists of an introduction to common signal processing techniques used for analyzing audio signals, including how to read spectrograms for audio signals, and Text-To-Speech (TTS) models.

“While there has been some research into end-to-end models, typical TTS models consist of a two-stage approach,” write the researchers.

“First, we enter the text sequence which we want to generate. This sequence gets mapped by some model (or feature extraction method) to a low-dimensional intermediate representation, often linguistic features or Mel spectrograms. Second, we use an additional model (often called vocoder) to map this intermediate representation to raw audio.”

Specifically, the researchers focus on vocoder literature, since it directly connects to their work on audio deepfakes.

Secondly, the researchers present a novel data set, built on nine sample sets from five different network architectures and spanning two languages.

The new dataset, hosted on zenodo, consists of approximately 196 hours of generated audio files and is mostly based on the LJSPEECH and JSUT datasets. It also includes a range of architectures, including MelGAN, Parallel WaveGAN (PWG), and WaveGlow, among others.

Finally, Frank and Schönherr supplied practitioners with two baseline models adopted from the signal processing community and designed to facilitate further research in the area.

“To provide a baseline for future practitioners, we trained several baseline models. We evaluated their performance across the different data sets and multiple settings. Specifically, we trained Gaussian Mixture Model (GMM) and neural network-based solutions.”

While they found the neural networks performed better overall, the GMM classifiers proved to be more robust, which might give them an advantage in real-life settings.

“Finally, we inspected the different classifiers using an attribution method. We found that lower frequencies cannot be neglected while high-frequency information proved indispensable.”

However, the research warns, the difficulties of obtaining realistic data sets have been a longstanding problem in the security community, and may potentially make the research results not universally applicable.

“Often benign data is readily available, but data used in malicious contexts is hard to come by. That leaves us with estimating real-world performance on proxy data.”

Frank and Schönherr argue that in their case, they might have good odds that results would transfer to the same kinds of data used in attacks.

“Currently, images generated by off-the-shelf neural networks are used in malicious attempts. We expect the number of audio Deepfakes to increase as well.”

For more information about the Ruhr-University Bochum paper, you can follow this link to read it in its entirety.

Article Topics

 |   |   |   |   |   | 

Latest Biometrics News

 

Biometric Update Podcast: Claire Ma explores the next phase of government digital identity

Governments around the world are moving toward digital identity systems, but not all are taking the same path. On the…

 

Trusted Caller ID with digital wallet and VCs improves call center authentication

Decentralized digital IDs shared from a digital wallet on a smartphone can significantly speed up identity verification by call centers,…

 

EES records 66M border crossings in first six months despite rollout friction

During its first six months of operation of Europe’s biometric-based Entry-Exit System (EES), daily fingerprint checks against EU databases rose…

 

IDDEEA outlines role of e-signatures in Bosnia’s digital transformation

Qualified electronic signatures (QES) have the potential to bring significant improvements to complex, fragmented public administrations like those in Bosnia…

 

Luxembourg opens tender for AI-generated content detection tool

Luxembourg’s Ministry of Digitalization has opened a call for solutions to develop a deepfake detection platform intended to support the…

 

Dutch court backs DigiD contract renewal amid U.S. CLOUD Act fears

A Dutch court has ruled that the government may extend its contract with Solvinity, a key infrastructure provider for the…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events