FB pixel

A new idea to fight voice deepfakes from Ruhr University Bochum researchers

A new idea to fight voice deepfakes from Ruhr University Bochum researchers
 

Researchers from the Ruhr-University Bochum in Germany have released a new report with suggestions on how to tackle voice deepfakes through the use of a novel dataset.

The research focuses mainly on the “image domain” as the researchers claimed that studies exploring generated audio signals have so far been neglected by global research. To this end, Joel Frank and Lea Schönherr researched three different aspects of the audio deepfake challenge to “narrow this gap.”

The first consists of an introduction to common signal processing techniques used for analyzing audio signals, including how to read spectrograms for audio signals, and Text-To-Speech (TTS) models.

“While there has been some research into end-to-end models, typical TTS models consist of a two-stage approach,” write the researchers.

“First, we enter the text sequence which we want to generate. This sequence gets mapped by some model (or feature extraction method) to a low-dimensional intermediate representation, often linguistic features or Mel spectrograms. Second, we use an additional model (often called vocoder) to map this intermediate representation to raw audio.”

Specifically, the researchers focus on vocoder literature, since it directly connects to their work on audio deepfakes.

Secondly, the researchers present a novel data set, built on nine sample sets from five different network architectures and spanning two languages.

The new dataset, hosted on zenodo, consists of approximately 196 hours of generated audio files and is mostly based on the LJSPEECH and JSUT datasets. It also includes a range of architectures, including MelGAN, Parallel WaveGAN (PWG), and WaveGlow, among others.

Finally, Frank and Schönherr supplied practitioners with two baseline models adopted from the signal processing community and designed to facilitate further research in the area.

“To provide a baseline for future practitioners, we trained several baseline models. We evaluated their performance across the different data sets and multiple settings. Specifically, we trained Gaussian Mixture Model (GMM) and neural network-based solutions.”

While they found the neural networks performed better overall, the GMM classifiers proved to be more robust, which might give them an advantage in real-life settings.

“Finally, we inspected the different classifiers using an attribution method. We found that lower frequencies cannot be neglected while high-frequency information proved indispensable.”

However, the research warns, the difficulties of obtaining realistic data sets have been a longstanding problem in the security community, and may potentially make the research results not universally applicable.

“Often benign data is readily available, but data used in malicious contexts is hard to come by. That leaves us with estimating real-world performance on proxy data.”

Frank and Schönherr argue that in their case, they might have good odds that results would transfer to the same kinds of data used in attacks.

“Currently, images generated by off-the-shelf neural networks are used in malicious attempts. We expect the number of audio Deepfakes to increase as well.”

For more information about the Ruhr-University Bochum paper, you can follow this link to read it in its entirety.

Article Topics

 |   |   |   |   |   | 

Latest Biometrics News

 

Thomson Reuters and Socure partner on AI-driven fraud prevention

Thomson Reuters is moving deeper into digital identity verification and fraud prevention through a new partnership with Socure, tying together…

 

Keir Starmer’s political crisis casts shadow on UK’s digital ID plans

Last week, the King’s Speech set out 37 bills for the new parliamentary year, including the Digital Access to Services…

 

Biometric Update report analyzes how MOSIP is reshaping digital identity infrastructure

Biometric Update has published a new report examining the growing role of the Modular Open Source Identity Platform (MOSIP) in…

 

Hancomwith joins South Korea’s 2026 Zero Trust pilot with SASE‑based security model

Hancomwith is taking part in the South Korean government’s 2026 Zero Trust Adoption Pilot Project. The initiative is supposed to…

 

Cambodia launches digital driver’s licences, national ID services expand

Cambodia is expanding its digital government drive with the launch of digital driver’s licences, while also stepping up national ID…

 

ID.me and Verisys partnership points to broader CMS digital identity push

ID.me and Verisys have launched a strategic partnership aimed at helping state Medicaid agencies verify provider identities, validate credentials, and…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events