FB pixel

Meta declines to make voice tool public as BixeLab highlights voice fraud concerns

Meta declines to make voice tool public as BixeLab highlights voice fraud concerns

AI voice technology can quite literally bring a voice to the voiceless and help us transcend language barriers. Even with such impactful use cases, heightened security risks follow the rise of AI-generated voice technology, particularly for systems using biometric voice authentication and in social engineering attacks, as highlighted in the second issue of BixeLab’s I.D. Risk Alerts newsletter.

BixeLab notes the account of an Australian journalist who used an AI-generated clone of his own voice to gain unauthorized access to his Centrelink account. In the UK, a cybersecurity researcher used an AI-generated version of his own voice to access a bank account. The testing and consulting firm rates the criticality of the fraud risk as “high.”

Aware of the security risks, Meta recently announced – but did not release – its newest generative AI system, Voicebox. The technology can generate spoken dialogue through speech samples and text and has capabilities like speech denoising and editing, text-to-speech synthesis, and diverse speech sampling. Still, the tech giant is “not making the Voicebox model or code publicly available at this time” due to “the potential risks of misuse.”

Voicebox can create outputs from scratch or based on a sample model. With a word error rate of 1.9 percent, the system currently outperforms VALL-E’s error rate of 5.9 percent. Voicebox also outperforms YourTTS on cross-lingual style transfer, with an average word error rate of 5.2 percent compared to 10.9 percent respectively. Voicebox also outperforms VALL-E and YourTTS on audio style similarity.

The technology also uses the Flow Matching model, which is a non-autoregressive generative model that can learn non-deterministic mapping between text and speech, enabling the technology to learn from varied speech data without using labels. As a result, Voicebox can train on more diverse data on a much larger scale.

Meta trained Voicebox with “more than 50,000 hours of recorded speech and transcripts from public domain audiobooks in English, French, Spanish, German, Polish, and Portuguese.” It can infill speech from context and generate the middle of an audio recording without having to re-create the input entirely.

Voicebox can use a two second audio sample to generate a matching audio style that can then be used to generate text-to-speech, which can give a voice to someone unable to speak. Cross-lingual style transfer allows users to turn text from one language into audio in another language, creating a new avenue to overcome language barriers. It can also resynthesize speech to remove background noise, simplifying the audio editing process.

Voice authentication and security threats continue

Voicebox can reportedly enable nefarious AI-generated voice cloning that can surpass voice authentication.

The technology can also be used to strengthen social engineering attacks. At the 2023 Regional Anti-Scam Conference in Singapore, Sun Xueling, the Minister of State for Home Affairs, expressed concerns that this technology could be used to impersonate public figures and spread disinformation.

In January an Arizona mother was the target of a ransomware scam that used Deepfake voice generation technology to trick the woman into thinking her own daughter had been kidnapped and held for ransom. “I will never be able to shake that voice and the desperate cries for help out of my mind,” she said in testimony to the Senate Judiciary Committee.

Article Topics

 |   |   |   |   |   | 

Latest Biometrics News


Michigan City Council orders comprehensive facial recognition policy for local police

In a move aimed at safeguarding civil liberties, the City Council of Ann Arbor in Michigan has taken a decisive…


Video deepfake fraud threat is real, helplessness is not: ID R&D webinar

Deepfakes have become a cause for common concern, with articles and viral posts warning of their power to deceive. Real-life…


Sumsub expands data sources to improve KYB

Sumsub has provided upgrades to its Business Verification platform aimed at tackling the common challenges that businesses encounter during the…


DHS reinterprets foreign worker fees to fund biometric border system

The U.S. Department of Homeland Security has proposed a way to fund its Biometric Entry-Exit program by changing the fee…


NIST adds flexibility, digital format to security requirements for federal contractors

The U.S. National Institute of Standards and Technology has updated its guidance for how businesses working with the federal government…


iOS 18 APIs suggest more digital ID integrations coming to Apple Wallet

A set of APIs bundled with iOS 18 indicates that more digital IDs may soon be integrated with Apple Wallet….


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Most Read This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events