FB pixel

Meta declines to make voice tool public as BixeLab highlights voice fraud concerns

Meta declines to make voice tool public as BixeLab highlights voice fraud concerns
 

AI voice technology can quite literally bring a voice to the voiceless and help us transcend language barriers. Even with such impactful use cases, heightened security risks follow the rise of AI-generated voice technology, particularly for systems using biometric voice authentication and in social engineering attacks, as highlighted in the second issue of BixeLab’s I.D. Risk Alerts newsletter.

BixeLab notes the account of an Australian journalist who used an AI-generated clone of his own voice to gain unauthorized access to his Centrelink account. In the UK, a cybersecurity researcher used an AI-generated version of his own voice to access a bank account. The testing and consulting firm rates the criticality of the fraud risk as “high.”

Aware of the security risks, Meta recently announced – but did not release – its newest generative AI system, Voicebox. The technology can generate spoken dialogue through speech samples and text and has capabilities like speech denoising and editing, text-to-speech synthesis, and diverse speech sampling. Still, the tech giant is “not making the Voicebox model or code publicly available at this time” due to “the potential risks of misuse.”

Voicebox can create outputs from scratch or based on a sample model. With a word error rate of 1.9 percent, the system currently outperforms VALL-E’s error rate of 5.9 percent. Voicebox also outperforms YourTTS on cross-lingual style transfer, with an average word error rate of 5.2 percent compared to 10.9 percent respectively. Voicebox also outperforms VALL-E and YourTTS on audio style similarity.

The technology also uses the Flow Matching model, which is a non-autoregressive generative model that can learn non-deterministic mapping between text and speech, enabling the technology to learn from varied speech data without using labels. As a result, Voicebox can train on more diverse data on a much larger scale.

Meta trained Voicebox with “more than 50,000 hours of recorded speech and transcripts from public domain audiobooks in English, French, Spanish, German, Polish, and Portuguese.” It can infill speech from context and generate the middle of an audio recording without having to re-create the input entirely.

Voicebox can use a two second audio sample to generate a matching audio style that can then be used to generate text-to-speech, which can give a voice to someone unable to speak. Cross-lingual style transfer allows users to turn text from one language into audio in another language, creating a new avenue to overcome language barriers. It can also resynthesize speech to remove background noise, simplifying the audio editing process.

Voice authentication and security threats continue

Voicebox can reportedly enable nefarious AI-generated voice cloning that can surpass voice authentication.

The technology can also be used to strengthen social engineering attacks. At the 2023 Regional Anti-Scam Conference in Singapore, Sun Xueling, the Minister of State for Home Affairs, expressed concerns that this technology could be used to impersonate public figures and spread disinformation.

In January an Arizona mother was the target of a ransomware scam that used Deepfake voice generation technology to trick the woman into thinking her own daughter had been kidnapped and held for ransom. “I will never be able to shake that voice and the desperate cries for help out of my mind,” she said in testimony to the Senate Judiciary Committee.

Article Topics

 |   |   |   |   |   | 

Latest Biometrics News

 

ACCS announces participants in Australia’s Age Assurance Technology Trial

In keeping with its philosophy of transparency by default in running Australia’s Age Assurance Technology Trial, the Age Check Certification…

 

DPI-as-a-Packaged Solution marks major milestone with Trinidad and Tobago rollout

The first ever implementation of DaaS — DPI-as-a-Packaged Solution — is going live in Trinidad and Tobago in a test…

 

AI agents spark musings on identity, payments and wallets

AI agents continue to attract attention, including in the digital identity industry, which sees an opportunity for innovation. Their importance…

 

Trump deregulation is re-shaping the future of biometric surveillance in policing

The advent of AI has exponentially increased the capabilities of biometric tools such as facial recognition, fingerprint analysis, and voice…

 

World expands Android support for World ID credentials

World’s positive relationship with Malaysia continues, with the launch of Android support for World ID Credentials in the country, following…

 

Sri Lanka national data exchange to connect digital ID and public services

A fully developed foundational ID system, including citizen registration, may take 18 to 24 months for Sri Lanka to implement,…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events