Facebook touts unsupervised voice biometrics, AI infrastructure could speed things up
Voice biometrics were already poised to shoot past the underwhelming roles it performs today, and a new announcement by Facebook AI could prove to be an even bigger growth accelerator.
Executives at the AI lab for Facebook say in a marketing post that they have developed speech tools that are trained without supervision. Not only is the advance billed as making the creation of speech recognition systems faster and less expensive, it is supposed to do so for all languages and dialects.
The announcement comes as edge computing is being deployed to process voice recognition and other compute-intensive tasks. Taking most of that work out of the cloud increases operating speeds, something that will be critical to broader and deeper roles for voice biometrics vehicles, for example.
In a paper posted by the lab, company scientists write that wav2vec-U (for unsupervised) trains speech recognition models without labeled data. That is, no human text-to-speech input is required — wav2vec-U works with unlabeled speech audio recordings and unlabeled text.
This is the third iteration of wav2vec but the first that requires no supervision, according to Facebook AI, and it is open to all developers via GitHub.
The algorithm learns speech structure from voice recordings. The recordings are segmented at the phonetic level and fed to the self-supervised model. A generative adversarial, or generator and discriminator, network is trained to recognize recorded words.
According to Facebook AI, wav2vec-U was tested on the TIMIT benchmark, and it cut the error rate by 57 percent compared with “the next best unsupervised method.”
Market research firm ReportLinker is predicting that the global market for conventional voice biometrics will have more than tripled compared to 2020. Then, the market was about $1 billion, and it is predicted to top $3.4 billion in 2025.