FB pixel

Pindrop presents three research papers on voice biometrics, speech recognition at ICASSP

Pindrop presents three research papers on voice biometrics, speech recognition at ICASSP
 

Three research papers from Pindrop have been presented at the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP), and indicate the direction of the company’s attempts to further innovate with voice biometrics and speech recognition technologies.

The first paper is titled, ‘Distribution Learning for Age Estimation from Speech.’ It explores a different approach to age estimation based on voice biometrics by using distribution learning problem model rather than the traditional model of a classification or regression problem. The first obstacle that Pindrop’s researchers found with distributed learning is that audio research lacks datasets tagged with “apparent” age.

However, it also found that distribution learning validated for facial age estimation is still viable for audio, meaning a general age range can be estimated at a particular confidence interval. It concludes that while distributed learning is more constrained than facial age estimation, it can even outperform regression and classification algorithms for both matched and mismatched conditions.

The second paper is titled, ‘Speaker Embedding Conversion for Backward and Cross-Channel Compatibility.’ It examines solutions for compatibility issues between voice biometric authentication technology providers that have been migrating their models to newer deep learning techniques. Pindrop’s researcher suggest a deep neural network-based method to allow for backwards compatibility. The experimental results found that the DNN is able to deliver feature-embedding compatibility between two automatic speaker verification systems (ASV) with improved performance over a baseline convertor system, though the converted feature embedding performed worse than the traditional ASV systems at the low FAR range. The researchers say that an extension of their work could explore score calibration to improve this performance at a low FAR range.

The third paper is ‘Unsupervised Model Adaptation for End-to-End ASR,’ and looks into a way to improve automatic speech recognition (ASR) transcription systems that often struggle with mismatched train-test conditions like call centers that have to account for factors like accents and voice audio quality. The Pindrop researchers propose using in-domain data to eliminate the need for human annotations using the relationship between word-error-rate (WER) and the CTC (‘Connectionist Temporal Classification,’ a measure of alignment) loss on one hand, and the WER and the probability ratio-based confidence (PRC) on the other hand.

To solve for this, the research team has proposed a cost-effective way to improve accuracy of ASR systems using in-domain data without the need for costly human annotations. This was made possible by exploring the relationship between the word-error-rate (WER) and connectionist temporal classification loss, and the WER and the probability ratio based confidence (PRC). It found that WER could be reduced by 8 percent in absolute terms without supervision, allowing it to adapt to suboptimal conditions.

However, Pindrop says that the research is experimental and does not reflect the performance of its products.

Some other recent research in the field of voice biometrics include suggestions on how to tackle voice deepfakes and a method for continuous liveness detection on smart devices.

The online paper presentation portion of ICASSP closes this week, with the in-person event running in Singapore from May 22 to 27.

Article Topics

 |   |   |   |   | 

Latest Biometrics News

 

Canada regulator backs privacy-preserving age assurance

The Office of the Privacy Commissioner of Canada (OPC) has published a policy note and guidance documents pertaining to age…

 

FCC seeks comment on KYC revision for commercial phone calls

The U.S. Federal Communications Commission (FCC) has proposed stronger KYC requirements for voice service providers to prevent scams and illegal…

 

Deepfake detection upgrade for Sumsub highlights continuous self-improvement

Sumsub has launched an upgrade to its deepfake detection product with instant online self-learning updates to address rapidly evolving fraud…

 

Metalenz debuts under-display camera for payment-grade face authentication

Unlocking a smartphone with your face used to require a camera placed in a notch or a punch hole in…

 

UK regulators pan patchwork policy for law enforcement facial recognition

The UK’s two Biometrics Commissioners shared cautionary observations about the use of facial recognition in law enforcement over the weekend…

 

IDV spending to hit $29B by 2030 as DPI projects scale: Juniper Research

Spending on digital identity verification (IDV) technology is projected to reach a 55 percent growth rate between now and 2030,…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events