ArkX partners with Sensory on voice biometrics integration
The biometrics collaboration will now enable ArkX customers to access ultra-low power “always listening” and natural-language touchless control features, including support for more than 20 languages, a preset library of wake words, and the ability to create custom ones.
“Brands increasingly want to create their own branded voice experiences for their customers,” explains ArkX CEO Eric Bauswell.
“Working with Sensory, we have integrated a powerful set of advanced capabilities that enable OEMs to disrupt the status quo of ‘good enough’ to deliver cutting edge speech recognition performance to their customer,” he adds.
From a technical standpoint, the EveryWord Ultra portfolio consists of an Audio Front End (AFE) Voice Processing Module, an Integrated Voice Module (SOM + Audio Board w/AFE), and an Amazon Voice Service (AVS) Development Kit.
The advanced audio and voice technology supports both human-to-human and human-to-machine speech recognition.
“Voice adoption continues to grow rapidly, and brands are always exploring ways to streamline the process of integrating a convenient voice UX into their products,” comments Todd Mozer, CEO of biometrics firm Sensory.
Typical deployments include consumer electronics, home appliances, robotics, automotive, wearables, toys, and IoT (Internet of Things).
“Working with ArkX Labs provides the industry a turn-key solution for integrating advanced speech recognition capabilities into their products that can enable OEM-brand-specific wake words and other attractive advanced feature sets,” Mozer concludes.
The partnership comes weeks after Sensory released the beta version of its new artificial-intelligence- (AI-) as-a-service platform.
STC reports strong NIST SRE21 results
Speech Technology Center has announced a strong performance in biometric speaker recognition testing by the U.S. National Institute of Standards and Technology.
For the NIST 2021 Speaker Recognition Challenge (SRE), voice recognition algorithms were assessed for their performance with audio from conversational telephone speech, audio from video, and video, with STC combining the use of face and voice biometrics in the latter case to identify speakers.
The January 26 update of the leaderboard lists STC second with a 2.48 percent equal error rate (EER) at a minimum operating point (MIN_C) of 0.074 and an actual operation point (ACT_C) of 0.079.
STC says it is among the first biometrics providers to successfully merge transformer and wav2vec machine learning models. Transformer models are typically used in computer vision and natural language processing, while wav2vec is a common speech recognition model type. Their combination minimizes errors in speaker recognition, according to the announcement.
“Speech analytics provides insights into customer satisfaction and conversation quality to continuously improve customer experience,” states Speech Technology Center CEO Dmitriy Dyrmovskiy. “Moreover, high-quality speaker recognition is essential for nationwide biometric systems. NIST SRE21 is the fifth competition in 2021 where Speech Technology Сenter solutions have been given a high score by a jury of international experts.For Speech Technology Сenter being recognised in international contests is not just a personal achievement, it is a landmark for the entire industry. The strongest teams from around the world work on speaker recognition solutions, and we’re excited to take it to the next level by properly showcasing our core competencies on the global market.”
Speechmatics releases speech recognition report
The Voice Report 2022 covers a variety of topics related to voice biometrics, including the history of voice technology from the 1950s up until the pandemic.
The document includes insights from industry experts, product specialists, and machine learning engineers, with a particular focus on AI biases and the future of voice technology.
According to the report, an estimated 8.4 billion devices will use voice assistants by 2024, and the Speech-to-text API market may grow at a Compound Annual Growth Rate (CAGR) of 19.2 percent between 2021 and 2026.
Among the key data findings, Speechmatics claimed increased speaker diarization accuracy (the process of partitioning an input audio stream into homogeneous segments according to the speaker identity) is the most demanded feature going into the next three years, and that data privacy and security is seen as a very high priority for nearly 80 percent of those surveyed.
In terms of AI biases, the report has spotted the main issue in dialects and accents, which according to those surveyed account for more than 50 percent of cases.
“If we give the training models exposure to a different variety of voices, it should become familiar with them. While it isn’t a cure-all fix, exposure is critical for reducing AI Bias,” the report reads.
In addition, Speechmatics also said that, while data is not the only way to address AI Bias, it is a significant factor.
“Which is why self-supervised learning (already proving a success with the amount of data it can train on) is such a big factor in improving accuracy in automatic speech recognition (ASR).”
And according to the document, self-supervised learning is not only the solution to AI biases, but also the future of speech recognition.
“As we look to the future, we can clearly see where there’s more data to train on, there’s a much greater chance of meaningful change across the industry – and more and more voices and languages can be heard,” John Hughes, head of Accuracy at Speechmatics writes.
The Voice Report 2022 is publicly available on the Speechmatics site.