Speaker ID is the sleeper biometric function speech apps need to nail
Speaker identification is the kind of biometric block-and-tackle tech feature that gets overlooked by vendors and even buyers in favor of marquee capabilities. Myriad fonts are great in business word processing software, but who would use the app without competent spell-checking?
A new report by a speech-recognition company hints at how biometric speaker identification, if done competently, will help deliver the most value in speech apps in the age of COVID-19.
As with most coverage of advanced B-to-B and B-to-C autonomous speech apps the report, by Speechmatics, features and roles that will have the biggest commercial impact — as well it should. The top voice apps expected to have the largest commercial impact in 2021 are Web conferencing transcription and customer experience and analytics.
But down in the single-digit survey results is a speaker identification thread that demands attention.
The disastrous effects of the coronavirus pandemic is turning the technology-dependent world into watchers of live-streaming meetings. Citing industry estimates, the report states that by 2023, the global speech recognition market will be $16 billion.
Identifying who is speaking is important during events, especially those with many active participants when conversations can get chaotic. It also is important for transcriptions of those events, another overlooked aspect of life with COVID.
Speaker diarization is an increasingly important function. It is the autonomous method of partitioning an audio input stream into homogeneous segments by speaker, typically using voice biometrics. And it is a difficult chore.
It only works if a speech algorithm can pick out individuals speaking in multiple languages with differing accents and dialects (a significant angle parsed in the report). It also needs to hear through environmental noise, cross talk, impassioned deliveries and various speaking styles.
According to the report’s authors, speech diarization is “a key challenge that speech providers have not yet mastered.”
Indeed, 49 percent of those surveyed in the report pointed to better diarization accuracy within three years as an important product improvement. (Three-quarters of respondents said accuracy in general is the worst barrier to their firms adopting voice technology.)
And accuracy has a far broader meaning than mere word error rate, according to the report. “People also look at speaker change indicated, intent recognition, punctuation, number recognition and quick turnaround time for transcription when evaluating providers.”
Users soldier on regardless. Looking at two slices of the speech recognition market, the report states that 64 percent of e-learning and market research professionals used speech-to-text automated transcription last year.