Voice biometrics and how far we’ve come
When I first sat down to write about voice biometrics, I couldn’t shake the images of old TV commercials for voice-activated toys and of Capt. Jean-Luc Picard ordering tea on the Enterprise from my head.
The idea of commanding a machine – using only your voice – was at one time, a futuristic fantasy. Now, we have products like Siri. And technology that not only recognizes what we say, but who we are.
Maybe Skynet is real, and all of the machines are just quietly gathering intel. Judgment day is upon us. I need to find Sara Connor.
Jokes aside, speech recognition and voice biometric technology have come a long way, and products using voice as a biometric modality are gaining a ton of traction in the market – with particular success in the consumer device and customer authentication spaces.
For some background, speaker, or voice recognition (or broadly as ‘voice biometrics’) is a biometric modality that uses an individual’s voice for recognition purposes. It is a different technology than “speech recognition”, which recognizes words as they are articulated, which is not a biometric. The speaker recognition process relies on features influenced by both the physical structure of an individual’s vocal tract and the behavioral characteristics of the individual.
It is a popular choice for remote authentication due to the availability of devices for collecting speech. Speaker recognition is different from some other biometric methods in that speech samples are captured dynamically or over a period of time, such as a few seconds. Analysis occurs on a model in which changes over time are often monitored. For more information on how speaker recognition and identification works as well as how it differs from speech recognition, see our explainer here.
According to a survey conducted by Nuance Communications, smartphone users have grown frustrated with current authentication methods and 90% of them are eager to use voice biometrics in place of existing methods. That statistic is hard to ignore.
The overall market for voice and face biometrics is expected to reach nearly $3 billion by the end of 2018. Currently, the United States accounts for the largest share of these markets though according to a recently-published report, most of the anticipated growth in the market will come from emerging economies, with Asia-Pacific getting special mention in that regard.
In addition, the Biometrics Research Group has noted that mobile devices are expected to drive the bank adoption of voice biometrics, and has also pinpointed that Asia is expected to lead the growth in biometric banking applications. The group estimates the total revenues for biometrics supplied to the global banking sector will total US$900 million by the end of 2012.
Another survey, this one focused on consumers in the UK, found that more than half would use voice biometric authentication for some phone banking tasks.
Siri certainly can’t be credited as the first in the consumer device space, and it isn’t biometric, but it has ushered in a swath of services and products that are.
In the last two weeks alone, Authentify launched a mobile security app with support for voice biometric authentication, Agnitio is boasting new algorithms for more accurate calculations of likelihood ratios for its speaker verification and three major companies launched a voice biometric home automation system together.
As I’ve mentioned in previous articles and on social media, a space that really seems to be begging for richer biometric integrations is automobile automation and cabin customization, but there are still many hurdles voice biometrics needs to pass before it can fill this gap.
In Ontario (Canada), it’s illegal to drive while talking on a cellphone and it’s the same in New York. The concern for distracted driving is growing and it’s not just law enforcement in Ontario and New York that understand the risk fiddling with electronics poses to safe roads.
If you can’t use your hands, and in a car your feet are clearly occupied, voice is the obvious choice. But why are so many systems emerging today that feature only speech recognition, when speaker recognition and voice biometrics would be so convenient?
Just think of the possibilities: Auto-adjusted seats, mirrors, climate, navigation and even keyless start options for registered drivers!
According to Bernie Brafman, VP of Business Development for Sensory, there is one major problem and it’s voice biometric technology’s known Kryptonite: Background noise.
No matter how quiet the driver cabin in your car is, there is always a chance for unwanted noise, and nothing is more frustrating than a malfunctioning system that should recognize you but doesn’t. Talk about a distraction on the road.
“The enemy of speech recognition is noise – even more so for voice biometrics,” Brafman said.
Sensory, which focuses on embedded solutions and has been working with speech recognition and speaker identification for 20 years, has integrated speech technology in a new QNX concept car, though speaker recognition isn’t an option. The system lets users plot a route, pick music or perform pre-programmed actions.
Nuance Communications and BYD, a domestic auto manufacturer in China have recently teamed up and introduced a new sedan which includes Nuance’s voice recognition technology,though it also does not include support for speaker identification.
According to Brafman, speaker identification in cars is something that’s still actively being researched.
“We are working on it – we have a really novel approach to operating in noise,” Brafman said. “It’s possible by CES that you might see some interesting demonstrations of what can be done.”
Speaker recognition is also becoming an increasingly viable security and customization option for mobile devices, as with embedded systems, power consumption is quite low – almost negligible.
“if [recognition] is running on the applications processor at the OS level, [power consumption] can be significant,” Brafman said. “If it’s running deeply embedded […] it can run at only 2 milliamps, which is quite low.”
“Always-on and always-listening is definitely a possibility. You’ll be seeing a lot more of it.”
This is what makes the company’s most recent mobile integration – in the new Samsung Galaxy S4 — possible, Brafman said.
With all of this being said and despite some of the hurdles facing voice biometrics and speaker recognition today, it’s safe to say we’ve come a long way since Julie.