Biometric speaker recognition expanding along with voice applications
Speaker recognition is a biometric modality that uses input from an individual’s voice to identify that person or verify their identity claims. Since research from the 1960s that identified distinct physiological traits of the vocal tract and behavioral characteristics of each person when speaking, speaker recognition has caught on for its ease of use and the widespread availability of collecting devices, such as phones or microphones in computers.
As each person speaks, their vocal tract produces sound in conjunction with the jaw, tongue, larynx and nasal passage to create a distinctive acoustic pattern that is influenced by accent, rhythm, intonation and even the choice of vocabulary.
The characteristics of the voice are captured for enrollment and matching through either a text-dependent or a text-independent method. A text-dependent method requires the speaker to recite a specific password or phrase, typically one which is universal to all users. A text-independent method has no fixed phrasing and be used to covertly enroll the user without their knowledge or if they are uncooperative. Text-dependent systems tend to perform matching more quickly, while text-independent systems are seen as more user-friendly.
Once the voice is received, the unique pattern of speaking per individual is taken by voice recognition systems that transform it into data with a variety of approaches. It can be Hidden Markov Models, Gaussian Mixture Models, and pattern matching algorithms, among others. The data forms a unique profile for the individual that is compared to future inputs for verification (determining a person’s identity from a large database of voices) or identification (examining if the person is who they say they are from previous samples).
Voice recognition applications are commonly found in phone-based customer service, particularly for financial services, logical access control for call center staff, criminal forensics, and Internet of Things (IoT) applications, including connected car systems.
Voice biometrics are often combined with similar but separate technologies like speech recognition, such as in meeting transcription software.
A notable risk of speaker recognition is that a voice sample can be distorted from background noise, a weak microphone or a poor signal during the enrollment stage. This would impair the performance of the sample to the detriment of future recognition.
Alternatively, a person’s voice can be recorded or hacked out of a database to be spoofed, and some highly-sophisticated security attacks can even replicate or construct a voice sample with software that imitates the vocal patterns of a person. Challenge-based security measures against spoofing have been developed, such as randomizing a passphrase for every entry or creating a vocal sample with a text independent method. Voice liveness or presentation attack detection algorithms have also been developed to secure voice interactions against imposters.
The modality was forecast in early 2022 to make up a $20.9 billion market by 2026.
Click here for more explainers on concepts in the field of biometrics.