Alexa can’t understand what kids today are saying, either
Much work needs to be done if biometric systems are to accurately recognize the voices of children, according to Clarkson University research.
Although adult voices vary year to year and even hour to hour, their acoustic properties do not change as fundamentally as do the voices of children as they grow from school-aged to adolescence.
Researchers spent 2.5 years evaluating biometric speaker verification performance for available software and hardware. In their study, Clarkson researchers analyzed speaker verification over six sessions involving 30 children aged four to 14.
The results were noteworthy, but little more. The scientists report that the MFCC 20 verification system and GMM algorithm delivered the best results. That performance, however, “is not with the expected biometric recognition performance.”
The bigger picture is that the published research is thin for a technology that is changing how people, and particularly children, interact with their world.
Virtually all voice biometrics research and development has been conducted on adults. The result has been Siri, Alexa and other digital assistants that can identify speakers by their voices.
Speech recognition work with children’s voices is very complex, and it also is fraught with concerns over privacy, consent and the appearance of undue commercial profit from data collected from minors.
It will be difficult to know which, if any, voice verification apps are appropriate for children unless more research is done, Clarkson Professor and Researcher Stephanie Schuckers pointed out in a December Biometric Update interview.
The Clarkson paper notes that voice recognition for children has been based on physiological changes with gender and speech — not speaker — recognition. The ways that children’s voices change join other challenges to recognition, including background and channel noise, low-quality mics, illness and vocal stress.
Deep learning speaker recognition systems used in the study achieved “strong speaker recognition performance.” They showed improved recognition compared to classifiers that require “hand-crafted features,” according to the study.
But the algorithms had practical drawbacks when considering commercial use. Researchers found that they are more complex and demand massive amounts of labeled data. Their computation and storage costs also are high.