Gender equality in speech recognition inherently challenging

Mar 14, 2018, 7:17 pm EDT | Chris Burt

Categories Biometrics News | Voice Biometrics

Gender equality in speech recognition inherently challenging

Voice recognition technology is less accurate when applied to women than men due in part to the design of speech systems, but also because of inherent physiological differences, according to a blog post by Delip Rao, co-founder of AI speech recognition startup R7 Speech Sciences.

The differential error rates of speech samples from male and female speakers make training AI systems to recognize both equally difficult, Rao writes, and the problem is often exacerbated by commonly-used technologies such as MFCCs (Mel-frequency cepstral coefficients).

Mean fundamental frequency, or mean F0, which is related to the perception of pitch, is usually around 120Hz for men, and closer to 200Hz for women, and can also depend on ethnicity, smoking, sickness, and other factors. Rao also notes that the notion of gender in mean F0 is limited to biological gender at puberty.

“Speech systems designed without mindfulness to the extent of this problem can make an already hard problem worse,” he writes. “Fortunately, with recent deep models for speech, we can build models that directly learn from raw waveforms, throw a lot of data and compute at it, and hope the models have enough capacity to reliably encode class-specific variation. This is appealing but also sort of favors large companies than smaller startups that push out new technologies all the time. But with sufficient thought, many of these over-provisioned deep models may be replaced with simpler deep models.”

Kaggle Data Preparation Analyst Rachael Tatman told The Register that while MFCCs are not inherently less effective for modeling women’s speech, “there’s a slightly less robust acoustic signal for women, it’s more easily masked by noise, like a fan or traffic in the background, which makes it harder for speech recognition systems. That will affect whatever you use for your acoustic modelling, which is what MFCCs are used for.”

Rao suggests that with the increasing popularity of voice-activated digital assistants like Apple’s Siri, the opinions of women speech researchers should be sought about the speech models in production, and how to improve them.

Facial recognition systems have been shown to perform less accurately both for women, and for darker skinned people, leading to consideration of the problem by a congressional subcommittee seeking to guide government application of AI.

Article Topics

artificial intelligence | research and development | voice recognition

Gender equality in speech recognition inherently challenging

Article Topics

Comments

Leave a ReplyCancel reply

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events

Gender equality in speech recognition inherently challenging

Article Topics

Latest Biometrics News

Governments grappling with biometrics to ease airport, public service access

Biometric Update Podcast: Claire Ma explores the next phase of government digital identity

Trusted Caller ID with digital wallet and VCs improves call center authentication

EES records 66M border crossings in first six months despite rollout friction

IDDEEA outlines role of e-signatures in Bosnia’s digital transformation

Luxembourg opens tender for AI-generated content detection tool

Comments

Leave a ReplyCancel reply

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events