FB pixel

Gender equality in speech recognition inherently challenging

Gender equality in speech recognition inherently challenging

Voice recognition technology is less accurate when applied to women than men due in part to the design of speech systems, but also because of inherent physiological differences, according to a blog post by Delip Rao, co-founder of AI speech recognition startup R7 Speech Sciences.

The differential error rates of speech samples from male and female speakers make training AI systems to recognize both equally difficult, Rao writes, and the problem is often exacerbated by commonly-used technologies such as MFCCs (Mel-frequency cepstral coefficients).

Mean fundamental frequency, or mean F0, which is related to the perception of pitch, is usually around 120Hz for men, and closer to 200Hz for women, and can also depend on ethnicity, smoking, sickness, and other factors. Rao also notes that the notion of gender in mean F0 is limited to biological gender at puberty.

“Speech systems designed without mindfulness to the extent of this problem can make an already hard problem worse,” he writes. “Fortunately, with recent deep models for speech, we can build models that directly learn from raw waveforms, throw a lot of data and compute at it, and hope the models have enough capacity to reliably encode class-specific variation. This is appealing but also sort of favors large companies than smaller startups that push out new technologies all the time. But with sufficient thought, many of these over-provisioned deep models may be replaced with simpler deep models.”

Kaggle Data Preparation Analyst Rachael Tatman told The Register that while MFCCs are not inherently less effective for modeling women’s speech, “there’s a slightly less robust acoustic signal for women, it’s more easily masked by noise, like a fan or traffic in the background, which makes it harder for speech recognition systems. That will affect whatever you use for your acoustic modelling, which is what MFCCs are used for.”

Rao suggests that with the increasing popularity of voice-activated digital assistants like Apple’s Siri, the opinions of women speech researchers should be sought about the speech models in production, and how to improve them.

Facial recognition systems have been shown to perform less accurately both for women, and for darker skinned people, leading to consideration of the problem by a congressional subcommittee seeking to guide government application of AI.

Article Topics

 |   | 

Latest Biometrics News


Michigan City Council orders comprehensive facial recognition policy for local police

In a move aimed at safeguarding civil liberties, the City Council of Ann Arbor in Michigan has taken a decisive…


Video deepfake fraud threat is real, helplessness is not: ID R&D webinar

Deepfakes have become a cause for common concern, with articles and viral posts warning of their power to deceive. Real-life…


Sumsub expands data sources to improve KYB

Sumsub has provided upgrades to its Business Verification platform aimed at tackling the common challenges that businesses encounter during the…


DHS reinterprets foreign worker fees to fund biometric border system

The U.S. Department of Homeland Security has proposed a way to fund its Biometric Entry-Exit program by changing the fee…


NIST adds flexibility, digital format to security requirements for federal contractors

The U.S. National Institute of Standards and Technology has updated its guidance for how businesses working with the federal government…


iOS 18 APIs suggest more digital ID integrations coming to Apple Wallet

A set of APIs bundled with iOS 18 indicates that more digital IDs may soon be integrated with Apple Wallet….


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Most Read This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events