Researchers find major demographic differences in speech recognition accuracy

Mar 25, 2020, 10:45 am EDT | Chris Burt

Categories Biometric R&D | Biometrics News

Researchers find major demographic differences in speech recognition accuracy

Research indicates that speech recognition technology from the world’s leading consumer technology brands performs with different degrees of accuracy for different demographics, or as some would say is “biased” against black people.

A team of academics from Stanford University tested automated speech recognition (ASR) systems from Amazon, Apple, Google, IBM, and Microsoft for the paper “Racial disparities in automated speech recognition,” in the Proceedings of the National Academy of Sciences journal, and found that they misidentified roughly 19 percent of words uttered by white people, but word error rate (WER) was 35 percent for speech of black people. Audio snippets from white speakers were considered incomprehensible 2 percent of the time, while for black speakers the systems could not read 20 percent.

To analyze WER for different linguistic groups, the researchers took the Corpus of Regional African American Language (CORAAL) dataset compiled in three U.S. communities and samples from the Voice of California (VOC) dataset. Human experts transcribed interview snippets 5 to 50 seconds long, and their results were compared with those of the machine-learning algorithms from the above-mentioned tech giants.

The researchers propose increasing the diversity of training datasets, and including African American Vernacular English, to reduce performance differences.

Apple had the highest error rates for both datasets, and a WER discrepancy of more than 20 percent. Google and Microsoft had the smallest discrepancies, but both were still over 10 percent, and Amazon’s WER for black speakers was equal to that of Google, but its algorithm was slightly more accurate for white speakers. Microsoft’s system was the only one with a WER for black people below 30 percent.

The findings also include some insight into geographic distribution, as speech collected from black speakers in rural and heavily urban settings (Princeville, North Carolina and D.C.) had higher error rates than speech collected in Rochester, NY.

Two different possible explanations for the differences were explored by the researchers; a gap in the lexicon and grammar of the language models used, such as black people using words not included in the ASR systems, and a performance gap in the systems’ acoustic models.

Words spoken by white and black people were identifiable in the vocabulary of Google’s ASR 98.6 percent and 98.7 percent of the time, however. When phrases with identical text were analyzed, the ASR technology made more errors with samples spoken by black speakers, indicating that differences in pronunciation and prosody, such as rhythm, pitch, syllable accenting, vowel duration, and lenition may be behind the performance differences.

Bias has been a significant issue in facial biometrics, where NIST testing has shown differences in accuracy vary widely between different vendors.

R7 Speech Sciences Co-founder Delip Rao explained in a blog post in 2018 that inherent physiological differences between men and women make it difficult to train AI speech recognition systems to perform as accurately with speech from women.

Voice and speech recognition are expected to make up a $26.8 billion market by 2025.

Article Topics

biometric dataset | biometrics | research and development | speech recognition | training

Researchers find major demographic differences in speech recognition accuracy

Article Topics

Comments

Leave a ReplyCancel reply

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events

Researchers find major demographic differences in speech recognition accuracy

Article Topics

Latest Biometrics News

Governments grappling with biometrics to ease airport, public service access

Biometric Update Podcast: Claire Ma explores the next phase of government digital identity

Trusted Caller ID with digital wallet and VCs improves call center authentication

EES records 66M border crossings in first six months despite rollout friction

IDDEEA outlines role of e-signatures in Bosnia’s digital transformation

Luxembourg opens tender for AI-generated content detection tool

Comments

Leave a ReplyCancel reply

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events