FB pixel

New speech datasets, software target greater inclusion

New speech datasets, software target greater inclusion

The University of Illinois Urbana-Champaign (UIUC) has unveiled the Speech Accessibility Project, an initiative to make voice biometrics and speech analysis systems more inclusive of diverse speech patterns for people with disabilities.

According to a blog post on the UIUC website, the project will be supported by tech giants Amazon, Apple, Google, Meta, and Microsoft, alongside various nonprofits.

The Speech Accessibility Project will focus on developing speech recognition and biometric systems capable of interpreting speech patterns associated with disabilities like Lou Gehrig’s disease (ALS), Parkinson’s disease, cerebral palsy, and Down syndrome.

To this end, the initiative will see the collection of speech samples from paid volunteers representing a diversity of speech patterns.

The samples will then be compiled into a private, de-identified dataset that can be used to train machine learning models to understand various speech patterns better.

The Speech Accessibility Project will initially focus on American English. It will be led by Mark Hasegawa-Johnson, the UIUC professor of electrical and computer engineering, with the support of Heejin Kim, a research professor in linguistics, and Clarion Mendes, a clinical professor in speech and hearing science and a speech-language pathologist.

The initiative will also see the participation of several staff members from UIUC’s Beckman Institute for Advanced Science and Technology and community-based organizations Davis Phinney Foundation and Team Gleason, which will assist in participant recruitment, user testing and feedback.

OpenAI releases multilingual speech recognition system

OpenAI has made its speech recognition software Whisper available as open source models and inference code.

Trained on 680,000 hours of multilingual and multitask supervised data collected from the web, Whisper “approaches human level robustness and accuracy” on English speech recognition, according to OpenAI.

“We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language,” the company wrote on a web page dedicated to Whisper.

“Moreover, it enables transcription in multiple languages, as well as translation from those languages into English.”

According to the company, other existing approaches frequently use smaller, more closely paired audio-text training datasets or broad but unsupervised audio pretraining.

“Because Whisper was trained on a large and diverse dataset and was not fine-tuned to any specific one, it does not beat models that specialize in LibriSpeech performance, a famously competitive benchmark in speech recognition,” OpenAI explains.

“However, when we measure Whisper’s zero-shot performance across many diverse data sets, we find it is much more robust and makes 50 percent fewer errors than those models.”

Additionally, the company said roughly a third of Whisper’s audio dataset is non-English. The program is either given the task of transcribing in the original language or translating to English.

“We find this approach is particularly effective at learning speech-to-text translation and outperforms the supervised SOTA on CoVoST2 to English translation zero-shot.”

Because the system was trained on a remarkably diversified dataset, however, Whisper does not always perform at its best when predicting text, sometimes including words that were not spoken (but present in its memory, ‘learned’ via training).

Just like any other AI system, the software also has limitations when it comes to speakers of languages that are not well-represented in the training data.

Despite these limitations, a recent analysis of Whisper by VentureBeat suggests the speech analysis software represents a potential ‘return to openness’ for OpenAI after being harshly criticized by the community for not open-sourcing its GPT-3 and DALL-E models.

In particular, Whisper can be run on various devices, from laptops to desktop workstations, from mobile devices to cloud servers. Each size of Whisper calculates accuracy and speed proportionately, based on the device it is running on.

The open source community already uses the voice tool, with journalist Peter Sterne and GitHub engineer Christina Warren recently unveiling a joint project aimed at creating a transcription app for journalists.

Article Topics

 |   |   |   |   |   |   |   |   | 

Latest Biometrics News


Biometrics developers dance with data privacy regulations continues

Biometrics controversy and investments are often found side by side, as seen in many of this week’s top stories on…


EU AI Act should revise its risk-based approach: Report

Another voice has joined the chorus criticizing the European Union’s Artificial Intelligence Act, this time arguing that important provisions of…


Swiss e-ID resists rushing trust infrastructure

Switzerland is debating on how to proceed with the technical implementation of its national digital identity as the 2026 deadline…


Former Jumio exec joins digital ID web 3.0 project

Move over Worldcoin, there’s a new kid on the block vying for the attention of the digital identity industry and…


DHS audit urges upgrade of biometric vetting for noncitizens and asylum seekers

A recent audit by the DHS Office of Inspector General (OIG) has called for the Department of Homeland Security (DHS)…


Researchers spotlight Russia’s opaque facial recognition surveillance system

In recent years, Russia has been attracting attention for its use of facial recognition surveillance to track down protestors, opposition…


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Most Read This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events