FB pixel

Training dataset tower of babel collected for voice AI development

Training dataset tower of babel collected for voice AI development
 

A Chinese AI data services vendor claims to have built speech training datasets in at least 30 languages, a task that would make rolling out a multilanguage voice biometrics product more efficient.

Datatang executives say their speech recognition datasets are created with native language speakers and that surpass data quality standards. The company says it gathered signed authorization agreements to collect the data.

Failure to obtain consent from subjects for inclusion in datasets used to train biometrics and other algorithms has long been seen as a point of ethical failure within the AI community.

Among the languages covered are German, Spanish, Korean, French, Hindi and Japanese.

The Japanese set is something shy of 1,000 hours of spoken language useful for in-vehicle and smart home devices.

The Spanish set holds 3,000 hours spoken by natives of Spain, Mexico, Columbia, Venezuela and other nations. It also is pitched at vehicle and home use.

The Korean dataset, with about 2,000 hours, on the other hand, has speech relevant to economics, news and entertainment.

Last fall, Microsoft and Nvidia said they had trained the Megatron-Turing national language generation system, which perform speech recognition tasks including natural language inferences.

Article Topics

 |   |   |   |   |   |   |   |   |   | 

Latest Biometrics News

 

Cameroon ends 2024 biometric voter registration drive with 755k new enrollments

The Director General in charge of Elections at Cameroon’s elections management agency (ELECAM), Dr Erik Essousse, says 755,085 new potential…

 

Malaysia completes biometric border clearance pilot at Singapore border

Authorities in the Malaysian state of Johor say plans are being finalized for the implementation of a biometric border clearance…

 

New Burkina Faso biometric passport further cements ECOWAS departure

The government of Burkina Faso has unveiled a new generation biometric passport in a move that highlights the countries unwillingness…

 

India to digitize the agricultural sector through unique digital farmer ID

India’s Finance Minister Nirmala Sitharaman announced the implementation of DPI for agriculture in the Union Budget 2024-25. The approved Digital…

 

Protean acknowledged for leadership in digital public infrastructure

Protean Tech has been recognized for its contributions to the digital public infrastructure (DPI) sector at the 2024 Global Fintech…

 

Federal law enforcement must now conduct transparent, standardized AI field testing

A White House advisory panel voted to approve a 24-page report that sets forth specific actions that all federal law…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Most Read This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events