FB pixel

Training dataset tower of babel collected for voice AI development

Training dataset tower of babel collected for voice AI development
 

A Chinese AI data services vendor claims to have built speech training datasets in at least 30 languages, a task that would make rolling out a multilanguage voice biometrics product more efficient.

Datatang executives say their speech recognition datasets are created with native language speakers and that surpass data quality standards. The company says it gathered signed authorization agreements to collect the data.

Failure to obtain consent from subjects for inclusion in datasets used to train biometrics and other algorithms has long been seen as a point of ethical failure within the AI community.

Among the languages covered are German, Spanish, Korean, French, Hindi and Japanese.

The Japanese set is something shy of 1,000 hours of spoken language useful for in-vehicle and smart home devices.

The Spanish set holds 3,000 hours spoken by natives of Spain, Mexico, Columbia, Venezuela and other nations. It also is pitched at vehicle and home use.

The Korean dataset, with about 2,000 hours, on the other hand, has speech relevant to economics, news and entertainment.

Last fall, Microsoft and Nvidia said they had trained the Megatron-Turing national language generation system, which perform speech recognition tasks including natural language inferences.

Article Topics

 |   |   |   |   |   |   |   |   |   | 

Latest Biometrics News

 

Cyber Threat Observatory workshop advises on protections for national digital ID systems

The Alan Turing Institute launched the Cyber Threat Observatory last year to monitor cyber threats to digital ID systems. The…

 

Kyrgyzstan state printer wades into biometric passport market with Namibia deal

A shipment of 130,000 biometric passports has been sent from Kyrgyzstan to Namibia, after a contract was signed between the…

 

Spanish law among most comprehensive for age checks, kids’ online safety

Among EU nations pursuing child online safety legislation and age verification tools, Spain has been at the forefront. It has…

 

UN cautions govts to safeguard human rights in AI procurement

AI is a major trend of this decade with advancements in the technology having an effect across society, for both…

 

Optimistic plan would pair universal legal identity with basic income program

A new paper calls the lack of legal identity for millions of people around the world one of the “most…

 

Facia declares breakthrough deepfake detection scores

Facia has reached the point where it is scoring perfect accuracy for deepfake detection on third-party datasets, including Meta’s. The…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Biometric Market Analysis

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events