FB pixel

MIT AI training dataset pulled down for racist, sexist, vulgar labels as industry grapples with bias

 

artificial-intelligence-edge-computing-biometric-facial-recognition-apple

A database used in training systems for tasks like facial biometrics and object recognition has been taken down by the Massachusetts Institute of Technology (MIT) after The Register reported it includes racist, misogynistic and vulgar images and labels.

The 80 Million Tiny Images training dataset was created in 2008 to help advance object detection technology, but contains images describing women, Black and Asian people in derogatory language, as well as close-up pictures of sexual organs labeled with offensive slang terminology.

A paper on the dataset from startup UnifyID AI Labs Chief Scientist Vinay Prabhu and University College Dublin PhD candidate Abeba Birhane has been submitted to a computer vision conference for presentation next year. The researchers found that each of nine derogatory terms were used to label more than a thousand images. Training neural networks on such a database would build prejudice into the systems, and go beyond demographic performance differences to build a different kind of bias into AI.

MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) Professor of Electrical Engineering and Computer Science Antonio Torralba told The Register that in retrospect the school should have manually screened the labels used. He apologized on behalf of the lab and said the dataset has been taken down so the troubling content can be removed.

The school noted that between the size of the database and its “Tiny” images capable of running on the computing resources available when it was made, manual inspection may not be feasible or effective at removing the offensive images.

“We therefore have decided to formally withdraw the dataset. It has been taken offline and it will not be put back online. We ask the community to refrain from using it in future and also delete any existing copies of the dataset that may have been downloaded,” the statement reads.

The dataset was scrapped from Google Images, with images divided into roughly 75,000 categories. Torralba said the scraping was performed by pasting more than 53,000 different nouns form WordNet to search for images using them. WordNet was built at Princeton’s Cognitive Science Laboratory to examine the relationship between words, not specifically for association with images.

Even datasets purpose-built for training facial recognition systems have faced criticism for collecting images without consent, and even an IBM dataset created specifically to root out bias in AI has been targeted by litigation.

The debate over the role of imbalanced datasets in causing biased AI boiled over in a Twitter debate between Facebook Chief AI Scientist Yann LeCun and Google Ethical Artificial Intelligence Team Technical Co-Lead Timnit Gebru, summarized by Synced. The original point of contention was LeCun’s assertion that “ML systems are biased when data is biased,” to which Gebru responded that the problem extends beyond that to social and structural problems.

The University of Notre Dame has launched a new Tech Ethics Lab with IBM’s support to research issues like police use of facial recognition, The Washington Post reports.

IBM will invest $20 million over the next decade in the initiative, which seeks to apply ethics earlier in the development of new technologies.

Article Topics

 |   |   |   |   |   |   |   | 

Latest Biometrics News

 

Canada regulator backs privacy-preserving age assurance

The Office of the Privacy Commissioner of Canada (OPC) has published a policy note and guidance documents pertaining to age…

 

FCC seeks comment on KYC revision for commercial phone calls

The U.S. Federal Communications Commission (FCC) has proposed stronger KYC requirements for voice service providers to prevent scams and illegal…

 

Deepfake detection upgrade for Sumsub highlights continuous self-improvement

Sumsub has launched an upgrade to its deepfake detection product with instant online self-learning updates to address rapidly evolving fraud…

 

Metalenz debuts under-display camera for payment-grade face authentication

Unlocking a smartphone with your face used to require a camera placed in a notch or a punch hole in…

 

UK regulators pan patchwork policy for law enforcement facial recognition

The UK’s two Biometrics Commissioners shared cautionary observations about the use of facial recognition in law enforcement over the weekend…

 

IDV spending to hit $29B by 2030 as DPI projects scale: Juniper Research

Spending on digital identity verification (IDV) technology is projected to reach a 55 percent growth rate between now and 2030,…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events