New facial biometric bias study from Buolamwini fans flames of controversy
Significant improvements have been made recently in the poor accuracy of facial recognition algorithms for recognizing women with darker skin, but several leading technologies misclassify their gender in 20 percent or more of cases, according to research published by MIT researcher Joy Buolamwini and reported by the New York Times.
Buolamwini tested facial recognition technologies from several providers with a dataset she created and calls the Pilot Parliaments Benchmark, which compares groups nearly even between men and women, and darker and lighter skin. The test found significant improvement compared to the results of her previous test published in early 2018, but also added benchmarks for facial biometric technology from Amazon and Kairos.
Buolamwini also presented some of the research at the World Economic Forum’s annual meeting in Davos, discussing how “the coded gaze” (a reference to “the male gaze” identified in film criticism) transmits biases from programmers through data sets and into artificial intelligence systems.
The initial round of tests with the new data set yielded 87.9 percent accuracy for IBM’s facial recognition technology, 90 percent for Megvii’s Face++, and 93.7 percent for Microsoft, and Buolamwini notes “the numbers seem ok.” Analyzed by gender, skin type, and both variables, the systems are significantly less accurate for certain groups. Females with darker skin were accurately matched in less than 80 percent of cases by Microsoft, and Face++, which performed slightly better than Microsoft for males with dark skin, scored only 65.5 percent accuracy for dark-skinned females. IBM’s technology matched females with dark skin at only a 65.3 percent success rate. When asked to classify subjects by gender, error rates from multiple provider’s systems for those women with the darkest skin surpassed 40 percent.
Buolamwini informed the companies of the results, and some significant improvements were made, including IBM’s accuracy identifying women with dark skin jumping to 83.5 percent. Buolamwini pointed out that the improvements debunk any suggestion that the core of the problem is related to physics.
Amazon’s Rekognition misidentified the gender of 7.5 percent of the data set, and more than 16 percent of women, and more than 13 percent of people with darker skin. Females with darker skin were misclassified 31 percent of the time by Amazon, and 22.5 percent of the time by Kairos. The Times reports that Kairos CEO Melissa Doval, who took over the company amidst a dispute which may be significantly related to Buolamwini’s concerns, said the company released a new algorithm in October, inspired by the research.
In a letter (PDF) to Amazon CEO Jeff Bezos, Buolamwini called on Amazon to stop providing its facial biometric products to law enforcement. The test did not, however, use the latest version of Amazon’s facial recognition service Rekognition, the product in testing by Orlando police, and used it for analysis, rather than recognition. The company did not respond directly, and the technology’s accuracy had not improved when retested after a couple of months.
“It’s not possible to draw a conclusion on the accuracy of facial recognition for any use case – including law enforcement – based on results obtained using facial analysis,” Amazon Web Services General Manager of Artificial Intelligence Dr. Matt Wood said in an emailed statement to Biometric Update. “The results in the paper also do not use the latest version of Rekognition and do not represent how a customer would use the service today. Using an up-to-date version of Amazon Rekognition with similar data downloaded from parliamentary websites and the Megaface dataset of 1M images, we found exactly zero false positive matches with the recommended 99% confidence threshold.”
“We continue to seek input and feedback to constantly improve this technology, and support the creation of third party evaluations, datasets, and benchmarks,” added Dr. Wood. “We have provided funding for academic research in this area, have made significant investment on our own teams, and will continue to do so. Many of these efforts have focused on improving facial recognition, facial analysis, the importance of high confidence levels in interpreting these results, the role of manual review, and standardized testing. Improvements made in these areas are routinely made available to customers using Amazon Rekognition; most recently through a significant update to the service in November 2018. Accuracy, bias, and the appropriate use of confidence levels are areas of focus for AWS, and we’re grateful to customers and academics who contribute to improving these technologies.”
“Technology like Amazon’s Rekognition should be used if and only if it is imbued with American values like the right to privacy and equal protection,” Massachusetts Democratic Senator Edward J. Markey told the Times. “I do not think that standard is currently being met.”
Buolamwini is also the founder of the Algorithmic Justice League, which launched the Safe Face Pledge campaign to have companies commit to ethical standards for facial recognition use in December.