Facial biometrics study suggests Amazon performance unchanged as Microsoft launches antibias tool
Comparitech has released a report on biometric bias in Amazon’s facial recognition offering, similar to previous reports from the ACLU comparing congressional representatives to mugshots and reporting on the potential matches offered up by the system. The report comes just days after Microsoft announced it will launch an antibias tool for machine learning algorithms to Azure.
The report shows demographic differences in Amazon Rekognition’s performance have not changed much, according to Comparitech, as 32 Congresspersons out of 530 (6 percent) were matched with mug shots when the confidence threshold was set at 80 percent. Out of 1,429 UK lawmakers searched against the same database, 73 were found (5 percent). Perhaps more significantly, of 12 false matches at a confidence threshold of 90 percent, six are people of color, even though people of color make up roughly a fifth of Congress and half that fraction in UK parliament.
The “confidence threshold” setting was one of the key reasons Amazon dismissed the original ACLU report, and Amazon now recommends a confidence threshold of 99 percent be used by law enforcement customers, though it is up to the individual agency to set its threshold. In a blog post responding to the original ACLU report, Amazon ran the same experiment with the confidence threshold at 99 percent and a larger reference dataset to match against, and found no false matches.
As Comparitech notes, the company had previously provided a confidence threshold recommendation of 95 percent for law enforcement, and some police forces decline to use the setting at all.
With the confidence threshold set at 95 percent, however, Comparitech found no false matches, making seems to make clear that law enforcement should use 95 percent or higher, rather than 80 percent confidence thresholds, if they are going to use the AI service.
Accuracy was the other point of contention, particularly after a University of Exeter study which showed that 81 percent of positive matches were false positives. This finding is often presented as an 81 percent overall error rate, though the vast majority of attempted matches were non-matches, none of which are known to be false.
Comparitech claims that including true negatives inflates the accuracy of the system, but declines to explain why they should be excluded.
Industry stakeholders have repeatedly attempted to explain how accuracy in biometric matching is properly measured, as Allevate did late last year in a refutation of a University of Essex report.
Comparitech concludes, as Amazon does in its blog post, that facial recognition technology is not ready for use in public for identification without human oversight.
Microsoft, meanwhile, has launched a tool to help reduce bias in machine learning to its Azure AI platform. The new Fairlearn toolkit will be launched to Azure Machine Learning in June, the company announced during its virtual Build developers conference.
It was introduced at the Ignite event in November, and has been tested by EY on a system for automating loan decisions. The firm had found a 15.3 percent difference in its loan approvals between men and women using an algorithm based on information like transaction, payment, and credit history. After training new machine learning models with Fairlearn, EY found the disparity was reduced to 0.43 percent.
“Increasingly we’re seeing regulators looking closely at these models,” said Eric Boyd, Microsoft CVP of Azure AI, said in a statement. “Being able to document and demonstrate that they followed the leading practices and have worked very hard to improve the fairness of the datasets are essential to being able to continue to operate.”
The company notes the importance of ensuring bias is not automated as machine learning is applied to applications like facial recognition for law enforcement.
NIST testing has found that demographic differentials are present in the majority of facial recognition algorithms, but not consistently enough to generalize across the biometric modality.