NIST plans for further biometrics bias evaluation unveiled in EAB webinar
Despite major gains in biometric matching accuracy in facial recognition systems across all demographic groups over the past several years, even measuring the extent of the problem of bias remains a challenge, as a series of experts told the audience of the European Biometrics Association’s (EAB’s) ‘Demographic Fairness in Biometric Systems’ virtual event.
In the mid-March session of the event, Margherita Natali of the United Nations Office of Counter-Terrorism spoke about the role of biometrics in border security management, the UN’s Compendium on Recommended Best Practices on the Use and Sharing of Biometrics, and the challenges to border systems, including facial recognition, emerging from COVID-19. She emphasized a human rights-based approach to the use of biometrics in border controls.
Aythami Morales presented a method for machine learning from privacy-preserving representations. He also showed the possibility of improving accuracy for all groups be implementing responsible AI practices that reduce bias.
Vincent Despiegel of Idemia discussed ways of mitigating bias, including balancing training databases, but also finding the correct loss function.
Patrick Grother of NIST spoke about the organization’s work on demographic differentials, the next report from which is estimated to come out in May, 2021.
Summary data on demographic performance differences may be included in a NIST FRVT leaderboard tab in the future, Grother says. Using the example of an Idemia algorithm which reduced false rejections from 13 percent in 2017 to 0.4 percent in 2021, he explained the dramatic gains in facial recognition accuracy over the past four years.
Describing the genesis of NIST’s investigations into bias in face biometrics, Grother reviewed the beginning of the controversy with the ‘Perpetual Line-up’ and ‘Gender Shades’ studies, and the Institute’s approach to quantifying the problem.
“What we wanted to do is put some specificity in the metrics that are involved,” he says. “If somebody asserted that face recognition was biased, what did that mean? Was it something to do with false positives or false negatives or failure to enroll? Was it about one-to-one systems? One-to-many systems?”
The common methodology of understanding this test, which is comparing the false non-match rate at a particular, policy-determined false acceptance rate, is wrong, Grother told event attendees. Rather than a fixed false-match rate, “we should report at a fixed threshold for an algorithm.” Differences in both false match rate and false non-match rate occur but are not reflected in statistics at a particular FMR.
Threshold setting procedure for face biometrics seems to come from fingerprint-matching practices, Grother observes, but NIST studies into the effects of age on face biometrics back in 2017 showed the limitations of this approach, which is essentially blind to demographic differences.
The recent demographic differences test showed the people are not typically matched with others from different parts of the world, but are more likely to be mistaken for someone from the same part of the world. Further, people from Europe are much less likely to be mistaken by algorithms for others from the same area, compared to people from other parts of the world. A leading algorithm from NTechLab shows FMRs around 1 in 33,000 for Europeans, but 1 in only 1,000 for Nigerians and 1 in 500 for Koreans. Other algorithms showed lower false positives, suggesting the value of different training data, but some showed even higher magnitudes of bias.
FNMR differentials are smaller, but Grother concludes that all algorithms have demographic differentials, and at this point, they consistently perform less accurately for women.
Idiap’s work on a biometric fairness measure based on worst-case differences in FMR and FMNR advances the field, Grother suggests, with “fairness discrepancy rate” calculated between 0 and 1, with 1 representing perfect balance in accuracy between demographics. NIST has developed an ‘Inequity Measure’ which generates a ratio, but more data is needed to generate effective measures for FNMR due to the high degree of uncertainty in the error rates, and further work is needed on reporting bias in 1:N algorithms.
Several developers have begun to grapple with the issue as NIST intended when it put out its report, Grother says, though many have not.
The EAB’s examination of demographic fairness in biometrics continued this week with presentations by Yevgeniy Sirotin, Jacob Hasselgren and John Howard of the Maryland Test Facility, among other international experts, so watch this space for Biometric Update’s continued coverage.