NIST’s Patrick Grother on bias, biometrics evolution and plans to test face recognition with masks
As the world’s foremost authority on the biometric performance of face recognition, the National Institute of Standards and Technology (NIST) is receiving increased attention for its work in the field as the technology’s use grows, and civil society becomes increasingly concerned with its power.
The agency seeks to serve stakeholders among the algorithm developers and their customers, so issues like alleged bias, or performance differences among different demographics, take greater prominence in testing and reports to reflect those stakeholder’s concerns, NIST Biometric Standards and Testing Lead Patrick Grother tells Biometric Update in an interview.
Differences in performance between different demographics have been known about for years, and was first looked at by NIST back in 2002. Reports from academic institutions, however, brought it to greater attention among government customers of face recognition technology, and the associated policy makers, and motivated NIST to directly address the issue in a report released at the end of 2019.
“The U.S. government stakeholders were doing the responsible thing of trying to understand what the technical information was on this topic,” Grother observes. “Were the points raised by Georgetown, and less so by MIT, were they valid, were they substantive, were the effects large, were they small? They wanted insight, so they did the right thing by asking us to look at this, I think.”
The prospective end-users group also has a high understanding of what the criteria and factors for effective biometric algorithms are, if not quite as deep as the developer stakeholder group, Grother says, and they are well-equipped to understand NIST reports. Some in the media, however, and the public by extension, had difficulty properly framing what the issue was that NIST was looking into.
“We did try to include some text in the report to sort of guide reporting on that by both the academic community who sometimes evaluate algorithms, in addition to developing algorithms, and to the press community. During the development of the report that we wrote I’d given a number of interviews,” Grother recounts. “Often they would begin with a statement like ‘well, face recognition is biased.’ I would ask them for a qualification. ‘Well, what do you mean? Which algorithm? Or which system?”
False positives, false negatives, or failures to detect a person could have significantly different impacts on the effectiveness of an application, and outcomes for its users.
“There was all this specificity that was missing in the discussion before we put out the report, so we tried to expose the landscape of things that could happen, even if we didn’t actually test them,” Grother adds.
Many media members and rights advocates seem to have a flawed understanding of what the report really indicates, but Grother emphasizes that NIST’s role is to serve its two stakeholder groups. Companies are working on the problem, and Grother notes efforts by Onfido to reduce performance disparities between demographics of users in its system by repeatedly sampling minor categories of training data available, an approach which was presented at a computer vision conferencecomputer vision conference in March. Grother believes that other developers are probably thinking along similar lines.
The potential customer group is also doing its own work. “I’m aware of efforts within the U.S. government to do more testing,” Grother notes.
Supporting those kinds of efforts is also why the 82 pages of the report are followed by numerous annexes, some running to hundreds of pages.
“The reason for doing that was to push demographic performance data back to the developers, and for them to reason about the consequences.”
In addition to the data, the report also includes the suggestion to customers to “Know your algorithm,” which means that organizations implementing the technology should know sensitivity of their chosen algorithm to demographic effects, and other effects like poor image quality.
That’s where the importance of testing by customers like government agencies comes in.
Generalizations about the state of the industry, on the other hand, do not advance either the technology or understanding of it; this is why NIST does not aggregate or average performance data.
“The average is not a meaningful number,” Grother points out, as the performance of the algorithm used for a given implementation is the only one that matters. While this approach does not supply the kind of pithy general summary of the results some in the consumer media seem to be searching for, there are a community of experts and consultants who are very capable of explaining them. Further, NIST has made efforts to make the reports easier to understand.
“The reports that we write sometimes get criticized for being too technical and inaccessible, and we’ve tried to solve that with material that helps people interpret charts and graphs, and the things we’ve put in the reports,” Grother says. “We’re not going to dilute the technical content, but we can supplement the technical comment with informative material for interpretation.”
Still, even if everyone understood how to interpret the data, “different communities out there have different applications in mind,” Grother observes. For rights advocates concerned about mass surveillance, as has been written about in China, the relevant metrics will be different than, for instance, a typical access control application.
For the government agencies looking to NIST for guidance, airport applications continue to be one of the main areas of interest.
As processes are shifted towards frictionless experiences with less physical interaction, the customer community has become increasingly interested in one-to-many facial recognition with cooperative subjects for passing airport checks without presenting documents. “Some of our data, and an increasing amount of our data going forward will represent that application,” states Grother.
NIST’s plans for future biometrics testing have not only been delayed by the COVID-19 pandemic, along with most of the world’s work, they have also been influenced. Facial recognition provides contact identification, and Grother notes that for iris recognition, which NIST tests with its IREX series, developers have been working on stand-off and on-the-move identification for a decade or more. The effectiveness of facial biometrics with subjects wearing masks, however, is an area in which there have been a lot more claims of late than tests.
“What happens when you occlude the mouth region with a face mask? And is face recognition undermined by that to any great extent?” Grother asks. “That’s one example of what we’re trying to do to support informed usage of face recognition with more quantitative data.”
As with demographic differences, that cold hard data is what will allow technology providers and customers alike to make informed decisions and continue to improve biometrics processes.
accuracy | algorithms | biometric testing | biometric-bias | biometrics | Face Recognition Vendor Test (FRVT) | facial recognition | NIST | standards | U.S. Government