Bias and morphing remain challenges in face biometrics, but measures are improving
A pair of the most pressing risks to the effectiveness of face biometric systems were under the microscope during the final day of last week’s International Face Performance Conference 2022.
The morning session on demographics and security was moderated by Patrick Grother. He also delivered the first presentation, on NIST IR 8429, which is currently in draft form but near official completion. The document sets out a method for summarizing demographic differentials, which in biometrics are often referred to as bias. With regulations increasingly incorporating requirements to assess bias, the method could soon be widely cited and used.
Interestingly, NIST research shows that facial recognition systems in some cases are more often unsuccessful matching (by false non-match rate) women, but the reverse is true in 12 to 15-year-old and 16 to 18-year-old age groups.
“You can see these numbers often trending in the right direction,” Grother says. “But not always; some developers don’t seem to have paid any attention to that.”
This method is being standardized in ISO/IEC 19795-10, which John Howard of SAIC discussed in the following presentation. Howard compared the Fairness Discrepancy Rate and Inequity Rate models of fairness developed by the Idiap Institute and NIST, respectively. These observations informed the creation of the Gini Coefficient model.
He proposes Pareto Optimization as a way of optimizing both fairness and overall performance.
More data, however is needed, especially in operational situations, like images collected from CCTV feeds. More models would help too.
Yevgeniy Sirotin and Howard presented research on feature vector clustering as a method of addressing broad homogeneity effects. This is an effect found in face biometrics much more than in other modalities, which makes intuitive sense, but also means work on assessing the issue is still in its early stages.
This is important, as Sirotin points out, because criminals attempting to beat a face biometrics system know to spoof a given person’s face with one that is demographically similar.
Michael King from the Florida Institute of Technology brought in the role of the labels used in datasets, and false biometric classifications.
Stephanie Schuckers and Keivan Bahmani presented progress of a CITeR longitudinal study on matching the faces of children as they age. The Young Face Aging dataset was collected from 231 subjects over the course of 3 years, with images taken every 6 months. Using this dataset with common matchers indicated that the best matchers reach viable accuracy rates when analyzing high-quality samples.
The afternoon session on day three of IFPC 2022 was moderated by Mei Ngan of NIST, and focussed on face morphing.
Matjaž Torkar of Slovenia’s Ministry of the Interior presented real-world examples of face morphing. The phenomenon appears to have begun in 2020, when multiple passengers on the same flight were caught. They had successfully passed through e-gates, but one was caught by an alert Polish border guard. Those holding morphed passports were seeking refugee status in Canada, and paid between 15,000 and 30,000 Euros.
Ngan shared NIST’s FRVT MORPH results, which are not encouraging. More accurate algorithms tend to be particularly vulnerable. Morph detection systems are showing progress, however.
Matteo Ferrara of the University of Bologna noted that ABC gates use multiple frames to match the face presented to the passport. Different strategies are used by different vendors to decide which frame to use.
A team of academics discussed morph generation techniques, and then Kiran Raja of NTNU reviewed the state of the art of morph attack detection.
The event concluded with a presentation from Frøy Løvåsda of Norway’s National Police Directorate on the ability of humans to detect face morph attacks. The research was part of the iMARS project, and shows that performance varies very widely. So does how long the human assessment takes, with little or no correlation between the time spent and success rate. Encouragingly, people seemed to get better with practice.
More research is planned, and potentially certification for high-performing examiners.