Fairness in facial recognition hinges on mix of factors, including cultural norms

A new research paper looks at the contentious issue of demographic fairness – or bias – in facial recognition systems. Authored by researchers from Idiap and published in IEEE Transactions on Biometrics, Behavior, and Identity Science, the paper (“Review of Demographic Fairness in Face Recognition”) systematically examines the “primary causes, datasets, assessment metrics and mitigation approaches associated with performance differences in facial recognition across demographic groups.”
In doing so, it “aims to provide researchers with a unified perspective on the state-of-the-art while emphasizing the critical need for equitable and trustworthy FR systems.”
“As FR technologies are increasingly deployed globally, disparities in performance across demographic groups – such as race, ethnicity, and gender – have garnered significant attention,” the paper says. It cites several real-world incidents that “underscore the societal risks associated with such disparities.”
Most of the incidents of false identifications by FRT involve Black people. As such, the authors’ major focus is on race and ethnicity, but “also include gender-related studies within the broader context.” Age is, for the most part, not in scope.
Having acknowledged the problem of potential bias in facial biometric systems, it notes that the “issue has been formally incorporated into the evaluation frameworks of prominent initiatives.” Specifically, the National Institute of Standards and Technology (NIST)’s Face Recognition Vendor Tests (FRVT) benchmarks.
NIST a ‘key reference’ for FRT fairness; data sets, image quality matter
“Since 2019, FRVT reports have incorporated analyses of demographic disparities, making them a key reference for assessing fairness in FR.” NIST has found that the majority of facial recognition algorithms are more likely to misidentify people with darker skin, women and the elderly – although the most accurate algorithms show very low differentials in the latest testing.
Other initiatives touching on bias include the Maryland Test Facility (MdTF) and the European Association for Biometrics (EAB), “though at a smaller scale and scope compared to NIST.”
In the section analyzing the causes of varied performance in facial recognition systems, its categorization system “encompass factors such as imbalances in training datasets, variability in skin-tones, algorithmic sensitivity, image quality, related covariates, combined or intersectional factors, and soft attributes.” It stresses that biased decisions often result from a combination of factors in concert.
Skin tone is imprecise; as a metric, ‘skin reflectance’ is better
On skin tone, it references a 2019 report from the Biometric Technology Rally organized by MdTF, which notes how skin reflectance – the measurable amount of light reflected from the skin surface – is a better metric than “skin tone,” which “refers to perceived skin color.”
“Using systematic linear modeling, their study demonstrated that darker skin-tones were associated with longer transaction (processing overall pipeline) times and lower accuracy in biometric systems.” Longer transaction times were primarily attributed to “difficulties in the face detection or image acquisition stage under suboptimal lighting conditions.”
“Lower skin reflectance can reduce contrast, making it harder for detection algorithms to localize the face, thus increasing processing time. This dependency was found to vary substantially across systems, highlighting the important role of acquisition methods in determining the extent of performance differences.”
Regardless, having interrogated the evidence, the authors add the important note that, “while many studies report that individuals with lighter skin tones tend to be recognized more accurately than those with darker skin-tones, there is no consistent consensus that skin tone is the primary driver of differences in FR performance across demographic groups.”
It was the mustache all along: ‘soft’ attributes contribute to bias
The report lists datasets used for the study of demographic accuracy differences, and gets highly technical in its explanation of assessment methods and metrics. It looks at bias mitigation systems across the biometric processing lifecycle, and casts a glance at future directions to be explored if remaining challenges are to be overcome.
In conclusion, it identifies training data imbalance, skin-tone variations and image quality as key factors in facial recognition, “as well as the growing recognition of non-demographic attributes,” such as facial hair, hairstyle and makeup, in shaping recognition outcomes. “These factors, though not inherently demographic, are deeply intertwined with social and cultural norms that vary across gender and ethnicity.”
“In the context of FR, these soft attributes function as partial occlusions or lead to shifts in the underlying data distribution.” And when these occlusions correspond with specific demographic groups, they “contribute to unequal recognition outcomes and further exacerbate existing disparities, mimicking demographic bias.”
In short, more so than skin tone or gender in themselves, specific features like beards or hairdos may be causing facial recognition systems to treat certain demographics less fairly.
“Recent studies have demonstrated that many of the observed demographic fairness in FR may in fact be driven by these correlated non-demographic traits,” the report says. “These collective insights underscore the significant influence of non-demographic but demographically correlated appearance factors in shaping recognition performance.”
Article Topics
accuracy | biometric bias | biometric data quality | biometric testing | biometrics | demographic fairness | facial recognition | Idiap | IEEE | skin reflectance | skin tone scale






Comments