Research indicates improved skin tone calibration can raise face biometrics accuracy
Skin tone is not just Black and white. It is also not quite what camera calibration systems represent it as, which appears to be a contributing factor in demographic differentials or bias within face biometric systems.
Yevgeniy Sirotin, technical director of the Identity and Data Sciences Lab at The Maryland Test Facility, presented these findings in a presentation at last week’s IFPC 2022 event.
The presentation ‘Assessing variation in human skin tone to inform face recognition system design’ is based on research conducted in collaboration with Arun Vemury, who is the lead at the Biometric and Identity Technology Center within the Department of Homeland Service’s Science and Technology Directorate.
At the 2021 Biometric Technology Rally, the top performing systems identified more than 98 percent of people in each demographic group. The median system, however, failed to reach the targeted 95 percent threshold for accuracy in identifying volunteers with darker skin. The median system correctly identified 97 percent of people with lighter skin, but only 93 percent of those with darker skin.
The system with the worst performance was accurate for people with dark skin 10 percent less of the time.
MdTF began measuring facial skin tone about three years ago using a calibrated device, a DSM III Colormeter from Cortex Technology.
Sirotin explained how those measurements work, and how it is normalized for human perception. Most people’s skin is measured with lightness values between about 23 and 66.
Race is often self-reported, so the researchers examined the relationship between lightness of skin tone and self-reported race. While people identifying as white tend to have skin lightness of 50 or higher, people identifying as Black or African-American ranged broadly, with many over 50.
Sirotin and Vemury found that rank one mated scores tend to be lower for people with darker skin, but the difference is not seen among people who self-report as white. Sirotin notes that the difference may come from algorithms, but could also be due to poor image quality.
He delves into how cameras are calibrated for color. One popular tool, for instance, provides 24 or 48 sample colors, two or five of which (respectively) are labeled as being for human skin tones. In the latter case, two of the five do not fit into the cluster of lightness values represented among the thousands of volunteers in the Biometrics Rally. Even beyond this strange alignment, the coverage of the calibration tool is uneven.
Google’s Monk Scale was similarly found to include several skin tones that are too light, and another that is too dark, to represent real people.
Sirotin suggests that a data-driven approach may yield better calibration tools.
Perhaps unsurprisingly, given the above, a series of images of a single person taken on the same day in the same lighting, but with different biometric sensors, shows lightness measured anywhere between 30 and 71.
Facial recognition systems engineered to take quality images for people across the full range of skin tones, perhaps with the help of better color targets, will improve their effectiveness, Sirotin concludes.
Crayola, it turns out, has done a solid job representing actual skin-tones in its crayons. The engineering task, then, is hardly insurmountable.
Article Topics
accuracy | biometrics | biometrics research | face biometrics | International Face Performance Conference (IFPC) | Maryland Test Facility (MdTF) | skin tone scale
Comments