Facial expression recognition’s bias problem is glaring
No doubt having watched the attention being paid to racial bias in biometric facial recognition, a group of researchers looked at facial expression recognition and found similar problems to the worst-performing algorithms.
The scientists found there are too few large data sets that both have diverse subjects and adequate expression labels. Expression recognition algorithms cannot be made fair and accurate training on this stock.
Facial expression recognition systems seek to identify emotions displayed on people’s faces, and are less bankable today than are more common facial recognition systems. There is broader inherent value — for identification alone — in face scanning than there is in understanding moods.
Scanning people to record their apparent emotions is likely to be more controversial than other biometrics, and even a hint of unfairness to the algorithms is likely to unnerve even the most disinterested people in a given public place.
Three researchers from the University of Cambridge in the UK and a fourth from Middle East Technical University in Turkey published their non-peer reviewed paper in July.
Facial expression recognition is a well-researched topic, according to the researchers, but not so expression recognition, which they describe as “scarce.” And there are “a large number” of data sets of faces with expression labels.
But “virtually none of these datasets have been acquired with consideration of fair distribution across the human population,” the team writes. Put another way, there has been almost no effort to collect images and videos in expression-related data sets that label ethnicity, age and gender.
AI cannot recognize a world it has never really seen.
In its proof of concept, the team used the RAF-DB and CelebA databases, which are described as well-known. Both identified emotions or affect as well as gender, age and ethnicity. The researchers also found both data sets are “large enough to enable the training and evaluation of state-of-the-art deep learning models.
The RAF-DB was described as a real-world set, having a spectrum of photographs of faces collected from the Internet. It is overwhelmingly populated by white people between 20 and 39 years old. From it were drawn 14,388 images, of which 11,512 were sampled for training.
This database was used to recognize seven categories of expressions: neutral, surprise, sadness, happiness, fear, disgust and anger.
CelebA, with 202,599 images of 10,177 people, had a smaller range of labeled expressions, and so was only used to train for smiling/not smiling, gender and age.
The researchers used three approaches to sort expressions: baseline, attribute-aware and disentangled.
As part of their results, they found that using data augmentation (which increase data diversity available for training models by using techniques including image cropping and flipping) improved the baseline model’s accuracy. It could not mitigate the bias, though.
Attribute-aware and disentanglement models were more accurate and fair than baseline when they were “fortified” by data augmentation. The disentangled approach was best for “mitigating demographic bias.”
Overall, bias mitigation is more “suitable” when there is an uneven attribute distribution or an imbalance in the quantity of subgroup data.