Biometric computer vision algorithms can’t be fair until they accept racial complexity — paper
A new research paper finds that those compiling and using facial recognition datasets take a convenient but error-producing shovel approach to representing racial diversity.
Not only that, but attempts to build computer vision data sets that might be considered racially “fair” are similarly ham-fisted. The result is that biometric data sets continue to use stereotypes and to sweep away ethnic outliers that cannot be easily categorized.
The paper, the product of a pair of Northeastern University scientists, has been published in the open-access Arxiv archive.
The way race is categorized influences who gets arrested based on AI-based identification systems and who sees which targeted ads online, for example.
But while these algorithms are created in developed nations, the work and principles are being exported to other regions where race is perceived fundamentally differently than it is in the West. The categories themselves, never mind racial nuances, do not translate to cultures globally.
Indeed, the paper’s authors find that racial labels used in computer vision data sets are “ill-defined, unstable temporally and geographically, and have a problematic history of scientific use.”
Racial categories used the way they are in data sets, they say, assume humanity can be divided into undifferentiated blocks, or stereotypes, without meaningful negative consequences.
And merely including varying percentages of race categories, perhaps to match the demographic makeup of an area, for instance, is inadequate to erase stereotypes, the authors write.
Portioning racial categories still provides inadequate equity because each category is still a fallible construct.
At the same time, ethnic groups that do not conform to the categories occupy the long tail of data sets, which scientists, programmers and algorithms typically ignore as “noise” in the data.
Although the creation of algorithms for digital systems obviously is recent, this dynamic — focusing on the what is easiest to quantify and analyze — has dogged scientists for centuries.
But simplistic categorization has been abandoned in every science field as computers were introduced. Accepting the complexity of the world has led to deeper practical understanding of the world.
Counterintuitively, facial recognition algorithms, which are digital offspring, have to recognize the value of complexity, something that meteorology, biology, geology and other pursuits saw last century.