Report questions ethics of image collection for AI facial biometric training datasets
Facial images used in datasets to train AI systems such as facial recognition are often collected without permission from the person who owns the image or the person who appears in it, according to a report by NBC News.
“This is the dirty little secret of AI training sets. Researchers often just grab whatever images are available in the wild,” NYU School of Law professor Jason Schultz told NBC News.
A new dataset recently released by IBM to help reduce bias in facial recognition and other algorithms includes images that were obtained and used without permission from the individuals in the photographs, or the photographers who posted them online, according to the report. IBM Fellow and Manager of AI Tech Dr. John Smith told Biometric Update on the release of the “Diversity in Faces” dataset that it represents one of the first concrete actions towards improving the fairness and accuracy of facial recognition algorithms. The images were collected from Flickr before annotation, and Smith told NBC News that the company is committed to protecting individual privacy, and will cooperate with requests for URLs to be removed from the dataset.
NB News reports that getting photos removed is almost impossible, however. IBM has not shared a pubic list of photo sources, so NBC created a search tool to help Flickr users determine if their images are included.
AI Now Institute Co-director Meredith Whittaker says the internet ecosystem was different when people consented to share the photos. “Now they are being unwillingly or unknowingly cast in the training of systems that could potentially be used in oppressive ways against their communities.”
NBC points out that historically, researchers paid individuals to have their data collected, after obtaining signed consent. P. Jonathon Phillips a dataset collector with NIST, says that the development of the internet allowed researchers to collect images far more efficiently, starting with commonly-photographed individuals such as celebrities and sports stars. Social media enabled the collection of facial images on an even greater scale. Many Flickr images are published under Creative Commons licenses, and academic researchers have limited responsibilities in how they source images due to the non-commercial nature of their work.
Diversity in Faces is also intended for academic use, rather than improving IBM’s commercial offerings, according to the company. NBC says that assertion contradicts IBM’s admission on the dataset’s release that it was in response to research by MIT’s Joy Buolamwini, but Smith told Biometric Update that the purpose of the dataset is to provide insights on the quality of datasets and questions like “(h)ow can we ensure that face image data is sufficiently diverse?” This may suggest that IBM is using academic insights to improve other, private datasets.
Kairos Founder and former CEO Brian Brackeen called the practice of algorithms being developed by academic researchers and later used in commercial offerings “the money laundering of facial recognition. You are laundering the IP and privacy rights out of the faces,” he says. NBC reports that IBM says Diversity in Faces will not be used in this way.
NBC reports that some photographers with images in the dataset are happy to see their work used to improve AI accuracy, but also that IBM deleted only four from among more than a thousand images that one individual asked the company to delete all of.
GDPR and Illinois’ BIPA apply data protections that could make companies sharing photos or biometric data liable to penalties, but the legal position of such a claim has yet to be tested.
The report comes as the ethical implications of AI and facial recognition are increasingly a subject of public debate, one that biometrics providers must heed and participate in, or risk the long-term viability of the industry.
“You’ve really got a rock-and-a-hard-place situation happening here,” Northeastern University Law and Computer Science Professor Woody Hartzog tells NBC. “Facial recognition can be incredibly harmful when it’s inaccurate and incredibly oppressive the more accurate it gets.”
artificial intelligence | biometrics | dataset | ethics | face photo | facial recognition | training