ImageNet database blurs images used for facial recognition training in privacy effort
The team behind ImageNet has announced that it has blurred the faces in its 1.5 million-image database, which has been used to train many facial recognition algorithms, writes Wired. The move comes amid ongoing concerns about bias and user privacy as these datasets are being used by a growing number of private and government entities to build their deep learning face biometric systems.
The blurring project, which is being conducted with help from Amazon’s Rekognition AI solution, was also part of an experiment in which the team seeks to determine whether the AI deep learning program can obscure faces without changing the recognizability of objects. ImageNet then contracted Mechanical Turk to make any additional adjustments to image selections. Of the vast database, approximately 243,198 images required facial blurring.
“We were concerned about the issue of privacy,” says Olga Russakovsky, ImageNet team member and Princeton University assistant professor.
ImageNet found that the blurring did not affect the program’s object recognition capabilities. The study’s result is a promising one as this will allow future programs and databases to maintain efficiency and accuracy without breaching individual privacy. The project follows ImageNet’s 2019 effort called ExcavatingAI, which helped purge derogatory terminology from its database to make facial recognition less biased. “We hope this proof-of-concept paves the way for more privacy-aware visual data collection practices in the field,” said Russakovsky.
Yet, despite the project’s initial success, concerns still exist regarding future datasets that are trained on blurred images.
“One important problem to consider is what happens when you deploy a model that was trained on a face-blurred data set,” Russakovsky says.
MIT Research Scientist Aleksander Madry studied the limitations of ImageNet, and has similar concerns. “Biases in data can be very subtle while having significant consequences,” he says. “That’s what makes thinking about robustness and fairness in the context of machine learning so tricky.”