MS Celeb and other facial biometrics datasets taken down
Several public facial recognition data sets have been deleted, including a Microsoft database of 10 million faces which is reported to have been the largest dataset in the world for biometric research and training in the world, the Financial Times reports.
The MS Celeb database was published in 2016, and has been used by a wide range of facial recognition researchers, including from militaries and high-profile biometrics companies like SenseTime and Megvii. Images of nearly 100,000 individuals scraped from the internet using search engines and videos under Creative Commons license terms, but consent was not sought from the individuals pictured.
“The site was intended for academic purposes,” Microsoft said in a statement. “It was run by an employee that is no longer with Microsoft and has since been removed.”
Data sets hosted by Stanford and Duke Universities have also been taken down, according to FT, which reported on them and the Microsoft dataset in April. The Duke MTMC surveillance data set, and Stanford’s Brainwash dataset, taken from a livestreaming camera in a San Francisco café, have both been taken offline. Duke did not respond to FT’s request for comment, while Stanford said one of the authors of a study Brainwash was used for requested the dataset’s removal.
The Megapixels project by researcher Adam Harvey documented all three datasets, along with the UnConstrained College Students (UCCS) dataset taken at the University of Colorado, and Oxford Town Centre Dataset. The UCCS dataset has been temporarily taken down because metadata was exposed in the FT article, while the Town Centre dataset remains active, according to the site. Harvey says Microsoft exploited the notion of celebrity, and included people who were vocal opponents of the technology’s development in its dataset.
The professor who made the UCCS dataset available says that he waited five years from when the images were collected to protect the privacy of those pictured, but has faced criticism from a University of Denver law professor, the Denver Post reports.
Use of the MS Celeb dataset has been cited in research papers by numerous facial recognition companies, including Microsoft itself.
“It’s indicative of Microsoft’s inability to hold their own researchers to integrity and probity that this was not torpedoed before it left the building,” technology writer Adam Greenfield, who was included in the MS Celeb dataset, told FT. “To me, it is indicative of a profound misunderstanding of what privacy is.”
Microsoft may also have violated GDPR by leaving the dataset up after the privacy regulation went into effect, FT reports.