IBM plans massive data set to improve facial recognition and reduce AI bias
IBM is planning to make the largest facial attribute and identity training set in the world, with more than a million images, available this fall to help improve the training of artificial intelligence facial recognition systems and reduce bias in algorithms.
The dataset, which IBM says in a blog post is five times the size of the largest one currently available, will be annotated with attribute and identity information, with images drawn from different countries using Flickr geo-tags, and sample selection bias mitigated with active learning tools. While currently available datasets include attributes, such as hair color, or tags identifying that multiple images are of the same person, the new set from IBM will include both. A dataset with 36,000 facial images evenly distributed among ethnicities, genders, and ages will also be released specifically to help identify and address bias.
“As the adoption of AI increases, the issue of preventing bias from entering into AI systems is rising to the judgement, intuition and expertise. The power of advanced innovations, like AI, lies in their ability to augment, not replace, human decision-making. It is therefore critical that any organization using AI — including visual recognition or video analysis capabilities — train the teams working with it to understand bias, including implicit and unconscious bias, monitor for it, and know how to address it.”
IBM showed earlier this year that the error rate of its Watson Visual Recognition service for facial analysis has been decreased nearly ten-fold, according to the post.
IBM is one of the facial recognition providers whose algorithms were tested by M.I.T. Media Lab Researcher Joy Buolamwini when she found major differences in error rates between people of different populations earlier this year.
Microsoft, another of the leading facial recognition providers with algorithms demonstrating bias in the same test, just announced a dramatic improvement in its facial recognition algorithm’s ability to recognize the gender of people with darker skin tones, as it attempts to deal with the same issue.