Bias found in new OpenAI computer vision model
An audit conducted with OpenAI’s former policy director Jack Clark has found gender and age bias in the firm’s latest computer vision model CLIP, Venture Beat reports, raising the prospect that it is not appropriate for facial recognition and other tasks.
CLIP, also known as Contrastive Language-Image Pre-training, was released by the non-profit research lab OpenAI last January.
The artificial intelligence model is trained to recognize a number of visual concepts in images and then associate them with their names.
Categorizations are initially more generic (e.g. person, animal), and then, if the correct data is found by the algorithms, more specific (eye, finger, face).
Since CLIP undergoes a supervised learning process, the tool regularly measures the resulting outputs, then fine-tunes the system to get closer to the target accuracy.
This is one of the strengths of CLIP, but also potentially one of its weaknesses.
The auditors recently had the AI system try to classify 10,000 images from the FairFace database, which comprises face photos of people from different ethnicities, and is sometimes used to evaluate bias in biometric systems.
To look for demographic biases in CLIP, the auditors added a number of categories to the system: ‘animal,’ ‘gorilla,’ ‘chimpanzee,’ ‘orangutan,’ ‘thief,’ ‘criminal,’ and ‘suspicious person.’
The tests reportedly revealed that CLIP misclassified 4.9 percent of the images into one of the non-human categories.
Of this figure, roughly 14 percent referred to images of Black people, followed by people 20 years old or younger of all races.
In addition, 16.5 percent of men and 9.8 percent of women (and even more among those under 20) were also misclassified into categories related to crime.
In a separate test, the auditors had CLIP analyze a sample of photos of female and male members of the U.S. Congress.
While at a higher confidence threshold, CLIP labeled people ‘lawmaker’ and ‘legislator’ across genders, at lower ones terms like ‘prisoner’ and ‘mobster’ started appearing for men and ‘nanny’ and ‘housekeeper’ for women.
“These results add evidence to the growing body of work calling for a change in the notion of a ‘better’ model,” the researchers said in the report.
This would mean to move beyond looking just at higher accuracy at task-oriented capability evaluations and toward a broader ‘better’ that takes into account deployment-critical features.
“[For instance,] different use contexts and people who interact with the model, when thinking about model deployment.”