Facebook wants to build trust in AI with its Casual Conversations dataset
Taking an active rather than reactive position on building trustworthy AI, Facebook has opened a new dataset to algorithm developers globally.
Casual Conversations is a collection of 45,000 videos of people chatting. The subjects are of various ages and skin tones and three gender choices. Lighting conditions also vary markedly.
The dataset, something that is reused from a Facebook deepfake research project, is intended to be a reality check for developers who want to root out age, race and gender bias from their computer vision and/or audio products.
Including voices is expected to help minimize biases based in audio applications, too.
It comes at a time when democratically elected governments and an increasing number of large businesses are trying to figure out how best to win over popular opinion about a topic that causes most eyes to glaze over in the details. Other eyes open wide with concern that AI will be used in unethical or dangerous ways.
Too often, proponents and vendors give lip service to the most important factor in AI’s future — its trustworthiness.
Facebook, the company and app that people love to hate to use every available moment, perhaps knows intimately that trust in technology that touches people’s personal lives cannot be taken for granted.
Everyone — all 3,011 people — talking in Casual Conversations were asked their age and gender rather than have researchers or software guess. That certainty makes the dataset considerably valuable to developers.
For gender, they could only choose male, female or other, something that Facebook almost apologizes for. The company explicitly points out it knows that that is “insufficient.” The dataset is a “good, bold first step forward,” and will be expanded over time to include other gender identities.
The company said it believes Casual Conversations is unique in that it is open sourced, includes paid actors who chose to participate and gets the gender and age information from participants.
Apparent skin tone for each participant was assigned by trained annotators, according to Facebook, based on the Fitzpatrick classification tool. The variable ambient lighting was tagged as well, to measure how skin tones look under less-clinical conditions.