Meta updates AI fairness dataset with international content
Meta has released a new open-source dataset, Casual Conversations v2, designed to broaden the range of biometric identifiers and to test fairness and inclusivity in algorithmic AI models.
In a release, Facebook’s parent touts the dataset as “a consent-driven, publicly available resource that enables researchers to better evaluate the fairness and robustness of certain types of AI models.” It consists of more than 25,000 videos of more than 5,000 paid participants in seven countries. Each perform scripted and unscripted monologues and reflect specific characteristics such as age, gender and physical disability in 11 self-provided and annotated categories.
For a dataset aimed at inclusion, the diversity of source locations is of particular importance. Meta’s earlier version of the dataset, from 2021, did not include videos of people from outside the United States. Version 2 includes videos recorded in Brazil, India, Indonesia, Mexico, Vietnam, the Philippines and the U.S. Subjects speak in their first and second languages. Meta reportedly plans to continue expanding the project’s geographical scope in subsequent versions.
Another key principle for Meta in collecting the Casual Conversations datasets is consent. In December, a Seattle judge rejected a motion by Amazon to dismiss a lawsuit accusing it of misusing biometric identifiers in the form of images it vacuumed from a Flickr dataset. Originally collected by IBM, that dataset was also intended to address bias in biometric systems. However, in his ruling, the judge said the scraped identifiers were so intrinsic to Amazon’s products that it was effectively “commercially disseminating the biometric data.”
A parallel case against Microsoft found it had also exploited biometric identifiers from the Flickr dataset without sharing that information in its products.
Casual Conversations v2, which will be available to Facebook teams and external users, is part of Meta’s stated push to make continued civil rights progress in regards to AI construction.