Texas university launching large synthetic face biometrics training dataset
Using synthetic data instead of real people’s faces to train facial recognition systems has been gaining ground among biometrics companies around the world. A university in Dallas, Texas now wants to create one of the largest balanced synthetic databases for facial recognition.
The Southern Methodist University (SMU) plans to generate a database of facial images from text descriptions with a high-performance computing platform specifically designed for AI called the Nvidia DGX SuperPOD. The goal is to tackle bias issues and other ethical conundrums that have been plaguing the facial recognition field by creating diverse images that can train artificial intelligence models.
The project is led by researcher Corey Clark and his team at the SMU’s Intelligent Systems and Bias Examination Lab (ISaBEL). Aside from impacting how facial recognition algorithms recognize race and gender, the synthetic database aims to solve the question of ethically collecting and using biometric data from real people, the university says.
“There are constraints in trying to create a real-world based dataset to train any artificial intelligence model,” says Clark, an assistant professor of computer science in the Lyle School of Engineering and deputy director for Research at SMU Guildhall. “To ethically source it you must solve challenges like consent, fairness, and legal compliance. Synthetic data, generated by the SuperPOD, removes those obstacles.”
The university is also planning to launch a bias certification program for evaluating companies’ AI systems and be used to develop future models specified to need.
SMU has been collaborating with Nvidia since 2021 when the company helped expand the university’s supercomputer memory capacity, leading to a 25-fold increase in the speed and efficiency of AI and machine learning. The university established its ISaBEL laboratory in September 2021 with Pangiam as its first industry partner.
Clark says that the massive number of images created for their datasets would not be possible without the SuperPOD.
“Facial recognition is here and not going away,” Clark says. “The demand for these larger training datasets is crucial for improving [facial recognition] systems so they provide equitable results. Through our methodology and use of the SuperPOD, we’re generating datasets not previously easy to obtain, and doing so quickly and ethically.”
Companies from Amazon to Innovatrics and IDVerse have been employing synthetic data as a solution to bias and privacy issues in biometric algorithm training. Experts, however, are warning it must be used carefully in facial recognition training as it can also be subject to potential biases.