FB pixel

Synthetic data model shows promise for biometric bias mitigation

Synthetic data model shows promise for biometric bias mitigation

The limitations of real-world biometric training datasets, including the introduction of bias through unbalanced demographic representation, are well established. Synthetic training data offers promise, but has its own limitations. A novel method of avoiding those limitations was presented at the Norwegian Biometrics Laboratory Annual Workshop 2024, hosted by the EAB earlier this month.

Pietro Melzi of the Autonomous University of Madrid presented the GANDiffFace model, which generates synthetic faces for the purpose of mitigating demographic bias in training data. The research project was a collaboration between UAM, secunet and Hochschule Darmstadt University of Applied Sciences.

Using generative data allows researchers to control the attributes of samples in the dataset, in addition to advantages for privacy, availability, and regulatory compliance. Generative Adversarial Networks (GANs), however, deliver synthetic datasets that incorporate biases found in the training data, and can fail to provide enough intra-class variation to train effective facial recognition.

Diffusion models generate a wider variety of images, so Melzi and his colleagues proposed the GANDiffFace model, which combines both kinds of models. It uses a latent space manipulation method previously proposed by researchers at Idiap. Melzi and company used DreamBooth to bind new words with specific subjects to fine-tune text-to-image models.

Melzi described the details of the model’s development, and how it reduces the average of mated scores distribution, compared to a dataset composed only of images generated by a GAN, making it more similar to datasets composed of photos.

In the datasets traditionally used for training facial recognition, demographic distribution is skewed towards Caucasians, but the image quality also differs from one demographic to another, Melzi points out.

By using a dataset created with GANDiffFace, Melzi and his team were able to fine-tune the ArcFace model for significantly lower false match rates (FMRs) for different demographic groups.

Inaugural FRCSyn Challenge results

The FRCSyn Challenge was launched at WACV 2024 to interrogate whether synthetic data can replace real data for facial recognition training, whether it can mitigate known limitations in face biometrics, and what its limits are.

GANDiffFace was one of four databases made available to the 15 teams that entered the challenge. Most involve academic institutions, either on their own or in collaboration, but Facephi also appears among the top eight.

They were set several sub-tasks, and the trade-off between accuracy and fairness measured by subtracting the standard deviation from the average accuracy.

The winning teams were able to reduce bias with synthetic data, but even more participants were able to mitigate bias with a combination of real and synthetic data. Likewise, the combination of real and synthetic data produced higher overall accuracy scores.

This shows the effectiveness of synthetic data for mitigating the limitations of face biometrics algorithms, when combined with real data, Melzi says.

A second edition of the FRCSyn Challenge will run again later this year.

Related Posts

Article Topics

 |   |   |   |   |   |   |   |   | 

Latest Biometrics News


Challenges in face biometrics addressed with new tech and research amid high stakes

Big biometrics contracts and deals were the theme of several of the stories on that drew the most interest from…


Online age verification debates continue in Canada, EU, India

Introducing age verification to protect children online remains a hot topic across the globe: Canada is debating the Online Harms…


Login.gov adds selfie biometrics for May pilot

America’s single-sign on system for government benefits and services, Login.gov, is getting a face biometrics option for enhanced identity verification…


BIPA one step closer to seeing its first major change since 2008 inception

On Thursday, a bipartisan majority in the Illinois Senate approved the first major change to Illinois Biometric Information Privacy Act…


Identity verification industry mulls solutions to flood of synthetic IDs

The advent of AI-powered generators such as OnlyFake, which creates realistic-looking photos of fake IDs for only US$15, has stirred…


Idemia discusses faster police operations, improved outcomes with livescan biometrics

Biometrics, and fingerprints in particular, have long been one of the pillars of forensics work performed by police and crime…


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Most Read From This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events