Researchers navigate facial recognition algorithm training with synthetic data
A new paper authored by a team from the Biometrics Security and Privacy group at the Idiap Research Institute explores “synthetic face dataset generation for responsible face recognition,” according to a LinkedIn post from Professor Sébastien Marcel, a senior researcher at Idiap.
The paper argues that facial recognition (FR) models trained on large-scale datasets come with privacy and ethical concerns, not to mention time constraints. “While data collection campaigns performed in laboratories can be made representative of the general demographics and performed with subjects’ consents, they are typically quite limited due to the large amount of effort they require,” it says. “Lately, the use of synthetic data to complement or replace genuine data for the training of FR models has been proposed. While promising results have been obtained, it still remains unclear if generative models can yield diverse enough data for such tasks.”
The team’s research tests a new method that aims to tackle some of theses issues by “developing physics-inspired algorithms that allow precise control on the sampling of synthetic identities and variations thereof.” In technical terms, the method is “inspired by the physical motion of soft particles subjected to stochastic Brownian forces, allowing us to sample identities distributions in a latent space under various constraints.” After its core equation, it has been christened the Langevin algorithm – allowing for “a dense packing of the spherical identities while keeping latent space-spread minimal.”
In simpler terms, the algorithm imagines face biometrics as particles suspended in a medium, and applies force to ensure that these identifying units are optimally spread out.
“With this in hand,” say the researchers, “we generate several face datasets and benchmark them by training FR models, showing that data generated with our method exceeds the performance of previously GAN-based datasets and achieves competitive performance with state-of-the-art diffusion-based synthetic datasets.”
Additional support for the Langevin project comes from the Center for Identification Technology Research (CITeR) and its affiliates, the European TReSPAsS-ETN, the Hasler foundation through the Responsible Face Recognition (SAFER) project and the Swiss Center for Biometrics Research and Testing.
The full paper is available here.
First synthetic child face database for face recognition
A separate paper from Da/sec Biometrics and Security Research Group at Germany’s Hochschule Darmstadt investigates child face recognition at scale. The authors also want to address issues of bias and privacy, which they say has not been given enough attention in the specific context of children, despite numerous potential applications for facial recognition systems for children. A given example is an automated process for recognizing victims in seized child sexual abuse material (CSAM), which is a growing problem, with more than 70 million CSAM videos and images obtained in 2019.
“Due to this immense amount of data, it is necessary to have automated systems which can identify the children in such material, necessitating effective face recognition systems,” the paper says. The authors present “a novel pipeline for creating a synthetic face database containing the same subjects both at adult age and also different child ages.” The paper says by combining GANS with face age progression (FAP) models, the team was able to create what it claims is “the first large-scale synthetic child face image database: HDA-SynChildFaces.”
The proposed processing pipeline enables what it calls a “controlled unbiased generation of child face images.” In simple terms, it shows a series of images predicting progressive changes to a face over time. The synthetically generated HDA-SynChildFaces database will be made available to researchers in the field of child face recognition, for whom it is “expected to provide a good basis for algorithm evaluation and training.”
The paper can be accessed via Frontiers in Signal Processing.
Article Topics
biometrics | biometrics research | face biometrics | facial recognition | Idiap | synthetic data | training
Comments