Dataset generating model boosts biometric performance and privacy, researchers say
It is possible to create synthetic samples for facial recognition algorithms that simultaneously maintain code performance and cut privacy leakage, according to an international team of researchers.
The scientists created a two-stage framework they have named FaceMAE because it uses masked autoencoders, or MAEs. It is divided in two – the biometric training and deployment stages. Their work has been published, but not reviewed.
The headline here is that as it generates faces for use as training data, FaceMAE considers face privacy and recognition performance simultaneously, report researchers from the National University of Singapore; University of Edinburgh, Scotland; Tsinghua University, China; and InsightFace, an open-source two- and three-dimensional face analysis library.
The results are encouraging. FaceMAE cuts privacy leakage by about 20 percent. It reduced the error rate by 50 percent compared to the next-best performing method when it was trained on reconstructed images from three-quarter masked faces of the Casia-WebFace dataset.
Synthetic face biometric data generated for algorithm training is already drawing millions in investment, so a method achieving similar privacy protection with better accuracy would be in demand.
FaceMAE’s leak risk was measured based on face retrieval between reconstructed and original datasets. The team’s experiments indicate that the identities of reconstructed images are difficult to retrieve.
A technical summary of the paper can be found on the machine-learning news site Marktechpost.
biometrics | biometrics research | data privacy | dataset | facial recognition | synthetic data | synthetic faces