Promise of knowledge distillation for debiasing biometric PAD discussed in EAB event
The innovative technique for training machine learning models known as ‘knowledge distillation’ is showing promise for mitigating bias in biometric anti-spoofing systems.
More than 130 attendees registered for the European Association for Biometrics (EAB) virtual launch talk on ‘Bias mitigation in anti-spoofing through knowledge distillation,’ presented by Unissey Computer Vision Engineer Idriss Mghabbar.
Knowledge distillation is a broadly applicable technique, Mghabbar noted at the beginning of the presentation, in which models are built as “teachers” to train other models; their “students.” Student models, as Unissey’s experiments have shown, can be trained to perform biometric presentation attack detection (PAD) with not only lower variation between the performance among different demographics, but also improve overall accuracy.
Unissey has developed its own passive liveness detection technique, and has been experimenting with knowledge distillation to mitigate gender and racial bias. The company’s biometric PAD technology was recently tested for compliance to ISO/IEC standards by French lab CLR.
Teacher models are trained with an unconstrained architecture to achieve the best possible performance, as Mghabbar notes “the distillation can only be as good as the teacher it’s using.” The student model incorporates restraints, and is therefore likely lighter. The student model is evaluated in its training process not only against the ground truth, but also the “soft targets” provided by the teacher model, Mghabbar explained.
He describes static and dynamic distillation, and recommended dynamic distillation (meaning with augmented samples).
One use case for this type of approach is bias mitigation.
For this use case, the teachers are specialist models “expert in their respective domains,” and the student models are “multi-domain” students, where each domain is a category, such as a particular ethnicity.
A generic model is used for fine-tuning the teaching model to make up for low number of samples in given domain. For each training on each sample, only the teacher model associated with the specific domain is activated. This enables both higher accuracy and faster training, Mghabbar says.
Having a balanced dataset remains therefore the most important element in reducing bias through knowledge distillation, because of the domain-specific data used in the teacher training process.
Mghabbar also talked about how to appropriately measure the progress of training to ensure both bias reduction and overall high biometric PAD accuracy.
Following the presentation, questions about teacher training and evaluation were fielded by Mghabbar, and he discussed the need to limit the number of domains used to avoid distillation reducing overall performance.