Specificity crucial for useful biometrics tests, training with synthetic data: EAB panel
Synthetic data may address privacy issues with biometric algorithm training, but it must be used carefully in facial recognition training anyway, Youverse Co-founder and CEO Pedro Torres said during a panel of influential stakeholders brought together by the EAB.
The EAB’s Research Projects Conference 2023 concluded with a talk on “Advancing Biometric Technology and ID Management in the EU: Exploring the Need for Testbeds and Sandboxes.” The event took place over three days last week, and featured a series of presentations on advanced biometrics research projects from around Europe.
The discussion on testbeds and sandboxes was chaired by Javier Galbally of eu-LISA and Fernando Alonso-Fernandez of Halmstad University. Panelists were William Graves of the U.S. Department of Homeland Security, Patrick Grother of the National Institute of Standards and Technology (NIST) and Torres.
Yoonik rebranded as Youverse earlier this year.
Testbeds and sandboxes will be important for the development of biometric technology in the EU, Galbally said, as the EU advances its intersecting agendas for digital identity, AI and digital public administration. Technologies for these projects must be developed and tested in ways compliant with data privacy and other regulations.
Graves and Grother contrasted the kinds of biometrics testing they do in the introduction, with the former carrying out tests in secret, which are not published.
Grother recounted NIST’s experiences with changing the frequency of testing and how often developers can submit algorithms, and the uncertainty of how many participants there will in a given test. He also discussed the relative merits and greater flexibility with ongoing testing, compared to occasional tests.
Graves spoke about the role of NIST testing in U.S. government biometrics procurement positions, but also the need to “run your own data against these algorithms.” He recounted how the U.S. Department of Defense uses a biometric algorithm which has not ranked among the leaders in NIST testing, but has been found highly effective in matching the data the DoD holds.
Because NIST uses data from the Department of Homeland Security, that agency finds a stronger correlation between NIST test results and the most effective algorithms in internal testing.
Torres spoke about the advantages NIST testing gives to participating biometrics vendors, and also the importance of different tests for different, specific use cases.
The discussion also dealt with how and why NIST decided to evaluate presentation attack detection and age estimation, and the kind of biometrics testing carried out by Frontex. Those tests, which was referred to as “complimentary,” are for biometric capture gear, rather than algorithms.
Where to get the data needed for testing is a challenge for many companies, Torres says. Privacy regulations, a paucity of consented data, and therefore cost are problems. Youverse augments the data it is able to acquire to test for occluded images and other differences.
Collaboration on making datasets available could benefit the entire biometrics industry, Torres suggests.
Another option, one used by Youverse, is the use of synthetic data to avoid some of the difficulties with getting good, ethically-sourced images. Because the data is created by an algorithm, however, it is also subject to potential biases, and is not a replacement for real data.
Grother broke down the kinds of tasks synthetic data is relatively well, and relatively poorly suited to. Do not expect to see NIST using synthetic data any time soon, but he notes its potential value for developers performing internal speed tests.