DHS selfie biometrics evaluation for remote IDV shows range in performance

Biometric face verification can be an effective tool for remote identity verification, but many systems still have some issues to work out, according to research from the U.S. Department of Homeland Security.
A webinar from DHS’ Science & Technology Directorate (S&T) shared results from the second stage of the Remote Identity Validation Technology Demonstration (RIVTD). RIVTD Track 2 evaluated the capability to match a selfie to the image included on the document.
The testing, carried out at the Maryland Test Facility (MdTF), shows that many systems have trouble extracting biometric templates from ID documents, commonly struggling with rotated face images. Among the more effective systems, face biometrics performance was consistent across smartphone types, degrees of control over the selfie and demographics, but setting thresholds based on FMR targets poses a challenge in most cases.
S&T launched its evaluations for Track 3, assessing the PAD or liveness detection capabilities of selfie biometrics, earlier this year.
Addressing an assessment gap
The analysis was performed to address a dearth in “independent, objective data on how systems performed,” Arun Vemury said in an introductory overview. The evaluations that have been performed and publicized make it “hard to do an apples-to-apples comparison across different systems and different implementations because there weren’t really a lot of great testing processes out there where everything was being validated against the same benchmark.”
Testing the effectiveness and fairness of the systems is difficult for the industry, he observes, in part because of the scale tests much reach to be effective.
John Howard reviewed the selfie biometric matching track, the data used and the system properties.
“One of the reasons this had a sort of elongated timeline as opposed to some of the other tests we’ve done is that in the summer of 2023 a dataset to evaluate match-to-document systems didn’t really exist,” he says. “No-one had a curated set of both individual ID documents and selfies from the same people.”
S&T tested remote identity validation with 1,633 volunteers, who each provided one controlled and one uncontrolled selfie image with each of three different late-model smartphones, over a pair of data collection sessions last May and September. The volunteers represent different genders, ages and ethnicities. An additional, longitudinal dataset of non-mated comparisons was evaluated to calculate false match rate (FMR).
Vendors sent in Linux-based docker containers with a server application, which were run offline to protect the privacy of the participants. There were 18 vendors that applied to participate, 16 of which were accepted, and 10 of which were viable for all of the metrics assessed. S&T believes those vendors, who are given aliases in the results, represent the state of the art in biometric matching.
Variable results and common problems
All 16 vendors were able to extract biometric templates from the selfies, and 13 were able to extract templates from at least half of the ID documents, which proved a much more difficult task. A dozen were able to successfully compare the templates from the live photos and the ID documents, but two of them had false non-match rates (FNMRs) for genuine pairs of above 90 percent.
Failure to extract rates (FTXR) for selfies was generally low – below 1 percent for 14 of 16 systems – though some variability was observed with the uncontrolled selfies. The phone used (Apple iPhone 14, Samsung Galaxy S22, Google Pixel 7) had minimal impact on extraction rates. For documents, FTXR was high for 3 outlier systems, while 6 of 16 had FTXR below 1 percent. The iPhones used tended to deliver lower FTXR than the Samsung smartphones used, and documents from different states had some variation.
When it came to matching, the median FNMR was around 0.2 percent.
With controlled selfies, FTXR were generally low across all demographics, Yevgeniy Sirotin explained. Encouragingly, all but four of the 48 combinations of biometric systems and smartphones met the 1 percent benchmark for all demographic groups, and median error rates were 0 percent. Certain combinations had notable error rate trends among some demographic groups.
Uncontrolled selfies were a little more challenging, with 41 of 48 combinations meeting the 1 percent threshold. Median error rate was still 0 percent, but more errors were observed with some combinations for males, people over 60 years old, and those with the darkest skin tone.
At the biometric matching stage, 23 of the 30 combinations met the 1 percent FNMR benchmark for controlled images, but the median FNMR ranged from 0 percent to 0.35 percent for Black participants. For uncontrolled images, 25 of 30 combinations met the benchmark, but it was volunteers in the 31-45 age group with the highest median FNMR, at 0.19 percent.
False match rate (FMR) was set at 1 in 10,000, in line with the incoming requirement for authentication in the updated NIST SP 800-63B.
S&T confirmed FMR with the longitudinal dataset, based on both random imposters and demographically matched imposters. Some leeway was given, based on the variability between datasets, but despite that, few of the systems were in the expected range. For random imposters, 4 of 14 had FMRs three times higher than expected or more, and 5 more had FMRs three times lower than expected. For demographically matched imposters, 10 were more permissive and 2 were more conservative, while only 2 were in the expected range.
“As a rule of thumb, we see that imposters of the same gender, race and similar age, actually increase false match rate by about a factor of 10,” Sirotin says. That means a target FMR ten times lower compared to with random imposters may be a good strategy when demographic data isn’t available.
Only one biometric system out of 16 met all benchmarks, though two more missed only with more conservative security (FMR) than expected. Another had a more permissive FMR, but otherwise met the targets.
Article Topics
biometric matching | biometric testing | biometrics | DHS | DHS S&T | face biometrics | Maryland Test Facility (MdTF) | Remote Identity Validation Technology Demonstration (RIVTD) | selfie biometrics
Comments