New biometrics standards for face-aware capture and image quality should help operational accuracy
Facial recognition performance in the field could soon be significantly improved by a pair of standards in development, as Patrick Grother, NIST biometric standards and testing lead, explained to a webinar audience of several hundred this week.
The ongoing development of the ISO/IEC 24358 face-aware capture specifications and ISO/IEC 29794-5 standard for face image quality were explained by Grother during day two of the International Face Performance Conference (IFPC) 2020.
There are a range of reasons why and situations in which these standards could be useful.
Improving image quality could help make presentation attack detection more widely applicable, Grother says, and a class of face-aware capture devices could also be programmed to upload images directly to issuing authorities to avoid potential risk of morphing spoofs.
Higher resolution and lower compression are also desirable for addressing demographic issues or human review.
The standards do not proscribe solutions, but draw on existing work, such as on pose estimation.
Ensuring high-quality images could be achieved by specifying a requirement for machine-vision cameras with high pixel depth or sensitivity, or through closed-loop exposure control, in which parameters like gain are adjusted on the sensor.
“In its entirety this kind of project is aimed at giving to face capture at least as much maturity as fingerprint capture has; dedicated objects; dedicated signal processing, and iris recognition.”
Both of those biometric modalities have purpose-built sensors, but as Grother points out, most facial recognition is still performed “with cameras that know nothing about what they’re looking at.”
Multiple faces or image distortion can result.
The face-aware capture standard would impose static image requirements, and specify certain capabilities for the camera subsystem, including face detection, range estimation, pose estimation and illumination control. Smartphones often include computer vision capabilities to help their users take better photos, which Grother says proves that imaging far beyond the capabilities of a cheap, off the shelf webcam is possible despite cost and form factor constraints.
Higher-definition imaging is also desirable to cut down on false positive matches, without trading that improvement off for more false negative results. Familial relationships, and twins in particular are beyond the capacity of most algorithms to differentiate between, and comparisons within large datasets drawn from a single region likewise show a persistence of false positives, in line with the “broad homogeneity” effect discussed earlier in the conference.
High enough resolution can reveal skin texture, which does not appear to be as strongly genetically linked as many other features, among other characteristics like tiny scars.
Other modalities have requirements, like illumination for iris biometrics, equivalents of which could significantly benefit face biometrics, Grother states.
When standards for passport photos were being established, Grother recounts that there was a push to include high enough resolution to compare skin texture, but ultimately the characteristic is not normally used at all.
“Face image quality assessment is not a trivial topic,” however, Grother cautions.
To have value in all enrollment situations, the standard must support blind assessment, such as when an individual first applies for a driver’s license or passport, and there is no image to compare them against. The standard’s working group has evaluated the use of canonical portraits as a baseline for desirable characteristics, and found
“Incorrect sample rejection rate,” which measures the rejection of matchable photographs, and “incorrect sample acceptance rate,” which seems to have high image quality but cannot be matched.
Quality assessment algorithms have been submitted by Rank One, Paravision, and several other organizations. The group then used facial recognition algorithms from the same companies, along with AnyVision, Imperial College and Innovatrics, and found that vendors predict their own genuine scores better than algorithms from other developers. The most effective comparisons were able to reject 1 percent of samples, and reduce by roughly a factor of six the incorrect sample acceptance rate, suggesting that the technique can work, though at present it requires a very specific set of conditions.