What are NIST evaluation tests for facial recognition algorithms?
The U.S. National Institute of Standards and Technology divided the Face Recognition Vendor Test (FRVT) into two separate sets of biometric evaluations in 2023 – the Face Recognition Technology Evaluation (FRTE) and the Face Analysis Technology Evaluation (FATE). This article will explore the different tracks for assessing facial recognition algorithms.
What is Face Recognition Technology Evaluation (FRTE)?
The Face Recognition Technology Evaluation (FRTE) program focuses on evaluating the identification and verification capabilities of face recognition technology. It is designed to assess the performance of these technologies in identifying individuals from images, ranging from 1:1 verification to 1:N identification scenarios.
Moreover, the FRTE program features specialized assessments, including Multimodal (Face + Iris) and Twins Demonstration, which are aimed at offering a comprehensive understanding of face recognition technology’s performance across various user groups.
The multimodal approach involves testing algorithms for their ability to match a person’s face and iris data against a large database of multiple modalities. The FRTE Twins Demonstration is a specialized track that addresses the challenge of distinguishing between identical twins using facial recognition technology.
The key metrics of the evaluation are the False Non-Match Rate (FNMR) and False Match Rate (FMR). FNMR measures the probability that the face recognition system will fail to match two images of the same person, while FMR is the measure of the likelihood that the system will incorrectly match an image with one from a different individual.
The equation provides the calculation of the FNMR at a particular threshold:
Where, T is the decision threshold, ui is the similarity score for the i-th comparison, N is the total number of genuine comparison trials, and H is the step function.
The step function returns 1 if the argument (ui – T) is greater than or equal to zero and returns 0 otherwise. The summation gives the number of true positive identifications, where the system correctly identifies pairs of genuine images as matches above the threshold. Thus the equation gives the proportion of genuine pairs that did not match, which is the FNMR at threshold T.
Similarly, the False Match Rate (FMR) is computed but uses a vector of scores from comparisons between images of different persons. The equation is:
Where, T is the threshold, vi are the imposter scores, N is the total number of imposter comparison trials, and H is the step function.
The calculator of threshold is crucial to determine the sensitivity of the system. The method to determine T is described using a quantile function of the imposter scores.
Where, Qv is the quantile function, FMRk values are set based on a desired range on a log scale.
What is Face Analysis Technology Evaluation (FATE)?
The Face Analysis Technology Evaluation (FATE) is a comprehensive facial analysis that extends beyond identification. It includes the assessment of software for morph detection, image quality analysis, presentation attack detection, and age estimation.
The FATE morph specifically targets the detection of morphed facial images, which are images that have been digitally manipulated to merge features from different faces into a single image. These attacks have the potential to compromise identity verification systems. The evaluation test measures the efficacy of the algorithms in detecting these morphing attacks.
The FATE image defect detection is an integral part of quality assessment that identifies and measures defects that may impact the recognition of facial images. During image quality analysis, the system evaluates different factors that affect image usability.
To determine whether a face presented in an image is genuine or a representation intended to deceive the system, presentation attack detection is utilized. Age estimation leverages biometric technology to determine a person’s age based on their facial features.
These performance assessments utilize two key metrics–the Attack Presentation Classification Error Rate (APCER) and the Bona Fide Classification Error Rate (BPCER).
The Attack Presentation Classification Error Rate (APCER) specifically measures the system’s accuracy in detecting and identifying presentation attacks, such as those involving manipulated or morphed images.
Where, M is the number of morphed images incorrectly classified as non-morphed, and Nm is the total number of morphed images presented to the system.
On the other hand, the Bona Fide Classification Error Rate (BPCER) measures how well the system avoids false alarms. It is defined by the following equation:
Where, B is the total number of bona fide (genuine, nonmorph) images incorrectly classified as morphed, and Nb is the total number of bona fide images presented to the system.
Who participates?
The FRTE and FATE tests welcome developers of facial recognition who wish to have their algorithms assessed for different aspects of the technology. For participation, companies must integrate their software with the C++ API that has been made public and put their software through a validation package before submission.
Evaluation by NIST is often seen as table stakes, if not made an explicit requirement, for many contracts in both the public and private sector.
Article Topics
biometric testing | biometrics | Face Analysis Technology Evaluation (FATE) | Face Recognition Technology Evaluation (FRTE) | facial recognition | NIST
Comments