Researchers develop EdgeFace, a face recognition model for resource-constrained edge devices
Scientists have developed EdgeFace, a novel face recognition model optimized for use on devices with limited processing power and storage. This lightweight face recognition network is inspired by the hybrid architecture of EdgeNeXt, which combines the strengths of convolutional neural networks (CNNs) and transformers to perform accurate face recognition while conserving computational resources.
Typical facial recognition systems rely on deep neural networks that, despite their accuracy, demand extensive memory and processing capabilities, rendering them impractical for use on edge devices. In response to these challenges, there has been a shift towards designing more efficient neural networks for vision-related tasks, aiming to balance accuracy with the computational resources available.
However, with the introduction of newer models like vision transformers (ViTs), the technology shows a promising approach to enhancing face recognition by effectively capturing long-range interactions. The research paper titled “EdgeFace: Efficient Face Recognition Model for Edge Devices” incorporates techniques like the Low Rank Linear Module (LoRaLin) to achieve high performance on edge devices by minimizing the need for significant onboard memory.
Introducing EdgeFace face recognition model
Researchers have extended the existing EdgeNeXt architecture tailored for face recognition on next-generation edge devices. Several modifications were made to adapt the EdgeNeXt architecture with the goal of reducing parameters and computational costs (FLOPs) to make the model more efficient and lightweight.
The team chose EdgeNeXt because it outperformed other popular models like MobileViT and EdgeFormer in image recognition accuracy. EdgeNeXt also addressed the transformer cost for edge devices by using “transposed query and key attention feature maps,” enabling efficient computation across channels instead of spatial dimensions.
However, despite EdgeNeXt’s efficiency, its linear layers still contribute significantly to both computational cost and parameter size. The authors propose replacing traditional linear layers with the newly developed LoRaLin layers, which use small matrices to reduce the number of parameters and computations.
PyTorch class implementing a Low-Rank Linear layer (LoRaLin)
def __init__(self, in_feat, out_feat, gamma, bias):
rank = max(2, int(min(in_feat, out_feat) * gamma))
self.lin1 = nn.Linear(in_feat, rank, bias=False)
self.lin2 = nn.Linear(rank, out_feat, bias=bias)
def forward(self, input):
x = self.lin1(input)
x = self.lin2(x)
The PyTorch class for implementing a LoRaLin is designed to create a more parameter-efficient linear transformation by decomposing a traditional dense layer into two lower-rank matrices. This approach reduces the computational cost and the number of parameters required, making it suitable for edge devices.
The code creates an instance of the LoRaLin class by taking four arguments, the number of input features (in_feat), the number of output features (out_feat), hyperparameters controlling the rank of the low-rank matrices (gamma), and a boolean flag (bias). The next line calculates the rank of the low rank matrices used in the layer. It ensures the rank is at least two and compares it to the smaller dimension between input features, output features, controlled by the gamma hyperparameter.
The code calculates two linear layers, the first linear layer with (in_feat) input features, (rank) output features, and (bias=False) no bias term, while the second linear layer with (rank) input features, (out_feat) output features, and specified (bias) behavior.
The forward pass performs the actual computation when the layer is used in a neural network. The code applies the first linear layer to the input data, and the second linear layer to the output of the first layer to return the final output of the LoRaLin layer.
EdgeFace performance and accuracy on benchmark
The newly developed EdgeFace face recognition model achieves state-of-the-art accuracy on benchmark datasets like LFW, IJB-B, and IJB-C, even compared to larger and more complex models. According to the records, the model requires only 1.77 million parameters, making it very compact and suitable for edge devices.
“Our experiments show that our model is very efficient and also achieves competitive recognition accuracy compared to SOTA lightweight models. Among seven benchmarking datasets used in our evaluation, EdgeFace achieves the best recognition performance for four different datasets in each of the categories of models with 2-5 M parameters and < 2M parameters,” researchers highlight.
The code will be publicly available, allowing adopters to test and verify the model performance.