Face biometrics’ limitations with profile images could help deepfake detection
Limitations to the software used in building deepfakes means that many of them are not very good at recreating profile views, according to a new analysis by avatar creator Metaphysic.
The company shared its finding in an article exploring some of the vulnerabilities of face biometrics in relation to video deepfakes.
During its testing, Metaphysic artists worked with Bob Doyle, host of a YouTube show on face swapping and deepfake technology, who used DeepFaceLive (live-streaming DeepFaceLab software) to change his appearance to that of some celebrities.
Most of the test results seem to indicate recreations were quite convincing even at fairly acute angles. However, when the facial angle hit a full 90 degrees, the image became distorted revealing the algorithm’s vulnerability.
“It’s evident also that Bob’s real profile lineaments entirely survive the deepfake process for all these models,” Metaphysic writes.
According to AI insiders, these limitations come from the fact that Facial Alignment Network, the software typically used for estimating facial poses in images in deepfakes, does not work reliably at acute angles.
“In fact, most 2D-based facial alignments algorithms assign only 50-60% of the number of landmarks from a front-on face view to a profile view,” according to the article.
Further, research suggests that 2D-alignment packages used in deepfake generation consider a profile view to be 50 percent hidden. This angle not only hinders recognition but also negatively affects accurate training and subsequent face synthesis.
“For this reason, in spite of manual intervention and various clean-up processes that can be used, sheer profile shots are likely to be less temporally consistent than most frames in extracted videos.”
The Metaphysic analysis also mentions a general lack of availability of profile shots to train deepfakes.
“Photographers will fight a crowd to escape them; picture editors can’t sell them (or can’t sell them as well as a ‘real’ photo), and they are in general charmless and prosaic representations of us that many of us literally would barely even recognize as ourselves.”
Case in point: The most convincing deepfakes generated to date are those of movie stars, who have hundreds of hours of footage and innumerable profile shots available for AI training.
Liveness detection and lateral views
Despite this, Metaphysic argued, many biometric software firms do not ask users to make 90-degree turns from the camera as part of their liveness detection verification process.
Sensity is one such company, which was interviewed by Metaphysic as part of the research.
“Lateral views of people’s faces, when used as a form of identity verification, may indeed provide some additional protection against deepfakes,” Sensity’s CEO Giorgio Patrini told Metaphysic.
“As pointed out, the lack of widely available profile view data makes the training of deepfake detectors very challenging.”
At the same time, Patrini agreed that prompting profile views as an anti-deepfake measure during video conferencing calls may work as an anti-spoofing method.
“Indeed, one tip for performing deepfake detection ‘by eye’ today is to check whether one can spot face artifacts or flickering while a person is turning completely to their side — where it’s more likely that a face landmarks detector would have failed.”
According to the Metaphysic article, another way of creating a critical situation for a deepfake model would be to ask a video caller to wave their hands in front of their face.
“[The model] is likely to demonstrate poor latency and quality of superimposition over the deepfaked face.”
Synthesizing profile data
In the absence of enough real-world data for training deepfakes, researchers have turned to synthetic data.
Metaphysic mentioned a research paper called Dual-Generator Face Reenactment, released by the University of Taipei earlier this year.
Noticeably, the majority of the paper’s accompanying examples stop at around 80 degrees, with very few depicting profile images at 90 degrees.
“It’s just a few degrees, but it seems to make all the difference, and getting there reliably and with authenticity would be no minor milestone for a live deepfake streaming system, or the models that power it,” Metaphysic writes.
An improvement on this technology Nvidia’s InstantNeRF, but according to Metaphysic “resolution, expression accuracy and mobility remain major challenges to high-resolution inference.”
The Metaphysic analysis concludes by mentioning the recent FBI warning about potential live deepfake fraud and how the research community and its efforts are hindered by the fact that they cannot interact with the material that they’re investigating.
There are a growing number of solutions emerging that could be applied as a security layer in video calls,” the article reads.
These include measuring monitor illumination, embedding facets of a known and trusted video into a detection system, and comparing potential deepfaked video content against known biometric traits, among other approaches.