Deepfake detection approaches from heart rate to lightweight considered in EAB workshop
Deepfake detection was one of the main topics investigated in the recent workshop from the European Association for Biometrics on face manipulation. Face morphing and synthetic identities were the other main threat types discussed during the event.
The ‘Digital Face Manipulation and Detection’ handbook was used as a guideline by the EAB in organizing the workshop.
In addition to discussion about the social and ethical implications of deepfakes, a series of presentations suggested ways to identify them. The suggestions show several promising research paths, but also some common challenges and limitations.
Deepfakes threat review
The workshop began with a pair of presentations on face manipulation and its effects on biometric systems.
The deepfake focus began in earnest with Pavel Korshunov of Idiap reviewing the threat they pose to both people and automated systems.
Korshunov gave examples of recent deepfakes and reviewed the databases of deepfakes available to scientists working on their detection.
He looked into claims about how people perceive deepfakes, with a range of deepfake videos from those presenting people with “melted faces” to very difficult-to-spot fakes. The percentage of people able to correctly identify deepfakes in the ‘very easy’ category was close to 100 percent, but even for the ‘easy’ category, more than 20 percent of those asked if the videos were fake were confidant in the wrong answer.
For the ‘very difficult’ category, nearly a quarter gave the right answer with confidence. Roughly 7 out of ten respondents were both confidant and wrong.
Real videos were mostly identified as such, but not as consistently as with the very low-quality deepfakes.
Korshunov then reviewed available algorithms, and found that the training dataset matters more than the particular model. Two models trained on Celeb-DF utterly failed to detect deepfake videos. The same models were more successful when trained with Google, but failed to detect many deepfakes in the ‘very easy’ category.
He then moved on to how generalization, a well-established problem in deepfake detection, can be improved. Data farming has shown some promise, in this regard, as does ‘few-shot training’ and training for attribution.
Julian Fierrez of UAM suggested deepfake detection based on heart rate estimation. Remote heart rate estimation has advanced significantly in recent years, he says, due to research for several different applications.
The DeepFakesON-Phys model was developed by adapting and retraining existing models, combining analysis of the subject’s appearance and motion. A 99.9 percent area under curve and frame level accuracy rate of 98.7 percent suggests this method indicates this method outperforms even recent state-of-the-art approaches.
Of course, if deepfake production techniques start to take into account physiological data related to heart rate or blood flow, the technique will become less effective.
Abhijit Das of BITS Pilani presented 3D convolutional neural network architectures and attention mechanisms for detection.
The focus of most state-of-the-art fake detection techniques on spatial information leaves out a valuable clue, Das says. Considering different attention mechanisms, Das’ team found that adding “non-local blocks” increased effectiveness.
Overall, the method is very promising, Das says, but more work is needed on understanding and integrating attention mechanisms, as well as detecting deepfakes made with cross-manipulation techniques.
Huy Nguyen of NII showed the possibility of using capsule-forensics networks for detecting deepfakes. He starts out by pointing out the dramatic increase in resources needed as CNNs performance is improved by expanding their depth, width, size, or even number in use.
In capsule networks, each capsule is a CNN which learns a particular representation, with agreement among the capsules indicating the authenticity of the input image.
The original design was not effective for deepfake detection, but Nguyen and his fellow researchers architected a capsule network for forensic applications with dynamic routing. The network then began to identify manipulated regions in videos.
Testing of ‘light’ and ‘full’ capsule models, with 3 and 10 modules, respectively, shows that statistical pooling improves detection accuracy while reducing the number of parameters used.
While generalization remains a challenge, and dealing with low-quality images or explaining the results is difficult, but the accuracy level arrived at suggests that light-weight models can still be effective for detecting deepfakes.
Liming Jiang of NTU delivered a talk on the DeeperForensics Dataset, which has been composed of 60,000 videos, one fake for every five real samples.
The dataset holds several advantages, in addition to the size. The subjects have all consented, capture is controlled, and the perturbations presented are mixed, and present in higher numbers.
Jiang also presented the results of the DeeperForensics Challenge 2020, which he was an organizer of. Several submissions showed promise in detecting previously unseen deepfakes.
The use of multiple data modalities for deepfake detection was presented by Edward J. Delp of Purdue.
His method involves face detection and cropping with a multi-task cascaded convolutional neural network, feature extraction and then automatic face weighting. A gated recurrent unit (GRU) processes the extracted features with logits predicted and weighted aggregations. An auxiliary network identical to the main one looked over its shoulder to estimate errors.
Like several of the other methods above, this resulted in accuracy above 90 percent, and much higher than compared approaches.
Delp also presented a proposed method for detecting synthesized audio by using spectrograms to visualize the frequency magnitudes of the speech signal over time for analysis with a CNN.
Finally, methods for matching the phonemes (sound units) and visemes (lip movements) in a video, and for comparing the emotion shown in the visual and audio portions of a video were considered.
The problem of generalization seems to loom largest in the community, particularly given the likelihood that the most threatening deepfakes will be novel. With several teams of international researchers continuing to pursue diverse approaches, there does seem to be some room for optimism that it will soon be possible to out deepfakes as soon as they are produced.