Unleashing the power of liveness detection: A game-changer in the battle against deepfakes
By Vijay Balasubramaniyan, co-founder & CEO Pindrop
The generative AI revolution is here. With its ability to augment human intelligence and effort in a wide variety of fields including healthcare, robotics and creative endeavors, it is no wonder that the global market for generative AI is predicted to be $110 billion by 2030. However, this rapidly advancing technology also starts to seriously question whether the image, audio or video that you just experienced is real or an AI-generated deepfake. Deepfakes have the potential to cause significant harm by spreading misinformation, manipulating public opinion, and eroding trust.
One recent example is Senator Blumenthal’s opening remarks at the Senate hearing on AI, in which he began by speaking with his voice and eventually switched to a deepfake impersonation of his voice. It was used to highlight how deepfakes are becoming more sophisticated and harder to detect. Deepfakes also can upend the determination of remote identity either when someone opens or accesses an account. In either of these cases, biometrics have been used to successfully safeguard accounts from fraudulent financial transactions, fraudulent access to healthcare records and identity theft. Biometrics answers the question, incredibly conveniently and accurately, whether it is the right human opening or accessing the account. With deepfakes, biometrics need to be augmented to answer a precursor question, which is, is this even a human opening or accessing an account or a machine?
One of the most promising technologies for augmenting biometrics and combating deepfakes is liveness detection, a technique leveraging attributes that come naturally to humans but are hard for machines to replicate at scale over sustained periods.
Why is liveness detection so effective against deepfakes?
To understand liveness detection, we must first understand the spectrum of attacks that are trying to impersonate a human. The simplest of these attacks is a replay attack where you use someone’s image/voice/video, making alterations and additions to it and replaying it as a new image/voice/video. Nancy Pelosi’s slurred speech where the video was slowed down is an example of this and is often called a “cheapfake”. The next is a synthetic identity attack where a machine has created a human-like image/audio/video but no corresponding real human exists. These are now routinely used in both Frankenstein frauds and romance scams. Finally, we have true deepfakes where a particular targeted individual’s voice, audio or video is completely machine-generated. In some cases, the words they are saying are also machine-generated using a Large Language Model (LLM) like ChatGPT. In the most advanced cases, all of this happens in real-time. Drake’s “Heart on my sleeve” collaboration with Weeknd is an example of this and so is Joanna Stern getting through her bank account using a voice clone.
Liveness detection is not new as determining if an interaction is being performed by a human or a bot is already being done through technology in a variety of applications, such as ATMs, mobile banking, and online voting. Liveness detection to prevent deepfakes is a new area that is rapidly advancing. Liveness detection works on the basic premise that any deepfake generator creates artifacts and patterns which are distinctly different from natural human interactions. These patterns may not be detectable by humans but can be identified when analyzed by specially designed artificial intelligence tools. To provide an example of this, in audio, when a human says “Hello Paul”, their mouth is wide open at the end of the ‘o’ and shuts down when they say the ‘P’. The speed with which they are able to do this has human limitations. A machine-generated deepfake does not care about these human limitations. An example of this in action is that one of the fraudsters we have identified is called Giraffe Man, as a vocal tract analysis suggests that the person producing this speech can only do so with a 7-foot-long neck. Even the lowest fidelity audio channel has 8000 samples of human speech every second. This allows you to look for a multitude of anomalies every single second. The more seconds you have the larger the treasure trove of anomalies. Liveness is able to handle the range of deepfake attacks as it focuses on the uniqueness of speech produced by the perfection and imperfection of human vocal anatomy while looking to identify anomalies that are added by machines that are either replaying your voice or generating your voice without the backing of 10,000 years of human evolution.
Won’t deepfake engines get better and invalidate liveness detection?
In this AI security arms race, how does liveness detect new deepfake engines that it has previously never seen? For a real case study of this, see the ability of liveness to detect Meta’s brand new Voicebox engine with a 90% detection rate. This requires a particular brand of liveness architecture that takes advantage of the fact that deepfake engines are not one monolithic system and are created by a large set of components. For a new deepfake engine, typically one or two of these components change but that still means you can identify the acoustic artifacts left behind by the other components. Another aspect is deepfake engines that actively create audio perturbations to evade detection. The University of Waterloo paper is great work that highlighted the ability of deepfake engines to do this. While the popular press has inferred a 99% success rate for all deepfakes against all voice authentication systems, the paper in fact shows a wide range of success rates for different systems. The lowest success rate for an attack with 6 consecutive attempts being 9.55%. In fact, a sophisticated liveness detection system will be able to handle these perturbations as they are typically trained on a diverse range of spoofed audio along with extensive data augmentation covering a wide range of spectral modifications.
What are the applications of liveness detection?
There are many potential applications of liveness detection. Media organizations and fact-checking agencies can leverage this technology to enhance verification processes. This can have a profound impact on the trustworthiness of news and media, bolstering the integrity of information dissemination in an era where trust is easily eroded. By integrating liveness detection into popular social media platforms, users can be given the tools to verify the authenticity of content before sharing it with their networks. This not only protects individuals from inadvertently spreading misinformation but also promotes a culture of digital responsibility and accountability. Organizations should also take advantage of liveness detection within their account opening and transactional workflows. This would ensure malicious actors won’t be able to use a deepfake to gain access to a user’s account.
All in all, liveness detection represents a powerful tool in the ongoing battle against deepfakes. Its ability to detect fake content in real-time and its potential for widespread integration make it a game-changer in the fight against misinformation and manipulation. By empowering individuals, media, and organizations with the ability to verify the authenticity of audio and video, liveness detection can help restore trust in the digital landscape.
About the author
Vijay Balasubramaniyan is the co-founder & CEO of Pindrop.
DISCLAIMER: Biometric Update’s Industry Insights are submitted content. The views expressed in this post are that of the author, and don’t necessarily reflect the views of Biometric Update.