Business needs biometric protection from voice deepfakes, lawmakers want watermarks
Policy-makers may still be catching up to deepfakes, but biometric liveness detection developers are much further ahead, according to a recent online presentation from Pindrop.
VP of Research Elie Khoury explains that the company looks for both perceivable and imperceivable. An example of the former is the struggle some text to speech AI models have with fricatives. The text analysis, acoustic model and vocoders that make up text to speech models also tend to leave telltale signs behind which are not heard by the human ear, but give away their products’ source as generative AI. Khoury and Pindrop VP of Product, Research and Engineering Amit Gupta discussed the advantages of this kind of algorithmic analysis over relying on watermarks.
This is particularly important as the EU’s AI Act, the world’s most comprehensive regulation of algorithms and their products, does not ban deepfakes. Instead, it simply requires that they include a watermark, as explained by Sumsub AI Policy and Compliance Specialist Natalia Fritzen in a company blog post.
Fritzen notes that the effectiveness of the transparency methods stipulated in the AI Act has been questioned by observers, and fraudsters are unlikely to comply intentionally.
The kinds of artefacts may change over time, but as Pindrop’s model is specifically engineered to look for signs that the voice presented has been spoofed, while the attack model is engineered to fool humans, evidence of algorithmic involvement will still be detectable. As Director of Product and Engineering Sarosh Shahbuddin puts it, the models used to create deepfakes are optimized for how humans hear speech, not “how humans view speech in a spectrogram.”
In the future, Khoury notes, attackers may shift to an adversarial attack model to specifically target detection models like Pindrop’s, but the company is already working on methods of attacking adversarial attacks as well.
A poll on deepfake detection priorities shows that the overwhelming majority of webinar attendees are concerned primarily about voice deepfake attacks against call centers, rather than impersonation of executives, help desk employees or high net worth individuals.
Pindrop is also working with unnamed partners to thwart political disinformation, Shahbuddin says.
Some people use text-to-speech and similar tools to communicate with call centers despite disabilities. Senior Manager of Software Engineering Aarti Dongargaonkar explained Pindrop’s granular feature for excluding callers from voice authentication and liveness detection to avoid falsely flagging them as fraud risks.
Khoury says Pindrop’s model has proved over 99 percent accurate at detecting deepfakes from known generative AI tools in internal benchmarking. For unseen systems, he says, its accuracy is typically above 90 percent, referring to a pair of prominent recent third-party challenges. For false positives, he says customers have seen as few as 0.1 percent, and even then, have been triggered by phenomena like speech from a television which is on at a relatively high volume in the background.
Article Topics
adversarial attack | AI Act | biometric liveness detection | biometrics | deepfake detection | legislation | Pindrop | spoof detection | Sumsub
Comments