FB pixel

Is deepfake detection software ready for voice-enabled AI agents?

Paper finds voice agents generated with GPT 4o able to perform most common scams
Is deepfake detection software ready for voice-enabled AI agents?
 

OpenAI’s release of its real-time voice API has raised questions about how AI biometric voice technology might be used to supercharge phone scams.

Writing on Medium, computer scientist Daniel Kang notes that while AI voice applications have potentially useful applications such as voice-enabled autonomous customer service, “as with many AI capabilities, voice-enabled agents have the potential for dual-use.”

Anyone with a phone knows how common phone scams are these days. Kang notes that, every year, they target up to 17.6 million Americans and cause up to $40 billion in damage.

Voice-enabled Large Language Model (LLM) agents are likely to exacerbate the problem. A paper submitted to arXiv and credited to Kang, Dylan Bowman and Richard Fang says it shows how “voice-enabled AI agents can perform the actions necessary to perform common scams.”

The researchers chose common scams collected by the government and created voice-enabled agents with directions to perform these scams. They used agents created using GPT-4o, a set of browser access tools via playwright, and scam specific instructions. The resulting AI voice agents were able to do what was necessary to conduct every common scam they tested.  The paper describes them as “highly capable,” with the ability to “react to changes in the environment, and retry based on faulty information from the victim.”

“To determine success, we manually confirmed if the end state was achieved on real applications/websites. For example, we used Bank of America for bank transfer scams and confirmed that money was actually transferred.”

The overall success rate across all scams was 36 percent. Rates for individual scams ranged from 20 to 60 percent. Scams required “a substantial number of actions, with the bank transfer scam taking 26 actions to complete. Complex scams took “up to 3 minutes to execute.”

“Our results,” the researchers say, “raise questions around the widespread deployment of voice-enabled AI agents.”

The researchers believe that the capabilities demonstrated by their AI agents are “a lower bound for future voice-assisted AI agents,” which are likely to improve as, among other things, less granular and “more ergonomic methods of interacting with web browsers” develop. Put differently, “better models, agent scaffolding, and prompts are likely to lead to even more capable and convincing scam agents in the future.”

As such, “results highlight the urgent need for future research in protecting potential victims from AI-powered scams.”

There are, however, potential solutions to the problem to be found in the biometrics and digital identity sector. Real-time AI voice detection is a feature of Pindrop’s Pulse Inspect product, which it says “can detect AI-generated speech in any digital audio file with 99 percent accuracy.” Its audio deepfake detection systems have figured in high-profile cases of political deepfake content.

Critics say the current set of deepfake detection tools is not reliable enough. University of California, Berkeley computer science professor Hany Farid has said that with AI voice deepfakes, “the bar is always moving higher. I can count on one hand the number of labs in the world that can do this in a reliable way.” Of publicly available deepfake detection tools, Hanid says “I wouldn’t use them. The stakes are too high not only for individual peoples’ livelihoods and reputations but also for the precedent that each case sets.”

Which is to say: most deepfake detection software is probably not ready for voice-enabled AI agents.

Yet research and development continues. Another recent paper on arXiv acknowledges that, “as the Deepfake Speech Detection task has emerged in recent years, there are not many survey papers proposed for this task. Additionally, existing surveys for the Deepfake Speech Detection task tend to summarize techniques used to construct a Deepfake Speech Detection system rather than providing a thorough analysis.”

The need prompted the researchers from Austria, Japan and Vietnam to conduct a survey and propose new solutions. The fight against audio deepfakes is not yet lost.

Related Posts

Article Topics

 |   |   |   |   |   |   |   | 

Latest Biometrics News

 

Municipal ID programs offer ID to undocumented people, and ICE wants their data

Amid the ongoing collapse of democratic norms in the U.S., it is easy to miss a nightmare scenario unfolding for…

 

Unissey levels-up biometric injection attack detection certification

Unissey’s face biometrics have been certified to substantial-level compliance with the European biometric injection attack detection (IAD) standard. Injection attacks…

 

Hey babe, check out my regulations: porn star, VerifyMy spice up UK Online Safety Act

It’s one thing when Christian moralists lobby for age assurance laws – but another thing entirely when the voices are…

 

Regula launches dedicated biometric morph attack detector

A new face morphing detector has been unveiled by Regula to defend against the significant security threat of passports and…

 

UK regulator fines 23andMe over massive genetic data breach

The U.K. Information Commissioner’s Office (ICO) has fined U.S.-based 23andMe £2.31 million for serious security failures that resulted in a…

 

Tonga reveals MOSIP and VS One World foundations of DPI success

Tonga launched its TongaPass digital ID and digital government portal this month. The government is now ramping up registration as…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Biometric Market Analysis

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events