‘Voice clones can sound as real as human voices,’ says new research

The American science fiction writer Harlan Ellison’s award-winning story, “I Have No Mouth, and I Must Scream,” is about a malevolent supercomputer that wipes out humankind. The voice deepfake crisis isn’t quite at that level yet, but the title is broadly evocative of the deluge of synthetic voices that has engulfed the world. Audio streaming platform Spotify has removed 75 million spam tracks from its library over the past year alone.
As the ability to use AI to generate music and voice becomes ever easier, deepfakes and synthetic voices are also becoming harder to detect. In fact, according to new research from the Queen Mary University of London, the average listener can no longer distinguish between deepfake voices and those of real human beings.
“Recently, an intriguing effect was reported in AI-generated faces, where such face images were perceived as more human than images of real humans,” says the abstract. The research team decided to test whether a similar “hyperrealism effect” also exists for AI-generated voices.
It did so by comparing real human voices with two different types of synthetic voices generated with AI: cloned voices based on real recordings, and voices synthesized from ElevenLabs’ “Voice Design” large voice model, with no specific human counterpart. (Of the latter, the researchers note, “generic AI-generated voices of this particular style are currently used to generate novel vocal identities, and used for example as voice-overs for advertisements and online content videos, or to narrate audiobooks or podcasts.”)
Subjects were asked which voices sounded most realistic, dominant and trustworthy.
The study’s conclusion is that “voice clones can sound as real as human voices, making it difficult for listeners to distinguish between them.”
“Both types of AI-generated voices were evaluated as more dominant than human voices, with some AI-generated voices also being perceived as more trustworthy.” That said, results did not support the hyperrealism hypothesis, suggesting a possible difference in perception of fake faces and fake voices.
Voice AI ubiquitous in deepfake Tower of Babel
A news release quotes Dr. Nadine Lavan, a senior lecturer in psychology at Queen Mary University of London, who co-led the study. “AI-generated voices are all around us now,” she says. “We’ve all spoken to Alexa or Siri, or had our calls taken by automated customer service systems. Those things don’t quite sound like real human voices, but it was only a matter of time until AI technology began to produce naturalistic, human-sounding speech. Our study shows that this time has come, and we urgently need to understand how people perceive these realistic voices.”
Part of the urgency is the explosion of deepfake voice fraud, also called “vishing.” New research from Group IB looks at “the anatomy of a deepfake voice phishing attack.”
“Drawing on Group-IB’s experience in real‑world incidents and threat‑intelligence telemetry, this research highlights the sectors most at risk: finance, executive services, and remote‑work help desks,” says the report, noting that researchers used detection techniques such as acoustic fingerprinting and multimodal authentication to provide cybersecurity professionals with “a
layered defense strategy that blends AI‑powered anomaly analysis with robust employee awareness training.”
The research reveals that deepfake‑enabled fraud losses are forecast to reach 40 billion dollars by 2027. Asia‑Pacific is the current hotspot, with regional deepfake‑related fraud attempts surging 194 percent in 2024. Over 10 percent of surveyed financial institutions have suffered deepfake vishing attacks that exceeded 1 million dollars; average loss per case is approximately 600,000 dollars. And due to rapid money laundering, stolen funds are almost never recovered.
These attacks, says the report, “are not only financially damaging but emotionally manipulative – exploiting trust, authority, and familiarity to bypass human defenses.”
Humans, sadly, are easily manipulated, and have already shown a tendency to embrace vocal fakery at a high level. While Spotify has removed millions of deepfakes from its platform, it has not removed the catalog of The Velvet Sundown, an AI country-fried 1970s rock band created as a hoax, which brands itself as a “synthetic music project guided by human creative direction.” Its track, “Dust on the Wind,” has amassed over 3 million plays to date.
Article Topics
deepfake detection | deepfakes | ElevenLabs | generative AI | synthetic voice | voice biometrics




Comments