Security vendor says fast action is needed now on deepfake voice
A new research report from security analyst vendor Recorded Future says voice cloning is capable of defeating voice multifactor authentication in the wild. Authors of the report say a cross-industry approach is needed to keep deepfake voice in check.
The report, “I Have No Mouth, and I Must Do Crime,” is a nod to science-fiction author Harlan Ellison’s dark visions, but the findings it contains warrant poetic flourish.
“Voice cloning technology is currently being abused by threat actors in the wild,” the report states. It is “enabling the spread of misinformation and disinformation and increasing the effectiveness of social engineering.” The barrier to entry continues to get lower, with platforms such as ElevenLabs’s popular Prime Voice AI offering low cost, browser-based options for text-to-speech (TTS) conversion.
“Voice cloning samples – such as those of celebrities, politicians, and internet personalities (‘influencers’) – and are intended to create either comedic or malicious content, which is often racist, discriminatory, or violent in nature,” the report says. Threat actors are demonstrating effective voice-based fraud attacks including voice phishing, or vishing.
For platforms like Microsoft’s TTS AI model, VALL-E, it requires only three seconds of audio to be able to generate the cloned voice of, for example, a loved one asking for bail money.
For now, technical limitations mean that voice cloning is primarily used for small-scale fraud, leveraging one-time samples for extortion or disinformation. Still, the results can be disastrous for individuals. This month, Canadian broadcaster the CBC reported on voice cloning used to defraud eight seniors in Newfoundland out of CAD$200,000 (US$148,000). Victims received calls during which the cloned voices of their grandchildren asked them for money to cover emergency costs.
In other instances, cloned voices have been used in kidnapping and hostage scams.
The report surveyed dark web chatter, and found that some threat actors are not convinced current voice cloning tech is equipped to deal with certain security hurdles, particularly when it comes to cloning non-English speaking voices. But they are already finding ways to modify it. One such workaround is voice cloning as a service, or VCaaS. This is “a new form of commodified cybercrime in which voice cloning ‘specialists’ provide tailored voice cloning samples, often advertising their services via Telegram,” according to the report.
Furthermore, the general rise in public awareness of AI has led to a spike in the number of free, anonymous third-party voice cloning services. Open-source voice cloning software is appearing on social media and in code repositories. And cybercriminals are trying to find ways to circumvent content restrictions imposed by platforms like ElevenLabs, which drew the ire of posters on Reddit when it updated its community standards to deter voice cloning for nefarious purposes.
The report advises organizations to act early in addressing the risks associated with voice cloning, which are growing. “An industry-wide approach is required immediately in order to pre-empt further threats from future advances in voice cloning technology.”
Article Topics
biometric liveness detection | biometrics | deepfakes | spoof detection | synthetic voice | voice biometrics
Comments