Voice deepfakes from single facial image reveal fine-tuning detection trade-off

Researchers find deepfake detection blind spot

Nov 7, 2025, 10:33 am EST | Chris Burt

Voice deepfakes from single facial image reveal fine-tuning detection trade-off

A technique for generating a spoof of a person’s voice from only a single facial image, demonstrated at the USENIX Security 2024 conference, is among the more alarming deepfake creation methods uncovered so far. Worse, voice deepfake detection tools on the market tend to struggle with these audio deepfakes, according to a team of Australian researchers.

Fortunately, as the team from Australian digital research network Data61 at CSIRO shows in a recently-published paper, it is possible to tune those tools to more accurately detect deepfakes created with Face-to-voice synthesis, also known as “FOICE.”

In the paper “Can Current Detectors Catch Face-to-Voice Deepfakes?”, the researchers tested FOICE outputs with biometric voice authentication software including WeChat Voiceprint and Microsoft Azure. The spoof attempts were frequently successful, and approached a 100 percent success rate when making multiple attempts.

The researchers point out that this is troubling because of the wider availability of facial images than voice samples.

Four deepfake detectors the researchers characterize as state-of-the-art models “that span distinct architectural families and design goals” performed poorly when tested with deepfakes produced from four datasets. The best-performing, AASIST, had an equal error rate (EER) of 0.163. All models improved when fine-tuned, with AASIST’s EER dropping to 0.003.

Three of these four fine-tuned voice deepfake detectors were less accurate at identifying other kinds of spoofs, however. The drop in AASIST’s accuracy was modest, and the Ren et al. model’s improved, but TCM dropped by 10 percent and Sun et al. was rendered almost completely ineffective.

“Only domain-invariant approaches maintained relatively stable cross-vocoder behavior; noise robustness varied widely, and denoising can unintentionally remove forensic cues,” the researchers conclude. “Lasting defenses therefore require (i) larger, more diverse corpora (including FOICE variants and modern vocoders) and (ii) architectures and training regimes that target vocoder-independent, cross-modal representations.”

Voice deepfakes checks are forecast to surpass 4.8 billion and generate over $2.4 billion in revenue by 2027 in the 2025 Deepfake Detection Market Report and Buyers Guide from Biometric Update and Goode Intelligence.

Article Topics

Voice deepfakes from single facial image reveal fine-tuning detection trade-off

Article Topics

Comments

Leave a ReplyCancel reply

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events

Voice deepfakes from single facial image reveal fine-tuning detection trade-off

Related Posts

Article Topics

Latest Biometrics News

Face biometrics use cases outnumbered only by important considerations

Biometric Update Podcast explores identification at scale using browser fingerprinting

Passkeys now pervasive but passwords persist in enterprise authentication

Pornhub returns to UK, but only for iOS users who verify age with Apple

Europol operated ‘shadow’ IT systems without data safeguards: Report

EU pushes AI Act deadlines for high-risk systems, including biometrics

Comments

Leave a ReplyCancel reply

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events