FB pixel

Microsoft teases easy video deepfake tool, declines to release it

Generative AI video engine brings potential applications and risks
Categories Biometric R&D  |  Biometrics News
Microsoft teases easy video deepfake tool, declines to release it
 

Microsoft is the latest tech giant to tease an AI product so good at producing deepfake humans that it poses a threat to real ones. In a striking demonstration of how quickly generative AI is advancing, VASA-1 can generate “hyper-realistic talking face video” from nothing but a single static image, an audio clip and a text script. A research paper from Microsoft says VASA-1 produces “lip movements that are exquisitely synchronized with the audio,” plus “a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness.”

Dozens of accompanying video samples illustrate this capability, applied to both real humans and artificial faces (in one particularly jarring instance, Da Vinci’s Mona Lisa convincingly raps a verse by Anne Hathaway). Other demos showcase the AI’s ability to make faces sing, speak in different languages, and otherwise handle photo and audio inputs from outside the training set. Many of the videos are so realistic that most casual viewers would never think to question their authenticity.

If released to the public, VASA-1 would give just about anyone the ability to create deepfake videos with a single photo and a minimal amount of audio input. Microsoft purports to know this. Its release says its research “focuses on generating visual affective skills for virtual AI avatars, aiming for positive applications. It is not intended to create content that is used to mislead or deceive. However,” Microsoft concedes, “like other related content generation techniques, it could still potentially be misused for impersonating humans.”

“Given such context, we have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.”

Microsoft’s caution belies enthusiasm for generative AI’s potential

As pointed out by some observers, creating a powerful video cloning tool and saying it should not be used to create deepfakes is a bit like inventing dynamite and saying it could be misused to blow things up. Microsoft’s intention in announcing VASA-1 and outlining its capabilities is surely not to apologize. Its own language makes clear how the company weighs AI’s risks against its benefits: “while acknowledging the possibility of misuse, it’s imperative to recognize the substantial positive potential of our technique,” it says. “The benefits – such as enhancing educational equity, improving accessibility for individuals with communication challenges, offering companionship or therapeutic support to those in need, among many others – underscore the importance of our research and other related explorations.”

Kevin Surace, chair of biometric authentication firm Token, agrees – to a point. “The implications for personalizing emails and other business mass communication is fabulous,” he says in an article in The Register. “Even animating older pictures as well. To some extent this is just fun and to another it has solid business applications we will all use in the coming months and years.”

Yet for the biometrics industry and its associated regulatory circles, the technology and the speed at which it is evolving also pose serious questions about the reliability of existing systems. Deepfakes generated using VASA-1 and other AI spoofing tools could be used to trick facial recognition systems.

One of VASA-1’s major jumps is being able to create faces with “appealing visual affective skills”. Visual affective skills (VAS) are what let us perceive and interpret emotions through visual stimuli, such as facial expressions and body language. For VASA-1, those skills are reversed to apply to a fake video avatar’s ability to evoke emotion in a viewer. Per Microsoft, “the core innovations include a diffusion-based holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos.”

In other words, the algorithm reduces noise while adding detail, and captures the movement of the whole face and head as a single unit rather than disparate elements, which is expressed in highly refined and modular code.

Regulating generative AI models could be very difficult

Writing for The Register, Thomas Claburn says VASA-1 is the kind of threat that has governments scrambling to enact regulations. “These AI-generated videos, in which people can be convincingly animated to speak scripted words in a cloned voice, are just the sort of thing the U.S. Federal Trade Commission warned about last month, after previously proposing a rule to prevent AI technology from being used for impersonation fraud,” writes Claburn.

For his part, Surace believes that, despite the wave of AI-focused laws popping up around the globe, regulatory measures may end up being merely decorative.

“Microsoft and others have held back for now until they work out the privacy and usage issues,” he says. “How will anyone regulate who uses this for the right reasons? Because of the open source nature of the space, regulating it will be impossible in any case.”

Related Posts

Article Topics

 |   |   |   |   |   | 

Latest Biometrics News

 

BorderAge promises 100% anonymous age assurance with hand gesture modality

Imagine a magician who waves their hands not to conjure a white rabbit, but to provide age assurance without collecting…

 

euCONSENT’s tokenized age verification set for PoC at upcoming age assurance summit

The European Union has its own ideas about how age assurance should be carried out for restricted online services, and…

 

Humanity Protocol launches Humanity Foundation ahead of ‘big moves’

Humanity Protocol, one of the emergent contenders in the market for proof of personhood (PoP), has announced the launch of…

 

J.P. Morgan adds 2 biometric authentication terminals to payments ecosystem

J.P. Morgan Payments (JPM) has announced the release of two new proprietary biometric payments terminals for retail, restaurant and entertainment…

 

Prove acquires reusable digital ID verification firm Portabl

A post on Prove’s blog says the acquisition of digital ID startup Portabl “will enable Prove to enhance its industry-leading…

 

Socure: Nation-state fraud ramping up in 2025

Socure, a leading digital identity verification platform, believes 2025 will be the breakout year for digital identity verification in the…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events