FB pixel

Microsoft teases easy video deepfake tool, declines to release it

Generative AI video engine brings potential applications and risks
Categories Biometric R&D  |  Biometrics News
Microsoft teases easy video deepfake tool, declines to release it
 

Microsoft is the latest tech giant to tease an AI product so good at producing deepfake humans that it poses a threat to real ones. In a striking demonstration of how quickly generative AI is advancing, VASA-1 can generate “hyper-realistic talking face video” from nothing but a single static image, an audio clip and a text script. A research paper from Microsoft says VASA-1 produces “lip movements that are exquisitely synchronized with the audio,” plus “a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness.”

Dozens of accompanying video samples illustrate this capability, applied to both real humans and artificial faces (in one particularly jarring instance, Da Vinci’s Mona Lisa convincingly raps a verse by Anne Hathaway). Other demos showcase the AI’s ability to make faces sing, speak in different languages, and otherwise handle photo and audio inputs from outside the training set. Many of the videos are so realistic that most casual viewers would never think to question their authenticity.

If released to the public, VASA-1 would give just about anyone the ability to create deepfake videos with a single photo and a minimal amount of audio input. Microsoft purports to know this. Its release says its research “focuses on generating visual affective skills for virtual AI avatars, aiming for positive applications. It is not intended to create content that is used to mislead or deceive. However,” Microsoft concedes, “like other related content generation techniques, it could still potentially be misused for impersonating humans.”

“Given such context, we have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.”

Microsoft’s caution belies enthusiasm for generative AI’s potential

As pointed out by some observers, creating a powerful video cloning tool and saying it should not be used to create deepfakes is a bit like inventing dynamite and saying it could be misused to blow things up. Microsoft’s intention in announcing VASA-1 and outlining its capabilities is surely not to apologize. Its own language makes clear how the company weighs AI’s risks against its benefits: “while acknowledging the possibility of misuse, it’s imperative to recognize the substantial positive potential of our technique,” it says. “The benefits – such as enhancing educational equity, improving accessibility for individuals with communication challenges, offering companionship or therapeutic support to those in need, among many others – underscore the importance of our research and other related explorations.”

Kevin Surace, chair of biometric authentication firm Token, agrees – to a point. “The implications for personalizing emails and other business mass communication is fabulous,” he says in an article in The Register. “Even animating older pictures as well. To some extent this is just fun and to another it has solid business applications we will all use in the coming months and years.”

Yet for the biometrics industry and its associated regulatory circles, the technology and the speed at which it is evolving also pose serious questions about the reliability of existing systems. Deepfakes generated using VASA-1 and other AI spoofing tools could be used to trick facial recognition systems.

One of VASA-1’s major jumps is being able to create faces with “appealing visual affective skills”. Visual affective skills (VAS) are what let us perceive and interpret emotions through visual stimuli, such as facial expressions and body language. For VASA-1, those skills are reversed to apply to a fake video avatar’s ability to evoke emotion in a viewer. Per Microsoft, “the core innovations include a diffusion-based holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos.”

In other words, the algorithm reduces noise while adding detail, and captures the movement of the whole face and head as a single unit rather than disparate elements, which is expressed in highly refined and modular code.

Regulating generative AI models could be very difficult

Writing for The Register, Thomas Claburn says VASA-1 is the kind of threat that has governments scrambling to enact regulations. “These AI-generated videos, in which people can be convincingly animated to speak scripted words in a cloned voice, are just the sort of thing the U.S. Federal Trade Commission warned about last month, after previously proposing a rule to prevent AI technology from being used for impersonation fraud,” writes Claburn.

For his part, Surace believes that, despite the wave of AI-focused laws popping up around the globe, regulatory measures may end up being merely decorative.

“Microsoft and others have held back for now until they work out the privacy and usage issues,” he says. “How will anyone regulate who uses this for the right reasons? Because of the open source nature of the space, regulating it will be impossible in any case.”

Related Posts

Article Topics

 |   |   |   |   |   | 

Latest Biometrics News

 

UK police look at future tech, including biometrics like brainwaves

With technology, what was once cutting edge will one day become the norm. Currently, the police are incorporating facial recognition,…

 

ITL signs up French reseller for retail biometric age checks

France-based retail technology provider Bergens has been selected by Innovative Technology (ITL) as its authorized reseller in the region for…

 

4i Digital integrates verifiable credentials from Dock Labs

Latin American ID verification and biometrics provider 4i Digital has announced the integration of verifiable credential technology from Dock Labs….

 

World positions World App as a ZKP-based age verification solution

World has entered the age assurance arena. A post on the World blog says the company’s solution for age verification…

 

Identity verification to surpass $20B by 2030 amid transition to reusable digital ID

Total global revenue from identity verification services will exceed $20 billion by 2030, with digital identity verification passing traditional identity…

 

Mastercard joins OpenWallet Foundation, Paycode partners up to scale digital transactions

A new foundation membership and a technology partnership announced by digital wallet and payment providers are aimed at making digital…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Biometric Market Analysis

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events