FB pixel

Microsoft teases easy video deepfake tool, declines to release it

Generative AI video engine brings potential applications and risks
Categories Biometric R&D  |  Biometrics News
Microsoft teases easy video deepfake tool, declines to release it
 

Microsoft is the latest tech giant to tease an AI product so good at producing deepfake humans that it poses a threat to real ones. In a striking demonstration of how quickly generative AI is advancing, VASA-1 can generate “hyper-realistic talking face video” from nothing but a single static image, an audio clip and a text script. A research paper from Microsoft says VASA-1 produces “lip movements that are exquisitely synchronized with the audio,” plus “a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness.”

Dozens of accompanying video samples illustrate this capability, applied to both real humans and artificial faces (in one particularly jarring instance, Da Vinci’s Mona Lisa convincingly raps a verse by Anne Hathaway). Other demos showcase the AI’s ability to make faces sing, speak in different languages, and otherwise handle photo and audio inputs from outside the training set. Many of the videos are so realistic that most casual viewers would never think to question their authenticity.

If released to the public, VASA-1 would give just about anyone the ability to create deepfake videos with a single photo and a minimal amount of audio input. Microsoft purports to know this. Its release says its research “focuses on generating visual affective skills for virtual AI avatars, aiming for positive applications. It is not intended to create content that is used to mislead or deceive. However,” Microsoft concedes, “like other related content generation techniques, it could still potentially be misused for impersonating humans.”

“Given such context, we have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.”

Microsoft’s caution belies enthusiasm for generative AI’s potential

As pointed out by some observers, creating a powerful video cloning tool and saying it should not be used to create deepfakes is a bit like inventing dynamite and saying it could be misused to blow things up. Microsoft’s intention in announcing VASA-1 and outlining its capabilities is surely not to apologize. Its own language makes clear how the company weighs AI’s risks against its benefits: “while acknowledging the possibility of misuse, it’s imperative to recognize the substantial positive potential of our technique,” it says. “The benefits – such as enhancing educational equity, improving accessibility for individuals with communication challenges, offering companionship or therapeutic support to those in need, among many others – underscore the importance of our research and other related explorations.”

Kevin Surace, chair of biometric authentication firm Token, agrees – to a point. “The implications for personalizing emails and other business mass communication is fabulous,” he says in an article in The Register. “Even animating older pictures as well. To some extent this is just fun and to another it has solid business applications we will all use in the coming months and years.”

Yet for the biometrics industry and its associated regulatory circles, the technology and the speed at which it is evolving also pose serious questions about the reliability of existing systems. Deepfakes generated using VASA-1 and other AI spoofing tools could be used to trick facial recognition systems.

One of VASA-1’s major jumps is being able to create faces with “appealing visual affective skills”. Visual affective skills (VAS) are what let us perceive and interpret emotions through visual stimuli, such as facial expressions and body language. For VASA-1, those skills are reversed to apply to a fake video avatar’s ability to evoke emotion in a viewer. Per Microsoft, “the core innovations include a diffusion-based holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos.”

In other words, the algorithm reduces noise while adding detail, and captures the movement of the whole face and head as a single unit rather than disparate elements, which is expressed in highly refined and modular code.

Regulating generative AI models could be very difficult

Writing for The Register, Thomas Claburn says VASA-1 is the kind of threat that has governments scrambling to enact regulations. “These AI-generated videos, in which people can be convincingly animated to speak scripted words in a cloned voice, are just the sort of thing the U.S. Federal Trade Commission warned about last month, after previously proposing a rule to prevent AI technology from being used for impersonation fraud,” writes Claburn.

For his part, Surace believes that, despite the wave of AI-focused laws popping up around the globe, regulatory measures may end up being merely decorative.

“Microsoft and others have held back for now until they work out the privacy and usage issues,” he says. “How will anyone regulate who uses this for the right reasons? Because of the open source nature of the space, regulating it will be impossible in any case.”

Related Posts

Article Topics

 |   |   |   |   |   | 

Latest Biometrics News

 

Controversy surrounding police use of FRT in Denmark and Germany continues

In recent months, European nations have seen heightened debate over the use of facial recognition technology (FRT) by law enforcement,…

 

Privado ID merges with Disco to unify digital identity across Web2, Web3

Privado ID, formerly known as Polygon ID, has announced a merger with Disco, a company specializing in multichain verifiable data…

 

G20 ministers pledge AI transparency and digital inclusion with DPI at the core

At the G20 Digital Economy Ministers’ meeting held in Maceió, Brazil, on September 13, 2024, global leaders reaffirmed their commitment…

 

Spanish startup B-FY brings offline biometrics to US cloud authentication market

Spain-based biometrics startup B-FY has launched in the U.S. market, introducing its cloud-based identity verification and authentication software. B-FY’s technology…

 

Biometric payment cards from FPC and Infineon ready for mass production

Fingerprint Cards and Infineon Technologies have officially unveiled the complete package of biometric payment card technologies that Infineon previewed in…

 

UNHCR, WFP data sharing collaboration yielding results for refugee management in Tanzania

Food distribution for refugees in Tanzania is getting easier with the use of a data sharing tool recently introduced by…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Most Read This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events