Microsoft’s Horvitz says it will take more than code to deal with the deepfake threat
In a development that should surprise few in the facial recognition community, there is more unsettling news about how deepfakes are evolving and how fast.
Algorithms capable of interacting with people and creating entire narratives could soon be part of propagandists’ “persuasion toolkits,” according to Microsoft Chief Science Officer Eric Horvitz.
Horvitz has a list of steps humanity should take to meet the challenge it faces. For better or worse, several steps depend not on technology but on people becoming more analytical when it comes to the information they consume.
He sees two immediate threats — interactive deepfakes and compositional deepfakes. The first ones will interact with humans in ways that make it all but impossible for an ordinary person to now they are talking to code.
The second will be deepfakes with backstories. Creating scenes in which the deepfake lives. It could be the scene of an accident or a collection of social media clips appearing to show a person at birthday parties over the years.
Recent research is disquieting. Horvitz cites 2020 research into neural voice puppetry that creates a “compelling” spectacle of real-time face movements matching words being uttered behind the scenes by an actor.
Mapping audio to an expression and generating a realistic rendering, he writes, was done in 5 milliseconds using an Nvidia 1080Ti processor. Two years before that, researchers demonstrated deep neural models that generate realistic speaking from text.
And work avatars that are designed to mimic a human’s idiosyncratic communication style is leading to avatars that “um” realistically, facially fidget and so on.
Hybrid avatars also are in the wings. These hand control of a conversation over to a deepfake avatar, but have the actual human model matching the fake ready to be slyly swapped in when needed.
Those developments and others are laying the foundation for realistically interactive deepfakes, writes Horvitz.
Compositional deepfakes are a “concerning, feasible direction” as well, he writes. They create narratives, not just an image or an avatar made to do a task.
This nightmarish scenario includes the possibility of completely fabricating incidents to be spliced, so to speak, between real events. The goal would be to make a fabricated event meld seamlessly with actual events, conveying trust to the fakery.
Part of the answer is creating an environment that promotes good local and international journalism. Identifying trusted people with trusted information has always been a crucial step in combating disinformation. Along with that, Horvitz prescribes media literacy projects.
Content provenance, protocols for authenticity, watermarks and other digital ‘fingerprints’ also can make it easier to avoid fraudulent personalities and events.
Of course, detection through new algorithms as well as self-regulation and government regulation are mentioned. But Horvitz appears to be saying none of it will work unless it is accompanied by personal responsibility.