FB pixel

SenseTime researcher presents progress on AI syncing faces with audio

Categories Biometric R&D  |  Biometrics News
SenseTime researcher presents progress on AI syncing faces with audio
 

The digital manipulation of faces can be used to spoof biometric systems or sow misinformation, but technology to coordinate face movements with the speaker’s voice are also in demand in several areas, as SenseTime Researcher Yuxin Wang explained during a talk for the European Association for Biometrics (EAB).

His presentation on ‘Talking Faces: Audio-to-Video Face Generation’ was part of the EAB’s Workshop on Digital Face Manipulation & Detection, held this week for members of the organization.

Digital technologies have been used since the 1990s to generate synthetic video of people talking, for applications like virtual assistants, teleconferencing, movie and video game dubbing and digital twins.

The output in talking head generation should have “much more head movement” than the source material in audio-driven face re-enactment, Wang says.

Wang reviewed the modelling techniques that allow for relationships between head movement and vocalization to be measured for talking face generation. Data pulled from an audio representation is used to make the mouth movement and expressions of the speaker in the video accurately and consistently match the sound.

He described a pair of pipelines to do so; one based on audio and image encoders producing representations run through a single decoder, and the other using a regression model on audio features for combination with an intermediate feature, like a facial landmark, and rendered from the intermediate feature. Wang also explained image refinement and background compositing in post-processing.

The talk then touched on the methods and datasets used in 2D and 3D face generation.

Various metrics have been developed that can be applied to image quality, the synchronization between audio signals and the speaker’s lips, identity preservation and blinking, which Wang outlined.

Challenges remaining in talking face generation range from exercising fine-grained control over facial features like eyes and teeth, head movement and emotion, to generalization of the identity and body. Then there are considerations around forgery detection and social responsibility.

In an example of the first challenge, Wang notes that blinking is related to speech mechanics and thought processes, but the relations are not yet well-understood. Eye-blinking can be generated from target frames or gaussian noise. Some models connect eye movement to overall facial expression, but this method is also still in its early stages of development.

Larger and more diverse datasets could help with generation, according to the SenseTime researcher.

Detection of manipulated video was briefly considered, and deepfake detection was the focus of several other presentations during the event.

Wang sees talking face generation technology improving in the near future, and practical applications expanding as it does.

Article Topics

 |   |   |   |   |   |   |   | 

Latest Biometrics News

 

Financial fraud prompts $14M digital identity intelligence investment, calls for action

Financial institutions and regulators continue to invest in anti-fraud and identity verification. Barclays has invested in anti-fraud platform Heka as…

 

Age estimation at the shop, age verification online: France laws tested with questions

In France, age assurance tools are showing up online and at retail vendors selling age restricted products, prompting questions from…

 

Ofcom planning more safety measures to tackle addictive design

It has been noted previously in these pages that the UK is looking to be taken seriously in pursuing its…

 

OFIQ community reviews early results of biometric quality assessment tool

The standardization of image quality for face biometrics is a major step towards making population-scale biometric systems functional, and as…

 

New binocular iris scanner from IriTech designed for high volume use cases

Iris recognition provider IriTech, Inc. has announced the launch of the IriAegis-BK, a binocular iris scanner designed for high-throughput biometric…

 

Laos begins integrating digital ID cards into public agencies

Government agencies in Laos have received the order to begin integrating the country’s chip-enabled identity cards and citizen databases. Ministries,…

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Biometric Market Analysis

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events