Big jump in public face biometric dataset size

May 11, 2022, 5:48 pm EDT | Jim Nash

Big jump in public face biometric dataset size

A large team of researchers overwhelmingly from China says it has created new million-scale facial recognition benchmark. They claim in a new paper to have built an autonomously cleaned biometric dataset of 2 million identities among 42 million facial images.

The uncurated dataset holds 4 million celebrity identities among 260 million images. The new proposed benchmark is called WebFace260M, and it is being described as the largest public face biometric dataset.

That is a significant differentiator. Public researchers have decried the disadvantage they are at with dataset resources compared to private companies – particularly Facebook and Google. For all intents and purposes, both have unlimited image datasets.

The research paper says Google taps 200 million images of 8 million identities when training FaceNet. Facebook has 500 million faces among 10 million identities.

Dataset size is a potent accelerator of biometrics innovation, and public researchers are worried about being shut out of the race.

The WebFace260M researchers, from Tsinghua University, Imperial College London and a Chinese startup, XForwardAI, claim that their dataset “shows enormous potential on standard, masked and unbiased face recognition scenarios.” It was cleaned with an AI tool they developed, Cleaning Automatically by Self-Training.

Jack Clark, co-founder of AI safety and research firm Anthropic, writing in his blog Import AI, says, “Models trained on the resulting dataset are pretty good.”

Clark also makes the point that facial recognition – especially masked facial recognition – is important to government surveillance agencies. Results like those of WebFace260M influence decisions about “how to surveil a population and how much budget to set aside for said surveillance.”

A dataset this size has more proximate dangers, of course. With great volumes could come privacy-restricted images, long a problem for datasets created by academics and businesses alike.

A site has been posted with project history and updated details.

Article Topics

biometric dataset | biometrics | biometrics research | facial recognition

Big jump in public face biometric dataset size

Article Topics

Comments

Leave a ReplyCancel reply

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events

Big jump in public face biometric dataset size

Article Topics

Latest Biometrics News

Stop treating identity as a compliance step. It’s infrastructure now

If you build it, they will leave: experts warn UK gov’t on digital ID approach

Shufti biometric PAD clears iBeta Level 3 with 0 errors across iOS, Android

OpenID draft spec for extended identity claims assurance up for approval

EES troubles ignite speculation of further suspensions

UK Home Office eyes suppliers for SCBP biometrics platform

Comments

Leave a ReplyCancel reply

Biometric Market Analysis and Buyer's Guides

Most Viewed This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events