FB pixel

Big jump in public face biometric dataset size

Big jump in public face biometric dataset size

A large team of researchers overwhelmingly from China says it has created new million-scale facial recognition benchmark. They claim in a new paper to have built an autonomously cleaned biometric dataset of 2 million identities among 42 million facial images.

The uncurated dataset holds 4 million celebrity identities among 260 million images. The new proposed benchmark is called WebFace260M, and it is being described as the largest public face biometric dataset.

That is a significant differentiator. Public researchers have decried the disadvantage they are at with dataset resources compared to private companies – particularly Facebook and Google. For all intents and purposes, both have unlimited image datasets.

The research paper says Google taps 200 million images of 8 million identities when training FaceNet. Facebook has 500 million faces among 10 million identities.

Dataset size is a potent accelerator of biometrics innovation, and public researchers are worried about being shut out of the race.

The WebFace260M researchers, from Tsinghua University, Imperial College London and a Chinese startup, XForwardAI, claim that their dataset “shows enormous potential on standard, masked and unbiased face recognition scenarios.” It was cleaned with an AI tool they developed, Cleaning Automatically by Self-Training.

Jack Clark, co-founder of AI safety and research firm Anthropic, writing in his blog Import AI, says, “Models trained on the resulting dataset are pretty good.”

Clark also makes the point that facial recognition – especially masked facial recognition – is important to government surveillance agencies. Results like those of WebFace260M influence decisions about “how to surveil a population and how much budget to set aside for said surveillance.”

A dataset this size has more proximate dangers, of course. With great volumes could come privacy-restricted images, long a problem for datasets created by academics and businesses alike.

A site has been posted with project history and updated details.

Article Topics

 |   |   | 

Latest Biometrics News


Cybercrime and identity fraud: an Olympic challenge

By Grigory Yusupov, Regional Director UK and Rest of the World (ROW) at IDnow The Paris 2024 Olympics is set…


IDV providers respond to growing consumer demand for stronger fraud prevention

A range of digital identity and financial fraud prevention capabilities and solution updates have been released just as Veriff issues…


Biometrics developers dance with data privacy regulations continues

Biometrics controversy and investments are often found side by side, as seen in many of this week’s top stories on…


EU AI Act should revise its risk-based approach: Report

Another voice has joined the chorus criticizing the European Union’s Artificial Intelligence Act, this time arguing that important provisions of…


Swiss e-ID resists rushing trust infrastructure

Switzerland is debating on how to proceed with the technical implementation of its national digital identity as the 2026 deadline…


Former Jumio exec joins digital ID web 3.0 project

Move over Worldcoin, there’s a new kid on the block vying for the attention of the digital identity industry and…


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Most Read This Week

Featured Company

Biometrics Insight, Opinion

Digital ID In-Depth

Biometrics White Papers

Biometrics Events