Use of synthetic data for training biometric systems on the rise
Images generated by artificial intelligence have been spreading fast across the internet, sparking concerns about identity fraud, disinformation and more. But in the biometrics industry, synthetic data is hailed as a solution to the lack of data used for AI training, getting around data privacy restrictions, and battling bias.
Amazon One is using generative AI to create a “palm factory,” producing millions of synthetic images of palms to train its AI model.
The palm biometrics payment service relies on synthetic data to increase the size and the variety of its datasets so that it can boost the system’s accuracy, says Gerard Medioni, vice president and distinguished scientist at Amazon.
“The main challenge was, there’s no proper data. How do we train our deep network which needs lots and lots of data? We did it by creating synthetic data.” Medioni explains in a video recently published by the e-commerce giant.
The AI-generated hands can include many subtle changes, such as different brightness conditions, hand poses as well as the presence of Band-Aids, wedding bands or scars. Images can also be automatically “annotated,” cutting short the laborious process of manually labeling pictures so that computers can recognize what they are looking at. Amazon One also trained its system to detect fake hands, including detailed replicas.
The challenge for Amazon’s scientists was preserving the identity while creating many variations of this set. This is called controllable generative AI.
“We use that to train our system and we got an accuracy which is 1000 times higher than face recognition and 100 times more accurate than two irises,” Medioni says.
Grocery store chain Whole Foods offers Amazon One in 200 locations across the U.S. and is planning to equip over 500 stores with palm biometrics payment by the end of 2023. The company is planning to expand palm payments beyond payment to loyalty cards and age verification with customers like Panera and Coors Field.
Synthetic data – a solution to data privacy restrictions?
The usual arguments for using synthetic data are having too little real data, increasing the diversity of data, or getting around collecting data that is too impractical or expensive. But there are new reasons to apply AI-generated biometric data.
Innovatrics believes that the upcoming European Union’s AI Act may make the gathering and processing of biometric data even more complicated. Using synthetic data for training might solve this challenge.
“For us, as machine learning practitioners, it is going to be more and more difficult to actually work with real data,” says Innovatrics Image Synthesis Team Leader Igor Janos. “With the adoption of synthetic data, we hope to find ways around these restrictions.”
Innovatrics’ Janos spoke at the Eastern European Machine Learning Summer School organized by Google DeepMind in Kosice, Slovakia where the EU-based biometric solution provider demonstrated its use of generated data for improving algorithms.
The EU AI Act plans to classify biometric AI systems as high-risk risk which will require companies to meet certain requirements for data handling and protection. The AI Act may also introduce bans on certain techniques for collecting data such as scraping images from social media, YouTube, and other online resources.
Synthetic data allows companies to get around privacy concerns, as there is no need to control who has access to real data or concerns about leaks and exposure, notes Janos.
Innovatrics has also been using synthetic fingerprint fragments to improve algorithms used for detecting fingerprints at a crime scene, the company said in a release last week.