Synthetic biometric data will solve AI’s ills (jk)
There is an aspect of AI that is simple and straightforward.
It will take years of development and testing, regulatory approvals, top talent, mammoth funding rounds, marketing hype and ethics reviews, but an advanced algorithm will be written to find that aspect.
Today, the industry, including biometrics recognition players, is kicking around the idea of creating synthetic data to get around privacy concerns and bias in datasets.
While the concept is supposed to work just as well with any data category that can lead to identifying a person or skewing commercial operations — in the financial sector, for instance — this piece looks at biometrics.
A new article in MIT Technology Review tackles the counter-intuitive and convoluted topic, and comes down exactly on the fence about how useful the growing technique can be.
In one example of how it is being used now, the story describes how the company Datagen hires other vendors to digitally scan volunteers in great detail to train computer vision algorithms.
Datagen took $18.5 million in venture funding in March. The Israeli startup ranked well in a recent market analysis by data science firm StartUs Insights.
With the raw data in hand, Datagen uses multiple algorithms to create three-dimensional avatars. Except that they are not avatars, despite the fact that each digital bust looks exactly like a common type of clayey or rubbery avatar. (Follow the link above to see.)
It is not an anonymization of real data, either. Each synthetic person is built on real biometric data pertaining to a real person — face geometry, irises, body, gait, presumably fingerprints, too.
Datagen reportedly is producing face expression to train algorithms that need to spot drowsy or otherwise inattentive driving. In this case, people have consented to being digitized, so there are few or no privacy concerns.
Other uses could result in privacy violations when generation techniques so nearly mirror a subject as to make anonymity unlikely. And, as a University of Pennsylvania IT professor pointed out in the article, training data can be attacked the same as any database to reveal actual identities.
As for bias, synthetics-populated datasets can be skewed exactly as easily as conventional datasets.
As with so many other parts of real life today, there is an app for this problem. It is just a matter of time and money when it will appear on a phone’s home page.
Article Topics
AI | biometric data | biometrics | computer vision | data collection | Datagen | dataset | privacy | research and development | synthetic data | training
Comments