More real money for synthetic biometric data
Add $50 million to the venture funding raised by Datagen, which specializes in simulating humans for use in training AI algorithms.
The series B infusion brings to $72 million the money (including a seed round) that the Israeli firm has attracted, according to TechCrunch.
Datagen is one of dozens of companies worldwide developing synthetic data, which has been billed as something of an elixir for AI and biometrics researchers, developers, vendors and buyers trying to dig the bias and privacy landmines out of algorithms trained on real data.
Scale Ventures Partners, a new investor in the company, led the round and now has a seat on its board. Nvidia AI Director Gal Chechik participated as did computer scientists Trevor Darrell and Michael Black.
Also throwing in were Viola Ventures’ growth practice, Spider Capital and TLV Partners. A year ago, Datagen raised a $18.5 million series A round.
The four-year-old company (considered part of the computer vision sector) has recorded less than $10 million in revenue, according to reporting by Ctech. Its specialty is photorealistic visual simulations and recreations, including objects. Human motion is a particular focus.
Typically, an AI developer would train an algorithm and tinker with it to get the desired results. Datagen executives have said that that is not a best practice, particularly when new synthetic data can be created.
If the training data is comprised of biometric and other identifiers for actual individuals in a population, developers are constrained by the biometric and demographic makeup of the real data.
But if the training data is wholly created or only based on individuals within a desired human population, the components of that data set can be rejiggered with a keyboard. The hope is that this will reduce at least unintentional bias.
Plus, everyone using synthetic data enjoys life without fear of sub-optimally anonymized headaches popping up in the future.
Not everyone sees the same rosy picture, of course. An article about synthetic data and Datagen last year in MIT Technology Review quoted AI-industry insiders saying hazards exist.
In fact, the story called out the possibility that Datagen’s raw data “contains proportionally fewer ethnic minorities.”
And, addressing the young industry generally, cold water was poured on the idea that perfectly balanced data sets necessarily create pristinely fair algorithms.
Clearly there are money people who are not deterred by icy baths.
Investment publisher Nanalyze has written that Datagen is focusing right now on retail, robotics, car automation, Internet of Things and virtual reality. And it has published a graphic listing the many startups treading these waters.