What’s the difference between managing staff and making AI better? Surprisingly little
Who would have guessed that the three biggest problems when it comes to workforces are also primary and continuous headaches for AI and machine learning?
OK, there are people who have said that training, performance improvement and benchmarking are unwisely and embarrassingly ignored by the AI supply chain, from biased human annotators to developers who “fallaciously” elevate benchmarks.
VentureBeat, however, has rounded up a respectable passel of studies and articles that make it impossible for anyone to ignore.
(The assumption here is that the goal of AI and machine learning development is ethical as well as effective code. That is not always the case and may ultimately turn out to be a nice thought.)
The highly recommended article breaks the topic into three problems familiar to biometrics observers: training, labeling issues and benchmarks. As a bonus, the author offers some solutions.
It is a given that there are pressures in the real world that force companies to wrinkle corners if not cut them. But one example of very myopic development is playing good-enough when selecting datasets.
For example, can the same training dataset be the foundation for many AI missions? Really? Because VentureBeat cited a study that found significant disadvantages to the prevalent wrench-for-a-hammer school of dataset use.
That insight should get the attention of large in-house AI and machine learning development teams, but the same is true for everyone else using the most common datasets — their names show up again and again and again in research papers.
Candid practitioners might admit surprise to learn that, as VentureBeat points out, 12 companies and schools have created datasets that are “used more than 50% of the time in machine learning.” It has become a wag-the-dog environment.
Then there is the labeling problem.
Who has not heard about the annotators who inserted incredibly racist — and specifically anti-Black — slurs into the 80 Million Tiny Images dataset? Here is co-creator Massachusetts Institute of Technology’s apology for it.
This illustrates a common problem in many advanced economies. Many developers pay annotators the absolute minimum and show them even less professional respect to create the foundation for complex algorithms that will impact human lives, and then express shock that the sausage contains material scraped from the floor.
No one shops Walmart for supercomputers, but worse labor markets are scoured for annotators.
Finally, there is the benchmarking problem.
The article cites pre-print research by the Institute for Artificial Intelligence and Decision Support indicating that 3,867 AI papers suffered from inconsistent benchmarking.
Another study, by Facebook and the University College London found 60 to 70 percent of answers by natural language models showed that the models were merely memorizing answers.
It is extraordinary to read that “benchmarks like ImageNet are often ‘fallaciously elevated’ to justify claims that extend beyond” original design specs.
One partial solution is simple enough: Developers have to write licenses precisely, laying out how datasets can be used and/or expressly prohibit “questionable uses.”
Another: reward developers “for creating new, diverse datasets contextualized for the task at hand.”
More suggestions follow, and none of them are impossible. Some involve investing more money but considering how powerful biometrics and other algorithms can be in markets and daily lives, few monied buyers are balking at price tags.