AI bias creep is a problem that’s hard to fix
On the heels of a National Institute of Standards and Technology (NIST) study on demographic differentials of biometric facial recognition accuracy, Karen Hao, an artificial intelligence authority and reporter for MIT Technology Review, recently explained that “bias can creep in at many stages of the [AI] deep-learning process” because “the standard practices in computer science aren’t designed to detect it.”
“Fixing discrimination in algorithmic systems is not something that can be solved easily,” explained Andrew Selbst, a post-doctoral candidate at the Data & Society Research Institute, and lead author of the recent paper, Fairness and Abstraction in Sociotechnical Systems.
“A key goal of the fair-ML community is to develop machine-learning based systems that, once introduced into a social context, can achieve social and legal outcomes such as fairness, justice, and due process,” the paper’s authors, which include Danah Boyd, Sorelle A. Friedler, Suresh Venkatasubramanian, and Janet Vertesi, noted, adding that “(b)edrock concepts in computer science – such as abstraction and modular design – are used to define notions of fairness and discrimination, to produce fairness-aware learning algorithms, and to intervene at different stages of a decision-making pipeline to produce ‘fair’ outcomes.”
“However,” they pointed out, “we contend that these concepts render technical interventions ineffective, inaccurate, and sometimes dangerously misguided when they enter the societal context that surrounds decision-making systems. We outline this mismatch with five ‘traps’ that fair-ML work can fall into, even as it attempts to be more context-aware in comparison to traditional data science.”
The paper’s researchers stated that they drew upon “studies of sociotechnical systems in science and technology studies to explain why such traps occur and how to avoid them. Finally, we suggest ways in which technical designers can mitigate the traps through a refocusing of design in terms of process rather than solutions, and by drawing abstraction boundaries to include social actors rather than purely technical ones.”
Hao explained that “(o)ver the past few months, we’ve documented how the vast majority of AI’s applications today are based on the category of algorithms known as deep learning, and how deep-learning algorithms find patterns in data. We’ve also covered how these technologies affect people’s lives … Machine-learning algorithms use statistics to find patterns in data. So if you feed it historical crime data, it will pick out the patterns associated with crime,” Hao said, noting that “(b)ecause most risk assessment algorithms are proprietary, it’s also impossible to interrogate their decisions or hold them accountable.”
Consequently, just recently a broad coalition of more than 100 civil rights, digital justice, and community-based organizations issued a joint statement of civil rights concerns in which they highlighted concerns with the adoption of algorithmic-based decision making tools.
Explaining why “AI bias is hard to fix,” Hoa cited as an example, “unknown unknowns. The introduction of bias isn’t always obvious during a model’s construction because you may not realize the downstream impacts of your data and choices until much later. Once you do, it’s hard to retroactively identify where that bias came from and then figure out how to get rid of it.”
Imperfect processes is another problem, she wrote, saying, “many of the standard practices in deep learning are not designed with bias detection in mind. Deep-learning models are tested for performance before they are deployed, creating what would seem to be a perfect opportunity for catching bias. But in practice, testing usually looks like this: computer scientists randomly split their data before training into one group that’s used for training and another that’s reserved for validation once training is done. That means the data you use to test the performance of your model has the same biases as the data you used to train it. Thus, it will fail to flag skewed or prejudiced results.”
Hoa also blames “lack of social context,” meaning “the way in which computer scientists are taught to frame problems often isn’t compatible with the best way to think about social problems.”
Then there are the definitions of fairness where it’s not at all “clear what the absence of bias should look like,” Hoa argued, noting, “this isn’t true just in computer science – this question has a long history of debate in philosophy, social science, and law. What’s different about computer science is that the concept of fairness has to be defined in mathematical terms, like balancing the false positive and false negative rates of a prediction system. But as researchers have discovered, there are many different mathematical definitions of fairness that are also mutually exclusive.”
For example, she asks, “(d)oes fairness mean …that the same proportion of black and white individuals should get high-risk assessment scores? Or that the same level of risk should result in the same score regardless of race? It’s impossible to fulfill both definitions at the same time, so at some point, you have to pick one. But whereas in other fields this decision is understood to be something that can change over time, the computer science field has a notion that it should be fixed.”
“By fixing the answer, you’re solving a problem that looks very different than how society tends to think about these issues,” Selbst said.
“Fortunately,” Hoa said, there’s “a strong contingent of AI researchers [who] are working hard to address the problem. They’ve taken a variety of approaches: algorithms that help detect and mitigate hidden biases within training data or that mitigate the biases learned by the model regardless of the data quality; processes that hold companies accountable to the fairer outcomes, and discussions that hash out the different definitions of fairness.”
“A new wave of decision-support systems are being built today using AI services that draw insights from data (like text and video) and incorporate them in human-in-the-loop assistance. However, just as we expect humans to be ethical, the same expectation needs to be met by automated systems that increasingly get delegated to act on their behalf,” noted researchers Biplav Srivastava, and Francesca Rossi, in their paper, Towards Composable Bias Rating of AI Services.
“A very important aspect of ethical behavior is to avoid (intended, perceived, or accidental) bias,” which they said “occurs when the data distribution is not representative enough of the natural phenomenon one wants to model and reason about. The possibly biased behavior of a service is hard to detect and handle if the AI service is merely being used and not developed from scratch since the training data set is not available.”