Joint Panel Session between the Standing Panel on Social Equity in Governance and the Standing Panel on Technology Leadership: Can AI Be Used to Increase Fairness and Equity in Government Decisions?

Following are some insights gleaned from a joint panel meeting of the Standing Panel on Social Equity and the Standing Panel on Leadership in Technology on this topic, December, 18, 2020 (link to video recording)

Background. National news is full of vivid and disconcerting stories about explicit bias in law enforcement. There have been similar, less dramatic stories of implicit or unconscious bias as to whether government treats similarly situated individuals similarly or differently in other arenas, including disability benefits determination, access to health care services, and more.

There are efforts to overcome some of these biases. For example, last year, the Los Angeles Times reported . . . “The San Francisco district attorney’s office said Wednesday it plans to launch a program that would allow prosecutors to make charging decisions in some cases without knowing the race or background of the suspects and victims, a move aimed at reducing the potential for implicit bias in prosecutions.”

Interestingly, a 2019 study found: “A range of interventions for reducing bias has been developed and tried, but they have been largely unsuccessful. While training and other similar approaches have led to short term improvements in implicit bias, “nearly all interventions have limited success.” So, if implicit bias is rife and training doesn’t work well, are there other ways to mitigate its effects?

Can Artificial Intelligence Help Increase Fairness and Equity? The joint panel session hosted a presentation by Joseph Avery, with the Princeton Project in Computational Law, of a working paper on the use of artificial intelligence to increase fairness and equity in governmental decision-making. The working paper focused its case example on how AI was used to inform selected aspects of criminal justice decision making. Avery’s fellow research partners -- Akbar Agha, Eric Glynn, and Joel Cooper -- are also with the Princeton Project on Computational Law.

They developed computer models to see if artificial intelligence can detect bias and inform decisions by prosecutors in local district attorney offices. They focused their efforts on a specific instance of potential bias – whether initial criminal charges should be reduced based on the facts of a case – and constructed models that could help inform these decisions. The design principles underlying their approach could be used in other policy arenas, as well.

The first model was predictive, reflecting how a defendant historically would have been treated. The second model was “race-neutral” and reflected one definition of fairness (“demographic parity”). The third model showed how the case would have been resolved if the defendant were treated as if he or she was white.

The Princeton researchers recommend using such models to train prosecutors on how to be sensitive to implicit bias. But, after further use, and with the precautions outlined in the report in place, they might be used to help inform decisions on actual cases. For example, one test case where the models were applied by the researchers involved a charging decision following an arrest for theft of under $300 by a young black woman – this charge would typically be considered a misdemeanor, but the woman’s prior criminal history raised the charge to a felony.

Should the charge be reduced? Based on the outputs from the three models developed by the Princeton researchers, the predictive and race-neutral models showed low likelihoods that the charge would be reduced. However, the third model, which would show how the defendant historically would be treated if White, “showed a nearly certain likelihood of a charge reduction, and thus a charge reduction was recommended,” according to the researchers.

Can AI be Trusted to Reduce Bias? The Princeton researchers recognize that many may distrust AI to support decisions that affect civil liberties, based on documented instances showing that biased data were used in making decisions with AI. This distrust is not unfounded. Several prominent AI projects - including those used in hiring at some companies and in facial recognition software – have been shown to be biased against women and people of color. Therefore, advanced analytics involving AI might reinforce bias in decisions rather than reduce it. The researchers are sensitive to these arguments and embrace many of them themselves.

At the same time, they wanted to find a way to correct for deep-seated human bias, which has been affecting legal decision making for centuries, and thus they set out to explore how AI could reduce, rather than reinforce, bias. They created a framework, based on a set of guiding design principles, to do just this.

Key Elements for Designing AI Decision Models. The researchers affirmed two guiding design principles when designing AI decision-support models to account for bias: transparency and accountability. They caution that: “Unless AI systems are designed and implemented correctly, they may perpetuate or even exacerbate the problems they were designed to solve.”

Transparency. Legal due process protections for life, liberty, or property interests require that government shows that its actions are “justifiable as a rational means of achieving a legitimate government purpose.” Therefore, any decisions augmented using advanced analytics must be transparent and explainable. Both the data and algorithmic code should be made publicly available to the extent allowed by privacy laws.

The Princeton researchers note that how this is done is important.They say: “instead of demanding interpretability and then building a model, organizations such as district attorney offices should first build models and then work on interpretability—prior to using the model in actual cases and matters, of course.” They say that there is a significant amount of “art” to building a model and that the first objective should be to build a model that works and then proceed backwards to develop the explanation for what the model is doing.

Accountability. Agency leaders that use advanced analytics to augment decisions that affect an individual’s civil rights need to define up front clear objectives and metrics for success and regularly assess progress towards those metrics. For example, if the goal is to rid a district attorney office of racially-biased case outcomes, one metric could be: are similarly situated defendants being treated similarly?

Accountability would also include creating a right of appeal that would provide the same degree of protection as an appeal process that is based on decisions made solely in a human decision of a similar type. The researchers say: “At a minimum, complainants should be allowed to contest inaccuracies in machine inputs and to present mitigating information.”

Designing the AI Framework. The researchers developed an AI decision support framework based on three steps that incorporate the principles of transparency and accountability. A key element in designing the framework, however, is engaging diverse subject matter experts to provide context and uncover potential areas of implicit bias.

Collect the Data. Widespread and standardized data need to be collected and made publicly available. Only with large amounts of data can implicit bias and group-level bias be observed at the aggregate level. The criminal justice system collects large amounts of data, but its uneven quality and accuracy, and its highly distributed nature, often prevents its use more broadly. It is not insurmountable; similar challenges of quality, accuracy, and lack of centralization face the health care system and these issues are being overcome.
Build the Computational Models. Building the analytic models to augment decisions requires that project leaders engage both technical and subject matter experts to ensure that key values that reflect intended results – such as fair outcomes – be incorporated at each stage of the design and construction process. Who performs and informs the data analysis and constructs the model matters.
Manage the Human-Computer Interaction. Finally, once the models are built and tested, the users – such as prosecutors in a district attorney’s office – need to incorporate them into their work. Users need to articulate in writing how they make decisions without computational models and then compare their decision-making rules to those offered by the computer models. This would typically take place in interactive training sessions, and if an attorney’s decision deviates from the model, this would lead to a discussion as to whether the human or the machine need to alter course. Ultimately, the user of the computer models “will need to alter how they work in order to incorporate algorithmic-based decision support tools into their day-to-day routines.”

The Importance of Engaging Subject Matter Experts. The researchers note that, at all three stages of the framework, agency leaders must engage both technical experts and subject matter experts to work together iteratively to develop the approach. In addition, the approach must be transparent to, and engage, key stakeholders to ensure the framework is seen as—and is-- legitimate and fair. The researchers say that some may object and claim the use of AI could never model human decisions, but they say: “even in the absence of machine decision making, these hard decisions are already being made. When humans make decisions, they are precipitated by underlying beliefs regarding various factors, including fairness. . . . .The primary difference between human decision making and the process we mention in here. . . . is that the algorithmic process, with its emphasis on reaching a definition and instantiating it in code, is more transparent.”

In conclusion, the Princeton research team focused attention on the use of AI to reduce bias in certain decisions within the criminal justice system. They demonstrate that data and advanced computational and statistical techniques can train AI models to better anchor decisions on racially fair outcomes in prosecutorial decisions. Moreover, the principles they outline and the framework they developed are applicable to the wider expanse of government decision making and “will increase that chance of success in decreasing disparities in their agency’s decisions and outcomes.”

Panel Discussion. After Avery’s presentation, the panel opened to discussion. Following are several highlights:

One speaker observed that, Beyond the 100 or so largest, most local governments don’t have the infrastructure to design, or the fiscal or technical capacity to implement, such a sophisticated system. Most localities don’t know the data they have or the ability to think through the issues involved in designing a system. They tend to just turn to the private sector vendors for an off-the-shelf application.
- Avery responded that it is important for local governments to be involved in the design phase, and articulate the intended uses of AI. For example, AI has been used to improve the efficiency of policing in Los Angeles via predictive models, but the focus on efficiency may have reinforced existing biases. However, if the locality’s intended use is to improve fairness, then that becomes an important design element. He suggests enlisting academics in the design and modeling phase, in order to help identify potential bias at each step in the decision-modeling process.
Other cautionary comments and queries also focused on the role of the private sector:
- The localities won’t own the algorithm. Who owns the data?
- Can they rely on a vendor-own proprietary model, where the locality isn’t involved in defining the decision elements?
- How to localities protect themselves?
- Avery noted that one approach to address these kinds of issues might be to have a contract clause that requires a vendor to bring in an independent, third-party to validate the model being used in the locality.
It is important to involve a wide range of stakeholders in the design process. For example, in the case of modeling the reduction of criminal charges, they involved prosecutors, police officers, and members of the local community in defining the decision process. This is then used by the technical team to define the relative weights of different elements in the decision process, and explaining what the algorithm does vs. what the intended decision process expects it to do.
There was enthusiasm about the model’s potential. For example, one attendee observed that this approach could be used to help inform juvenile justice decisions and student disciplinary actions.