Why the Responsible Computer Science Challenge Matters to You

A few weeks ago, a group of organizations (Omidyar Network, Mozilla, Schmidt Futures, and Craig Newmark Philanthropies) announced the winners of the “Responsible Computer Science Challenge.”   What is this challenge? It’s an initiative to integrate content about ethics and responsibility into undergraduate computer science curricula and pedagogy in U.S. colleges and universities – clearly a timely and important topic.

But in most cases, you as a reader of this CSTA blog are involved with K-12, not university, computer science education.   So why should the Responsible Computer Science Challenge matter to you? Three reasons:

  1. The materials produced will be designed for incorporation into technical computer science courses, including at the introductory level.   As the topic of ethics and social responsibility in computing has become more prevalent in university computer science education, initially much of that has been in standalone courses.   There’s nothing wrong with that – I sure hope not, I’m introducing such a course at my university next semester! But ethics will become far more fundamental to the mindset of computer scientists if it is an integral part of core, technical computer science classes, and this is exactly the approach that the Responsible Computer Science Challenge takes.   Many of the successful proposals address introductory classes. As such, they should produce materials and approaches that are relevant and helpful in K-12 computer science education as well.
  1. The outputs from the funded projects will be openly available.   A fundamental feature of the Responsible Computer Science Challenge has been the production of openly available materials, such as syllabi or class activities.   This will be done either by making these materials available online without restrictions, or where a license is involved, through use of a Creative Commons license.  Thus, a rich library of materials for K-12 and university educators to consider will be produced within the next 1-2 years.
  1. The quality is likely to be very high.   The competition in this challenge was stiff and the 17 award winners are a broad set with very high-quality experience and plans.   The breadth cuts across many dimensions: types of universities (community college, undergraduate colleges, research universities, public and private, small and large); types of curricular and classroom approaches (e.g. ethics exercises and assignments, role playing games, case studies); and the courses to be targeted (including introductory programming, algorithm design, AI, data science, cybersecurity and more).   

For more information on the Responsible Computer Science Challenge and the 17 award winners, see https://foundation.mozilla.org/en/initiatives/responsible-cs/challenge/ and https://blog.mozilla.org/blog/2019/04/30/2-4-million-in-prizes-for-schools-teaching-ethics-alongside-computer-science/.  

By Bobby Schnabel, Partner Representative

AI is automated decision-making, and it accelerates century-old algorithmic methods

Abstract: Artificial intelligence (AI) is automated decision-making, and it builds on quantitative methods which have been pervasive in our society for at least a hundred years. This essay reviews the historical record of quantitative and automated decision-making in three areas of our lives: access to consumer financial credit, sentencing and parole guidelines, and college admissions. In all cases, so-called “scientific” or “empirical” approaches have been in use for decades or longer. Only in recent years have we as a society recognized that these “objective” approaches reinforce and perpetuate injustices from the past into the future. Use of AI poses new challenges, but we now have new cultural and technical tools to combat old ways of thinking.

Introduction

Recently, concerns about the use of Artificial Intelligence (AI) have taken center stage. Many are worried about the impact of AI on our society.

AI is the subject of much science fiction and fantasy, but simply put, AI is automated decision-making. A bunch of inputs go into an AI system, and the AI algorithm declares an answer, judgment, or result.

This seems new, but quantitative and automated decision-making has been part of our culture for a long time—100 years, or more. While it may seem surprising now, the original intent in many cases was to eliminate human bias and create opportunities for disenfranchised groups. Only recently are we recognizing that these “objective” and “scientific” methods actually result in reinforcing the structural barriers that underrepresented groups actually face.

This essay reviews our history in three areas in which automated decision-making has been pervasive for many years: decisions for awarding consumer credit, recommendations for sentencing or parole in criminal cases, and college admissions decisions.

Consumer credit

The Equal Credit Opportunity Act, passed by the U.S. Congress in 1974, made it unlawful for any creditor to discriminate against any applicant on the basis of “race, color, religion, national origin, sex, marital status, or age” (ECOA 1974).

As described by Capon (1982), “The federal legislation was directed largely at abuses in judgmental methods of granting credit. However, at that time judgmental methods that involved the exercise of individual judgment by a credit officer on a case-by-case basis were increasingly being replaced by a new methodology, credit scoring.”

As recounted by Capon, credit scoring systems were first introduced in the 1930s to extend credit to customers as part of the burgeoning mail order industry. With the availability of computers in the 1960s, these quantitative approaches accelerated. The “credit scoring systems” used anywhere from 50 to 300 “predictor characteristics,” including features such as the applicant’s zip code of residence, status as a homeowner or renter, length of time at present address, occupation, and duration of employment. The features were processed using state-of-the-art statistical techniques to optimize their predictive power, and make go/no-go decisions on offering credit.

As Capon explains, in the years immediately after passage of the ECOA, creditors successfully argued to Congress that “adherence to the law would be improved” if these credit scoring systems were used. They contended that “credit decisions in judgmental systems were subject to arbitrary and capricious decisions” whereas decisions made with a credit scoring system were “objective and free from such problems.”

As a result, Congress amended the law with “Regulation B” which allowed the use of credit scoring systems on the condition that they were they were “statistically sound and empirically derived.”

This endorsed companies’ existing use of actuarial practices to indicate which predictor characteristics had predictive power in determining credit risk. Per Capon: “For example, although age is a proscribed characteristic under the Act, if the system is statistically sound and empirically derived, it can be used as a predictive characteristic.” Similarly, zip code, a strong proxy for race and ethnicity, could also be used in credit scoring systems.

In essence, the law of the United States ratified the use of credit scoring algorithms that discriminated, so long as the as the algorithms were “empirically derived and statistically sound”—subverting the original intent of the 1974 ECOA law. You can read the details yourself—it does actually say this (ECOA Regulation B, Part 1002, 1977).

Of course, denying credit, or offering only expensive credit, to groups that historically have had trouble obtaining credit is a sure way to propagate the past into the future.

Recommendations for sentencing and parole

In a deeply troubling, in-depth analysis, ProPublica, an investigative research organization, showed how a commercial and proprietary software system is being used to make parole recommendations to judges for persons who have been arrested is biased (Angwin et al., 2016).

As ProPublica reported, even though a person’s race/ethnicity is not part of the inputs provided to the software, the commercial software (called COMPAS, as part of the Northpointe suite)  is more likely to predict a high risk of recidivism for black people. In a less well-publicized finding, their work also found that COMPAS was more likely to over-predict recidivism for women than men.

What was not evident in the press surrounding the ProPublica’s work is that the US has been using standardized algorithms to make predictions on recidivism for nearly a century. According to Frank (1970), an early and classic work is a 1931 study by G. B. Vold, which “isolated those factors whose presence or absence defined a group of releasees with a high (or low) recidivism rate.”

Contemporary instruments include the Post Conviction Risk Assessment, which is “a scientifically based instrument developed by the Administrative Office of the U.S. Courts to improve the effectiveness and efficiency of post-conviction supervision” (PCRA, 2018); the Level of Service (LS) scales, which “have become the most frequently used risk assessment tools on the planet” (Olver et al., 2013); and Static-99, “the most commonly used risk tool with adult sexual offenders” (Hanson and Morton-Bourgon, 2009).

These instruments have undergone substantial and ongoing research and development, with their efficacy and limitations studied and reported upon in the research literature, and it is profoundly disturbing that commercial software that is closed, proprietary, and not based on peer-reviewed studies is now in widespread use.

It is important to note that Equivant, the company behind COMPAS, published a technical rebuttal of ProPublica’s findings, raising issues with their assumptions and methodology. According to their report, “We strongly reject the conclusion that the COMPAS risk scales are racially biased against blacks” (Dieterich et al., 2016).

Wherever the truth may lie, the fact that the COMPAS software is closed source prevents an unbiased review, and this is a problem.

College admissions decisions

At nearly one hundred years old, the SAT exam (originally known as the “Scholastic Aptitude Test”) is a de facto national exam in the United States used for college admission decisions. In short, it “automates” some (or much) of the college admissions process.

What is less well-known is that the original developers of the exam intended it to “level the playing field”:

When the test was introduced in 1926, proponents maintained that requiring the exam would level the playing field and reduce the importance of social origins for access to college. Its creators saw it as a tool for elite colleges such as Harvard to use in selecting deserving students, regardless of ascribed characteristics and family background (Buchmann et al., 2010).

Of course, we all know what happened. Families with access to financial resources hired tutors to prep their children for the SAT, and whole industry of test prep centers was born. The College Board (publisher of the SAT) responded in 1990 by renaming the test to be the Scholastic Assessment Test, reflecting the growing consensus that “aptitude” is not innate, but something that can be developed with practice. Now, the test is simply called the SAT—a change which the New York Times reported on with the headline “Insisting it’s nothing” (Applebome, 1997).

Meanwhile, contemporary research continues to demonstrate that children’s SAT scores correlate tightly with their parent’s socioeconomic status and education levels (“These four charts show how the SAT favors rich, educated families,” Goldfarb, 2014).

The good news is that many universities now allow students to apply for admission as “test-optional”; that is, without needing to submit SAT scores or those from similar standardized tests. Students are evaluated using other metrics, like high school GPA, and a portfolio of their accomplishments. This approach allows universities to admit a more diverse set of students while evaluating they are academically qualified and college-ready.

What are the takeaways?

There are three main lessons here:

1. Automated decision-making has been part of our society for a long time, under the guise of it being a “scientific” and “empirical” method that produces “rational” decisions.

It’s only recently that we are recognizing that this approach does not produce fair outcomes. Quite to the contrary: these approaches perpetuate historical inequities.

2. Thus today’s use of AI is a natural evolution of our cultural proclivities to believe that actuarial systems are inherently fair. But there are differences: (a) AI systems are becoming pervasive in all aspects of decision-making; (b) AI systems use machine learning to evolve their models (decision-making algorithms), and if those decision-making systems are seeded with historical data, the result will necessarily be to reinforce the structural inequities of the past; and (c) many or most AI models are opaque—we can’t see the logic inside of them used to generate decisions.

It’s not that people are intentionally designing AI algorithms to be biased. Instead, it’s a predictable outcome of any model that’s trained on historical data.

3. Now that we are realizing this, we can have an intentional conversation about the impact of automated decision-making. We can create explicit definitions of fairness—ones that don’t blindly extend past injustices into the future.

In general, I am an optimist. Broadly, technology has vastly improved our world and lifted many millions of people out of poverty. Artificial Intelligence is presently being used in many ways that create profound social good. Real-world AI systems perform early, non-invasive detection of cancer, improve crop yields, achieve substantial savings of energy, and many other wonderful things.

There are many initiatives underway to address fairness in AI systems. With continued social pressure, we will develop technologies and and a social contract that together creates the world we want to live in.

Acknowledgments: I am part of the AI4K12 Initiative (ai4k12.org), a joint project of the Association for the Advancement of Artificial Intelligence (AAAI) and the Computer Science Teachers Association (CSTA), and funded by National Science Foundation award DRL-1846073. We are developing guidelines for teaching artificial intelligence in K-12. With my collaborators, I have had many conversations that have contributed to my understanding of this field. I most especially thank David Touretzky, Christina Gardner-McCune, Deborah Seehorn, Irene Lee, and Hal Abelson, and all members of our team. Thank you to Irene and Hal for feedback on a draft of this essay. Any errors in this essay are mine alone.

head shot of Fred Martin, chair of board of directors
Fred Martin, Chair of Board of Directors

References

Applebome, P. (1997). Insisting it’s nothing, creator says SAT, not S.A.T. The New York Times, April 2. Retrieved from https://www.nytimes.com/1997/04/02/us/insisting-it-s-nothing-creator-says-sat-not-sat.html.

Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias. ProPublica, May 23. Retrieved from https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.

Buchmann, C., Condron, D. J., & Roscigno, V. J. (2010). Shadow education, American style: Test preparation, the SAT and college enrollment. Social forces, 89(2), 435–461.

Capon, N. (1982). Credit scoring systems: A critical analysis. Journal of Marketing, 46(2), 82–91.

Datta, A., Tschantz, M. C., & Datta, A. (2015). Automated experiments on ad privacy settings. Proceedings on privacy enhancing technologies, 2015(1), 92–112.

Dieterich, W., Mendoza, C., & Brennan, T. (2016). COMPAS risk scales: Demonstrating accuracy equity and predictive parity. Northpoint Inc. Retrieved from http://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf.

ECOA (1974). Equal Credit Opportunity Act, 15 U.S. Code § 1691. Retrieved from https://www.law.cornell.edu/uscode/text/15/1691.

Frank, C. H. (1970). The prediction of recidivism among young adult offenders by the recidivism-rehabilitation scale and index (Doctoral dissertation, The University of Oklahoma).

Goldfarb, Z. A. (2014). These four charts show how the SAT favors rich, educated families. The Washington Post, March 5. Retrieved from https://www.washingtonpost.com/news/wonk/wp/2014/03/05/these-four-charts-show-how-the-sat-favors-the-rich-educated-families/.

Hanson, R. K., & Morton-Bourgon, K. E. (2009). The accuracy of recidivism risk assessments for sexual offenders: a meta-analysis of 118 prediction studies. Psychological assessment, 21(1), 1.

PCRA (2018). Post Conviction Risk Assessment. Retrieved from https://www.uscourts.gov/services-forms/probation-and-pretrial-services/supervision/post-conviction-risk-assessment.

Call for Input: K-12 Content on Computing, Ethics and Social Responsibility

I’ll start with the punch line: I’m starting to get involved with understanding what innovative approaches are appearing in higher education throughout the world in educating students about the intersection of computing with ethics and social responsibility. I’m sure there are some equally innovative things going on at the K-12/pre-university level. If you’re involved with education in this area, or if you know of interesting work that others are doing, I’d love to hear from you – just email me at bobby@colorado.edu. In subsequent blog posts I will share things that I’ve learned, from you and from the higher education community.

I doubt one needs to say why this topic is important. Once upon a time, computer science was far removed from societal implications. We worked on writing operating systems and compilers – the things that go on inside the computer – or applications in business data processing and scientific computing. When computers impacted society, that impact was fairly far removed from what the computer scientist had worked on directly.

How times have changed! Computing professionals often now work on applications that directly impact the basic fabric of our society. This can be social network software that for many of us has become a dominant form of human interaction; or robotic systems that are or will be used as substitutes for human interaction in eldercare and maybe even childcare; or artificial intelligence systems that are used as the basis for making judgments in situations ranging from loan applications to judicial sentencing; and dozens more that you can readily add to this list.

The implication is that not only do computing professionals need to be taught, as a topic just as fundamental as programming or machine learning, to think in terms of the ethics and social implications of what they do – but that every citizen needs to have this perspective as well as they deal with computing systems that are ubiquitous in our society. Creators of computing systems need to apply high moral and ethical standards to their work and learn to think about the consequences, intended or not, of the systems they create; users need to realize that computing tools may have biases or harmful consequences, and aren’t necessarily perfectly trustworthy just because they come from a “machine”. This means that all students need to be exposed to these perspectives, beginning when they start learning computing in schools. I look forward to learning what you may be doing in this regard or just hearing your thoughts on this topic!

Bobby Schnabel, Partner Representative