Workshop focuses on adding human contexts to data science classes

At the National Data Science Education Workshop hosted virtually June 14-18 by UC Berkeley, many sessions focused on getting data science classes into universities and getting students into those classes. But one discussion within the larger program looked at helping teachers introduce human contexts into class instruction, with an eye toward using data science to advance social justice.

The June 15 workshop on “Towards Social Justice in the Data Science Classroom” was organized by Berkeley’s Human Contexts and Ethics Program (HCE), which is part of Berkeley’s Division of Computing, Data Science, and Society (CDSS). The session drew 75 registrants from colleges and universities across the U.S., as well as from Canada, Germany, Austria, India, and China.

Embedding human contexts and the social good is at the heart of Berkeley’s Data Science major, said Margo Boenig-Liptsin, director of the HCE Program.

“There’s a sense that by studying data science and by practicing data science each in their own way, students should be able to learn about and contribute to a more just world,” Boenig-Liptsin said. “In order for that to happen, what should we be teaching and how should we be teaching it?”

Making a better world

One of the first lessons HCE wants to instill in Berkeley students is that when they are developing an algorithm, building a technical device, or creating an argument with data analysis, they are engaged in remaking and reshaping the broader social world and not just making a single tool for one purpose. This perspective recognizes that a just society is not only one in which benefits and harms are distributed in an equitable way, but where all people are represented and feel that they belong and can shape it according to their visions.

“A world-making perspective invites us to ask about what kind of world is built. What sort of entities exists in it? What kinds of relationships, what kinds of practices are there?” Boenig-Liptsin said. “This perspective is in contrast to a more cost-benefit kind of evaluation of technology.”

It’s important, she added, to think about the world shaped by technology from the perspective of people who do not fit in but who can’t simply walk away. What can be done to make sure they are not only included but able to help shape the world according to their designs?

Pivoting to pedagogy

To frame that discussion, Hani Gomez, who recently earned her Ph.D. in electrical engineering from Berkeley, spoke about the tenets of socially just pedagogy. Although her Ph.D. work focused on building microrobots and working in clean rooms, Gomez said she is now pivoting toward education and social justice. Along with Boenig-Liptsin, Gomez is on the advisory board for a new anti-racism class offered by Berkeley’s Department of Electrical Engineering and Computer Science (EECS).

Although technology is often seen as a solution, Gomez advised attendees to avoid the idea of “tech solutionism” and remember that it can perpetuate social issues and has a long history of being used for oppression. During World War II, the Nazis used census data cards and card readers made by IBM to identify Jews throughout Europe. In the United States, census data was also used to facilitate the forced displacement and incarceration of Japanese-Americans .

Nor is technology morally neutral, as many think, Gomez said. Decisions made in the design of new technologies can overlook biases or transfer them. As an illustration, Gomez showed a video of an automatic soap dispenser that used light to detect a hand before squirting out a dollop of soap. The machine worked for white users, but it wasn’t calibrated to detect a Black person’s hand and didn’t dispense soap until the person took a white paper towel. Even if the designer did not realize this would happen, the user still experienced racism, Gomez said.

“This is an example of how internal biases can be embedded in technology,” Gomez said.

Anti-racist pedagogy is an ongoing process of learning that evolves beyond the classroom, she said, calling it a “mutual learning process between all partners” who can share their knowledge and personal experiences. It’s not just about covering content related to race and racism but about implementing an anti-racist approach to learning, Gomez said, adding that faculty have the power and the position to make positive changes in the classroom.

Faculty also have to be careful not to assume that all students have access to the same resources and should be willing to learn from their students. Gomez said the most valuable lesson she has learned is the ability to say, “I’m sorry, I didn’t know better” when a student calls her out on a comment.

Four lenses

Boenig-Liptsin offered teachers four different “lenses,” or core social science concepts, to help students understand the social and ethical contexts that are important for their work, as well as help teachers, shape their curricula.

The first is positionality, being aware of how one’s identity, expertise and power affect one's perspectives, aspirations and actions. This can also affect how a person approaches problems, develops solutions and implements them. It’s important to think about who we are, as well as how others see us.

Power is the second lens; the asymmetric power of a person or technology to alter the behavior of others. As the saying goes, knowledge is power, which could be in the form of insights gleaned through data science. But so is the act of producing knowledge, and it’s important to teach students to think critically about power. “We really come to be who we are through power relationships,” Boenig-Liptsin said.

The third lens is sociotechnical systems, or organizations in which the action of people and technologies are intertwined. These systems distribute risks, responsibilities, and opportunities widely and unevenly, which poses unique challenges for governance. The consideration of systems gives students a larger view of layered structures and can encourage them to think about their position as data scientists in that structure.

The final lens is narratives; data science can shape broader narratives in society, which people use to explain who they are and how the world works, as well as what needs to be done and what futures are possible. Having students wrestle with data science’s relationship with narrative is a good way to teach about social justice in classrooms, Boenig-Liptsin said.

Using the lenses to focus on injustice

To show how these lenses could be put into action, Maria Smith, a fourth-year Ph.D. student in sociology who is a National Science Foundation Graduate Research Fellowship and a graduate student researcher for HCE, presented a case study from Berkeley’s Data 100 class. The case study focuses on the role of data science in uncovering -- and responding to -- corruption and racial discrimination in assessing the value of homes in Cook County, Illinois. An investigation by Chicago Tribune found that the county assessor’s office failed to accurately assess home values for many years, which led to a regressive tax system that harmed the poor and helped the rich. The investigation also found that the process for appealing tax decisions favored wealthier homeowners

Once the problem was identified -- homeowners in predominantly Black neighborhoods were responsible for a greater share of the county’s tax burden -- a solution became apparent. The new Cook County assessor replaced the old model with a new machine learning-based model supplemented with additional data and created an open data initiative making the data freely accessible on Github. The new model significantly increased the accuracy of predicted home values, as compared to market values.

For the exercise, Data 100 students consider the relationship between a model, the sociotechnical system -- including legal and racial components -- in which the model was developed, and the historical and institutional contexts in which it works. Students examine the relationships between the accuracy of the model and objectives of fairness and reflect upon the opportunities and limits of data science expertise in making a fair assessment system. Smith offered the following questions that students learn to ask and answer by working through the case, matched to the four lenses:

Positionality: Who are the stakeholders? Who does the work benefit? Do the people working on the solution have the relevant expertise? Are others who could contribute being overlooked?
Power: Who decides what is fair? What do people need to see or know to verify that the system is fair? Who has access to the data? Who benefits from restricting access to the data?
Sociotechnical systems: How does history and the opportunity to use technology as part of the solution frame the problem that data science is brought on to solve? What role did “redlining” and risk assessment play in creating the problem and developing the solution? Who trained the machine learning models?
Narratives: What concepts of justice informed the data collection and model? What classification categories are used in the data? How does your interpretation or bias influence the story being told?

These questions can be applied to other real-world cases students may encounter in their coursework and beyond. By preparing to think proactively about questions of positionality, power, sociotechnical systems, and narratives and equipping students with a historical understanding of their work, the HCE theory of change posits that the next generation of data scientists will not just identify issues of social justice in data and computing, but be able to engage these tools responsibly in making a more just society.

In concluding, Smith offered a checklist for instructors looking to introduce social justice into their classes:

Consider a case of inequity or injustice.
Invite perspectives from marginalized groups or social science domains to help.
Think through the stakeholders, historical context, and harmful implications.
Create problem sets that enable students to practice data cleaning, statistical modeling and visualization, and communicating their results to their audiences.
Encourage constant engagement with positionality, power, sociotechnical systems, and narratives throughout the data science workflow, class discussions, and homework.