Data scientists don’t just work in the field of data science. As big data tools permeate more and more of our social structure, they can also play a valuable role in communicating about how technological systems work and what they mean for the public. The UC Berkeley course Digital Accountability: Exploring Section 230 provided an opportunity for data science students and others to do just that.
This spring, students from Berkeley’s School of Journalism, School of Information and other professional graduate programs reported on Section 230, a federal law enacted in 1996 ensuring social media platforms are not legally responsible for content published by others on their platforms. They created podcasts on how this statute has impacted the modern-day spread of misinformation and potential ways to solve that crisis.
We spoke with Ian Castro, a second-year UC Berkeley School of Information graduate student who participated in this class. Castro spoke about the narrative he reported, how his data science background affected his approach to storytelling and what he learned from the experience about data science’s role in journalism. This interview has been edited for length and clarity.
Q: You’re a graduate student in the I School’s Information Management and Systems program. How did you decide you wanted to earn a master’s degree in data science?
A: When I first came to Berkeley, data science was just becoming a thing. By the time the major actually existed, I was probably in my sophomore or junior year. I took Data 8, CS 61A, and Data 100, but still felt like it was too late to major in it. But I always thought data science was an important skill to learn, so I figured better late than never. So I decided to get a School of Information master’s degree.
Q: We’re here to talk about your participation in one class. Can you tell us briefly what that class is?
A: The class is Journalism 276. Essentially, the goal of the class was to help address issues of misinformation and disinformation on the internet. What the professors wanted to do was take these journalists who are interested in technology policy issues and combine them with experts from the other schools, like public policy, law and information. Our job was to use our expertise to help these journalists understand and tell stories about Section 230. It's an incredibly fundamental, yet misunderstood part of our internet.
My story that we spent the semester researching and collecting information on was content moderation for young people and California’s related “Age Appropriate Design Code” bill that was signed into law on Sept. 15. Other people looked at, for example, Donald Trump's social media platform, Truth Social, and all of the failed bills that aimed to reform Section 230. There's a lot of different topics we covered. We had a lot of support from the deans of each of the schools, as well as the professors. We worked with Queena Sook Kim from KQED and Aaron Glantz from Reveal. I got to speak to a lot of experts, too.
Q: Why did you decide to take this class?
A: I'm interested in journalism as a career. And because I’m interested in free speech online, I wanted to take this opportunity to explain this very simple, but also highly complex, law to people who don't really know about it. It also allowed me to see a lot of issues and miscommunication happening around this topic.
Q: How did you find your story?
A: I come from a data science background and my reporting partner, Andrew Lopez, had some experience reviewing online content. We realized that there are a lot of people who just don't know how content moderation works. What's the human’s role in relation to algorithms when looking at these videos that appear on your feed? We wanted to combine both of our experiences and expertise to try to give a good narrative explainer story on this issue.
Q: How did your data science background change how you approached reporting and telling that story?
A: The thing that I found to be difficult is that a lot of the people we talked to who are data scientists or work in the field, they don't know how to communicate these issues of say, ‘How does an algorithmic recommender system work?’ for people outside of the field.
And then, on the other side, many journalists have no idea about some of the technical side, like how you define an algorithm or what a whitelist versus a blacklist is. That's where I ended up coming in – being that person who can kind of bridge between both worlds.
The benefit of the class was just coming to realize that there is actually this big gap that I think exists among a lot of journalists and people working on these systems. It’s a problem that exists because a lot of data science training is highly specialized and inaccessible to most people. So, when problems arise, like election disinformation and vaccine misinformation, there’s a lot of confusion because people don’t understand how these systems work or the limitations that exist.
Q: So your data science expertise helped fill in those knowledge gaps for this project?
A: Yeah, exactly. Even though Section 230 only has 26 words in it, it has such a huge impact on how the internet works and what it actually allows companies to do. It was fundamental to creating the online industry we see today. And even though there are a lot of bad things online now, reforming the law might unintentionally stifle free speech and innovation. The role I think of a lot of the policy people and data science people and law people was to help explain these more complex topics.
Q: Why is it important to explain the systems and how they work, as opposed to just calling everything an algorithm?
A: One of the benefits of journalism is that you're bringing power back to the reader – to people who don't have the training or information that we, as people with a technical college education, have. We’re helping people understand how these big social media platforms work and how they contribute to larger societal issues like political polarization. Having that information lets people make better decisions and, hopefully, change things for the better.
Q: What do you believe is the role of data science in storytelling?
A: I think it depends on the role of the data scientists. For this class, as one of the few people who have a data science background, the goal was just to explain these more technical sides that people don't really see at all and make sure that we communicate that in a fair, easy to understand and accurate way.
The other side, though, is that data scientists can also work as data journalists, taking these large datasets and large sets of information and communicating them in easy-to-understand ways through visualizations or maps. For example, with housing, being able to prove that there are disparities or issues in a community with numbers helps to make an investigative story a lot stronger.
Q: Has participating in this class affected how you would approach a data science project that might not be journalism?
A: In graduate school, a lot of our professors push the fact that as a data scientist, you can do all the math. You can make all the models. You can do all this analysis. You can have a great finding. But if you can't convince someone that this is important, your findings won’t go anywhere.
Having majored in media studies in undergrad and working as a teacher, I've always been interested in communication. But I think this class really highlighted for me the disparity between non-technologists and technologists, and it emphasized the importance of being able to explain these issues to people who don't have a technical background.
Q: Is there anything you would like to clarify or emphasize or add?
A: One thing: In the class when we were first introducing ourselves, I realized I was the only technical person in the room. Everyone was either a journalist, lawyer or policy expert. It showed me there is a large need for data scientists who are interested in social good and willing to do this work in bridging gaps to provide technical expertise.
For people who are data science majors – undergraduates or people considering the major – you should still do data science. But you should think about other ways you can apply it, especially to fields where it could have a better impact.