Berkeley Unboxing Data Science (BUDS): Engaging High Schoolers in Data Science Research

BUDS | UC Berkeley Unboxing Data Science Internship

October 9, 2020

Berkeley Unboxing Data Science (BUDS) is a new Computing, Data Science, and Society (CDSS) summer program that immerses high school students in the world of data science and research. It aims to teach students not only how to perform data analysis, but also how to critically examine the data and technology in their daily lives. “It created a space for students of underrepresented minorities to learn about research and data science,” Ashley Quiterio, the program coordinator of BUDS, explains. “Data is so important in these times, and we helped them develop a critical eye for the data that is being collected around them.”

The 5-week program ran from June 22 to July 23 and met four days a week from 12 pm to 5 pm. The students learned how to use the Python data science library, worked on projects that used real-world data, and attended presentations by data science undergraduates and professors. By the end of five weeks, BUDS refined the students’ understanding of the computational approaches, applications, and all the possibilities of data science.  

Daily Routine

The 12 high school students in the program were split into teams of four. Each group was paired with a data science undergraduate, who acted as the team lead, and a CalTeach student, who specializes in effective teaching strategies.

Interacting with my interns was my favorite part of the day; seeing those light bulbs light up and their excitement trying a new python skill was just amazing,” says Maryo Felfel, a CalTeach Co-Teacher.

The high school students spent a large portion of the day within their teams, though there were also ample inter-team activities and program-wide events. At the end of each day, students documented their reflections—what they learned, their progress, any questions they have, etc.—in a scientific journal. 

One major focus of BUDS was on the context and consequences of data science. Every Monday and Wednesday, students discussed a current event, such as COVID-19, and the ethics behind how data science is being applied. On Tuesdays and Thursdays, BUDS invited undergraduates to talk about their college experience and professors to explain their current research. 

Python and Projects in Real-World Contexts 

During the first two weeks, students were introduced to the computational basics of data science. They learned Python through Jupyter notebooks and were given quizzes and worksheets to solidify the coding concepts. “The focus was on using code as a tool to answer questions,” Quiterio says. “A large part of this is teaching the students about data visualizations: how to read, understand, and create them.” The students also learned how to manipulate tables (e.g., filtering, grouping). 

In the third week, the students applied what they learned to their first project. They implemented the data science lifecycle (formulating a question, data acquisition, exploratory data analysis, and drawing inferences) as they conducted background research on confirmed COVID-19 cases, used Jupyter Notebook for data analysis, created visualizations comparing growth rates across states, and presented their results and conclusions at the end of the week. The project helped tie together the concepts taught in the previous weeks, as noted by a BUDS student: “When I worked on the first project, I saw what we were learning come together. I could see from our results that I really understand what we were doing and the story we were trying to tell so I felt confident to present and share my ideas.”  

The fourth week was data justice week, where each day featured an environmental or social justice topic. The instructors gave context to the issue and then walked students through a modified Jupyter notebook to show, for example, how to use the data to find the correlation between asthma and pollution. Other topics included mapping toxic releases and disadvantaged communities, the use of force in Minneapolis policing, and the mass incarceration in California prisons. The environmental justice content personally resonated with a BUDS student, who says, “When we were discussing how big companies set up in poorer neighborhoods, I realized that we were talking about my neighborhood and it opened my eyes to neighborhood issues." 

The data justice week encouraged students to consider the implications of data science on different communities. This analytical ability came into play in the fifth and final week, where the student teams worked on their own projects. One group looked at how the top songs on Spotify changed over time, another group investigated patterns in President Trump’s tweets, and the third group studied racial bias in soccer. The final project represented the culmination of what they learned: how to analyze data, create visualizations, draw conclusions, and present their project⁠—all while considering the context and implications of their work. 


By the end of the program, students gained proficiency in coding and learned how to conduct and present data science research. Additionally, the current event discussions, guest presentations, and data justice week helped students understand the importance of context in data analysis and how some uses of data science can negatively affect marginalized communities. Karla Palos Castellanos, a BUDS team lead, was impressed by the students’ performance.

“For many of them, it was their first time learning concepts of data science or coding, yet every single day they showed up ready to learn,” she commented. “By the end of the program, I was amazed by their progress and ability to grasp onto big ideas even in a very fast-paced environment.” 

BUDS also built a community where high school students could easily connect with others and collaboratively analyze data. According to Karla, the community was the most exciting part of BUDS: “This program was much more than data science, it was a place where we could talk about topics including ethics and data justice, career and professional development, impostor syndrome and sense of belonging. Together, we discovered, we taught, we learned, we grew, we cried, and we laughed.” 

Most of all, BUDS empowered high school students to see themselves as data scientists. “We worked on redefining what it means to be a researcher or a data scientist, and encouraged the students to reflect on their perceptions of identity,” Quiterio says. “They are all data scientists, but it was difficult for them to believe in themselves.” Now, after completing BUDS, the students are armed with the tools to critically analyze data in context⁠—the very foundation of what it means to be a data scientist.