UC Berkeley’s Data Science Undergraduate Studies program continues to cultivate the growth of data science education at colleges and universities across the country. Led by a dedicated adoption team at the Division of Computing, Data Science, and Society, many institutions have been able to emulate and adapt Berkeley's pedagogy in establishing local programs.

One of the key parts of this effort is hosting an annual workshop for instructors. Between June 14-18, more than 400 data science instructors met virtually for Berkeley’s fourth National Workshop on Data Science Education to share success stories, offer advice to each other, and discuss how to make the programs even more relevant for students. The online event was coordinated by the Division of Computing, Data Science, and Society and sponsored by Microsoft and the West Big Data Innovation Hub.

The first two days showcased curricular materials and both the theoretical and technical underpinnings of offerings at UC Berkeley. The following three days consisted of panel discussions composed of professors and teachers from across the country on a variety of topics.

This year’s program also included panel discussions on several new topics: growing momentum to introduce data science at the high school level, and reframing the teaching of math for students interested in data science. Another panel session that was new this year looked at how schools are partnering with local government and non-profit agencies to give students real-world, hands-on experience in using data science to tackle local problems. 

Bringing data science to K-12

Zarek Drozda, a data science fellow at the U.S. Department of Education, opened the session on high school instruction by noting that the data science momentum in higher education has been building for five years, due in large part to the efforts at Berkeley, but it’s just starting to ratchet up in the K-12 grades.

Jo Boaler, a professor of education at Stanford University and former Marie Curie Professor of Mathematics Education in England, is currently co-leading a K-12 data science initiative at Stanford called Youcubed.

“We have been teaching the content of high school courses since 1892 and the mathematics in high school has really not changed since then, until now. We’re actually seeing changes in high school mathematics,” said Boaler, who is one of five writers of a new math framework for the state of California. “We’re centralizing data science across the grades, not only in high school.”

This effort was hugely helped by the UC and Cal State University systems agreeing that data science and statistics would be accepted as alternatives to algebra 2 for entrance to the schools, Boaler said. She credited fellow panelist Rob Gould with helping drive that change.

“This is amazing news partly because we’ve really only had one mathematics pathway for generations (calculus) that has dominated admissions, and that pathway is deeply inequitable,” Boaler said, adding that in order to meet the current high school calculus requirement, students need to be on the advanced math track by middle school, a decision schools make by sixth grade based on fourth-grade test scores. 

“Data science can be rolled out as an equitable option for students and they don’t have to be advanced in middle school or do well in algebra or geometry,” Boaler said. “It lends itself to a project-based approach, which we know engages many more students.”

She cited the 2021 COMAP Mathematical Contest in Modeling/ Interdisciplinary Contest in Modeling, in which undergraduate students in teams of three apply mathematics to model, develop, and communicate a solution to a real-world problem. In the Feb. 4-6 contest, 47 percent of the participants and winners were young women, compared to about 5 percent in most math competitions. Importantly, 67 percent of the participants said the experience changed their thinking about future pathways.

At the North Carolina School of Science and Mathematics, high school math teacher Taylor Gibson said that more students are trying to get into data science and statistics, saying they don’t want to take calculus. Gibson attended the first Berkeley workshop, where he learned about Data 8 and thought “I bet we could pull that off.”

The school is publicly funded through the University of North Carolina for students in grades 11 and 12. In Durham, 680 students live on the campus and another 400 attend online courses. In the fall, 300 more students will live at a Morganville campus emphasizing data science.

“Data 8 was the right choice for us,” Gibson said because the data science program presented a clean slate. “We get to define the culture,” he said. “We give away as much as we can,” referring to the publicly available tools, books, assessments, and more. Best of all, he said, it doesn’t take a lot of money to get started. Sixty-eight students took the inaugural Foundations of Data Science class in the past year and 21 incoming students have requested it for next year. The school also offers a more advanced course based on Berkeley’s Data 100 class. 

The main challenge, which was echoed across several workshop panels, is finding teachers who can teach data science, Gibson said. 

Gould, who helped convince the UC and CSU systems to change their math requirements, described the National Science Foundation-funded Introduction to Data Science program. Developed by the UCLA Department of Statistics in 2012, with contributions from the school’s Department of Computer Science and School of Education and Information Science, in collaboration with the Los Angeles Unified School District.

The program’s freely available materials were released in 2014, about the same time California implemented the Common Core curriculum, which gave schools more flexibility. Gould said that the goal was not about prepping students for college or to become data scientists, but to help them develop statistical and computational thinking, both as a civic responsibility and right. “We need stronger data acumen for everyone,” Gould said. Currently, 15 school districts are using the program, as are a few state departments of education, reaching about 15,000 students.

Gould offered some lessons learned:

  • Build on known research about how students learn about data, which goes back about 30 years.

  • Emphasize the process, not the tools. It’s about problem-solving, thinking about thinking.

  • Students should become fundamentally aware of distortion when viewing any data.

  • Data collection should use different methods and data types, with students acting as human sensors to gather data about processes around them.

  • Develop coding skills.

  • Collaborative, student-centered learning works better than lectures; let them investigate issues important to them.

  • Provide professional development for teachers who don’t know the content. In particular, student-driven discussions can quickly grow personal and/or political, unlike in most math classes, and teachers need to be prepared to handle these situations without alienating students. 

Watch the High School Data Science panel discussion.

A vision for math in Data Science

A workshop panel on June 19 dove more deeply into how math courses can better align with data science instruction. The rise of data science as a field of study has led to a re-evaluation of math pathways, stemming from a need for students to have practical knowledge of linear algebra. The panelists agreed that data science has an opportunity to replace proof-based teaching of data science with practical project-based examples that motivate students to learn and understand the underlying math.    

Tim Chartier, a professor of mathematics at Davidson College in North Carolina, is well known for his work in Bracketology, the art and science of predicting which team will win the men’s and women’s NCAA March Madness tournaments. He uses the tournaments to help students understand analytics, showing them that understanding data science can lead to more accurate predictions. When he first gave this as a homework assignment in 2011, his students did better than 99.9 percent of the 4 million predictions. Most recently, a student with little interest in basketball best predicted the outcome, winning ice cream from the local Ben & Jerry’s as her reward.

“Heavy basketball fans actually don’t do as well because they try to overfit the data to their opinions, which is usually not a good idea,” Chartier said.

Students also work on projects relevant to their community, such as mapping the effects of gentrification on marginalized neighborhoods along a main road in Charlotte or helping distribute food grown on the college’s farm.

Chartier said he encourages students to think about data in and out of the classroom as it helps them develop a good mindset toward data. That way when they are asked to study something, even if they don’t know how to go about it, they have the tools and confidence to start thinking about the solution.

At Grand Valley State University in western Michigan, math professor David Austin said the school offers a data science minor with no math class, but instead requires a computer science or statistics course. Very few students are using calculus, favoring linear algebra instead. The school offers a two-course linear algebra sequence, with more emphasis on data science in the second course. The university has also reached out to a local high school and a number of students are dual-enrolled in both schools and “really thriving,” Austin said. “It’s a second pathway into university math.”

The minor requires a final capstone project, in which student teams work with community partners using real data, cleaning it and visualizing the findings. Projects include housing affordability and lead abatement in historically disadvantaged areas in nearby Grand Rapids.

“They really see that data is about people,” Austin said. “They need to communicate their findings to a non-technical audience in a way that gives them confidence in the findings.”

Austin said the goal is to help students see the value of math in the story, then use their training to help create models and better understand the data.

“We really want students to understand that behind everything we’re doing there is some mathematics, whether it’s a predictive algorithm or a data visualization,” said statistics professor Jo Hardin from Pomona College in southern California. “Data science without math doesn’t exist.”

A firm grounding in math will help students understand what’s going on in the machine learning algorithms, that they are “not some kind of magical thing,” Hardin said. This is important to help them build models and produce reproducible results. Data scientists should be able to think “fluently, if not deeply” about the math behind a model, said Hardin, who co-authored the paper “Ensuring that Mathematics Is Relevant In a World of Data Science.”

At Berkeley, Gireeja Ranade, an Assistant Teaching Professor in the Department of Electrical Engineering and Computer Science, redesigned the curriculum to introduce linear algebra in the first year. She introduced key concepts by asking students what makes the Shazam app and the GPS on their phones work. What about building a touch screen? Other prompts include getting Netflix recommendations, ranking websites, and training deep learning neural networks?

“Math isn’t just some abstract equation, but is connected to all of these things they can tangibly think about and interact with,” Ranade said. “We try to show them through homework problems how they can do all of these things using the math concepts we’re teaching them.”

In fact, the two-semester sequence Ranade developed -- EECS 16A and 16B: Designing Information Devices and Systems I and II --  uses hands-on labs to give students a “tangible feel of abstract concepts. In the first course, they learn about systems by building a single-pixel camera and then learn about design by building their own touch screen and seeing if it can detect touches. As an introduction to machine learning, they create their own acoustic GPS from scratch, which leads to “a lot of cool ‘Aha!’ moments,” Ranade said.

“What we’re trying to show them is the full pipeline of how a data scientist might think, that first, you have to collect data from the real work, then you make a model and process it,” she said, “And then you close the loop by taking that information and taking an action in the real world.”

Watch the Math for Data Science session.