Modules-- Bringing Data to every classroom

Data Science Modules in MCB-32 Human Biology.

Check out featured data science modules by clicking the logo.

February 15, 2021

Although Data Science only just became an official Berkeley major in Fall 2018⁠, its versatility and applicability in almost all domains have led to its rapid expansion across campus. At Berkeley, data science can even be found in traditionally non-technological, liberal arts classes… in the form of  data science modules

Data science modules are created by the module development team (with the input of the course instructor)to guide students through the exploration and analysis of a dataset relevant to their course. The modules are based on Jupyter Notebooks, which are fantastic for designing user-friendly walkthroughs of the material. Hosted on the Data Hub, the Jupyter Notebooks require no prior set-up, support Python code, and allow students to run blocks of code separately, which helps even the least-experienced student to easily follow along. 

The first modules, which date back to Spring 2017, actually preceded the data science major. They started with early efforts in text analysis for humanities classes and have since kept a long-standing collaboration with the American Cultures and Social Justice curriculum. Currently, the Modules team is expanding into the social and life sciences: there is a pilot data hub with biology tools as well as a recent connector course on Statistical Genomics. The Modules team also has ongoing projects with quantitative social science classes in fields such as Sociology and Economics.

This semester, Berkeley classes across a variety of fields⁠—biology, econometrics, cognitive science, political science, public health, etc.⁠—have incorporated data science modules into their curriculum. In a module for Introductory Applied Econometrics, students conduct a series of calculations to estimate the linear relationship between GDP per capita and agricultural labor productivity. Introduction to Empirical Analysis and Quantitative Methods, a required course for political science students, includes modules that use data on the 2018 midterm election to teach about sampling and confidence intervals

In Biology 1B (General Biology), the class“used Jupyter Notebook to gather information on Berkeley’s Strawberry Creek and to run randomization tests on large sample sizes,” says Juliana Hartley, a student currently enrolled in the course. “The Strawberry Creek module was useful because I could analyze data that was larger than the sample I collected… [which] helped me understand the true diversity of Strawberry Creek.” According to Juliana, everything was already coded and students simply had to run the cells to analyze their data. 

William McEachen, one of the student leads of the modules development team and a double major in Data Science and Political Science, hopes that the modules will help students of non-technical backgrounds realize that anyone has the ability to grasp quantitative materials. The team continuously maintains and improves existing modules⁠ to optimize students’ educational experience—adjusting the length of the Notebooks and taking care to balance their technical complexity in a way that sufficiently challenges students without discouraging them. Currently, McEachen is working with the quantitative political science classes to update their assignments, as well as helping to build  a political science connector course. 

We are currently trying a new collaboration with the Collegium, a grant program where faculty can propose ways to bring research methods into their courses,” says Eric Van Dusen, Interim Director of Data Science Undergraduate Studies. Their first Collegium project involves teams of data science students working with public health students on infodemiology, an emerging field of research that studies health-related information on the Internet. 

The Modules team is excited about the growth of the modules both across different Berkeley majors and across different universities. So far 55 Berkeley courses have incorporated data science modules, exposing thousands of students to data science concepts and demonstrating to them how data analysis can reveal important insights in the course’s field. There are many areas with potential integration for data science, and modules are an excellent self-contained, ready-to-use method of bridging fields for students at all levels of data expertise.