Berkeley’s Data Science education program aims at a comprehensive curriculum built from the entry level upward to meet students’ varied needs for data fluency. It includes a diverse constellation of connector courses that allow students to explore real-world issues related to their areas of interest and continues with intermediate and advanced courses that enable them to apply more complex concepts and approaches. The constellation of Berkeley’s data science course offerings is broad and deep and includes a wide range of courses in a variety of departments across campus.

Foundations of Data Science (Data 8)

Foundations of Data Science, or Data 8, is Berkeley’s pathbreaking lower-division course that teaches core computational and statistics concepts while enabling students to work hands-on with real data. It is designed to be accessible to undergraduates of any intended major and does not require prior experience in the field.

Connectors

Connector courses enable students to apply core concepts from the Foundations course to explore real-world issues that relate to students’ areas of interest. Offered by faculty across many departments and fields of study, connectors are optional but highly encouraged and are designed to be taken at the same or after the Foundations course.  They typically (but not always) offer students two units of course credit.

Examples:

Extenders

Extenders are courses that take Foundations of Data Science or connector courses as a starting point for further work in data science. Instructors of these courses explicitly build upon the concepts and skills developed in the entry-level data science curriculum. Some extenders require only Foundations of Data Science  or a specific connector as a prerequisite, while others may require calculus or other prerequisites.  

Examples include:

Closely Related Courses

These are courses that are not technically part of the Data Science Education Program but for which the instructors are “not oblivious” to DSEP (mainly Data 8) in their design, and the course is being developed with attention to students from Data 8

NWMEDIA 190: Making Sense of Cultural Data

Near Eastern Studies 190: Introduction to Digital Humanities: from Analog to Digital

History 100S: Text Analysis for Digital Humanists and Social Scientists

Advanced Integrative Opportunities

Advanced Integrative opportunities enable more advanced students to work hands-on with data in an interdisciplinary, project-based manner. For instance, Terrestrial Hydrology (Geog C136/ESPM C130) is a new course focused on the role that hydrology plays in malaria transmission in sub-Saharan Africa (prerequisites are Math 1A-1B and Physics 7A).

Data science modules in existing courses

The faculty has begun to introduce data science modules into their existing courses as a way to enable students to tap the power of data in a range of disciplines and fields. Using the approach and concepts taught in the Foundations and connector courses, faculty are adding short modules to use data science into existing courses ranging from entry-level to advanced. As an example, a Rhetoric instructor in Fall 2016 developed a three-class module on polling data in the context of the current election. Learn more about modules>>

Other core courses for students who want to pursue data science

In addition to the courses listed above, a variety of advanced and mid-level courses are offered as part of existing programs. Here are just a few examples offered in Fall 2016:

    • CS 186: Introduction to Database Systems

    • CS 189: Introduction to Machine Learning

    • STAT 133: Concepts in Computing with Data

    • STAT 154: Modern Statistical Prediction and Machine Learning

    • STAT 159: Reproducible and Collaborative Data Science

    • CS 61A, B, C

    • CS 188 189 186 70 170

    • Math 53 and Math 54, Math 1A, Math 1B

Faculty from other institutions who are interested in developing a data science program are invited to visit our implementation page.

Background on the Data Science Education Program at Berkeley

Berkeley faculty across many disciplines have collaboratively created a model for a comprehensive undergraduate data science curriculum. Starting from the blueprint in the January 2015 report of the Data Sciences Education Rapid Action Team (DSERAT), the curriculum is built around a modular core-and-connections structure that can serve as a platform on which many academic programs can build.

The Data Science curriculum was launched at the entry level in 2015-16 with an innovative introductory course and a suite of connector courses that relate to students’ areas of interest, now ranging from neuroscience to civil engineering to demography to ethics. The entry-level courses are designed to provide the base for later classes in a broad range of departments that will be able to leverage and extend what students have learned. The upper tiers of the program, which are now being developed, will provide additional depth and connect across the campus with major and integrated minor offerings. As previewed in the DSERAT report, the program engages with societal and ethical issues around data science not only in course content, but also throughout the program design, incorporating best practices around diversity, equity, and inclusion so that the curriculum is welcoming to students of many backgrounds and interests.

The curriculum that is now being created aims at an integrated program. It responds to the experience of faculty of the transformation of their own fields of research and teaching by the cross-cutting possibilities of data science, and to fast-growing student demand for courses in computing, inference, and hands-on work with real data, as reflected in very large numbers of students enrolling in preexisting courses covering parts of this material in separated fashion. The curriculum aims to integrate a full appreciation of the lifecycle of working with data with the computational and mathematical knowledge that underlies it. It follows a modular design that allows it both to leverage common teaching of exceptional quality and shared infrastructure in a highly cost-effective manner, and to create tailored offerings designed and “owned” by departments. In staying strongly coupled to student interests and diverse programs’ needs, it must operate flexibly and responsively even as it scales up fast.

Leadership and staffing

Program leadership has been provided by the Dean of Undergraduate Studies and two additional faculty members involved in the leadership of the Data Science Planning Initiative. A proposal for its institutional regularization in relation to a new decanal division for computation and data is now being considered by campus. Through its start-up phase, the data science curriculum has been very lightly staffed. In addition to the dean’s resources and a start-up funding allocation, it has drawn heavily on individual faculty investment, staff commitment, and provision of additional resources by multiple departments and support units.