December 6, 2016

The pilot offering of Berkeley’s new intermediate-level data science course, “Principles and Techniques of Data Science,” is now open for students to sign up. Students interested in taking Data Science 100 (cross-listed between Computer Science and Statistics) can register now on Cal Central for CS C100 or Stat C100 (ThTh 12:30-2:00 lecture and a 2-hour lab section). More information, as well as a form to sign up for updates, is available on the course website.

Students in Data Science 100 will explore the data science lifecycle, including question formulation, data collection and cleaning, exploratory data analysis and visualization, statistical inference and prediction​, and decision-making.​ The class focuses on quantitative critical thinking​ and key principles and techniques needed to carry out this cycle. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.

The class is open to students of all majors and levels who meet the prerequisites. It bridges between Foundations of Data Science (Data 8) and upper division computer science and statistics courses as well as methods courses in other fields. It is intended to serve as an upper division core class for Berkeley’s envisioned data science major and minors, once these are designed by faculty and approved by the university.

In the pilot offering of the class, enrollment will be limited to 98 students; later offerings are expected to be substantially larger. Enrollment in Spring 2017 will be via the waitlist on CalCentral. The instructors are aiming to get a diverse group of students so they can understand how the course will perform when it is opened up broadly in Fall 2017.


For this pilot, the instructors plan to require the following (or equivalent):

  1. Foundations of Data Science: Data8 covers much of the material in DS100 but at an introductory level. Data8 provides basic exposure to python programming and working with tabular data as well as visualization, statistics, and machine learning.

  2. Computing: The Structure and Interpretation of Computer Programs CS61a or Computational Structures in Data Science CS88. These courses provide additional background in python programming (e.g., for loops, lambdas, debugging, and complexity) that will enable DS100 to focus more on the concepts in Data Science and less on the details of programming in python.

  3. Math: Linear Algebra (Math 54 or EE 16a): We will need some basic concepts like linear operators, eigenvectors, derivatives, and integrals to enable statistical inference and derive new prediction algorithms. This may be satisfied concurrently to DS100.

Students who are interested in additional data science classes after Data 8, but don’t yet meet the DS 100 requirements, should take the prerequisites now so they can take DS 100 in future semesters. Other Data Science classes in Spring 2017 are listed at

Data Science 100 will be taught in Spring 2017 by Professors Joseph Gonzalez and Joseph Hellerstein of Computer Science, and Professors Deborah Nolan and Bin Yu of Statistics. To learn more, please visit the course website at