data 100

Data 100: Principles and Techniques of Data Science is an intermediate level class bridges between and upper division computer science and statistics courses as well as methods courses in other fields. The class explores key areas of data science including question formulation, data collection and cleaning, visualization, statistical inference, predictive modeling, and decision making.​ Through a strong emphasis on data centric computing, quantitative critical thinking, and exploratory data analysis, this class covers key principles and techniques of data science. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.

Data 100 Course Goals:

  1. Prepare students for advanced Berkeley courses in data-management, machine learning, and statistics, by providing the necessary foundation and context

  2. Enable students to start careers as data scientists by providing experience working with real-world data, tools, and techniques

  3. Empower students to apply computational and inferential thinking to address real-world problems

Course Website

The course website can be found at http://www.ds100.org/.

Textbook

Principles and Techniques of Data Science is the textbook for Data 100 at UC Berkeley. The book is a free online textbook created in Jupyter Notebooks and compiled using Jupyter Books. The textbook source is maintained as an open-source project under the CC BY-NC-ND 4.0 License. 

Public Course Materials

A majority of Berkeley’s resources for Data 100 are open source. The Spring 2020 public repository contains resources for lectures, discussions, and labs.

Technology Adoption Guide

This Jupyter Book is a technology guide for others who wish to adopt a data science classroom environment, and is based on the Data Science Education Program’s (DSEP) experiences from running Data 8 and other data science courses. It mainly covers JupyterHub and autograding setup.

Data 100 Goals

  1. Prepare students for advanced Berkeley courses in data-management, machine learning, and statistics, by providing the necessary foundation and context

  2. Enable students to start careers as data scientists by providing experience working with real-world data, tools, and techniques

  3. Empower students to apply computational and inferential thinking to address real-world problems