Students in Data 100 explore the data science lifecycle, including question formulation, data collection and cleaning, exploratory data analysis and visualization, statistical inference and prediction​, and decision-making.​ The class focuses on quantitative critical thinking​ and key principles and techniques needed to carry out this cycle. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.

The class is open to students of all majors and levels who meet the prerequisites. It bridges between Foundations of Data Science (Data 8) and upper division computer science and statistics courses as well as methods courses in other fields. It also serves as an upper division core class for the data science major.

In the pilot offering of the class, enrollment was limited to 98 students. The course was opened up broadly in Fall 2017.

Prerequisites - instructors require the following (or equivalent):

  1. Foundations of Data Science: Data 8 covers much of the material in Data 100 but at an introductory level. Data 8 provides basic exposure to python programming and working with tabular data as well as visualization, statistics, and machine learning.

  2. Computing: The Structure and Interpretation of Computer Programs, CS61A, Computational Structures in Data Science, CS88, or Introduction to Computer Programming for Scientists and Engineers, ENGIN 7. These courses provide additional background in programming (e.g., for-loops, lambdas, debugging, and complexity) that will enable Data 100 to focus more on the concepts in Data Science and less on the details of programming in python.

  3. Math: Linear Algebra (MATH 54STAT 89A, or EE 16A): We will need some basic concepts like linear operators, eigenvectors, derivatives, and integrals to enable statistical inference and derive new prediction algorithms. This may be satisfied concurrently to Data 100, though course staff highly recommend completing linear algebra prior to enrolling in Data 100.

Students who are interested in additional data science classes after Data 8 but don’t yet meet the Data 100 requirements should take the prerequisites now so they can take Data 100 in future semesters. Other Data Science classes are listed here: https://data.berkeley.edu/education/courses.

To learn more about Data 100, please visit the course website at http://www.ds100.org.