Backbone Courses

Title

Course Number

Times & Locations

Description 

Instructor

Units

Foundations of Data Science (Data 8)

STAT/COMPSCI C8

Class #: 32888(link is external)

MWF

11-12pm

Wheeler 150

Foundations of data science from three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social and legal issues surrounding data analysis, including issues of privacy and data ownership.

Ramesh Sridharan

4

Principles & Techniques of Data Science (Data 100)

STAT/COMPSCI C100

Class #: 24961(link is external)

TTh

9:30-11am

Wheeler 150

In this course, students will explore the data science lifecycle, including question formulation, data collection and cleaning, exploratory data analysis and visualization, statistical inference and prediction​, and decision-making.​ This class will focus on quantitative critical thinking​ and key principles and techniques needed to carry out this cycle. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.

Joshua A. Hug

4

Data, Inference, and Decisions (Data 102)

STAT 102

Class #: 33141(link is external)

TTh

2-3:30pm

Soda 306

This course develops the probabilistic foundations of inference in data science, and builds a comprehensive view of the modeling and decision-making life cycle in data science including its human, social, and ethical implications. Topics include: frequentist and Bayesian decision-making, permutation testing, false discovery rate, probabilistic interpretations of models, Bayesian hierarchical models, basics of experimental design, confidence intervals, causal inference, Thompson sampling, optimal control, Q-learning, differential privacy, clustering algorithms, recommendation systems and an introduction to machine learning tools including decision trees, neural networks and ensemble methods.

Fernando Perez, Michael Jordan

4

Probability for Data Science

STAT 140

Class #: 26032(link is external)

TuTh

6:30-8pm

Valley Life Sciences 2050

An introduction to probability, emphasizing the combined use of mathematics and programming to solve problems. Random variables, discrete and continuous families of distributions. Bounds and approximations. Dependence, conditioning, Bayes methods. Convergence, Markov chains. Least squares prediction. Random permutations, symmetry, order statistics. Use of numerical computation, graphics, simulation, and computer algebra.

Ani Adhikari

4

Connectors

Title

Course Number

Times & Location

Description

Instructor

Units

Computational Structures in Data Science

COMPSCI 88

Class #: 28629(link is external)

M

2-3pm

Stanley 105

Development of Computer Science topics appearing in Foundations of Data Science (C8); expands computational concepts and techniques of abstraction. Understanding  the structures that underlie the programs, algorithms, and languages used in data science and elsewhere.   Mastery of a particular programming language while studying general techniques for managing program complexity, e.g., functional, object-oriented, and declarative programming. Provides practical experience with composing larger systems through several significant programming projects.

 

2

Economic Models

DATA 88-1

Class #: 33469(link is external)

T

2-4pm

105 Cory

This Data Science connector course will motivate and illustrate key concepts in Economics with examples in Python Jupyter notebooks. The course will give data science students a pathway to apply python programming and data science concepts within the discipline of economics.  The course will also give economics students a pathway to apply programming to reinforce fundamental concepts and to advance the level of study in upper division coursework and possible thesis work.

Eric Van Dusen

2

Data Science in Genetics and Genomics

DATA 88-2

Class #: 33483(link is external)

T

12-2pm

105 Cory

Recent years have witnessed a rapid expansion in the creation and utilization of genetic and genomic data across diverse domains such as business, biological research, and medicine. In this Data 8 connector course we will survey relevant questions of interest and employ the methods frequently relied upon by analysts to derive insights from genetic and genomic data. Topics will include the comparison of DNA sequences, dimension reduction, the characterization of transcriptomes, and genome-wide association studies, among others. In addition to hands-on work with data, we will also consider the history of the genetic and genomic sciences and their intersection with current events, ethics, and modern medicine. Students should exit with an understanding of the central role played by data in the fields and an appreciation for the remaining challenges in light of ever-increasing degrees of personalization of, and access to, these sciences. No biological background is required.

Jonathan Fischer

2

Immigration: What do the data tell us?

DEMOG 88

Class #: 25347(link is external)

M

2-4pm

2232 Piedmont 100

This course will cover the small but important part of the rich history human migration that deals with the population of the United States -- focusing on the 20th and 21st Centuries. We will use the tools of DS8 to answer specific questions that relate to the themes
of this course:

(1) Why do people migrate?

(2) Is immigration good or bad for receiving (and sending) countries?

(3) How do immigrants adapt and how do societies change in response?

In addition to scientific questions, this course will also address the demographic and political history of immigration in the US --
an understanding of which is crucial for understanding both the broad contours of US history and the particular situation in which
we find ourselves today.

 

2

PyEarth: A Python Introduction to Earth Science

EPS 88

Class #: 26406(link is external)

T

12-2pm

McCone 325

Earthquakes and El Ninos are examples of natural hazards in California. The course uses Python/Jupyter Notebook and real-world observations to introduce students to these and other Earth phenomena and their underlying physics. The students will learn how to access and visualize the data, extract signals, and make probability forecasts. The final module is a project that synthesizes the course material to make a probabilistic forecast. The course will be co-taught by a team of EPS faculty, and the focus of each semester will depend on the expertise of the faculty in charge.

Nicolas Swanson-hysell

2

Data Science Applications in Physics

PHYSICS 88

Class #: 26399(link is external)

M

2-4pm

Barrows 166

Introduction to data science with applications to physics. Topics include: statistics and probability in physics, modeling of the physical systems and data, numerical integration and differentiation, function approximation. Connector course for Data Science 8, room-shared with Physics 77. Recommended for freshmen intended to major in physics or engineering with emphasis on data science.

 

2

Data and Decisions

UGBA 88-1

Class #: 33263(link is external)

2-4pm

Cheit C210

The goal of this connector course is to provide an understanding of how data and statistical analysis can improve managerial decision-making. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in non-experimental data. The course draws on a variety of business and social science applications, including advertising, management, online marketplaces, labor markets, and education. This course, in combination with the Data 8 Foundations course, satisfies the statistics prerequisite for admission to Haas.

Conrad Miller

2

Data and Decisions

UGBA 88-2

Class #: 33264(link is external)

M

4-6pm

Cheit C210

The goal of this connector course is to provide an understanding of how data and statistical analysis can improve managerial decision-making. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in non-experimental data. The course draws on a variety of business and social science applications, including advertising, management, online marketplaces, labor markets, and education. This course, in combination with the Data 8 Foundations course, satisfies the statistics prerequisite for admission to Haas.

Conrad Miller

2

How does History Count?

HIST 88

Class #: 26125(link is external)

T

4-4:30pm

Cory 105

In this connector course, we will explore how historical data becomes historical evidence and how recent technological advances affect long-established practices, such as close attention to historical context and contingency. Will the advent of fast computing and big data make history “count” more or lead to unprecedented insights into the study of change over time? During our weekly discussions, we will apply what we learn in lectures and labs to the analysis of selected historical sources and get an understanding of constructing historical datasets. We will also consider scholarly debates over quantitative evidence and historical argument.

 

2

Human Contexts & Ethics

Title

Course Number

Times & Location

Description

Instructor

Units

Ethics in Science and Engineering

BIOENG 100

Class #: 27356(link is external)

MWF

12-1pm

Evans 10

The goal of this semester course is to present the issues of professional conduct in the practice of engineering, research, publication, public and private disclosures, and in managing professional and financial conflicts. The method is through historical didactic presentations, case studies, presentations of methods for problem solving in ethical matters, and classroom debates on contemporary ethical issues. The faculty will be drawn from national experts and faculty from religious studies, journalism, and law from the UC Berkeley campus. 

Dorian Liepmann

3

Human Contexts and Ethics of Data

HISTORY C184D / STS C104

Class #: 31650(link is external)

MWF

4-5pm

Li Ka Shing 245

This course teaches you to use the tools of applied historical thinking and Science, Technology, and Society (STS) to recognize, analyze, and shape the human contexts and ethics of data. It addresses key topics such as doing ethical data science amid shifting definitions of human subjects, consent, and privacy; the changing relationship between data, democracy, and law; the role of data analytics in how corporations and governments provide public goods such as health and security to citizens; sensors, machine learning and artificial intelligence and changing landscapes of labor, industry, and city life.  It prepares you to engage as a knowledgeable and responsible citizen and professional in the varied arenas of our datafied world.

Cathryn Carson

4

Behind the Data: Humans and Values

INFO 188

Class #: 29335(link is external)

TTh

12:30-2pm

Etcheverry 3108

This course blends social and historical perspectives on data with ethics, law, policy, and case examples to help students understand current ethical and legal issues in data science and machine learning. Legal, ethical, and policy-related concepts addressed include: research ethics; privacy and surveillance; bias and discrimination; and oversight and accountability. These issues will be addressed throughout the lifecycle of data--from collection to storage to analysis and application. The course emphasizes strategies, processes, and tools for attending to ethical and legal issues in data science work. Course assignments emphasize researcher and practitioner reflexivity, allowing students to explore their own social and ethical commitments.

Deirdre Kathleen Mulligan

3

Data Enabled Courses

These courses are taught in a way that permits students to build on Data 8. Please review the prerequisites.

Title

Course Number

Times & Locations

Description

Instructor

Units

Engineering Data Analysis

CIVENG 93

Class #: 27400(link is external)

MW

9-10am

Etcheverry 3106

Application of the concepts and methods of probability theory and statistical inference to CEE problems and data; graphical data analysis and sampling; elements of set theory; elements of probability theory; random variables and expectation; simulation; statistical inference. Use of computer programming languages for analysis of CEE-related data and problems. The course also introduces the student to various domains of uncertainty analysis in CEE. 

Joan L. Walker

3

Introduction to Machine Learning

COMPSCI 189

Class #: 27462(link is external)

TTh

5-6:30pm

Pimentel 1

Theoretical foundations, algorithms, methodologies, and applications for machine learning. Topics may include supervised methods for regression and classification (linear models, trees, neural networks, ensemble methods, instance-based methods); generative and discriminative probabilistic models; Bayesian parametric learning; density estimation and clustering; Bayesian networks; time series models; dimensionality reduction; programming projects covering a variety of real-world applications. 

Jennifer Listgarten, Stella Xingxing Yu

4

Computational Models of Cognition

COGSCI 131

Class #: 32577(link is external)

MWF

6-7pm

Latimer 120

This course will provide advanced students in cognitive science and computer science with the skills to develop computational models of human cognition, giving insight into how people solve challenging computational problems, as well as how to bring computers closer to human performance. The course will explore three ways in which researchers have attempted to formalize cognition -- symbolic approaches, neural networks, and probability and statistics -- considering the strengths and weaknesses of each. 

 

4

Introduction to Population Analysis

DEMOG 110

Class #: 21585(link is external)

TTh

3:30-5pm

Barrows 20

Measures and methods of Demography. Life tables, fertility and nuptiality measures, age pyramids, population projection, measures of fertility control. 

 

3

Data, Environment and Society

ENERES 131

Class #: 33105(link is external)

TTh

9:30-11am

Wheeler 202

Critical, data-driven analysis of specific issues or general problems of how people interact with environmental and resource systems. This course will teach students to build, estimate and interpret models that describe phenomena in the broad area of energy and environmental decision-making. More than one section may be given each semester on different topics depending on faculty and student interest. 

Duncan Calloway

4

Basic Modeling and Simulation Tools for Industrial Research Applications

ENGIN 150

Class #: 31486(link is external)

TTh

2-3:30pm

Jacobs Hall 310

The course emphasizes elementary modeling, numerical methods & their implementation on physical problems motivated by phenomena that students are likely to encounter in their careers, involving biomechanics, heat-transfer, structural analysis, control theory, fluid-flow, electrical conduction, diffusion, etc. This will help students develop intuition about the strengths and weaknesses of a variety of modeling & numerical methods, as well as develop intuition about modeling physical systems & strengths and weaknesses of a variety of numerical methods, including: Discretization of differential equations, Methods for solving nonlinear systems, Gradient-based methods and machine learning algorithms for optimization, stats & quantification

Tarek Zohdi

3

Introductory Applied Econometrics

ENVECON/IAS C118

Class #: 26748(link is external)

TTh

9:30-11am

Mulford 159

Formulation of a research hypothesis and definition of an empirical strategy. Regression analysis with cross-sectional and time-series data; econometric methods for the analysis of qualitative information; hypothesis testing. The techniques of statistical and econometric analysis are developed through applications to a set of case studies and real data in the fields of environmental, resource, and international development economics. Students learn the use of a statistical software for economic data analysis. 

Jeremy R. Magruder

4

Data Science in Global Change Ecology

ESPM 157

Class #: 27205(link is external)

WF

10-12pm

Barrows 110

Many of the greatest challenges we face today come from understanding and interacting with the natural world: from global climate change to the sudden collapse of fisheries and forests, from the spread of disease and invasive species to the unknown wealth of medical, cultural, and technological value we derive from nature. Advances in satellites and micro-sensors, computation, informatics and the Internet have made available unprecedented amounts of data about the natural world, and with it, new challenges of sifting, processing and synthesizing large and diverse sources of information. In this course, students will learn and apply fundamental computing, statistics and modeling concepts to a series of real-world ecological and environment

Carl Boettiger

4

Introduction to Machine Learning and Data Analytics

INDENG 142

Class #: 28312(link is external)

TTh

3:30-5pm

Morgan 101

This course introduces students to key techniques in machine learning and data analytics through a diverse set of examples using real datasets from domains such as e-commerce, healthcare, social media, sports, the Internet, and more. Through these examples, exercises in R, and a comprehensive team project, students will gain experience understanding and applying techniques such as linear regression, logistic regression, classification and regression trees, random forests, boosting, text mining, data cleaning and manipulation, data visualization, network analysis, time series modeling, clustering, principal component analysis, regularization, and large-scale learning.

Paul Grigas

3

Introduction to Data Visualization

INFO 190-1

Class #: 19471(link is external)

MW 10:30-12pm

W 12-1pm

South Hall 202

This course introduces students to data visualization: the use of the visual channel for gaining insight with data, exploring data, and as a way to communicate insights, observations, and results with other people.

The field of information visualization is flourishing today, with beautiful designs and applications ranging from journalism to marketing to data science. This course will introduce foundational principles and relevant perceptual properties to help students become discerning judges of data displayed visually. The course will also introduce key practical techniques and include extensive hands-on exercises to enable students to become skilled at telling stories with data using modern information visualization tools.

Marti A. Hearst

4

Artificial Intelligence in Medicine and Health Policy

PBHLTH 196-8

Class #: 33297(link is external)

W

3-6pm

Hearst Field Annex B5

Over the coming decades, data and algorithms will transform medicine and our health care system. Whether you plan to be a doctor, an algorithm developer, or work elsewhere in the health sector, this course will help you understand the tremendous upside of artificial intelligence for health: what the tools of machine learning can do in this important sector, and where they can do harm. The course will focus on teaching concepts, not the mechanics of specific algorithms. But genuine conceptual understanding will require engagement with technical content (e.g., readings from computer science and statistics, problem sets requiring analysis of real datasets with statistical software). As a result, it is designed for students who are already comfortable with basic data analysis, thanks to coursework in data science/computer science, biostatistics/statistics, or economics (e.g., you should already know how to load and manipulate datasets in statistical software).

Ziad Obermeyer

1-4

Introduction to Computational Techniques in Physics

PHYSICS 77

Class #: 23245(link is external)

M

2-4pm

Barrows 166

Introductory scientific programming in Python with examples from physics. Topics include: visualization, statistics and probability, regression, numerical integration, simulation, data modeling, function approximation, and algebraic systems. Recommended for freshman physics majors.

 

3

Research and Data Analysis in Psychology

PSYCH 101

Class #: 23558(link is external)

W

4-7pm

Evans 10

The course will concentrate on hypothesis formulation and testing, tests of significance, analysis of variance (one-way analysis), simple correlation, simple regression, and nonparametric statistics such as chi-square and Mann-Whitney U tests. Majors intending to be in the honors program must complete 101 by the end of their junior year. 

 

4

Data Science for Research Psychology

PSYCH 101D

Class #: 33033(link is external)

MW

5-6:30pm

Mulford 240

This Python based course builds upon the inferential and computational thinking skills developed in the Foundations of Data Science course by tying them to the classical statistical and research approaches used in Psychology. Topics include experimental design, control variables, reproducibility in science, probability distributions, parametric vs. non-parametric statistics, hypothesis tests (t-tests, one and two way ANOVA, chi-squared and odds-ratio), linear regression and correlation.

 

4

Concepts in Computing with Data

STAT 133

Class #: 23999(link is external)

MWF

9-10pm

Li Ka Shing 245

An introduction to computationally intensive applied statistics. Topics will include organization and use of databases, visualization and graphics, statistical learning and data mining, model validation procedures, and the presentation of results. 

Gaston Sanchez

3

Modern Statistical Prediction and Machine Learning

STAT 154

Class #: 23882(link is external)

TTh

3:30-5pm

Hearst Mining 390

Theory and practice of statistical prediction. Contemporary methods as extensions of classical methods. Topics: optimal prediction rules, the curse of dimensionality, empirical risk, linear regression and classification, basis expansions, regularization, splines, the bootstrap, model selection, classification and regression trees, boosting, support vector machines. Computational efficiency versus predictive performance. Emphasis on experience with real data and assessing statistical assumptions. 

Gaston Sanchez

4