Backbone
Title 
Course Number 
Times & Locations 
Description 
Instructor 
Units 
Foundations of Data Science (Data 8) 
STAT/COMPSCI C8 CCN: 27696 
MWF 910am Wheeler 150 
Foundations of data science from three perspectives: inferential thinking, computational thinking, and realworld relevance. Given data arising from some realworld phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with handson analysis of realworld datasets, including economic data, document collections, geographical data, and social networks. It delves into social and legal issues surrounding data analysis, including issues of privacy and data ownership.  David Wagner & Ani Adhikari  4 
Principles & Techniques of Data Science (Data 100) 
STAT/COMPSCI C100 CCN: 25289 
TTh 6:30pm8 Wheeler 150 
In this course, students will explore the data science lifecycle, including question formulation, data collection and cleaning, exploratory data analysis and visualization, statistical inference and prediction, and decisionmaking. This class will focus on quantitative critical thinking and key principles and techniques needed to carry out this cycle. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.  Joshua A. Hug, Fernando Perez  4 
Probability for Data Science 
STAT 140 CCN: 31468 
MW 56:30pm Valley Life Sciences 2050 
An introduction to probability, emphasizing the combined use of mathematics and programming to solve problems. Random variables, discrete and continuous families of distributions. Bounds and approximations. Dependence, conditioning, Bayes methods. Convergence, Markov chains. Least squares prediction. Random permutations, symmetry, order statistics. Use of numerical computation, graphics, simulation, and computer algebra.  Ani Adhikari  4 
Connectors
Title 
Course Number 
Times & Location 
Description 
Instructor 
Units 
Python for Earth Science 
EPS 88 CCN: 32879 
F 1012 Evans 458  Earthquakes and El Ninos are examples of natural hazards in California. The course uses Python/Jupyter Notebook and realworld observations to introduce students to these and other Earth phenomena and their underlying physics. The students will learn how to access and visualize the data, extract signals, and make probability forecasts. The final module is a project that synthesizes the course material to make a probabilistic forecast. The course will be cotaught by a team of EPS faculty, and the focus of each semester will depend on the expertise of the faculty in charge. 
Doug Dreger and Maggie Avery 
2 
Data Science for Smart Cities 
CIVENG 88 CCN: 27752 
M 122 Davis 406  Cities become more dependent on the data flows that connect infrastructures between themselves, and users to infrastructures. Design and operation of smart, efficient, and resilient cities nowadays require data science skills. This course provides an introduction to working with data generated within transportation systems, power grids, communication networks, as well as collected via crowdsensing and remote sensing technologies, to build demand and supplyside urban services based on data analytics.  Alexey Pozdnukhov  2 
Computational Structures in Data Science 
COMPSCI 88 CCN: 30996 
M 24 LeConte 4  Development of Computer Science topics appearing in Foundations of Data Science (C8); expands computational concepts and techniques of abstraction. Understanding the structures that underlie the programs, algorithms, and languages used in data science and elsewhere. Mastery of a particular programming language while studying general techniques for managing program complexity, e.g., functional, objectoriented, and declarative programming. Provides practical experience with composing larger systems through several significant programming projects.  David E. Culler  2 
Immigration: What Do the Data Tell Us? 
DEMOG 88 CCN: 25929 
M 24 2232 Piedmont 100  Humans are a migratory species like no other. As huntergatherers, humans migrated from East Africa to every currently inhabited place on earthexcept for a few pacific islands and that research station in Antarctica. During modern times humans continue to migrate in astounding numbers from poor countries to rich countries; from rural to urban areas; as refugees and as laborers both with and without the consent of receiving countries. This course will cover the small but important part of the rich history human migration that deals with the population of the United Statesfocusing on the period between 1850 and the present. Since its founding, conflict over immigration policies have periodically risen to the top of the American political agenda often masking or exacerbating other sources of conflict. Understanding past immigration policies thus provides a lens through which we can view both the broad contours of US history and the particular situation in which we find ourselves today.  Carl Mason  2 
Making History Count: The Anthropocene and Data Science 
HISTORY 88 CCN: 32177 
M 1012 Cory 105 
Some geologists have proposed a that we live in a new stratigraphic age, the Anthropocenethe epoch of humanity. In this data science connector course, we will explore the history of this new geological era through a combination of traditional history and data science. We will look at how the Industrial Revolution, global trade and empire, and the unprecedented takeoff of mid20thcentury material prosperity changed the planet. You will learn the skills of a historian: how to tell solid stories about complicated things, how to read efficiently, how to write clearly and convincingly, and how to understand current events as part of a historical process. Note: We will play around with big data sets and descriptive statistics, but it’s unlikely we will delve deep into programming. 
Brendan Mackie  2 
Aesthetics and Data 
L&S 881 CCN: 25859 
Tu 57 Cory 105  When we visualize data, we give it an aesthetic shape that did not previously exist. In this course, we will develop the basic aesthetic literacy needed to critically consider the ways we present information. First, we will study the concepts that art critics use to describe the sensory, emotional, and political qualities of art. Then, we will consider how these concepts might be useful in the field of data analysis, both by making us aware of the ways in which data is shaped and manipulated and by helping us to perceive meaning where patterns are not readily perceptible. Through several writing assignments, students will develop their appreciation of visual form as well as the writing skills needed to communicate their ideas broadly.  Rebecca Gaydos  2 
Broken down by age and sex: data science and demography 
L&S 882 CCN: 25860 
M 122 Barrows 122  Demography is the science of populations and how they change — including death, sex, migration, marriage, and more. Today, demography is a critical part of answering the most pressing questions that face populations all over the world: why do some countries become rich, while some remain poor? Which forces guide the shifting landscape of politics and voting? Who is most affected by the opioid epidemic? In this connector, we will take a tour of cuttingedge problems in demography and how data science can be used to help address them.  Dennis Feehan  2 
Sports Analytics 
L&S 883 CCN: 25861 
M 24 Barrows 122 
The principles of data science meet sports analytics. What makes a good hitter in baseball? How do you measure that? What are the flaws of plus/minus in basketball? Do Steph Curry or Klay Thompson ever get a hot hand? When should a coach go for it on 4th down? This course cover a wide range of topics on the analytical thinking behind the data revolution in sports and explore data science through the lens of sports analytics. 
Alex Papanicolaou  2 
Data Science Applications in Physics 
PHYSICS 88 CCN: 32824 
M 24pm Evans 60 
Introduction to data science with applications to physics. Topics include: statistics and probability in physics, modeling of the physical systems and data, numerical integration and differentiation, function approximation. Connector course for Data Science 8, roomshared with Physics 77. Recommended for freshmen intended to major in physics or engineering with emphasis on data science. 
Yuri Kolomensky  2 
Probability and Mathematical Statistics in Data Science 
STAT 88 CCN: 24294 
TTh 12 LeConte 4  In this connector course we will state precisely and prove results discovered in the foundational data science course through working with data. Topics include: total variation distance between discrete distributions; the mean, standard deviation, and tail bounds; correlation, and the derivation of the regression equation; probabilities, random variables, and the Central Limit Theorem; probabilistic models; symmetries in random permutations; prior and posterior distributions, and Bayes’ rule.  Shobhana Murali Stoyanov  2 
Data and Decisions 
UGBA 962 CCN: 16898 
M 24  The objective of the course is to provide an understanding of how data and statistical analysis can improve managerial decisionmaking. Students learn how to ask the right questions, find or collect relevant data, and apply appropriate statistical methods to solve problems and make better business decisions. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in nonexperimental data. This course, in combination with the Foundations course, satisfies the statistics prerequisite for admissions to Haas.  Conrad Miller  2 
Data and Decisions 
UGBA 963 CCN: 32610 
M 46  The objective of the course is to provide an understanding of how data and statistical analysis can improve managerial decisionmaking. Students learn how to ask the right questions, find or collect relevant data, and apply appropriate statistical methods to solve problems and make better business decisions. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in nonexperimental data. This course, in combination with the Foundations course, satisfies the statistics prerequisite for admissions to Haas.  Conrad Miller  2 
Human Contexts & Ethics
Title 
Course Number 
Times & Location 
Description 
Instructor 
Units 
Introduction to Science, Technology, and Society 
HISTORY C182C CCN: 31147 
MWF
12pm
Dwinelle 145

This course explores how data science is entangled with diverse human contexts (histories, institutions, and material bases) and ethics (domains of valueladen choice). We will bring historicallygrounded perspectives as well as frameworks and methods from Science, Technology, and Society (STS) (such as crossnational comparison, coproduction, and controversy studies) to bear on topics that include: Doing ethical data science amid shifting definitions of human subjects, consent, and privacy; the changing relationship between data, democracy, and law; the role of data analytics in how corporations and governments provide public goods such as health and security to citizens; sensors, machine learning and artificial intelligence and changing landscapes of labor, industry, and city life; and the implications of data for how publics and varied scientific disciplines know the world. 
Cathryn Carson  4 
Information Technology and Society 
AFRICAM 134 CCN: 32637 
M
36pm
Wheeler 200

This course assesses the role of information technology in the digitalization of society by focusing on the deployment of egovernment, ecommerce, elearning, the digital city, telecommuting, virtual communities, Internet time, the virtual office, and the geography of cyberspace. Course will also discuss the role of information technology in the governance and economic development of society.  Michel S Laguerre  4 
Ethics in Science and Engineering 
BIOENG 100 CCN: 26884 
WF
56:30 Evans 10

The goal of this semester course is to present the issues of professional conduct in the practice of engineering, research, publication, public and private disclosures, and in managing professional and financial conflicts. The method is through historical didactic presentations, case studies, presentations of methods for problem solving in ethical matters, and classroom debates on contemporary ethical issues. The faculty will be drawn from national experts and faculty from religious studies, journalism, and law from the UC Berkeley campus.  TBD  3 
The Social Life of Computing 
ISF 100J CCN: 25474 
TTh 3:305 Barrows 20  In this class, we will look at computing as a social phenomenon: to see it not just as a technology that transforms but to see it as a technology that has evolved, and is being put to use, in very particular ways, by particular groups of people. We will be doing this by employing a variety of methods, primarily historical and ethnographic, oriented around a study of practices. We will pay attention to technical details but ground these technical details in social organization (a term whose meaning should become clearer and clearer as the class progresses). We will study the social organization of computing around different kinds of hardware, software, ideologies, and ideas.  Shreeharsh Kelkar  4 
Behind the Data: Humans and Values 
INFO 188 CCN: 34266 
TTh
12:30 pm  2pm

This course blends social and historical perspectives on data with ethics, law, policy, and case examples to help students understand current ethical and legal issues in data science and machine learning. Legal, ethical, and policyrelated concepts addressed include: research ethics; privacy and surveillance; bias and discrimination; and oversight and accountability. These issues will be addressed throughout the lifecycle of datafrom collection to storage to analysis and application. The course emphasizes strategies, processes, and tools for attending to ethical and legal issues in data science work. Course assignments emphasize researcher and practitioner reflexivity, allowing students to explore their own social and ethical commitments.

Deirdre Kathleen Mulligan  3 
Data Enabled Courses
These courses are taught in a way that permits students to build on Data 8. Please review the prerequisites.
Title 
Course Number 
Time & Location 
Description 
Instructor 
Units 
Urban Informatics and Visualization 
CP 255 CCN: 20224 
MW 12:302 pm 
This is a handson course that trains students to analyze urban data using statistical and machine learning tools, develop indicators, and create visualizations and maps using the Python programming language, open source libraries, and public data to address urban challenges such as transportation accessibility and housing affordability. The course will first introduce the fundamentals of programming in Python before moving on to a survey of data analysis and visualization methods. Classroom sessions will include lectures and workshops. A series of exercises will reinforce the skills and topics being presented, and a final project will provide an opportunity for students to develop a more complete project from harvesting data from Open Data portals to synthesizing and analyzing those data to explore a question or problem, to communicating their results in a web map and blog, as well as a final presentation. 
Paul Waddell 
3 
Introduction to Machine Learning 
COMPSCI 189 CCN: 
TuTh 9:30  11 am 
Theoretical foundations, algorithms, methodologies, and applications for machine learning. Topics may include supervised methods for regression and classication (linear models, trees, neural networks, ensemble methods, instancebased methods); generative and discriminative probabilistic models; Bayesian parametric learning; density estimation and clustering; Bayesian networks; time series models; dimensionality reduction; programming projects covering a variety of realworld applications.

Moritz Hardt, Benjamin Recht, Stella Xingxing Yu 
4 
Technological and social networks: Theory and analysis 
COMPSCI 194 031 CCN: 
M 56:30pm Etcheverry 3109 
This course will take a computational approach to the study of technological (like the Internet or WWW) and social (like Facebook and Twitter) networks. It will follow the textbook Networks, Crowds and Markets but take a more computational approach. Prerequisites: CS61a (or strong python programming experience) and Math 1B.  Eric J. Friedman  4 
Demographic Methods: Introduction to Population Analysis 
DEMOG 110 CCN: 
TuTh 3:305 pm Moffitt 102 
Measures and methods of Demography. Life tables, fertility and nuptiality measures, age pyramids, population projection, measures of fertility control.  TBD  3 
Machine learning for sequential decision making under uncertainty 
EE 194 CCN: 34087 
TuTh 23:30 pm 306 Soda 
This course is about learning to make decisions that are embedded in time and in an uncertain environment. What does it even mean to do well in such settings and how can we evaluate performance? What if we do not fully trust a probabilistic model? What if there are gametheoretic or adversarial aspects? How can we intelligently navigate the tension between exploration (figuring out what is going on), exploitation (reaping the rewards of what we have learned), and defense (preventing a potentially adversarial environment from exploiting us!)? How does this change if our feedback from the environment is delayed or sparse? 
Anant Sahai, Vidya Muthukumar  4 
Statistical Learning for Energy and Environment 
ENERES 190C CCN: 
TuTh 9:3011 am 102 Wheeler Hall, 
This course will teach students to build, estimate and interpret models that describe phenomena in the broad area of energy and environmental decisionmaking. The effort will be divided between (i) learning a suite of datadriven modeling approaches, (ii) building the programming and computing tools to use those models and (iii) developing the expertise to formulate questions that are appropriate for available data and models. Students will leave the course as both critical consumers and responsible producers of datadriven analysis. 
Duncan Callaway  4 
Data Science in Global Change Ecology 
ESPM 157 CCN: 
MF 122pm Barrows 110 
Many of the greatest challenges we face today come from understanding and interacting with the natural world: from global climate change to the sudden collapse of fisheries and forests, from the spread of disease and invasive species to the unknown wealth of medical, cultural, and technological value we derive from nature. Advances in satellites and microsensors, computation, informatics and the Internet have made available unprecedented amounts of data about the natural world, and with it, new challenges of sifting, processing and synthesizing large and diverse sources of information. In this course, students will learn and apply fundamental computing, statistics and modeling concepts to a series of realworld ecological and environment.  Carl Boettiger  4 
Applied Data Science with Venture Applications 
IND ENG 135 CCN: 
TTh 12:302pm Evans 10 
This highlyapplied course surveys a variety of key of concepts and tools that are useful for designing and building applications that process data signals of information. The course introduces modern open source, computer programming tools, libraries, and code samples that can be used to implement data applications. The mathematical concepts highlighted in this course include filtering, prediction, classification, decisionmaking, Markov chains, LTI systems, spectral analysis, and frameworks for learning from data. Each math concept is linked to implementation using Python using libraries for math array functions (NumPy), manipulation of tables (Pandas), long term storage (SQL, JSON, CSV files), natural language (NLTK), and ML frameworks. 
Ikhlaq Sidhu, Alexander S. Fred Ojala 
3 
Introduction to Machine Learning and Data Analytics 
IND ENG 142 CCN: 28029 
TTh 3:305pm LeConte 3 
This course introduces students to key techniques in machine learning and data analytics through a diverse set of examples using real datasets from domains such as ecommerce, healthcare, social media, sports, the Internet, and more. Through these examples, exercises in R, and a comprehensive team project, students will gain experience understanding and applying techniques such as linear regression, logistic regression, classification and regression trees, random forests, boosting, text mining, data cleaning and manipulation, data visualization, network analysis, time series modeling, clustering, principal component analysis, regularization, and largescale learning.  Paul Grigas  3 
Natural Language Processing 
INFO 159 CCN: 
TuTh 3:305pm LeConte 4 
This course introduces students to natural language processing and exposes them to the variety of methods available for reasoning about text in computational systems. NLP is deeply interdisciplinary, drawing on both linguistics and computer science, and helps drive much contemporary work in text analysis (as used in computational social science, the digital humanities, and computational journalism). We will focus on major algorithms used in NLP for various applications (partofspeech tagging, parsing, coreference resolution, machine translation) and on the linguistic phenomena those algorithms attempt to model. Students will implement algorithms and create linguistically annotated data on which those algorithms depend.  David Bamman  3 
Introduction to Computational Techniques in Physics 
PHYSICS 77 CCN: 
M 24pm Evans 60 
Introductory scientific programming in Python with examples from physics. Topics include: visualization, statistics and probability, regression, numerical integration, simulation, data modeling, function approximation, and algebraic systems. Recommended for freshman physics majors.  Yury Kolomensky  3 
Data Science and Bayesian Statistics for Physical Sciences 
PHYSICS 151 CCN: 
MW 1112:30pm 251 LeConte 
Get acquainted with modern computational methods used in physical sciences, including numerical analysis methods, data science and Bayesian statistics. 
Uroš Seljak 
3 
Data Science for Research Psychology 
PSYCH 101D CCN: 
TuTh
3:30 5 pm

This Python based course builds upon the inferential and computational thinking skills developed in the Foundations of Data Science course by tying them to the classical statistical and research approaches used in Psychology. Topics include experimental design, control variables, reproducibility in science, probability distributions, parametric vs. nonparametric statistics, hypothesis tests (ttests, one and two way ANOVA, chisquared and oddsratio), linear regression and correlation. 
TBD  4 
Concepts in Computing with Data 
STAT 133 CCN: 
MWF 89am Dwinelle 155 
An introduction to computationally intensive applied statistics. Topics will include organization and use of databases, visualization and graphics, statistical learning and data mining, model validation procedures, and the presentation of results. 
Gaston Sanchez Trujillo 
3 
Reproducible and Collaborative Statistical Data Science 
STAT 159 CCN: 
MW 10am12 pm Barrows 126 
A projectbased introduction to statistical data analysis. Through case studies, computer laboratories, and a term project, students will learn practical techniques and tools for producing statistically sound and appropriate, reproducible, and verifiable computational answers to scientific questions. Course emphasizes version control, testing, process automation, code review, and collaborative programming. Software tools may include Bash, Git, Python, and LaTeX.  Philip Stark  4 