Fall 2018 Courses


Course Number
Times & Locations
Foundations of Data Science (Data 8)


CCN: 27696 



Wheeler 150

Foundations of data science from three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social and legal issues surrounding data analysis, including issues of privacy and data ownership. David Wagner & Ani Adhikari 4
Principles & Techniques of Data Science (Data 100)


CCN: 25289



Wheeler 150

In this course, students will explore the data science lifecycle, including question formulation, data collection and cleaning, exploratory data analysis and visualization, statistical inference and prediction​, and decision-making.​ This class will focus on quantitative critical thinking​ and key principles and techniques needed to carry out this cycle. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing. Joshua A. Hug, Fernando Perez 4
Probability for Data Science

STAT 140

CCN: 31468



Valley Life Sciences 2050

An introduction to probability, emphasizing the combined use of mathematics and programming to solve problems. Random variables, discrete and continuous families of distributions. Bounds and approximations. Dependence, conditioning, Bayes methods. Convergence, Markov chains. Least squares prediction. Random permutations, symmetry, order statistics. Use of numerical computation, graphics, simulation, and computer algebra. Ani Adhikari 4


Course Number
Times & Location

Python for Earth Science

EPS 88

CCN: 32879

F 10-12 Evans 458 Earthquakes and El Ninos are examples of natural hazards in California. The course uses Python/Jupyter Notebook and real-world observations to introduce students to these and other Earth phenomena and their underlying physics. The students will learn how to access and visualize the data, extract signals, and make probability forecasts. The final module is a project that synthesizes the course material to make a probabilistic forecast. The course will be co-taught by a team of EPS faculty, and the focus of each semester will depend on the expertise of the faculty in charge.

Doug Dreger and Maggie Avery


Data Science for Smart Cities


CCN: 27752

M 12-2 Davis 406 Cities become more dependent on the data flows that connect infrastructures between themselves, and users to infrastructures. Design and operation of smart, efficient, and resilient cities nowadays require data science skills. This course provides an introduction to working with data generated within transportation systems, power grids, communication networks, as well as collected via crowd-sensing and remote sensing technologies, to build demand- and supply-side urban services based on data analytics. Alexey Pozdnukhov 2
Computational Structures in Data Science


CCN: 30996

M 2-4 LeConte 4 Development of Computer Science topics appearing in Foundations of Data Science (C8); expands computational concepts and techniques of abstraction. Understanding the structures that underlie the programs, algorithms, and languages used in data science and elsewhere. Mastery of a particular programming language while studying general techniques for managing program complexity, e.g., functional, object-oriented, and declarative programming. Provides practical experience with composing larger systems through several significant programming projects. David E. Culler 2
Immigration: What Do the Data Tell Us?


CCN: 25929

M 2-4 2232 Piedmont 100 Humans are a migratory species like no other. As hunter-gatherers, humans migrated from East Africa to every currently inhabited place on earth--except for a few pacific islands and that research station in Antarctica. During modern times humans continue to migrate in astounding numbers from poor countries to rich countries; from rural to urban areas; as refugees and as laborers both with and without the consent of receiving countries. This course will cover the small but important part of the rich history human migration that deals with the population of the United States--focusing on the period between 1850 and the present. Since its founding, conflict over immigration policies have periodically risen to the top of the American political agenda often masking or exacerbating other sources of conflict. Understanding past immigration policies thus provides a lens through which we can view both the broad contours of US history and the particular situation in which we find ourselves today. Carl Mason 2

Making History Count: The Anthropocene and Data Science


CCN: 32177



Cory 105

Some geologists have proposed a that we live in a new stratigraphic age, the Anthropocene--the epoch of humanity. In this data science connector course, we will explore the history of this new geological era through a combination of traditional history and data science. We will look at how the Industrial Revolution, global trade and empire, and the unprecedented take-off of mid-20th-century material prosperity changed the planet. You will learn the skills of a historian: how to tell solid stories about complicated things, how to read efficiently, how to write clearly and convincingly, and how to understand current events as part of a historical process. Note: We will play around with big data sets and descriptive statistics, but it’s unlikely we will delve deep into programming.

Brendan Mackie 2
Aesthetics and Data

L&S 88-1

CCN: 25859

Tu 5-7 Cory 105 When we visualize data, we give it an aesthetic shape that did not previously exist. In this course, we will develop the basic aesthetic literacy needed to critically consider the ways we present information. First, we will study the concepts that art critics use to describe the sensory, emotional, and political qualities of art. Then, we will consider how these concepts might be useful in the field of data analysis, both by making us aware of the ways in which data is shaped and manipulated and by helping us to perceive meaning where patterns are not readily perceptible. Through several writing assignments, students will develop their appreciation of visual form as well as the writing skills needed to communicate their ideas broadly. Rebecca Gaydos 2
Broken down by age and sex: data science and demography

L&S 88-2

CCN: 25860

M 12-2 Barrows 122 Demography is the science of populations and how they change — including death, sex, migration, marriage, and more. Today, demography is a critical part of answering the most pressing questions that face populations all over the world: why do some countries become rich, while some remain poor? Which forces guide the shifting landscape of politics and voting? Who is most affected by the opioid epidemic? In this connector, we will take a tour of cutting-edge problems in demography and how data science can be used to help address them. Dennis Feehan 2
Sports Analytics

L&S 88-3

CCN: 25861

M 2-4 Barrows 122

The principles of data science meet sports analytics. What makes a good hitter in baseball? How do you measure that? What are the flaws of plus/minus in basketball? Do Steph Curry or Klay Thompson ever get a hot hand? When should a coach go for it on 4th down? This course cover a wide range of topics on the analytical thinking behind the data revolution in sports and explore data science through the lens of sports analytics.

Alex Papanicolaou 2
Data Science Applications in Physics


CCN: 32824



Evans 60

Introduction to data science with applications to physics. Topics include: statistics and probability in physics, modeling of the physical systems and data, numerical integration and differentiation, function approximation. Connector course for Data Science 8, room-shared with Physics 77. Recommended for freshmen intended to major in physics or engineering with emphasis on data science.

Yuri Kolomensky 2
Probability and Mathematical Statistics in Data Science


CCN: 24294

TTh 1-2 LeConte 4 In this connector course we will state precisely and prove results discovered in the foundational data science course through working with data. Topics include: total variation distance between discrete distributions; the mean, standard deviation, and tail bounds; correlation, and the derivation of the regression equation; probabilities, random variables, and the Central Limit Theorem; probabilistic models; symmetries in random permutations; prior and posterior distributions, and Bayes’ rule. Shobhana Murali Stoyanov 2
Data and Decisions

UGBA 96-2

CCN: 16898

M 2-4 The objective of the course is to provide an understanding of how data and statistical analysis can improve managerial decision-making. Students learn how to ask the right questions, find or collect relevant data, and apply appropriate statistical methods to solve problems and make better business decisions. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in non-experimental data. This course, in combination with the Foundations course, satisfies the statistics prerequisite for admissions to Haas. Conrad Miller 2
Data and Decisions

UGBA 96-3

CCN: 32610

M 4-6 The objective of the course is to provide an understanding of how data and statistical analysis can improve managerial decision-making. Students learn how to ask the right questions, find or collect relevant data, and apply appropriate statistical methods to solve problems and make better business decisions. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in non-experimental data. This course, in combination with the Foundations course, satisfies the statistics prerequisite for admissions to Haas. Conrad Miller 2

Human Contexts & Ethics

Course Number
Times & Location
Introduction to Science, Technology, and Society


CCN: 31147

Dwinelle 145

This course explores how data science is entangled with diverse human contexts (histories, institutions, and material bases) and ethics (domains of value-laden choice). We will bring historically-grounded perspectives as well as frameworks and methods from Science, Technology, and Society (STS) (such as cross-national comparison, co-production, and controversy studies) to bear on topics that include: Doing ethical data science amid shifting definitions of human subjects, consent, and privacy; the changing relationship between data, democracy, and law; the role of data analytics in how corporations and governments provide public goods such as health and security to citizens; sensors, machine learning and artificial intelligence and changing landscapes of labor, industry, and city life; and the implications of data for how publics and varied scientific disciplines know the world.

Cathryn Carson 4
Information Technology and Society


CCN: 32637

Wheeler 200
This course assesses the role of information technology in the digitalization of society by focusing on the deployment of e-government, e-commerce, e-learning, the digital city, telecommuting, virtual communities, Internet time, the virtual office, and the geography of cyberspace. Course will also discuss the role of information technology in the governance and economic development of society.  Michel S Laguerre 4
Ethics in Science and Engineering


CCN: 26884

5-6:30 Evans 10
The goal of this semester course is to present the issues of professional conduct in the practice of engineering, research, publication, public and private disclosures, and in managing professional and financial conflicts. The method is through historical didactic presentations, case studies, presentations of methods for problem solving in ethical matters, and classroom debates on contemporary ethical issues. The faculty will be drawn from national experts and faculty from religious studies, journalism, and law from the UC Berkeley campus.  TBD 3
The Social Life of Computing

ISF 100J

CCN: 25474

TTh 3:30-5 Barrows 20 In this class, we will look at computing as a social phenomenon: to see it not just as a technology that transforms but to see it as a technology that has evolved, and is being put to use, in very particular ways, by particular groups of people. We will be doing this by employing a variety of methods, primarily historical and ethnographic, oriented around a study of practices. We will pay attention to technical details but ground these technical details in social organization (a term whose meaning should become clearer and clearer as the class progresses). We will study the social organization of computing around different kinds of hardware, software, ideologies, and ideas.  Shreeharsh Kelkar 4

Behind the Data: Humans and Values

INFO 188

CCN: 34266

12:30 pm - 2pm
This course blends social and historical perspectives on data with ethics, law, policy, and case examples to help students understand current ethical and legal issues in data science and machine learning. Legal, ethical, and policy-related concepts addressed include: research ethics; privacy and surveillance; bias and discrimination; and oversight and accountability. These issues will be addressed throughout the lifecycle of data--from collection to storage to analysis and application. The course emphasizes strategies, processes, and tools for attending to ethical and legal issues in data science work. Course assignments emphasize researcher and practitioner reflexivity, allowing students to explore their own social and ethical commitments.
Deirdre Kathleen Mulligan 3

Data Enabled Courses

These courses are taught in a way that permits students to build on Data 8. Please review the prerequisites.

Course Number
Time & Location

Urban Informatics and Visualization

CP 255 

CCN: 20224


12:30-2 pm

Wurster 214

This is a hands-on course that trains students to analyze urban data using statistical and machine learning tools, develop indicators, and create visualizations and maps using the Python programming language, open source libraries, and public data to address urban challenges such as transportation accessibility and housing affordability.  The course will first introduce the fundamentals of programming in Python before moving on to a survey of data analysis and visualization methods. Classroom sessions will include lectures and workshops. A series of exercises will reinforce the skills and topics being presented, and a final project will provide an opportunity for students to develop a more complete project from harvesting data from Open Data portals to synthesizing and analyzing those data to explore a question or problem, to communicating their results in a web map and blog, as well as a final presentation.

Paul Waddell


Introduction to Machine Learning





9:30 - 11 am

Li Ka Shing 245

Theoretical foundations, algorithms, methodologies, and applications for machine learning. Topics may include supervised methods for regression and classication (linear models, trees, neural networks, ensemble methods, instance-based methods); generative and discriminative probabilistic models; Bayesian parametric learning; density estimation and clustering; Bayesian networks; time series models; dimensionality reduction; programming projects covering a variety of real-world applications.

Moritz Hardt, Benjamin Recht, Stella Xingxing Yu


Technological and social networks: Theory and analysis

COMPSCI 194 031





Etcheverry 3109

This course will take a computational approach to the study of technological (like the Internet or WWW) and social (like Facebook and Twitter) networks. It will follow the textbook Networks, Crowds and Markets but take a more computational approach. Prerequisites: CS61a (or strong python programming experience) and Math 1B. Eric J. Friedman 4
Demographic Methods: Introduction to Population Analysis





3:30-5 pm

Moffitt 102

Measures and methods of Demography. Life tables, fertility and nuptiality measures, age pyramids, population projection, measures of fertility control. TBD 3
Machine learning for sequential decision making under uncertainty

EE 194




2-3:30 pm 306 Soda

This course is about learning to make decisions that are embedded in time and in an uncertain environment. What does it even mean to do well in such settings and how can we evaluate performance? What if we do not fully trust a probabilistic model? What if there are game-theoretic or adversarial aspects? How can we intelligently navigate the tension between exploration (figuring out what is going on), exploitation (reaping the rewards of what we have learned), and defense (preventing a potentially adversarial environment from exploiting us!)? How does this change if our feedback from the environment is delayed or sparse?

Anant Sahai, Vidya Muthukumar 4
Statistical Learning for Energy and Environment




TuTh 9:30-11 am

102 Wheeler Hall, 

This course will teach students to build, estimate and interpret models that describe phenomena in the broad area of energy and environmental decision-making. The effort will be divided between (i) learning a suite of data-driven modeling approaches, (ii) building the programming and computing tools to use those models and (iii) developing the expertise to formulate questions that are appropriate for available data and models. Students will leave the course as both critical consumers and responsible producers of data-driven analysis.

Duncan Callaway 4
Data Science in Global Change Ecology

ESPM 157





Barrows 110

Many of the greatest challenges we face today come from understanding and interacting with the natural world: from global climate change to the sudden collapse of fisheries and forests, from the spread of disease and invasive species to the unknown wealth of medical, cultural, and technological value we derive from nature. Advances in satellites and micro-sensors, computation, informatics and the Internet have made available unprecedented amounts of data about the natural world, and with it, new challenges of sifting, processing and synthesizing large and diverse sources of information. In this course, students will learn and apply fundamental computing, statistics and modeling concepts to a series of real-world ecological and environment. Carl Boettiger 4
Applied Data Science with Venture Applications






Evans 10

This highly-applied course surveys a variety of key of concepts and tools that are useful for designing and building applications that process data signals of information. The course introduces modern open source, computer programming tools, libraries, and code samples that can be used to implement data applications. The mathematical concepts highlighted in this course include filtering, prediction, classification, decision-making, Markov chains, LTI systems, spectral analysis, and frameworks for learning from data. Each math concept is linked to implementation using Python using libraries for math array functions (NumPy), manipulation of tables (Pandas), long term storage (SQL, JSON, CSV files), natural language (NLTK), and ML frameworks.

Ikhlaq Sidhu, Alexander S. Fred Ojala

Introduction to Machine Learning and Data Analytics


CCN: 28029



LeConte 3

This course introduces students to key techniques in machine learning and data analytics through a diverse set of examples using real datasets from domains such as e-commerce, healthcare, social media, sports, the Internet, and more. Through these examples, exercises in R, and a comprehensive team project, students will gain experience understanding and applying techniques such as linear regression, logistic regression, classification and regression trees, random forests, boosting, text mining, data cleaning and manipulation, data visualization, network analysis, time series modeling, clustering, principal component analysis, regularization, and large-scale learning. Paul Grigas 3
Natural Language Processing

INFO 159





LeConte 4

This course introduces students to natural language processing and exposes them to the variety of methods available for reasoning about text in computational systems. NLP is deeply interdisciplinary, drawing on both linguistics and computer science, and helps drive much contemporary work in text analysis (as used in computational social science, the digital humanities, and computational journalism). We will focus on major algorithms used in NLP for various applications (part-of-speech tagging, parsing, coreference resolution, machine translation) and on the linguistic phenomena those algorithms attempt to model. Students will implement algorithms and create linguistically annotated data on which those algorithms depend. David Bamman 3
Introduction to Computational Techniques in Physics






Evans 60

Introductory scientific programming in Python with examples from physics. Topics include: visualization, statistics and probability, regression, numerical integration, simulation, data modeling, function approximation, and algebraic systems. Recommended for freshman physics majors. Yury Kolomensky 3
Data Science and Bayesian Statistics for Physical Sciences






251 LeConte

Get acquainted with modern computational methods used in physical sciences, including numerical analysis methods, data science and Bayesian statistics.

Uroš Seljak


Data Science for Research Psychology




3:30 -5 pm

This Python based course builds upon the inferential and computational thinking skills developed in the Foundations of Data Science course by tying them to the classical statistical and research approaches used in Psychology. Topics include experimental design, control variables, reproducibility in science, probability distributions, parametric vs. non-parametric statistics, hypothesis tests (t-tests, one and two way ANOVA, chi-squared and odds-ratio), linear regression and correlation.

Concepts in Computing with Data

STAT 133





Dwinelle 155

An introduction to computationally intensive applied statistics. Topics will include organization and use of databases, visualization and graphics, statistical learning and data mining, model validation procedures, and the presentation of results.

Gaston Sanchez Trujillo

Reproducible and Collaborative Statistical Data Science

STAT 159




10am-12 pm

Barrows 126

A project-based introduction to statistical data analysis. Through case studies, computer laboratories, and a term project, students will learn practical techniques and tools for producing statistically sound and appropriate, reproducible, and verifiable computational answers to scientific questions. Course emphasizes version control, testing, process automation, code review, and collaborative programming. Software tools may include Bash, Git, Python, and LaTeX. Philip Stark 4