Summer 2018
Title 
Course Number 
Times & Locations 
Description 
Instructor 
Units 

Foundations of Data Science (Data 8) 
STAT C8/ COMPSCI C8 CCN: 14344 
Summer C: MTWTF 910 Dwinelle 155 
Foundations of data science from three perspectives: inferential thinking, computational thinking, and realworld relevance. Given data arising from some realworld phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with handson analysis of realworld datasets, including economic data, document collections, geographical data, and social networks. It delves into social and legal issues surrounding data analysis, including issues of privacy and data ownership. 
Vinitra Swamy, Fahad Kamran, Deborah Nolan 
4 
Environmental Health and Development 
ESPM c167/ PBHLTH c160 CCN: 14966 
Summer D: MF 911 W 911:30 Haviland 12 
The health effects of environmental alterations caused by development programs and other human activities in both developing and developed areas. Case studies will contextualize methodological information and incorporate a global perspective on environmentally mediated diseases in diverse populations. Topics include water management; population change; toxics; energy development; air pollution; climate change; chemical use, etc. 
TBD 
4 
Demographic Methods: Introduction to Population Analysis 
DEMOG 110 CCN: 14956 
Summer A: MTWTh 122 Barrows 170 
Measures and methods of Demography. Life tables, fertility and nuptiality measures, age pyramids, population projection, measures of fertility control. 
TBD 
3 
Social Networks 
DEMOG 180 CCN: 14961 
Summer A: 
The science of social networks focuses on measuring, modeling, and understanding the different ways that people are connected to one another. We will use a broad toolkit of theories and methods drawn from the social, natural, and mathematical sciences to learn what a social network is, to understand how to work with social network data, and to illustrate some of the ways that social networks can be useful in theory and in practice. We will see that network ideas are powerful enough to be used everywhere from UNAIDS, where network models help epidemiologists prevent the spread of HIV, to Silicon Valley, where data scientists use network ideas to build products that enable people all across the globe to connect with one another. 
TBD 
3 
Fall 2018 Courses
Backbone
Title 
Course Number 
Times & Locations 
Description 
Instructor 
Units 
Foundations of Data Science (Data 8) 
STAT/COMPSCI C8 CCN: 27696 
MWF 910am Wheeler 150 
Foundations of data science from three perspectives: inferential thinking, computational thinking, and realworld relevance. Given data arising from some realworld phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with handson analysis of realworld datasets, including economic data, document collections, geographical data, and social networks. It delves into social and legal issues surrounding data analysis, including issues of privacy and data ownership.  David Wagner & Ani Adhikari  4 
Principles & Techniques of Data Science (Data 100) 
STAT/COMPSCI C100 CCN: 25289 
TTh 6:30pm8 Wheeler 150 
In this course, students will explore the data science lifecycle, including question formulation, data collection and cleaning, exploratory data analysis and visualization, statistical inference and prediction, and decisionmaking. This class will focus on quantitative critical thinking and key principles and techniques needed to carry out this cycle. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.  Joshua A. Hug, Fernando Perez  4 
Probability for Data Science 
STAT 140 CCN: 31468 
MW 56:30pm Valley Life Sciences 2050 
An introduction to probability, emphasizing the combined use of mathematics and programming to solve problems. Random variables, discrete and continuous families of distributions. Bounds and approximations. Dependence, conditioning, Bayes methods. Convergence, Markov chains. Least squares prediction. Random permutations, symmetry, order statistics. Use of numerical computation, graphics, simulation, and computer algebra.  Ani Adhikari  4 
Connectors
Title 
Course Number 
Times & Location 
Description 
Instructor 
Units 
Python for Earth Science 
EPS 88 CCN: 32879 
F 1012 Evans 458

Earthquakes and El Ninos are examples of natural hazards in California. The course uses Python/Jupyter Notebook and realworld observations to introduce students to these and other Earth phenomena and their underlying physics. The students will learn how to access and visualize the data, extract signals, and make probability forecasts. The final module is a project that synthesizes the course material to make a probabilistic forecast. The course will be cotaught by a team of EPS faculty, and the focus of each semester will depend on the expertise of the faculty in charge. 
Doug Dreger and Maggie Avery 
2 
Data Science for Smart Cities 
CIVENG 88 CCN: 27752 
M 122 Davis 406 
Cities become more dependent on the data flows that connect infrastructures between themselves, and users to infrastructures. Design and operation of smart, efficient, and resilient cities nowadays require data science skills. This course provides an introduction to working with data generated within transportation systems, power grids, communication networks, as well as collected via crowdsensing and remote sensing technologies, to build demand and supplyside urban services based on data analytics.

2  
Computational Structures in Data Science 
COMPSCI 88 CCN: 30996 
M 24 LeConte 4  Development of Computer Science topics appearing in Foundations of Data Science (C8); expands computational concepts and techniques of abstraction. Understanding the structures that underlie the programs, algorithms, and languages used in data science and elsewhere. Mastery of a particular programming language while studying general techniques for managing program complexity, e.g., functional, objectoriented, and declarative programming. Provides practical experience with composing larger systems through several significant programming projects.  David E. Culler  2 
Immigration: What Do the Data Tell Us? 
DEMOG 88 CCN: 25929 
M 24 2232 Piedmont 100 
Humans are a migratory species like no other. As huntergatherers, humans migrated from East Africa to every currently inhabited place on earthexcept for a few pacific islands and that research station in Antarctica. During modern times humans continue to migrate in astounding numbers from poor countries to rich countries; from rural to urban areas; as refugees and as laborers both with and without the consent of receiving countries. This course will cover the small but important part of the rich history human migration that deals with the population of the United Statesfocusing on the period between 1850 and the present. Since its founding, conflict over immigration policies have periodically risen to the top of the American political agenda often masking or exacerbating other sources of conflict. Understanding past immigration policies thus provides a lens through which we can view both the broad contours of US history and the particular situation in which we find ourselves today.

2  
How Does History Count? 
HISTORY 88 CCN: 32177 
F 10am12pm Dwinelle 3205 
In this connector course, we will explore how historical data becomes historical evidence and how recent technological advances affect longestablished practices, such as close attention to historical context and contingency. Will the advent of fast computing and big data make history “count” more or lead to unprecedented insights into the study of change over time? During our weekly discussions, we will apply what we learn in lectures and labs to the analysis of selected historical sources and get an understanding of constructing historical datasets. We will also consider scholarly debates over quantitative evidence and historical argument.

2  
Data Science Applications in Physics 
PHYSICS 88 CCN: 32824 
M 24pm Evans 60 

Yuri Kolomensky  2 
Probability and Mathematical Statistics in Data Science 
STAT 88 CCN: 24294 
TTh 12 LeConte 4  In this connector course we will state precisely and prove results discovered in the foundational data science course through working with data. Topics include: total variation distance between discrete distributions; the mean, standard deviation, and tail bounds; correlation, and the derivation of the regression equation; probabilities, random variables, and the Central Limit Theorem; probabilistic models; symmetries in random permutations; prior and posterior distributions, and Bayes’ rule.  Shobhana Murali Stoyanov  2 
Data and Decisions 
UGBA 962 CCN: 16898 
M 24  The objective of the course is to provide an understanding of how data and statistical analysis can improve managerial decisionmaking. Students learn how to ask the right questions, find or collect relevant data, and apply appropriate statistical methods to solve problems and make better business decisions. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in nonexperimental data. This course, in combination with the Foundations course, satisfies the statistics prerequisite for admissions to Haas.  Conrad Miller  2 
Data and Decisions 
UGBA 963 CCN: 32610 
M 46  The objective of the course is to provide an understanding of how data and statistical analysis can improve managerial decisionmaking. Students learn how to ask the right questions, find or collect relevant data, and apply appropriate statistical methods to solve problems and make better business decisions. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in nonexperimental data. This course, in combination with the Foundations course, satisfies the statistics prerequisite for admissions to Haas.  Conrad Miller  2 
Human Contexts & Ethics
Title 
Course Number 
Times & Location 
Description 
Instructor 
Units 
Introduction to Science, Technology, and Society 
HISTORY C182C CCN: 31147 
MWF
12pm
Dwinelle 145

This course explores how data science is entangled with diverse human contexts (histories, institutions, and material bases) and ethics (domains of valueladen choice). We will bring historicallygrounded perspectives as well as frameworks and methods from Science, Technology, and Society (STS) (such as crossnational comparison, coproduction, and controversy studies) to bear on topics that include: Doing ethical data science amid shifting definitions of human subjects, consent, and privacy; the changing relationship between data, democracy, and law; the role of data analytics in how corporations and governments provide public goods such as health and security to citizens; sensors, machine learning and artificial intelligence and changing landscapes of labor, industry, and city life; and the implications of data for how publics and varied scientific disciplines know the world. 
Cathryn Carson  4 
Information Technology and Society 
AFRICAM 134 CCN: 32637 
M
36pm
Wheeler 200

This course assesses the role of information technology in the digitalization of society by focusing on the deployment of egovernment, ecommerce, elearning, the digital city, telecommuting, virtual communities, Internet time, the virtual office, and the geography of cyberspace. Course will also discuss the role of information technology in the governance and economic development of society.  Michel S Laguerre  4 
Ethics in Science and Engineering 
BIOENG 100 CCN: 26884 
WF
56:30 Evans 10

The goal of this semester course is to present the issues of professional conduct in the practice of engineering, research, publication, public and private disclosures, and in managing professional and financial conflicts. The method is through historical didactic presentations, case studies, presentations of methods for problem solving in ethical matters, and classroom debates on contemporary ethical issues. The faculty will be drawn from national experts and faculty from religious studies, journalism, and law from the UC Berkeley campus.  TBD  3 
The Social Life of Computing 
ISF 100J CCN: 25474 
TTh 3:305 Barrows 20  In this class, we will look at computing as a social phenomenon: to see it not just as a technology that transforms but to see it as a technology that has evolved, and is being put to use, in very particular ways, by particular groups of people. We will be doing this by employing a variety of methods, primarily historical and ethnographic, oriented around a study of practices. We will pay attention to technical details but ground these technical details in social organization (a term whose meaning should become clearer and clearer as the class progresses). We will study the social organization of computing around different kinds of hardware, software, ideologies, and ideas.  Shreeharsh Kelkar  4 
Data Enabled Courses
These courses are taught in a way that permits students to build on Data 8. Please review prerequisites.
Title 
Course Number 
Time & Location 
Description 
Instructor 
Units 
Technological and social networks: Theory and analysis 
COMPSCI 194 031 CCN: 
MW 56:30 Etcheverry 3109 
This course will take a computational approach to the study of technological (like the Internet or WWW) and social (like Facebook and Twitter) networks. It will follow the textbook Networks, Crowds and Markets but take a more computational approach. Prerequisites: CS61a (or strong python programming experience) and Math 1B.

Eric J. Friedman  4 
Demographic Methods: Introduction to Population Analysis 
DEMOG 110 CCN: 
TuTh 3:305pm Moffitt 102 
Measures and methods of Demography. Life tables, fertility and nuptiality measures, age pyramids, population projection, measures of fertility control.

TBD  3 
Statistical Learning for Energy and Environment 
ENERES 190C CCN: 
102 Wheeler Hall, TT 9:3011 
This course will teach students to build, estimate and interpret models that describe phenomena in the broad area of energy and environmental decisionmaking. The effort will be divided between (i) learning a suite of datadriven modeling approaches, (ii) building the programming and computing tools to use those models and (iii) developing the expertise to formulate questions that are appropriate for available data and models. Students will leave the course as both critical consumers and responsible producers of datadriven analysis. 
Duncan Callaway  4 
Data Science in Global Change Ecology 
ESPM 157 CCN: 
MF 122pm Barrows 110 
Many of the greatest challenges we face today come from understanding and interacting with the natural world: from global climate change to the sudden collapse of fisheries and forests, from the spread of disease and invasive species to the unknown wealth of medical, cultural, and technological value we derive from nature. Advances in satellites and microsensors, computation, informatics and the Internet have made available unprecedented amounts of data about the natural world, and with it, new challenges of sifting, processing and synthesizing large and diverse sources of information. In this course, students will learn and apply fundamental computing, statistics and modeling concepts to a series of realworld ecological and environment.  Carl Boettiger  4 
Applied Data Science with Venture Applications 
IND ENG 135 CCN: 
TTh 12:302pm Evans 10 
This highlyapplied course surveys a variety of key of concepts and tools that are useful for designing and building applications that process data signals of information. The course introduces modern open source, computer programming tools, libraries, and code samples that can be used to implement data applications. The mathematical concepts highlighted in this course include filtering, prediction, classification, decisionmaking, Markov chains, LTI systems, spectral analysis, and frameworks for learning from data. Each math concept is linked to implementation using Python using libraries for math array functions (NumPy), manipulation of tables (Pandas), long term storage (SQL, JSON, CSV files), natural language (NLTK), and ML frameworks.  Ikhlaq Sidhu, Alexander S. Fred Ojala  3 
Introduction to Machine Learning and Data Analytics 
IND ENG 142 CCN: 28029 
TTh 3:305pm LeConte 3 
This course introduces students to key techniques in machine learning and data analytics through a diverse set of examples using real datasets from domains such as ecommerce, healthcare, social media, sports, the Internet, and more. Through these examples, exercises in R, and a comprehensive team project, students will gain experience understanding and applying techniques such as linear regression, logistic regression, classification and regression trees, random forests, boosting, text mining, data cleaning and manipulation, data visualization, network analysis, time series modeling, clustering, principal component analysis, regularization, and largescale learning.  Paul Grigas  3 
Natural Language Processing 
INFO 159 CCN: 
TuTh 3:305pm LeConte 4 
This course introduces students to natural language processing and exposes them to the variety of methods available for reasoning about text in computational systems. NLP is deeply interdisciplinary, drawing on both linguistics and computer science, and helps drive much contemporary work in text analysis (as used in computational social science, the digital humanities, and computational journalism). We will focus on major algorithms used in NLP for various applications (partofspeech tagging, parsing, coreference resolution, machine translation) and on the linguistic phenomena those algorithms attempt to model. Students will implement algorithms and create linguistically annotated data on which those algorithms depend.  David Bamman  3 
Introduction to Computational Techniques in Physics 
PHYSICS 77 CCN: 
M 24pm Evans 60 
Introductory scientific programming in Python with examples from physics. Topics include: visualization, statistics and probability, regression, numerical integration, simulation, data modeling, function approximation, and algebraic systems. Recommended for freshman physics majors.  Yury Kolomensky  3 
Data Science and Bayesian Statistics for Physical Sciences 
PHYSICS 151 CCN: 
MW 11am12:30pm 251 LeConte 
Get acquainted with modern computational methods used in physical sciences, including numerical analysis methods, data science and Bayesian statistics  Uroš Seljak  3 
Data Science for Research Psychology 
PSYCH 101D CCN: 
TuTh
3:30 5 pm

This Python based course builds upon the inferential and computational thinking skills developed in the Foundations of Data Science course by tying them to the classical statistical and research approaches used in Psychology. Topics include experimental design, control variables, reproducibility in science, probability distributions, parametric vs. nonparametric statistics, hypothesis tests (ttests, one and two way ANOVA, chisquared and oddsratio), linear regression and correlation. 
TBD  4 
Concepts in Computing with Data 
STAT 133 CCN: 
MWF 89am Dwinelle 155 
An introduction to computationally intensive applied statistics. Topics will include organization and use of databases, visualization and graphics, statistical learning and data mining, model validation procedures, and the presentation of results.  Gaston Sanchez Trujillo  3 
Reproducible and Collaborative Statistical Data Science 
STAT 159 CCN: 
MW 10am12 pm Barrows 126 
A projectbased introduction to statistical data analysis. Through case studies, computer laboratories, and a term project, students will learn practical techniques and tools for producing statistically sound and appropriate, reproducible, and verifiable computational answers to scientific questions. Course emphasizes version control, testing, process automation, code review, and collaborative programming. Software tools may include Bash, Git, Python, and LaTeX.  Philip Stark  4 