Spring 2018 Courses
Backbone
Title 
Course Number 
Times & Location 
Description 
Instructor 
Units 
Foundations of Data Science (Data 8) 
CS/ INFO/ STAT C8 CCN: 31678 
MWF 10  11 am Wheeler 150 
Foundations of data science from three perspectives: inferential thinking, computational thinking, and realworld relevance. Given data arising from some realworld phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with handson analysis of realworld datasets. 
Anindita Adhikari  4 
Principles and Techniques of Data Science (Data 100) 
COMPSCI / STAT C100 CCN: 37227 
TuTh 11  12:30 pm Wheeler 150 
In this course, students will explore the data science lifecycle, including question formulation, data collection and cleaning, exploratory data analysis and visualization, statistical inference and prediction , and decisionmaking. This class will focus on quantitative critical thinking and key principles and techniques needed to carry out this cycle. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification, and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing. 
Joseph Edgar Gonzalez, Fernando Perez  4 
Probability for Data Science 
STAT 140 CCN: 32926 
MW 5  6:30 PM Latimer 120 
An introduction to probability, emphasizing the combined use of mathematics and programming to solve problems. Random variables, discrete and continuous families of distributions. Bounds and approximations. Dependence, conditioning, Bayes methods. Convergence, Markov chains. Least squares prediction. Random permutations, symmetry, order statistics. Use of numerical computation, graphics, simulation, and computer algebra. 
Anindita Adhikari  4 
Connectors
Title 
Course Number 
Times & Location 
Description 
Instructor 
Units 

Data Science and the Mind 
COG SCI 88 CCN: 23161 
Tu 1012 
How does the human mind work? We explore this question by analyzing a range of data concerning such topics as human rationality and irrationality, human memory, how objects and events are represented in the mind, and the relation of language and cognition. This class provides young scientists with critical thinking and computing skills that will allow them to work with data in cognitive science and related disciplines. 
Dmetri Hayes  2 
Computational Structures in Data Science  COMPSCI 88 CCN: 41565 
F 12  1 pm Soda 306 
Development of Computer Science topics appearing in Foundations of Data Science (C8); expands computational concepts and techniques of abstraction. Understanding the structures that underlie the programs, algorithms, and languages used in data science and elsewhere. Mastery of a particular programming language while studying general techniques for managing program complexity, e.g., functional, objectoriented, and declarative programming. Provides practical experience with composing larger systems through several significant programming projects. 
Gerald Friedland  2 
Data Science Applications in Geography 
GEOG 88 CCN: 33084 
MW 5  7 pm McCone 145 
Data science methods are increasingly important in geography and earth science. This course introduces some of the particular challenges of working with spatial data arising from characteristics specific to such data. These issues will be explored in a series of modules deploying data science methods to investigate contemporary topics in geography and earth science, relating to climate science, hydrology, population census and remote sensing of the environment. No prior knowledge is assumed or expected. This class runs for the equivalent of 7 weeks only with two 2 hour lecture/lab sessions each week. The first class meeting (an introductory session) will be 57PM Mon 1/22/18 in McCone 535. Regular meetings will then be MW 57PM from 2/21/18 until 4/11/18 in either McCone 535 or McCone 145 per the class syllabus. 
David Bernard O'Sullivan  2 
Crime and Punishment: Taking the Measure of the US Justice System 
LEGALST 88 CCN: 32891 
Tu 8  10 am Kroeber 238 
We will explore how data are used in the criminal justice system by exploring the debates surrounding mass incarceration and evaluating a number of different data sources that bear on police practices, incarceration, and criminal justice reform. Students will be required to think critically about the debates regarding criminal justice in the US and to work with various public data sets to assess the extent to which these data confirm or deny specific policy narratives. Building on skills from Foundations of Data Science, students will be required to use basic data management skills working in Python: data cleaning, aggregation, merging and appending data sets, collapsing variables, summarizing findings, and presenting data visualizations. 
Dag P. Macleod  2 
Children in the Developing World 
L&S 881 CCN: 42347 
Tu 6  9 pm Cory 105 
Child nutrition and education are important for adult success, but children in developing countries fall behind with nutritional consumption and have fewer educational opportunities. Household surveys are an instrumental tool for understanding factors associated with investments in children. We will use household data sets to explore relationships between nutrition and education outcomes and a variety of socioeconomic variables to establish an understanding of contextual elements which can hinder or promote child growth and learning. 
TBD  3 
Sports Analytics 
L&S 882 CCN: 42348 
Tu 12  2 pm Cory 105 
The principles of data science meet sports analytics. What makes a good hitter in baseball? How do you measure that? What are the flaws of plus/minus in basketball? Do Steph Curry or Klay Thompson ever get a hot hand? When should a coach go for it on 4th down? This course cover a wide range of topics on the analytical thinking behind the data revolution in sports and explore data science through the lens of sports analytics. 
Alex Papanicolaou  2 
Immunotherapy of Cancer: Success and Failures 
MCELLBI 88 CCN: 32863 
M

We will work with a variety of datasets that describe a molecular view of cells and how they divide. We will learn about the processes that cause cells to become specialized (differentiate) and to give rise to cancer (transform). We will analyze data on genetic mutations in cancer that distinguish tumor cells from normal cells. We will learn how mutations are detected by the immune system and the basis of cancer immunotherapy. Finally we will analyze data on clinical trials of cancer immunotherapy to define the correlates of success in curing the disease. The students are expected to gain an understanding of data that reveals the basics of cell physiology and cancer, how immunotherapies of cancer work and their current limitations. 
Nilabh Shastri 
2 
Data Science for Cognitive Neuroscience 
PSYCH 88 CCN: 42654 
Tu 36

Learn how the brain works and how you can use cuttingedge brain imaging and data analysis tools to study it. Students will learn how to formulate and test hypotheses about how the brain represents information, and perform analyses to derive conclusions. 
Samy AbdelGhaffar, Michael Eickenberg  3 
Probability and Mathematical Statistics in Data Science 
CCN: 30832 
TuTh 4  5 pm 
In this connector course we will state precisely and prove results discovered in the foundational data science course through working with data. Topics include: total variation distance between discrete distributions; the mean, standard deviation, and tail bounds; correlation, and the derivation of the regression equation; probabilities, random variables, and the Central Limit Theorem; probabilistic models; symmetries in random permutations; prior and posterior distributions, and Bayes' rule. 
Rasmus Nielsen  2 
Linear Algebra for Data Science 
STAT 89A CCN: 30833 
TuTh 9:3011 am 
This connector will cover introductory topics in the mathematics of data science, focusing on discrete probability and linear algebra and the connections between them that are useful in modern theory and practice. We will focus on matrices and graphs as popular mathematical structures with which to model data. For examples, as models for termdocument corpora, highdimensional regression problems, ranking/classification of web data, adjacency properties of social network data, etc. Stat 89A (in its new, 4unit Spring 2018 form) can be used as an alternate linear algebra corequisite for Data 100 this semester, along with Math 54 and EE 16A. 
Michael William Mahoney  4 
Data & Decisions 
UGBA 96  4 CCN: 41284 
Mon 2  4 pm 
The objective of the course is to provide an understanding of how data and statistical analysis can improve managerial decisionmaking. Students learn how to ask the right questions, find or collect relevant data, and apply appropriate statistical methods to solve problems and make better business decisions. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in nonexperimental data. This course, in combination with the Foundations course, satisfies the statistics prerequisite for admissions to Haas. 
Conrad Miller  2 
Data & Decisions 
UGBA 96  5 CCN: 41285 
Mon 4  6 pm 
The objective of the course is to provide an understanding of how data and statistical analysis can improve managerial decisionmaking. Students learn how to ask the right questions, find or collect relevant data, and apply appropriate statistical methods to solve problems and make better business decisions. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in nonexperimental data. This course, in combination with the Foundations course, satisfies the statistics prerequisite for admissions to Haas. 
Conrad Miller  2 
Human Contexts and Ethics
Behind the Data: Humans and Values 
INFO 188 CCN: 42374 
TuTh 12:30  2 pm South Hall 202 
This course provides an introduction to ethical and legal issues surrounding data and society, as well as handson experience with frameworks, processes, and tools for addressing them in practice. It blends social and historical perspectives on data with ethics, law, policy, and case examples — from Facebook’s “Emotional Contagion” experiment to controversies around search engine and social media algorithms, to selfdriving cars — to help students develop a workable understanding of current ethical and legal issues in data science and machine learning. Legal, ethical, and policyrelated concepts addressed include: research ethics; privacy and surveillance; bias and discrimination; and oversight and accountability. These issues will be addressed throughout the lifecycle of data — from collection to storage to analysis and application. The course emphasizes strategies, processes, and tools for attending to ethical and legal issues in data science work. Course assignments will emphasize researcher and practitioner reflexivity, allowing students to explore their own social and ethical commitments. 
Deirdre Mulligan 
3 
Introduction to Science, Technology, and Society: Human Contexts and Ethics of Data 
Hist C182C / STS C100 / ISF C100G CCN: 42032 
MWF 1  2 pm Lewis 100 
This course explores how data science is entangled with diverse human contexts (histories, institutions, and material bases) and ethics (domains of valueladen choice). We will bring historicallygrounded perspectives as well as frameworks and methods from Science, Technology, and Society (STS) (such as crossnational comparison, coproduction, and controversy studies) to bear on topics that include: Doing ethical data science amid shifting definitions of human subjects, consent, and privacy; the changing relationship between data, democracy, and law; the role of data analytics in how corporations and governments provide public goods such as health and security to citizens; sensors, machine learning and artificial intelligence and changing landscapes of labor, industry, and city life; and the implications of data for how publics and varied scientific disciplines know the world.

Cathryn Carson & Margot BoenigLiptsin 
4 
Additional DataEnabled Courses
These courses are taught in a way that permits students to build on Data 8. Please review prerequisites. To add a proposed course to this list, please contact DSEP.
Title 
Course Number 
Times & Location 
Description 
Instructor 
Units 

Statistical Methods for Data Science 
STAT 28 CCN: 32673 
Tu, Th 12:30  2 pm Evans 60 
This is a lowerdivision course that is a followup to STAT8/CS8 (Foundations of Data Science). The course will teach a broad range of statistical methods that are used to solve data problems. Topics will include group comparisons and ANOVA, standard parametric statistical models, multivariate data visualization, multiple linear regression and classification, classification and regression trees and random forests. An important focus of the course will be on statistical computing and reproducible statistical analysis. The students will be introduced to the widely used R statistical language and they will obtain handson experience in implementing a range of commonly used statistical methods on numerous real world datasets. 
William Fithian  4 
Engineering Data Analysis(link is external) 
CIVENG 93 CCN: 35260 
TuTh 9  10 am O'Brien 212 
Application of the concepts and methods of probability theory and statistical inference to CEE problems and data; graphical data analysis and sampling; elements of set theory; elements of probability theory; random variables and expectation; simulation; statistical inference. Applications to various CEE problems and real data will be developed by use of MATLAB and existing codes. The course also introduces the student to various domains of uncertainty analysis in CEE. 
Mark Hansen  3 
Computational Models of Cognition(link is external) 
COGSCI 131 CCN: 39043 
TuTh 2  3:30 pm Haas F295 
This course will provide advanced students in cognitive science and computer science with the skills to develop computational models of human cognition, giving insight into how people solve challenging computational problems, as well as how to bring computers closer to human performance. The course will explore three ways in which researchers have attempted to formalize cognition  symbolic approaches, neural networks, and probability and statistics  considering the strengths and weaknesses of each. 
Anne Ge Collins  4 
Sensemaking and Organizing(link is external) 
COGSCI 190 CCN: 41641 
MW 9  10 am Dwinelle 283 
When something "makes sense” or " is organized” we are imposing or discovering order in the arrangement of concepts, events, or resources of some kind. Sensemaking and organizing are fundamental human activities that raise many multi‐ or trans‐disciplinary questions about perception, knowledge, decision making, interaction with things and with other people, values and value creation. We can analyze sensemaking and organizing from four interrelated perspectives: As an individual, as a member of a social, cultural, or language community, in institutional contexts, or in data‐intensive or scientific contexts. At the end of the course, students will be more aware of their existing mechanisms and methods for sensemaking and organizing and will have learned a variety of new ones that they can apply as appropriate in the four contexts. 
Robert J. Glushko  3 
Introduction to Machine Learning 
COMPSCI 189 CCN: 35661 
TuTh 3:30  5 pm Dwinelle 155 
Theoretical foundations, algorithms, methodologies, and applications for machine learning. Topics may include supervised methods for regression and classication (linear models, trees, neural networks, ensemble methods, instancebased methods); generative and discriminative probabilistic models; Bayesian parametric learning; density estimation and clustering; Bayesian networks; time series models; dimensionality reduction; programming projects covering a variety of realworld applications. 
TBD  4 
Designing, Visualizing and Understanding Deep Neural Networks 
COMPSCI 194129 CCN: 41752 
MW 5  6:30 pm North Gate 105 
Topics will vary semester to semester. See the Computer Science Division announcements.  John F. Canny  4 
Social Networks(link is external) 
DEMOG 180 CCN: 41045 
TuTh 11 am  12:30 pm McCone 141 
The science of social networks focuses on measuring, modeling, and understanding the different ways that people are connected to one another. We will use a broad toolkit of theories and methods drawn from the social, natural, and mathematical sciences to learn what a social network is, to understand how to work with social network data, and to illustrate some of the ways that social networks can be useful in theory and in practice. We will see that network ideas are powerful enough to be used everywhere from UNAIDS, where network models help epidemiologists prevent the spread of HIV, to Silicon Valley, where data scientists use network ideas to build products that enable people all across the globe to connect with one another. 
Dennis M. Feehan  3 
Seminar on Topics in Law and Society(link is external) 
LEGALST 190 CCN: 17157 
TuTh 10 am  12 pm Barrows 122 
Data, Prediction, and Law is a new Legal Studies seminar that allows students to explore different data sources that scholars and government officials use to make generalizations and predictions in the realm of law. The course will also introduce critiques of predictive techniques in law. Students will apply the statistical and Python programming skills from Foundations of Data Science to examine a traditional social science dataset, “big data” related to law, and legal text data. 
Jonathan D. Marshall  4 
Introduction to Computational Techniques in Physics(link is external) 
PHYSICS 77 CCN: 28817 
M 2  4 pm Barrows 20 
Introductory scientific programming in Python with examples from physics. Topics include: visualization, statistics and probability, regression, numerical integration, simulation, data modeling, function approximation, and algebraic systems. Recommended for freshman physics majors. 
Yuri Kolomensky 
3 
Research and Data Analysis in Psychology(link is external) (10) Research and Data Analysis in Psychology(link is external) (101) 
PSYCH 10/101 
M 2  5 pm Lewis 100 
The class covers research design, statistical reasoning, and statistical methods appropriate for psychological research. Topics covered in research design include the scientific method, experimental versus correlational designs, controls and placebos, within and between subject designs and temporal or sequence effects. Topics covered in statistics include descriptive versus inferential statistics, linear regression and correlation and univariate statistical tests: ttest, one way and twoway ANOVA, chisquare test. The class also introduces nonparametric tests and modeling. Prospective Psychology majors need to take this course to be admitted to the major. 
Arman D. Catterson 
4 
STAT 133 CCN: 30844 
MWF 8  9 am Dwinelle 155 
An introduction to computationally intensive applied statistics. Topics will include organization and use of databases, visualization and graphics, statistical learning and data mining, model validation procedures, and the presentation of results. 
Gaston Sanchez Trujillo 
3  
Applied Data Science with Venture Applications(link is external) 
IEOR 135 CCN: 41878 
TUTh 3:30  5 pm Cory 277 
This highlyapplied course surveys a variety of key of concepts and tools that are useful for designing and building applications that process data signals of information. The course introduces modern open source, computer programming tools, libraries, and code samples that can be used to implement data applications. The mathematical concepts highlighted in this course include filtering, prediction, classification, decisionmaking, Markov chains, LTI systems, spectral analysis, and frameworks for learning from data. Each math concept is linked to implementation using Python using libraries for math array functions (NumPy), manipulation of tables (Pandas), long term storage (SQL, JSON, CSV files), natural language (NLTK), and ML frameworks. 
Ikhlaq Sidhu 
3 
INFO 190  1 CCN: 37950 
TuTh 2  3:30 pm South Hall 202 
This course introduces students to practical fundamentals of data mining and machine learning with just enough theory to aid intuition building. The course is projectoriented, with a project beginning in class every week and to be completed outside of class by the following week, or two weeks for longer assignments. The inclass portion of the project is meant to be collaborative, with the instructor working closely with groups to understand the learning objectives and help them work through any logistics that may be slowing them down. Weekly lectures introduce the concepts and algorithms which will be used in the upcoming project. Students leave the class with handson data mining and data engineering skills they can confidently apply. 
Zachary A. Pardos 
3  
GEOG 187 CCN: 24555 
MW 9:30  10:30 am McCone 575 
A spatial analytic approach to digital mapping and GIS. Given that recording the geolocation of scientific, business and social data is now routine, the question of what we can learn from the spatial aspect of data arises. This class looks at challenges in analyzing spatial data, particularly scale and spatial dependence. Various methods are considered such as hotspot detection, interpolation, and map overlay. The emphasis throughout is hands on and practical rather than theoretical. 
David Bernard O'Sullivan 
4  
Introductory Applied Econometrics(link is external)(ENVECON) 
ENVECON/IAS C118 
TuTh 9:30  11 am VLSB 2060 
Formulation of a research hypothesis and definition of an empirical strategy. Regression analysis with crosssectional and timeseries data; econometric methods for the analysis of qualitative information; hypothesis testing. The techniques of statistical and econometric analysis are developed through applications to a set of case studies and real data in the fields of environmental, resource, and international development economics. Students learn the use of a statistical software for economic data analysis. 
Sofia B. VillasBoas 
4 
SOCIOL 106 CCN: 30285 
Th 8  10 am Barrows 475 
This course will cover more technical issues in quantitative research methods, and will include, according to discretion of instructor, a practicum in data collection and/or analysis. Recommended for students interested in graduate work in sociology or research careers. 
MaoMei Liu 
4  
Modern Statistical Prediction and Machine Learning(link is external) 
STAT 154 CCN: 30887 
MWF 11 am  12 pm Tan Hall 180 
Theory and practice of statistical prediction. Contemporary methods as extensions of classical methods. Topics: optimal prediction rules, the curse of dimensionality, empirical risk, linear regression and classification, basis expansions, regularization, splines, the bootstrap, model selection, classification and regression trees, boosting, support vector machines. Computational efficiency versus predictive performance. Emphasis on experience with real data and assessing statistical assumptions. 
Gaston Sanchez Trujillo 
4 