Spring 2018 Courses


Course Number
Times & Location
Foundations of Data Science (Data 8)


CCN: 31678


10 - 11 am 

Wheeler 150

Foundations of data science from three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets.

Anindita Adhikari 4
Principles and Techniques of Data Science (Data 100)


CCN: 37227


11 - 12:30 pm

Wheeler 150

In this course, students will explore the data science lifecycle, including question formulation, data collection and cleaning, exploratory data analysis and visualization, statistical inference and prediction , and decision-making. This class will focus on quantitative critical thinking and key principles and techniques needed to carry out this cycle. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification, and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing.

Joseph Edgar Gonzalez, Fernando Perez 4
Probability for Data Science

STAT 140

CCN: 32926


5 - 6:30 PM

Latimer 120

An introduction to probability, emphasizing the combined use of mathematics and programming to solve problems. Random variables, discrete and continuous families of distributions. Bounds and approximations. Dependence, conditioning, Bayes methods. Convergence, Markov chains. Least squares prediction. Random permutations, symmetry, order statistics. Use of numerical computation, graphics, simulation, and computer algebra.

Anindita Adhikari 4


Course Number
Times & Location
Data Science and the Mind


CCN: 23161



How does the human mind work? We explore this question by analyzing a range of data concerning such topics as human rationality and irrationality, human memory, how objects and events are represented in the mind, and the relation of language and cognition. This class provides young scientists with critical thinking and computing skills that will allow them to work with data in cognitive science and related disciplines.

Dmetri Hayes 2
Computational Structures in Data Science COMPSCI 88

CCN: 41565


12 - 1 pm

Soda 306

Development of Computer Science topics appearing in Foundations of Data Science (C8); expands computational concepts and techniques of abstraction. Understanding the structures that underlie the programs, algorithms, and languages used in data science and elsewhere. Mastery of a particular programming language while studying general techniques for managing program complexity, e.g., functional, object-oriented, and declarative programming. Provides practical experience with composing larger systems through several significant programming projects.

Gerald Friedland 2
Data Science Applications in Geography


CCN: 33084


5 - 7 pm

McCone 145  

Data science methods are increasingly important in geography and earth science. This course introduces some of the particular challenges of working with spatial data arising from characteristics specific to such data. These issues will be explored in a series of modules deploying data science methods to investigate contemporary topics in geography and earth science, relating to climate science, hydrology, population census and remote sensing of the environment. No prior knowledge is assumed or expected. This class runs for the equivalent of 7 weeks only with two 2 hour lecture/lab sessions each week. The first class meeting (an introductory session) will be 5-7PM Mon 1/22/18 in McCone 535. Regular meetings will then be MW 5-7PM from 2/21/18 until 4/11/18 in either McCone 535 or McCone 145 per the class syllabus.

David Bernard O'Sullivan 2
Crime and Punishment: Taking the Measure of the US Justice System


CCN: 32891


8 - 10 am

Kroeber 238  

We will explore how data are used in the criminal justice system by exploring the debates surrounding mass incarceration and evaluating a number of different data sources that bear on police practices, incarceration, and criminal justice reform. Students will be required to think critically about the debates regarding criminal justice in the US and to work with various public data sets to assess the extent to which these data confirm or deny specific policy narratives. Building on skills from Foundations of Data Science, students will be required to use basic data management skills working in Python: data cleaning, aggregation, merging and appending data sets, collapsing variables, summarizing findings, and presenting data visualizations.

Dag P. Macleod 2
Children in the Developing World

L&S 88-1

CCN: 42347


6 - 9 pm

Cory 105

Child nutrition and education are important for adult success, but children in developing countries fall behind with nutritional consumption and have fewer educational opportunities. Household surveys are an instrumental tool for understanding factors associated with investments in children. We will use household data sets to explore relationships between nutrition and education outcomes and a variety of socio-economic variables to establish an understanding of contextual elements which can hinder or promote child growth and learning.

Sports Analytics

L&S 88-2

CCN:  42348


12 - 2 pm

Cory 105

The principles of data science meet sports analytics. What makes a good hitter in baseball? How do you measure that? What are the flaws of plus/minus in basketball? Do Steph Curry or Klay Thompson ever get a hot hand? When should a coach go for it on 4th down? This course cover a wide range of topics on the analytical thinking behind the data revolution in sports and explore data science through the lens of sports analytics.

Alex Papanicolaou 2
Immunotherapy of Cancer: Success and Failures


CCN: 32863


2 - 4 pm

We will work with a variety of datasets that describe a molecular view of cells and how they divide. We will learn about the processes that cause cells to become specialized (differentiate) and to give rise to cancer (transform). We will analyze data on genetic mutations in cancer that distinguish tumor cells from normal cells. We will learn how mutations are detected by the immune system and the basis of cancer immunotherapy. Finally we will analyze data on clinical trials of cancer immunotherapy to define the correlates of success in curing the disease. The students are expected to gain an understanding of data that reveals the basics of cell physiology and cancer, how immunotherapies of cancer work and their current limitations.

Nilabh Shastri
Data Science for Cognitive Neuroscience


CCN: 42654

Tu 3-6

105 Cory

Learn how the brain works and how you can use cutting-edge brain imaging and data analysis tools to study it. Students will learn how to formulate and test hypotheses about how the brain represents information, and perform analyses to derive conclusions.

Samy Abdel-Ghaffar, Michael Eickenberg 3
Probability and Mathematical Statistics in Data Science


CCN: 30832


4 - 5 pm

Birge 50  

In this connector course we will state precisely and prove results discovered in the foundational data science course through working with data. Topics include: total variation distance between discrete distributions; the mean, standard deviation, and tail bounds; correlation, and the derivation of the regression equation; probabilities, random variables, and the Central Limit Theorem; probabilistic models; symmetries in random permutations; prior and posterior distributions, and Bayes' rule.

Rasmus Nielsen 2
Linear Algebra for Data Science


CCN: 30833

TuTh 9:30-11 am
Moffitt 103

This connector will cover introductory topics in the mathematics of data science, focusing on discrete probability and linear algebra and the connections between them that are useful in modern theory and practice. We will focus on matrices and graphs as popular mathematical structures with which to model data. For examples, as models for term-document corpora, high-dimensional regression problems, ranking/classification of web data, adjacency properties of social network data, etc. Stat 89A (in its new, 4-unit Spring 2018 form) can be used as an alternate linear algebra co-requisite for Data 100 this semester, along with Math 54 and EE 16A.

Michael William Mahoney 4
Data & Decisions

UGBA 96 - 4

CCN: 41284


2 - 4 pm

The objective of the course is to provide an understanding of how data and statistical analysis can improve managerial decision-making. Students learn how to ask the right questions, find or collect relevant data, and apply appropriate statistical methods to solve problems and make better business decisions. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in non-experimental data. This course, in combination with the Foundations course, satisfies the statistics prerequisite for admissions to Haas.

Conrad Miller 2
Data & Decisions

UGBA 96 - 5 

CCN: 41285


4 - 6 pm

The objective of the course is to provide an understanding of how data and statistical analysis can improve managerial decision-making. Students learn how to ask the right questions, find or collect relevant data, and apply appropriate statistical methods to solve problems and make better business decisions. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in non-experimental data. This course, in combination with the Foundations course, satisfies the statistics prerequisite for admissions to Haas.

Conrad Miller 2

Human Contexts and Ethics

Behind the Data: Humans and Values

INFO 188

CCN: 42374


12:30 - 2 pm

South Hall 202

This course provides an introduction to ethical and legal issues surrounding data and society, as well as hands-on experience with frameworks, processes, and tools for addressing them in practice. It blends social and historical perspectives on data with ethics, law, policy, and case examples — from Facebook’s “Emotional Contagion” experiment to controversies around search engine and social media algorithms, to self-driving cars — to help students develop a workable understanding of current ethical and legal issues in data science and machine learning. Legal, ethical, and policy-related concepts addressed include: research ethics; privacy and surveillance; bias and discrimination; and oversight and accountability. These issues will be addressed throughout the lifecycle of data — from collection to storage to analysis and application. The course emphasizes strategies, processes, and tools for attending to ethical and legal issues in data science work. Course assignments will emphasize researcher and practitioner reflexivity, allowing students to explore their own social and ethical commitments.

Deirdre Mulligan

Introduction to Science, Technology, and Society: Human Contexts and Ethics of Data

Hist C182C / STS C100 / ISF C100G

CCN: 42032


1 - 2 pm

Lewis 100

This course explores how data science is entangled with diverse human contexts (histories, institutions, and material bases) and ethics (domains of value-laden choice). We will bring historically-grounded perspectives as well as frameworks and methods from Science, Technology, and Society (STS) (such as cross-national comparison, co-production, and controversy studies) to bear on topics that include: Doing ethical data science amid shifting definitions of human subjects, consent, and privacy; the changing relationship between data, democracy, and law; the role of data analytics in how corporations and governments provide public goods such as health and security to citizens; sensors, machine learning and artificial intelligence and changing landscapes of labor, industry, and city life; and the implications of data for how publics and varied scientific disciplines know the world.

Cathryn Carson & Margot Boenig-Liptsin


Additional Data-Enabled Courses

These courses are taught in a way that permits students to build on Data 8. Please review prerequisites. To add a proposed course to this list, please contact DSEP.

Course Number
Times & Location
Statistical Methods for Data Science

STAT 28 

CCN: 32673 

Tu, Th 12:30 - 2 pm Evans 60

This is a lower-division course that is a follow-up to STAT8/CS8 (Foundations of Data Science). The course will teach a broad range of statistical methods that are used to solve data problems. Topics will include group comparisons and ANOVA, standard parametric statistical models, multivariate data visualization, multiple linear regression and classification, classification and regression trees and random forests. An important focus of the course will be on statistical computing and reproducible statistical analysis. The students will be introduced to the widely used R statistical language and they will obtain hands-on experience in implementing a range of commonly used statistical methods on numerous real world datasets.

William Fithian 4
Engineering Data Analysis(link is external)


CCN: 35260


9 - 10 am

O'Brien 212

Application of the concepts and methods of probability theory and statistical inference to CEE problems and data; graphical data analysis and sampling; elements of set theory; elements of probability theory; random variables and expectation; simulation; statistical inference. Applications to various CEE problems and real data will be developed by use of MATLAB and existing codes. The course also introduces the student to various domains of uncertainty analysis in CEE.

Mark Hansen 3
Computational Models of Cognition(link is external)


CCN: 39043


2 - 3:30 pm

Haas F295

This course will provide advanced students in cognitive science and computer science with the skills to develop computational models of human cognition, giving insight into how people solve challenging computational problems, as well as how to bring computers closer to human performance. The course will explore three ways in which researchers have attempted to formalize cognition -- symbolic approaches, neural networks, and probability and statistics -- considering the strengths and weaknesses of each. 

Anne Ge Collins 4
Sensemaking and Organizing(link is external)


CCN: 41641


9 - 10 am

Dwinelle 283

When something "makes sense” or " is organized” we are imposing or discovering order in the arrangement of concepts, events, or resources of some kind. Sensemaking and organizing are fundamental human activities that raise many multi‐ or trans‐disciplinary questions about perception, knowledge, decision making, interaction with things and with other people, values and value creation. We can analyze sensemaking and organizing from four interrelated perspectives: As an individual, as a member of a social, cultural, or language community, in institutional contexts, or in data‐intensive or scientific contexts. At the end of the course, students will be more aware of their existing mechanisms and methods for sensemaking and organizing and will have learned a variety of new ones that they can apply as appropriate in the four contexts.

Robert J. Glushko 3
Introduction to Machine Learning


CCN: 35661


3:30 - 5 pm

Dwinelle 155

Theoretical foundations, algorithms, methodologies, and applications for machine learning. Topics may include supervised methods for regression and classication (linear models, trees, neural networks, ensemble methods, instance-based methods); generative and discriminative probabilistic models; Bayesian parametric learning; density estimation and clustering; Bayesian networks; time series models; dimensionality reduction; programming projects covering a variety of real-world applications. 

Designing, Visualizing and Understanding Deep Neural Networks

COMPSCI 194-129

CCN: 41752


5 - 6:30 pm

North Gate 105

Topics will vary semester to semester. See the Computer Science Division announcements. John F. Canny 4
Social Networks(link is external)


CCN: 41045


11 am - 12:30 pm

McCone 141

The science of social networks focuses on measuring, modeling, and understanding the different ways that people are connected to one another. We will use a broad toolkit of theories and methods drawn from the social, natural, and mathematical sciences to learn what a social network is, to understand how to work with social network data, and to illustrate some of the ways that social networks can be useful in theory and in practice. We will see that network ideas are powerful enough to be used everywhere from UNAIDS, where network models help epidemiologists prevent the spread of HIV, to Silicon Valley, where data scientists use network ideas to build products that enable people all across the globe to connect with one another.

Dennis M. Feehan 3
Seminar on Topics in Law and Society(link is external)


CCN: 17157


10 am - 12 pm

Barrows 122

Data, Prediction, and Law is a new Legal Studies seminar that allows students to explore different data sources that scholars and government officials use to make generalizations and predictions in the realm of law. The course will also introduce critiques of predictive techniques in law. Students will apply the statistical and Python programming skills from Foundations of Data Science to examine a traditional social science dataset, “big data” related to law, and legal text data.

Jonathan D. Marshall 4
Introduction to Computational Techniques in Physics(link is external)


CCN: 28817


2 - 4 pm

Barrows 20

Introductory scientific programming in Python with examples from physics. Topics include: visualization, statistics and probability, regression, numerical integration, simulation, data modeling, function approximation, and algebraic systems. Recommended for freshman physics majors.

Yuri Kolomensky


Research and Data Analysis in Psychology(link is external) (10)

Research and Data Analysis in Psychology(link is external) (101)

PSYCH 10/101

CCN: 2950029550


2 - 5 pm

Lewis 100

The class covers research design, statistical reasoning, and statistical methods appropriate for psychological research. Topics covered in research design include the scientific method, experimental versus correlational designs, controls and placebos, within and between subject designs and temporal or sequence effects. Topics covered in statistics include descriptive versus inferential statistics, linear regression and correlation and univariate statistical tests: t-test, one way and two-way ANOVA, chi-square test. The class also introduces non-parametric tests and modeling. Prospective Psychology majors need to take this course to be admitted to the major.

Arman D. Catterson


Concepts in Computing with Data(link is external)

STAT 133

CCN: 30844


8 - 9 am

Dwinelle 155

An introduction to computationally intensive applied statistics. Topics will include organization and use of databases, visualization and graphics, statistical learning and data mining, model validation procedures, and the presentation of results.

Gaston Sanchez Trujillo


Applied Data Science with Venture Applications(link is external)

IEOR 135

CCN: 41878


3:30 - 5 pm

Cory 277

This highly-applied course surveys a variety of key of concepts and tools that are useful for designing and building applications that process data signals of information. The course introduces modern open source, computer programming tools, libraries, and code samples that can be used to implement data applications. The mathematical concepts highlighted in this course include filtering, prediction, classification, decision-making, Markov chains, LTI systems, spectral analysis, and frameworks for learning from data. Each math concept is linked to implementation using Python using libraries for math array functions (NumPy), manipulation of tables (Pandas), long term storage (SQL, JSON, CSV files), natural language (NLTK), and ML frameworks.

Ikhlaq Sidhu


Data Mining and Analytics(link is external)

INFO 190 - 1

CCN: 37950


2 - 3:30 pm

South Hall 202

This course introduces students to practical fundamentals of data mining and machine learning with just enough theory to aid intuition building. The course is project-oriented, with a project beginning in class every week and to be completed outside of class by the following week, or two weeks for longer assignments. The in-class portion of the project is meant to be collaborative, with the instructor working closely with groups to understand the learning objectives and help them work through any logistics that may be slowing them down. Weekly lectures introduce the concepts and algorithms which will be used in the upcoming project. Students leave the class with hands-on data mining and data engineering skills they can confidently apply.

Zachary A. Pardos


Geographic Information Analysis(link is external)

GEOG 187

CCN: 24555


9:30 - 10:30 am

McCone 575

A spatial analytic approach to digital mapping and GIS. Given that recording the geolocation of scientific, business and social data is now routine, the question of what we can learn from the spatial aspect of data arises. This class looks at challenges in analyzing spatial data, particularly scale and spatial dependence. Various methods are considered such as hotspot detection, interpolation, and map overlay. The emphasis throughout is hands on and practical rather than theoretical.

David Bernard O'Sullivan


Introductory Applied Econometrics(link is external)(ENVECON)

Introductory Applied Econometrics(link is external)(IAS)


CCN: 3472025087


9:30 - 11 am

VLSB 2060

Formulation of a research hypothesis and definition of an empirical strategy. Regression analysis with cross-sectional and time-series data; econometric methods for the analysis of qualitative information; hypothesis testing. The techniques of statistical and econometric analysis are developed through applications to a set of case studies and real data in the fields of environmental, resource, and international development economics. Students learn the use of a statistical software for economic data analysis. 

Sofia B. Villas-Boas


Quantitative Sociological Methods(link is external)


CCN: 30285


8 - 10 am

Barrows 475

This course will cover more technical issues in quantitative research methods, and will include, according to discretion of instructor, a practicum in data collection and/or analysis. Recommended for students interested in graduate work in sociology or research careers.

Mao-Mei Liu


Modern Statistical Prediction and Machine Learning(link is external)

STAT 154

CCN: 30887


11 am - 12 pm

Tan Hall 180

Theory and practice of statistical prediction. Contemporary methods as extensions of classical methods. Topics: optimal prediction rules, the curse of dimensionality, empirical risk, linear regression and classification, basis expansions, regularization, splines, the bootstrap, model selection, classification and regression trees, boosting, support vector machines. Computational efficiency versus predictive performance. Emphasis on experience with real data and assessing statistical assumptions. 

Gaston Sanchez Trujillo