Foundations

## Title |
## Course Number |
## Times & Location |
## Description |
## Instructor |
## Units |

Foundations of Data Science (Data 8) |
CS/ INFO/ STAT C8 CCN: 31678 |
MWF 10 - 11 am Wheeler 150 |
Foundations of data science from three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets. |
Anindita Adhikari | 4 |

### Connectors

This is a partial list of connectors. More will be added as additional courses are developed and scheduled.

## Title |
## Course Number |
## Times & Location |
## Description |
## Instructor |
## Units |
---|---|---|---|---|---|

Data Science and the Mind |
COG SCI 88 CCN: 23161 |
Tu 10-12 |
How does the human mind work? We explore this question by analyzing a range of data concerning such topics as human rationality and irrationality, human memory, how objects and events are represented in the mind, and the relation of language and cognition. This class provides young scientists with critical thinking and computing skills that will allow them to work with data in cognitive science and related disciplines. |
Dmetri Hayes | 2 |

Computational Structures in Data Science | COMPSCI 88 CCN: 41565 |
F 12 - 1 pm Soda 306 |
Development of Computer Science topics appearing in Foundations of Data Science (C8); expands computational concepts and techniques of abstraction. Understanding the structures that underlie the programs, algorithms, and languages used in data science and elsewhere. Mastery of a particular programming language while studying general techniques for managing program complexity, e.g., functional, object-oriented, and declarative programming. Provides practical experience with composing larger systems through several significant programming projects. |
Gerald Friedland | 2 |

Data Science Applications in Geography |
GEOG 88 CCN: 33084 |
MW 5 - 7 pm McCone 145 |
Data science methods are increasingly important in geography and earth science. This course introduces some of the particular challenges of working with spatial data arising from characteristics specific to such data. These issues will be explored in a series of modules deploying data science methods to investigate contemporary topics in geography and earth science, relating to climate science, hydrology, population census and remote sensing of the environment. No prior knowledge is assumed or expected. This class runs for the equivalent of 7 weeks only with two 2 hour lecture/lab sessions each week. The first class meeting (an introductory session) will be 5-7PM Mon 1/22/18 in McCone 535. Regular meetings will then be MW 5-7PM from 2/21/18 until 4/11/18 in either McCone 535 or McCone 145 per the class syllabus. |
David Bernard O'Sullivan | 2 |

Crime and Punishment: Taking the Measure of the US Justice System |
LEGALST 88 CCN: 32891 |
Tu 8 - 10 am Kroeber 238 |
We will explore how data are used in the criminal justice system by exploring the debates surrounding mass incarceration and evaluating a number of different data sources that bear on police practices, incarceration, and criminal justice reform. Students will be required to think critically about the debates regarding criminal justice in the US and to work with various public data sets to assess the extent to which these data confirm or deny specific policy narratives. Building on skills from Foundations of Data Science, students will be required to use basic data management skills working in Python: data cleaning, aggregation, merging and appending data sets, collapsing variables, summarizing findings, and presenting data visualizations. |
TBD | 2 |

Children in the Developing World |
L&S 88-1 CCN: 42347 |
Tu 6 - 9 pm Cory 105 |
Child nutrition and education are important for adult success, but children in developing countries fall behind with nutritional consumption and have fewer educational opportunities. Household surveys are an instrumental tool for understanding factors associated with investments in children. We will use household data sets to explore relationships between nutrition and education outcomes and a variety of socio-economic variables to establish an understanding of contextual elements which can hinder or promote child growth and learning. |
TBD | 3 |

Sports Analytics |
L&S 88-2 CCN: 42348 |
Tu 12 - 2 pm Cory 105 |
The principles of data science meet sports analytics. What makes a good hitter in baseball? How do you measure that? What are the flaws of plus/minus in basketball? Do Steph Curry or Klay Thompson ever get a hot hand? When should a coach go for it on 4th down? This course cover a wide range of topics on the analytical thinking behind the data revolution in sports and explore data science through the lens of sports analytics. |
TBD | 2 |

Immunotherapy of Cancer: Success and Failures |
MCELLBI 88 CCN: 32863 |
M 2 - 4 pm |
We will work with a variety of datasets that describe a molecular view of cells and how they divide. We will learn about the processes that cause cells to become specialized (differentiate) and to give rise to cancer (transform). We will analyze data on genetic mutations in cancer that distinguish tumor cells from normal cells. We will learn how mutations are detected by the immune system and the basis of cancer immunotherapy. Finally we will analyze data on clinical trials of cancer immunotherapy to define the correlates of success in curing the disease. The students are expected to gain an understanding of data that reveals the basics of cell physiology and cancer, how immunotherapies of cancer work and their current limitations. |
Nilabh Shastri |
2 |

Probability and Mathematical Statistics in Data Science |
CCN: 30832 |
TuTh 4 - 5 pm |
In this connector course we will state precisely and prove results discovered in the foundational data science course through working with data. Topics include: total variation distance between discrete distributions; the mean, standard deviation, and tail bounds; correlation, and the derivation of the regression equation; probabilities, random variables, and the Central Limit Theorem; probabilistic models; symmetries in random permutations; prior and posterior distributions, and Bayes' rule. |
TBD | 2 |

Data & Decisions |
UGBA 96 - 4 CCN: 41284 |
Mon 2 - 4 pm |
The objective of the course is to provide an understanding of how data and statistical analysis can improve managerial decision-making. Students learn how to ask the right questions, find or collect relevant data, and apply appropriate statistical methods to solve problems and make better business decisions. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in non-experimental data. This course, in combination with the Foundations course, satisfies the statistics prerequisite for admissions to Haas. |
Conrad Miller | 2 |

Data & Decisions |
UGBA 96 - 5 CCN: 41285 |
Mon 4 - 6 pm |
The objective of the course is to provide an understanding of how data and statistical analysis can improve managerial decision-making. Students learn how to ask the right questions, find or collect relevant data, and apply appropriate statistical methods to solve problems and make better business decisions. We will explore statistical methods for gleaning insights from economic and social data, with an emphasis on approaches to identifying causal relationships. We will discuss how to design and analyze randomized experiments and introduce econometric methods for estimating causal effects in non-experimental data. This course, in combination with the Foundations course, satisfies the statistics prerequisite for admissions to Haas. |
Conrad Miller | 2 |

### Courses with Data 8 as prerequisite

## Title |
## Course Number |
## Times & Location |
## Description |
## Instructor |
## Units |

Statistical Methods for Data Science |
STAT 28 CCN: 32673 |
TuTh 12:30 - 2 pm Evans 60 |
This is a lower-division course that is a follow-up to STAT8/CS8 (Foundations of Data Science). The course will teach a broad range of statistical methods that are used to solve data problems. Topics will include group comparisons and ANOVA, standard parametric statistical models, multivariate data visualization, multiple linear regression and classification, classification and regression trees and random forests. An important focus of the course will be on statistical computing and reproducible statistical analysis. The students will be introduced to the widely used R statistical language and they will obtain hands-on experience in implementing a range of commonly used statistical methods on numerous real-world datasets. |
William Fithian | 4 |

Principles and Techniques of Data Science (Data 100) |
COMPSCI / STAT C100 CCN: 37227 |
TuTh 11 - 12:30 pm Wheeler 150 |
In this course, students will explore the data science lifecycle, including question formulation, data collection and cleaning, exploratory data analysis and visualization, statistical inference and prediction , and decision-making. This class will focus on quantitative critical thinking and key principles and techniques needed to carry out this cycle. These include languages for transforming, querying and analyzing data; algorithms for machine learning methods including regression, classification, and clustering; principles behind creating informative data visualizations; statistical concepts of measurement error and prediction; and techniques for scalable data processing. |
Joseph Edgar Gonzalez, Fernando Perez |
4 |

Probability for Data Science |
STAT 140 CCN: 32926 |
MW 5 - 6:30 PM Latimer 120 |
An introduction to probability, emphasizing the combined use of mathematics and programming to solve problems. Random variables, discrete and continuous families of distributions. Bounds and approximations. Dependence, conditioning, Bayes methods. Convergence, Markov chains. Least squares prediction. Random permutations, symmetry, order statistics. Use of numerical computation, graphics, simulation, and computer algebra. |
Anindita Adhikari | 4 |

### Human Contexts and Ethics

Behind the Data: Humans and Values |
INFO 188 CCN: 42374 |
TuTh 12:30 - 2 pm South Hall 202 |
This course provides an introduction to ethical and legal issues surrounding data and society, as well as hands-on experience with frameworks, processes, and tools for addressing them in practice. It blends social and historical perspectives on data with ethics, law, policy, and case examples — from Facebook’s “Emotional Contagion” experiment to controversies around search engine and social media algorithms, to self-driving cars — to help students develop a workable understanding of current ethical and legal issues in data science and machine learning. Legal, ethical, and policy-related concepts addressed include: research ethics; privacy and surveillance; bias and discrimination; and oversight and accountability. These issues will be addressed throughout the lifecycle of data — from collection to storage to analysis and application. The course emphasizes strategies, processes, and tools for attending to ethical and legal issues in data science work. Course assignments will emphasize researcher and practitioner reflexivity, allowing students to explore their own social and ethical commitments. |
Deirdre Mulligan |
3 |

Introduction to Science, Technology, and Society: Human Contexts and Ethics of Data |
Hist C182C / STS C100 / ISF C100G CCN: 42032 |
MWF 1 - 2 pm Lewis 100 |
In Spring 2018 this class will offer a special focus on data analytics and information technologies in the contemporary world, as an exemplary case of science, technology, and society. The course provides an introduction to the field of Science and Technology Studies (STS) as a way to study how our knowledge and technology shape and are shaped by social, political, historical, economic, and other factors. We will learn key concepts of the field (e.g., how technologies are understood and used differently in different communities) and explore how human values and technology can interact (e.g., how values are embedded in technical systems, shape the choices of their users, and pose ethical questions for their creators). This class has been proposed to meet the Human Contexts and Ethics requirement of the proposed Data Science major. |
Cathryn Carson |
4 |

### Additional Data-Enabled Courses

These courses are taught in a way that permits students to build on Data 8. Please review prerequisites. To add a proposed course to this list, please contact DSEP.

## Title |
## Course Number |
## Times & Location |
## Description |
## Instructor |
## Units |
---|---|---|---|---|---|

Engineering Data Analysis(link is external) |
CIVENG 93 CCN: 35260 |
TuTh 9 - 10 am O'Brien 212 |
Application of the concepts and methods of probability theory and statistical inference to CEE problems and data; graphical data analysis and sampling; elements of set theory; elements of probability theory; random variables and expectation; simulation; statistical inference. Applications to various CEE problems and real data will be developed by use of MATLAB and existing codes. The course also introduces the student to various domains of uncertainty analysis in CEE. |
Mark Hansen | 3 |

Computational Models of Cognition(link is external) |
COGSCI 131 CCN: 39043 |
TuTh 2 - 3:30 pm Haas F295 |
This course will provide advanced students in cognitive science and computer science with the skills to develop computational models of human cognition, giving insight into how people solve challenging computational problems, as well as how to bring computers closer to human performance. The course will explore three ways in which researchers have attempted to formalize cognition -- symbolic approaches, neural networks, and probability and statistics -- considering the strengths and weaknesses of each. |
Anne Ge Collins | 4 |

Sensemaking and Organizing(link is external) |
COGSCI 190 CCN: 41641 |
MW 9 - 10 am Dwinelle 283 |
When something "makes sense” or " is organized” we are imposing or discovering order in the arrangement of concepts, events, or resources of some kind. Sensemaking and organizing are fundamental human activities that raise many multi‐ or trans‐disciplinary questions about perception, knowledge, decision making, interaction with things and with other people, values and value creation. We can analyze sensemaking and organizing from four interrelated perspectives: As an individual, as a member of a social, cultural, or language community, in institutional contexts, or in data‐intensive or scientific contexts. At the end of the course, students will be more aware of their existing mechanisms and methods for sensemaking and organizing and will have learned a variety of new ones that they can apply as appropriate in the four contexts. |
Robert J. Glushko | 3 |

Introduction to Machine Learning |
COMPSCI 189 CCN: 35661 |
TuTh 3:30 - 5 pm Dwinelle 155 |
Theoretical foundations, algorithms, methodologies, and applications for machine learning. Topics may include supervised methods for regression and classication (linear models, trees, neural networks, ensemble methods, instance-based methods); generative and discriminative probabilistic models; Bayesian parametric learning; density estimation and clustering; Bayesian networks; time series models; dimensionality reduction; programming projects covering a variety of real-world applications. |
TBD | 4 |

Designing, Visualizing and Understanding Deep Neural Networks |
COMPSCI 194-129 CCN: 41752 |
MW 5 - 6:30 pm North Gate 105 |
Topics will vary semester to semester. See the Computer Science Division announcements. | John F. Canny | 4 |

Social Networks(link is external) |
DEMOG 180 CCN: 41045 |
TuTh 11 am - 12:30 pm McCone 141 |
The science of social networks focuses on measuring, modeling, and understanding the different ways that people are connected to one another. We will use a broad toolkit of theories and methods drawn from the social, natural, and mathematical sciences to learn what a social network is, to understand how to work with social network data, and to illustrate some of the ways that social networks can be useful in theory and in practice. We will see that network ideas are powerful enough to be used everywhere from UNAIDS, where network models help epidemiologists prevent the spread of HIV, to Silicon Valley, where data scientists use network ideas to build products that enable people all across the globe to connect with one another. |
Dennis M. Feehan | 3 |

Seminar on Topics in Law and Society(link is external) |
LEGALST 190 CCN: 17157 |
TuTh 10 am - 12 pm Barrows 122 |
Data, Prediction, and Law is a new Legal Studies seminar that allows students to explore different data sources that scholars and government officials use to make generalizations and predictions in the realm of law. The course will also introduce critiques of predictive techniques in law. Students will apply the statistical and Python programming skills from Foundations of Data Science to examine a traditional social science dataset, “big data” related to law, and legal text data. |
Jonathan D. Marshall | 4 |

Introduction to Computational Techniques in Physics(link is external) |
PHYSICS 77 CCN: 28817 |
M 2 - 4 pm Barrows 20 |
Introductory scientific programming in Python with examples from physics. Topics include: visualization, statistics and probability, regression, numerical integration, simulation, data modeling, function approximation, and algebraic systems. Recommended for freshman physics majors. |
Yuri Kolomensky |
3 |

Research and Data Analysis in Psychology(link is external) (10) Research and Data Analysis in Psychology(link is external) (101) |
PSYCH 10/101 |
M 2 - 5 pm Lewis 100 |
The class covers research design, statistical reasoning, and statistical methods appropriate for psychological research. Topics covered in research design include the scientific method, experimental versus correlational designs, controls and placebos, within and between subject designs and temporal or sequence effects. Topics covered in statistics include descriptive versus inferential statistics, linear regression and correlation and univariate statistical tests: t-test, one way and two-way ANOVA, chi-square test. The class also introduces non-parametric tests and modeling. Prospective Psychology majors need to take this course to be admitted to the major. |
Arman D. Catterson |
4 |

STAT 133 CCN: 30844 |
MWF 8 - 9 am Dwinelle 155 |
An introduction to computationally intensive applied statistics. Topics will include organization and use of databases, visualization and graphics, statistical learning and data mining, model validation procedures, and the presentation of results. |
Gaston Sanchez Trujillo |
3 | |

Applied Data Science with Venture Applications(link is external) |
IEOR 135 CCN: 41878 |
TUTh 3:30 - 5 pm Cory 277 |
This highly-applied course surveys a variety of key of concepts and tools that are useful for designing and building applications that process data signals of information. The course introduces modern open source, computer programming tools, libraries, and code samples that can be used to implement data applications. The mathematical concepts highlighted in this course include filtering, prediction, classification, decision-making, Markov chains, LTI systems, spectral analysis, and frameworks for learning from data. Each math concept is linked to implementation using Python using libraries for math array functions (NumPy), manipulation of tables (Pandas), long term storage (SQL, JSON, CSV files), natural language (NLTK), and ML frameworks. |
Ikhlaq Sidhu |
3 |

INFO 190 - 1 CCN: 37950 |
TuTh 2 - 3:30 pm South Hall 202 |
This course introduces students to practical fundamentals of data mining and machine learning with just enough theory to aid intuition building. The course is project-oriented, with a project beginning in class every week and to be completed outside of class by the following week, or two weeks for longer assignments. The in-class portion of the project is meant to be collaborative, with the instructor working closely with groups to understand the learning objectives and help them work through any logistics that may be slowing them down. Weekly lectures introduce the concepts and algorithms which will be used in the upcoming project. Students leave the class with hands-on data mining and data engineering skills they can confidently apply. |
Zachary A. Pardos |
3 | |

GEOG 187 CCN: 24555 |
MW 9:30 - 10:30 am McCone 575 |
A spatial analytic approach to digital mapping and GIS. Given that recording the geolocation of scientific, business and social data is now routine, the question of what we can learn from the spatial aspect of data arises. This class looks at challenges in analyzing spatial data, particularly scale and spatial dependence. Various methods are considered such as hotspot detection, interpolation, and map overlay. The emphasis throughout is hands on and practical rather than theoretical. |
David Bernard O'Sullivan |
4 | |

Introductory Applied Econometrics(link is external)(ENVECON) |
ENVECON/IAS C118 |
TuTh 9:30 - 11 am VLSB 2060 |
Formulation of a research hypothesis and definition of an empirical strategy. Regression analysis with cross-sectional and time-series data; econometric methods for the analysis of qualitative information; hypothesis testing. The techniques of statistical and econometric analysis are developed through applications to a set of case studies and real data in the fields of environmental, resource, and international development economics. Students learn the use of a statistical software for economic data analysis. |
Sofia B. Villas-Boas |
4 |

SOCIOL 106 CCN: 30285 |
Th 8 - 10 am Barrows 475 |
This course will cover more technical issues in quantitative research methods, and will include, according to discretion of instructor, a practicum in data collection and/or analysis. Recommended for students interested in graduate work in sociology or research careers. |
Mao-Mei Liu |
4 | |

Modern Statistical Prediction and Machine Learning(link is external) |
STAT 154 CCN: 30887 |
MWF 11 am - 12 pm Tan Hall 180 |
Theory and practice of statistical prediction. Contemporary methods as extensions of classical methods. Topics: optimal prediction rules, the curse of dimensionality, empirical risk, linear regression and classification, basis expansions, regularization, splines, the bootstrap, model selection, classification and regression trees, boosting, support vector machines. Computational efficiency versus predictive performance. Emphasis on experience with real data and assessing statistical assumptions. |
Gaston Sanchez Trujillo |
4 |