Probability for Data Science

This new course introduces students to probability theory using both mathematics and computation, the two main tools of the subject. The contents have been selected to be useful for data science, and include discrete and continuous families of distributions, bounds and approximations, dependence, conditioning, Bayes methods, random permutations, convergence, Markov chains and reversibility, maximum likelihood, and least squares prediction. Labs will cover a variety of topics including matches in random sampling, distance between distributions, Page rank, and Markov Chain Monte Carlo methods. The prerequisites are Foundations of Data Science (Data 8) and one year of calculus. Data 8 gives students a practical understanding of randomness and sampling variability. Stat 140 will capitalize on this, abstraction and computation complementing each other throughout. Students will develop multiple approaches to problem solving, understand the difference between theory and simulation, and appreciate the power of both. 

For more information, please visit the course website at