We will explore how historical data becomes historical evidence and how recent technological advances affect long-established practices, such as close attention to historical context and contingency. Will the advent of fast computing and big data make history “count” more or lead to unprecedented insights into the study of change over time? During our weekly discussions, we will apply what we learn in lectures and labs to the analysis of selected historical sources and get an understanding of constructing historical datasets. We will also consider scholarly debates over quantitative evidence and historical argument.

The main Foundations of Data Science course provides a baseline of computing skills, statistical concepts, and data visualization. This connector applies and reinforces inferential and computing ideas from the main course in relation to relevant historical materials, and it introduces students to key debates about the usefulness and limits of such approaches that can be found in many different areas of the study of history.

Concretely, the course addresses how workable historical data are constructed out of incomplete, selective, and messy primary materials, such as archival collections, textual materials, or administrative records. It goes into greater depth on questions related to the marshaling of evidence as touched on in the main course, which can include the use of distributions, sampling, regression, time series data, causation, networks, mapping, and visualization. It opens up methodological questions of quantitative and other approaches to historical argument, helping students understand the options presented by the main course in the context of the overall character of historical thinking.

Topics include :

  • What counts as quantitative history, and what doesn’t?
  • Debate 1: Narrative and quantification
  • Organizing historical data: turning raw data into a database
  • Debate 2: Sampling and its significance: Time on the Cross, and the critics and supporters of cliometrics
  • Visualizing income inequality over time: why are we so confident about the Piketty-Saez “U-shaped” curve?
  • Recap 1: The many uses of descriptive statistics in historical contexts
  • Extensions 1: Indices, growth rates and other tools by historians
  • The promise, and peril, of moving beyond descriptive statistics: thinking historically about explaining variance
  • Extension 2: Thinking about non-linear relationships: a richer explanation and more realistic explanation of variance in the real-world?
  • Debate 3: History in the era of Big Data: a repeat of the cliometrics debate?
  • Recap 2: How many uses of quantitative approaches in historical contexts?