Course page:

From our off-hand Tweets to the well-wrought urn of poetry, text functions both as a device for communication and a way of examining the world around us. We use text to lay out our thoughts in argumentative essays, speeches, and novels that have power of influence at the grand-scale of politics and at the personal scale of our selves. However, vast reams of this text lie apparently beyond our reach: it would be as difficult to sit down and read every blog post from a given day as it would be to read every novel in the library. Data science opens new avenues to “read” at previously untold scale, but if we did read every novel, would that change which ones we thought were important? Would we have to learn a different kind of reading all together? In this course, we will apply methods learned in Foundations of Data Science to sets of literary texts in order to expand our reading practices. This humanities-oriented approach will require us to think about the limits of both new and traditional reading methods and how we make arguments based on data.

This course will have three, overlapping phases. In the first, we will read theoretical texts and examples of traditional reading practices, especially “close reading,” in order to think about how interpretive arguments are made and what evidence is used to back them up. In the second and largest phase, we will experiment with popular statistical methods that have recently gained visibility in literary study, and consider them as forms of “distant reading.” This will emphasize hands-on interaction with texts, programming work-flow, and collaboration. The final part of the class will return to the theoretical question of whether these data science methods offer us new interpretations of literature, and what problems remain unsettled.

The final project in this course will require students to write a paper, in which interpretive, literary arguments rely on both data and traditional evidence.