The number of journals and articles published have increased enormously over the past 20 years, making it increasingly difficult for scholars to keep up with the literature. Research in many academic fields requires reviewing the literature to document what we know about a phenomenon and what gaps exist in our knowledge. We are developing a flexible and reproducible method to review academic literature that takes advantage of massive online collections containing nearly all articles published in academic journals (e.g., JSTOR, MathSciNet, Web of Science, MEDLINE). The goal is to harness computers to review the entire corpus of published literature, by charting engagement with specific theories or topics over time and across subfields. This computational method stands in sharp contrast to the time-honored practice of human reading, which can cover only a small fraction of the published corpus.

Professor Heather Haveman (UC Berkeley) and Dr. Jaren Haber (Georgetown University) are analyzing hundreds of thousands of academic articles gathered from JSTOR, the leading online repository of journal articles for the social sciences. Specifically, we are developing a method to construct, validate, and apply dictionaries--which are lists of concepts (unigrams, bigrams, and trigrams) related to a specific theory or topic. Our method harnesses inductive computational text analysis methods, specifically word-embedding models (Word2Vec) and hierarchical clustering. Will you join us?

Term
Fall 2022
Topic
Humanities