Do charter schools' marketing strategies attract parents of specific race and class backgrounds, contributing to educational segregation by race and class? To answer such questions, our team in May 2018 used our supercomputer allocation to crawl the websites of every U.S. charter school open today, yielding a snapshot of over 6,000 cases. This crawler is no longer available--creating the opportunity to develop a state-of-the-art, universal, reproducible framework for capturing text data from the web that builds on our previous success. Specifically, we aim this semester to capture new data as well as data from back in time by drawing on multiple Python-based approaches (scrapy, selenium, and wget) and the Internet Archive. We are looking for research apprentices to develop this complex web-scraping pipeline (in Python) and/or build statistical models (in R) to predict Structural Topic Model loadings using school and district race and poverty and to visualize results.

Spring 2020
Social Sciences