Reddit is the 7th most visited website in the world. The neat thing about Reddit, compared to other social media sites, is that we can extract all of the data via the Reddit API. Our research examines whether we can use features of language on Reddit to predict characteristics about the Reddit posters (users)- such as personality and liking of subreddits. All of the data has been cleaned; our objective for the spring is to use NLP and ML tools to build a predictive model to understand both: (1) can we predict social and personality characteristics based on language, and (2) what linguistic cues are associated with different personality characteristics.

Spring 2021
Social Sciences
Technical Area(s)
Natural language processing (NLP)