Group Dynamics on Reddit

I downloaded 10 years of Reddit data (metadata and content). I am looking to clean the data and run some statistical models to examine what predicts group commitment (i.e., commitment to subreddit). The team would help me: 1) efficiently clean the massive dataset, 2) fit statistical models to test theoretically-derived predictions, and 3) visualize the results. Expertise in NLP is a plus because we have the actual word content.