With CMS rules published last year, all hospitals have to make their pricing transparent in a machine-readable format. This is not unified in a database (WSJ has put a partial one together). The project would be to pull the information from different hospital websites and generate a database of hospital pricing for the billing codes provided together with other variables like location, hospital financing structure, reported profit metrics, CEO compensation etc.
Challenge is going to be to get the data in the database. Hospitals HATE that they have to post pricing and they are of course trying to make it hard to compare. It takes a fair bit of navigating each site to find it. Maybe a web crawler could do this but this is really where I need help - I'm a scientist wo doesn't code and don't know how to get this done in an automated fashion.
Dataset doesn't have to be complete, but needs to cover a significant amount of ground.
With that in hand, I'm planning to use spotfire (have available at CEND) to do the large-scale data analysis and stats, to identify correlations and answers to particular questions .
This project is going to yield interesting results since prices are expected to vary significantly and a large part of the analysis would be to investigate why certain hospitals charge 500$ for a blood draw and others don't.
It could also provide crucial information for self-payers in order to avoid getting huge bills they weren't expecting.