In the Fall of 2013 the University of Pittsburgh published a great store of public health data. The data includes cases (and deaths) of reported diseases for the United States as far back as 1888.
- Events: cases and deaths
- Diseases: 47
- Locations: 50 states, and 1287 cities
- Covered Years: 1888 to 2013
I am interested in how public health information can be used to better manage outbreaks and vaccinations. This seemed like a great resource to dive into.
The website offers the ability to query various aspects of the data set. Queries such as searching for cases by disease and state, or deaths from disease by state or city. I wanted to be able to look at the data from a spatial relationship point of view. To do this I needed the data in a different format.
Using the graph database, Neo4j, immediately came to mind. Graphs are about relationships and this data fits that very well.
State “has” city
City “has” Event
Event “is” caseOf or deathFrom
Both caseOf and deathFrom have specific information about year, week, number of events.
The first thing is to retrieve the data and create a graph database. Next geocode each city and then develop queries to see what one can learn.