Graph Database – project Tycho

In the Fall of 2013 the University of Pittsburgh published a great store of public health data. The data includes cases (and deaths)  of reported  diseases for the United States as far back as 1888.

  • Events: cases and deaths
  • Diseases: 47
  • Locations: 50 states, and 1287 cities
  • Covered Years: 1888 to 2013

I am interested in how public health information can be used to better manage outbreaks and vaccinations. This seemed like a great resource to dive into.

The website offers the ability to query various aspects of the data set. Queries such as searching for cases by disease and state, or deaths from disease  by state or city. I wanted to be able to look at the data from a spatial relationship point of view. To do this I needed the data in a different format.

Using the graph database, Neo4j, immediately came to mind. Graphs are about relationships and this data fits that very well.


State “has” city

City “has” Event

Event “is” caseOf  or  deathFrom

Both caseOf and deathFrom have specific information about year, week, number of events.

The first  thing is to retrieve the data and create a graph database. Next geocode each city and then develop queries to see what one can learn.

Retrieve the data

About gricker

Living and learning
This entry was posted in Uncategorized. Bookmark the permalink.