Graph Database – project Tycho part 2

Retrieve the data.

The Tycho project supplies a REST interface to query various data. An API Key is required to access the data. One of the first queries is a list of the diseases in the set.

http://www.tycho.pitt.edu/api/diseases?apikey=APIKEY. The result looks something like:

<result>
<count>47</count>
<row>
<disease>ANTHRAX</disease>
</row>
<row>
<disease>BOTULISM</disease>
</row>

Currently I restricted the diseases to:

  • INFLUENZA
  • RUBELLA
  • CHOLERA
  • MALARIA
  • MUMPS
  • MEASLES
  • SMALLPOX
  • YELLOW FEVER

I chose to limit the data until I am confident the model is correct and I figure out a faster way to import the data.

I decided to get the data I wanted and store it by state in  simple csv files. This way I could experiment with the data without having to go back to the website each time.

Using Python, its easy to construct a process to query and save the files. The first step is to get the disease list. Next, get a list of the states. Then start a loop through each state and get the list of cities. For each city, get events (cases or deaths) for each disease.

This is my first attempt with Python and the code is pretty rough. I created classes (State,City, Case, Death) to hold the information such as year, week , state and city. Each file is called state.dat. Because I have had to rerun the process I only pull data where there is no file.

 listOfdisease = get_disease()
 listOfStates = getStates()
 for stateName in listOfStates:
    if not os.path.isfile(stateName+".data"):
    myfile = open(stateName+".data","w")
    listOfCities = getCities(stateName)
      for cityName in listOfCities:
        for disease in listOfdisease:
             getCases(stateName ,cityName ,disease )
             getDeaths(stateName ,cityName ,disease )
             for case in listOfCases:
              myfile.write("Case,"+d+","+case.year+","+
                           case.week+","
                           +case.number+","+
                            case.state.getName()+"\n") 

             for death in listOfDeaths:
                 myfile.write("Death,"+ d+","+death.year+",
                 "+death.week+","+death.number+","+
                 death.state.getName()+"\n")
 

Load the database.

Posted in Uncategorized

Graph Database – project Tycho

In the Fall of 2013 the University of Pittsburgh published a great store of public health data. The data includes cases (and deaths)  of reported  diseases for the United States as far back as 1888.

  • Events: cases and deaths
  • Diseases: 47
  • Locations: 50 states, and 1287 cities
  • Covered Years: 1888 to 2013

I am interested in how public health information can be used to better manage outbreaks and vaccinations. This seemed like a great resource to dive into.

The website offers the ability to query various aspects of the data set. Queries such as searching for cases by disease and state, or deaths from disease  by state or city. I wanted to be able to look at the data from a spatial relationship point of view. To do this I needed the data in a different format.

Using the graph database, Neo4j, immediately came to mind. Graphs are about relationships and this data fits that very well.

Consider:

State “has” city

City “has” Event

Event “is” caseOf  or  deathFrom

Both caseOf and deathFrom have specific information about year, week, number of events.

The first  thing is to retrieve the data and create a graph database. Next geocode each city and then develop queries to see what one can learn.

Retrieve the data

 

 

 

https://www.tycho.pitt.edu/

Posted in Uncategorized