Retrieve the data.
The Tycho project supplies a REST interface to query various data. An API Key is required to access the data. One of the first queries is a list of the diseases in the set.
http://www.tycho.pitt.edu/api/diseases?apikey=APIKEY. The result looks something like:
<result> <count>47</count> <row> <disease>ANTHRAX</disease> </row> <row> <disease>BOTULISM</disease> </row>
Currently I restricted the diseases to:
- YELLOW FEVER
I chose to limit the data until I am confident the model is correct and I figure out a faster way to import the data.
I decided to get the data I wanted and store it by state in simple csv files. This way I could experiment with the data without having to go back to the website each time.
Using Python, its easy to construct a process to query and save the files. The first step is to get the disease list. Next, get a list of the states. Then start a loop through each state and get the list of cities. For each city, get events (cases or deaths) for each disease.
This is my first attempt with Python and the code is pretty rough. I created classes (State,City, Case, Death) to hold the information such as year, week , state and city. Each file is called state.dat. Because I have had to rerun the process I only pull data where there is no file.
listOfdisease = get_disease() listOfStates = getStates() for stateName in listOfStates: if not os.path.isfile(stateName+".data"): myfile = open(stateName+".data","w") listOfCities = getCities(stateName) for cityName in listOfCities: for disease in listOfdisease: getCases(stateName ,cityName ,disease ) getDeaths(stateName ,cityName ,disease ) for case in listOfCases: myfile.write("Case,"+d+","+case.year+","+ case.week+"," +case.number+","+ case.state.getName()+"\n") for death in listOfDeaths: myfile.write("Death,"+ d+","+death.year+", "+death.week+","+death.number+","+ death.state.getName()+"\n")