With the data in files by state it is time to create and fill the database.
Again I went with Python since it was simple to connect using REST. The first step was to determine the nodes and relationships. As far as I can tell there is no clear path in this area. With a relational database one would start with tables and then add foreign keys. Using Neo4j I decided to create nodes based on key items in the data.
State, City, Dies ease, Case, Death
Relationships are the glue that connects nodes. Unlike FK they can be more descriptive. Consider State and City. The relationship I considered was “HAS_A”. A State “HAS_A” City. I chose “CITY_IN_STATE”. City “CITY_IN_STATE” State.
State -> CITY_IN_STATE -> City
Case -> CASE_OF -> Disease
City -> HAS_CASE -> Case
Death-> DEATH_FROM -> Disease
City -> HAS_DEATH -> Death
Using Python to create the database.
gdb = GraphDatabase(“localhost:7474/db/data”)
state_idx = gdb.node.indexes.create(‘states’)
city_idx = gdb.node.indexes.create(‘cities’)
disease_idx = gdb.node.indexes.create(‘diseases’)
stateLabel = gdb.labels.create(“State”)
cityLabel = gdb.labels.create(“City”)
diseaseLabel = gdb.labels.create(“Disease”)
I then created functions to create nodes driven by the data files.
def create_city(name): cityNode = gdb.node(name=name, description=name) cityLabel.add(cityNode) city_idx['name'][name] = cityNode return cityNode
There are functions for States, cases, diseases, and deaths.
Create a city and state node. Then create the relationship:
cityNode = create_city(cityName) stateNode.relationships.create("CITY_IN_STATE", cityNode)
Below is how cases are created and the relationships built.
caseNode= create_case_event(caseName,"CASE",year,week,number,stateName) caseNode.relationships.create("CASE_OF", diseaseNode) cityNode.relationships.create("HAS_CASE", caseNode)