With the data in files by state it is time to create and fill the database.
Again I went with Python since it was simple to connect using REST. The first step was to determine the nodes and relationships. As far as I can tell there is no clear path in this area. With a relational database one would start with tables and then add foreign keys. Using Neo4j I decided to create nodes based on key items in the data.
Nodes:
State, City, Dies ease, Case, Death
Relationships are the glue that connects nodes. Unlike FK they can be more descriptive. Consider State and City. The relationship I considered was “HAS_A”. A State “HAS_A” City. I chose “CITY_IN_STATE”. City “CITY_IN_STATE” State.
Relationships:
“CITY_IN_STATE”,”CASE_OF”,”HAS_CASE”,”DEATH_FROM”,”HAS_DEATH”
State -> CITY_IN_STATE -> City
Case -> CASE_OF -> Disease
City -> HAS_CASE -> Case
Death-> DEATH_FROM -> Disease
City -> HAS_DEATH -> Death
Using Python to create the database.
gdb = GraphDatabase(“localhost:7474/db/data”)
state_idx = gdb.node.indexes.create(‘states’)
city_idx = gdb.node.indexes.create(‘cities’)
disease_idx = gdb.node.indexes.create(‘diseases’)
stateLabel = gdb.labels.create(“State”)
cityLabel = gdb.labels.create(“City”)
diseaseLabel = gdb.labels.create(“Disease”)
I then created functions to create nodes driven by the data files.
def create_city(name): cityNode = gdb.node(name=name, description=name) cityLabel.add(cityNode) city_idx['name'][name] = cityNode return cityNode
There are functions for States, cases, diseases, and deaths.
Create a city and state node. Then create the relationship:
cityNode = create_city(cityName) stateNode.relationships.create("CITY_IN_STATE", cityNode)
Below is how cases are created and the relationships built.
caseNode= create_case_event(caseName,"CASE",year,week,number,stateName) caseNode.relationships.create("CASE_OF", diseaseNode) cityNode.relationships.create("HAS_CASE", caseNode)