Outline
Assumptions What is a graph and what are they good for? What is a graph database? What is Neo4J and how does one use it? Case: Subway Model Results Compilation Questions
2
Audience Assumptions
Working knowledge of:
Java Relational databases
3
What is a graph?
Collection of nodes and edges Edges can be directed (or not) Edges can represent many things
4
Chuck
Jim
Jay
Gary
KnowsKnow
s
KnowsK
now
sKnow
s
Coco
Annoys
What is a graph?
5
Transportation Example
6
Denver
18:00
20:00
13:0
0
15:0
0 12:00
Los Angeles
ChicagoNew York
Dallas
16:00
17:00
What are graphs good for?
Often map more directly to the structure of some object-oriented problems.
Work best for storing “richly connected” data Many algorithms exist to extract useful information
– Dijkstra’s shortest path– Minimum spanning tree (Kruskal and others)
7
What is a graph database?
A NoSQL database that stores nodes and edges, and provides a mechanism to easily query information from it.
Can contain nodes of different types Can have free-form attributes within
nodes Can have edges (relationships) of
different types Can have attributes attached to edges
(distance, cost, relationship) Query mechanism
8
Why would I ever use one?
Easier to find solutions to certain problems by framing data graphically.
“The right tool for the job”
9
Neo4J
Open source (GPLv3 for Community Edition) V1.0 released in 2010 Written in Java and Scala Managed by Neo Technology Uses the Property Graph Model Embedded or server Fully transactional Set of jar files ~30MB Query language: Cypher
10
How do you use Neo4J?
Creating a database
11
// Location of databaseString dbPath = “/Users/chuck/myneodb”;
GraphDatabaseFactory factory = new GraphDatabaseFactory();GraphDatabaseBuilder builder = factory.newEmbeddedDatabaseBuilder(dbPath); GraphDatabaseService dbService = builder.newGraphDatabase();
How do you use Neo4J?
Creating fixed node and edge types
12
// Node typespublic enum NodeLabel implements Label {Station}; // Relationship typespublic enum RelType implements RelationshipType {TRACKS_TO, ROUTE_TO, AIRWAY_TO};
How do you use Neo4J?
Adding nodes to a database
13
// Create Station nodeNode node1 = dbService.createNode(NodeLabel.Station);
// Set properties on the Stationnode1.setProperty("number", “100”);node1.setProperty("name", “State St”);
// Add anotherNode node2 = dbService.createNode(NodeLabel.Station);node2.setProperty("number", “101”);node2.setProperty("name", “Lake St”);
How do you use Neo4J?
Adding edges to a database
14
// Create edge from node1 to node2Relationship edge = node1.createRelationshipTo(node2, RelType.ROUTE_TO);
// Set props on the edgeedge.setProperty("route", “State St Subway”);edge.setProperty("line", “Red”);
// Create another edge of a different type.edge = node1.createRelationshipTo(node2, RelType.TRACK_TO);
How do you use Neo4J?
Querying the database
15
// Returns station numbers of all stations in graph.String queryText = “MATCH (stn:Station) RETURN stn.number";
ExecutionEngine engine = new ExecutionEngine(dbService);ExecutionResult result = engine.execute(queryText);Iterator<String> stnIt = result.columnAs("stn.number");
// Print resultswhile (stnIt.hasNext()) System.out.println(stnIt.next());
Case: Studying Subways
16
Case: Studying Subways
Questions we might want to ask:
“Find all the stations that have air connectivity paths to station X that are less than K km”
“Find all the train routes that that go through all stations that are N stops from station X”
17
Case: Studying Subways
18
Case: Studying Subways
19
Stations- Number (id)- Name- Lat- Lon
Segments- SegmentId (id)- StationFromNumber- StationToNumber- Length- SegmentType
SegmentTypes- SegmentTypeId (id)- TypeName
LineSegments- LineId (id)- SegmentId- SegmentIndex
Lines- LineId (id)- LineName- LineDirection
Relational attempt…
Case: Studying Subways
20
Station NameStation Number
Route Name
Distance
Distance
Graph attempt…
Node
ROUTE_TO Edge
TRACKS_TO Edge
AIRWAY_TO Edge
Case: Studying Subways
21
St. Paul’s100
Bank101
Cannon Street200
Monument201
Tower Hill202
Tower Gateway300
Red
Green
Yellow
1 km
.7 km
.2 km
1.8 km
Green
Yellow
.1 km
White/Blue
Case: Studying Subways
Answering the question: returns all stations 2 track segments from station 200 (Cannon Street)
22
MATCH p=(fromStn:Station)-[edge:TRACKS_TO*2..2]-(toStn:Station {number:‘200’})
WHERE fromStn.number <> toStn.number
RETURN distinct fromStn,toStn,fromStn.number
Drawbacks
No standard query language like SQL. Vendor-specific. Query language learning curve. Lack of built-in visualization tools.
23
For Further Reading…
24
Ian Robinson, Jim Webber, Emil EifremGraph Databases, 2nd EditionO’Reilly and Associates
Rik Van BruggenLearning Neo4JPackt Publishing
http://www.neo4j.com
http://www.analytics-driven.com
Questions
25
Top Related