Post on 11-May-2015
An Introductionto Neo4j@doryokujin
GraphDB Meet-Up Japan #1
・Takahiro Inoue(age 26)
・twitter: doryokujin
・Majored in Math (Statistics & Graph Algorithm)
・Data Scientist
・Leader of MongoDB JP
・Interest: DataProcessing, GraphDB
About Me
(1) Introduction
(2) Code Examples
(3) Cypher
(4) Other Features
Agenda
(1) Introduction
・
・24/7 production deployment since 2003
・2011/09: $10.6 Million Series A Funding
・Always “Java first”
・ACID transactions
・Property Graph Model
・Using Lucene index for graph propeties
Neo4j Product Overview
・No Object/Relational mismatch: - Every Object-Oriented Model is a “Graph”
・Easy schema evolution: - Data first, Bottom-up approach to schemas
・Efficient storage of semi-structured information:- Neo4j's key-value properties can efficiently represent semi-structured data
・High performance on deep traversals
・Disk-based, native graph storage manager
Neo4j Key Benefits
Neo4j Product Overview
・ACID transactions: - custom JTA/JTS-compliant transaction manager
- distributed transactions
- two-phase commit (2PC)
- transaction recovery
- deadlock detection
- enterprise-strength database.
・Massive scalability:- supporting billions of nodes/relationships/properties on single-machine
Neo4j Key Benefits
Neo4j Product Overview
・”Today, NOSQL is by and large a Web phenomenon, not an enterprise success story”
“Bulding The Enterprise NOSQL Company”
・Support for transactions
・Support for durability
・Support for Java
NOSQL, The Web And The Enterprise
NOSQL, The Web And The Enterprise
Neo4j License and Price List
Edition license Description Price
Community Open source (GPLv3)
Fully ACID Transactional Graph DB
Free
Advanced Commercial and AGPL
+ SNMP & JMX Monitoring 500 USD
Enterprise Commercial and AGPL
+ High load & High availability 2000 USD
Neo4j PriceList
(2) Code Examples
・Define the relationship types we want to use:
・next step is to start the database server:
・Finally, shut down the database server
private static enum ExampleRelationshipTypes implements
RelationshipType
{
EXAMPLE
}
Hello World
GraphDatabaseService graphDb = new EmbeddedGraphDatabase( DB_PATH );
registerShutdownHook( graphDb );
graphDb.shutdown();
Transaction tx = graphDb.beginTx();
try
{
Node firstNode = graphDb.createNode();
firstNode.setProperty( NAME_KEY, "Hello" );
Node secondNode = graphDb.createNode();
secondNode.setProperty( NAME_KEY, "World" );
firstNode.createRelationshipTo( secondNode,
ExampleRelationshipTypes.EXAMPLE );
String greeting = firstNode.getProperty( NAME_KEY ) + " "
+ secondNode.getProperty( NAME_KEY );
System.out.println( greeting );
tx.success();
}
finally
{
tx.finish();
}
Creates 2 nodes
Creates an relationship
Start transaction
Node ClassClass Method Description
Relationship createRelationshipTo(...)
Creates a relationship between this node and another node.
void delete() Deletes this node if it has no relationships attached to it.
Iterable<Relationship> getRelationships() Returns all the relationships attached to this node.
boolean hasRelationship() Returns true if there are any relationships attached to this node.
Traverser traverse(...) Instantiates a traverser
Path ClassClass Method Description
Node endNode() Returns the end node of this path.
Iterator<PropertyContainer> iterator() Iterates through both the Nodes and Relationships of this path in order.
Relationship lastRelationship() Returns the last Relationship in this path.
int length() Returns the length of this path.(i.e. the number of relationships)
Iterable<Node> nodes() Returns all the nodes in this path.
Iterable<Relationship> relationships() Returns all the relationships in between the nodes which this path consists of.
Node startNode() Returns the start node of this path.
String toString() Returns a natural string representation of this path.
Relationship ClassClass Method Description
void delete() Deletes this relationship.
Node getEndNode() Returns the end node of this relationship.
long getId() Returns the unique id of this relationship.
Node[] getNodes() Returns the two nodes that are attached to this relationship.
Node getOtherNode(Node node) A convenience operation that
Node getStartNode() Returns the start node of this relationship.
RelationshipType getType() Returns the type of this relationship.
boolean isType(RelationshipType type) Indicates whether this relationship is of the type type.
PropertyContainer Class
Class Method Description
GraphDatabaseService getGraphDatabase() Get the GraphDatabaseService that this Node or Relationship belongs to.
Object getProperty( String key)
Returns the property value associated with the given key.
boolean hasProperty( String key)
Returns true if this property container has a property accessible through the given key.
Object removeProperty( String key)
Removes the property associated with the given key and returns the old value.
void setProperty(String key, Object value)
Sets the property value for the given key to value.
for nodes and relationships
private void printFriends( Node person ){ Traverser traverser = person.traverse(
Order.BREADTH_FIRST, // 幅優先探索を行う StopEvaluator.END_OF_GRAPH, // Graph全体を走査 ReturnableEvaluator.ALL_BUT_START_NODE,
MyRelationshipTypes.KNOWS, // ”KNOWS”の関係を持った辺を辿る Direction.OUTGOING ); // 外に向かう矢線を辿る for ( Node friend : traverser )
{ // 返されたNodeの属性”name”の値を取得 System.out.println( friend.getProperty( "name" ) ); }}
Traversals
Neo4j Wiki
1
1
2
3
TrinityMorpheusCypherAgent Smith
Evaluators ClassMethod Description
all() Returns all nodes.
atDepth(int depth) Returns an Evaluator which only includes positions at depth and prunes everything deeper than that.
excludeStartPosition() Returns the unique id of this relationship.
fromDepth(int depth) Returns an Evaluator which only includes positions from depth and deeper and never prunes anything.
includingDepths(int minDepth, int maxDepth)
Returns an Evaluator which only includes positions between depths minDepth and maxDepth.
toDepth(int depth) Returns an Evaluator which includes positions down to depth and prunes everything deeper than that.
private static Traverser findHackers( final Node startNode ){ TraversalDescription td = Traversal.description() .breadthFirst() .relationships( RelTypes.CODED_BY, Direction.OUTGOING ) .relationships( RelTypes.KNOWS, Direction.OUTGOING ) .evaluator( Evaluators.returnWhereLastRelationshipTypeIs( RelTypes.CODED_BY ) ); return td.traverse( startNode );}
Traverser traverser = findHackers( getNeoNode() );int numberOfHackers = 0;for ( Path hackerPath : traverser ){ System.out.println( "At depth " + hackerPath.length() + " => " + hackerPath.endNode() .getProperty( "name" ) );} 12.4. Traversal
New Traversal Framework
Traverse among 2 relation types
・Order- BREADTH_FIRST
- DEPTH_FIRST
・Relationship- BOTH, INCOMING, OUTGOING
・ReturnType- node
- relationship
- path: contains full representations of start and end node, the rest are URIs
- fullpath: contains full representations of all nodes and relationships
Traverser traverser = person.traverse(
Order.BREADTH_FIRST,
StopEvaluator.END_OF_GRAPH,
ReturnableEvaluator.ALL_BUT_START_NODE,
MyRelationshipTypes.KNOWS,
Direction.OUTGOING );
for ( Node friend : traverser ){...}
Traversals
・StopEvaluator- END_OF_GRAPH, DEPTH_ONE
・ReturnableEvaluator- ALL, ALL_BUT_START_NODE
/*
1. Begin a transaction.
2. Operate on the graph.
3. Mark the transaction as successful (or not).
4. Finish the transaction.
*/
Transaction tx = graphDb.beginTx();
try
{
... // any operation that works with the node space
tx.success();
}
finally
{
tx.finish();
}
Transaction
・Indexing either nodes or relationships
・For their prorerties※ Each node has direct references to its adjacent vertices
・Default: neo4j-lucene-index component
・Full Text Search, Sorting, Caching, Range Query
・Can index with GraphDB “itself”- B-Trees, RTrees, QuadTrees
Indexing
Indexing their propertiesGraph Databases and Endogenous Indices
createdcreated
follows
follows
created
citescites
created
cites
createdfollows
follows
follows
name=twarkoage=30
name=ahzf
name=graph_blogviews=1000
name=tenderlovegender=male
date=2007/10
name=neo4jviews=56781
page_rank=0.023
name=peterneubauer
name property index
views property index gender property index
The Graph Traversal Programming Pattern
Lucene Index
Each Element have direct pointer to its neighbours
GraphDatabaseService graphDb = new
EmbeddedGraphDatabase( "path/to/neo4j-db" );
IndexService index = new LuceneIndexService( graphDb );
Node andy = graphDb.createNode();
Node larry = graphDb.createNode();
andy.setProperty( "name", "Andy Wachowski" );
andy.setProperty( "title", "Director" );
larry.setProperty( "name", "Larry Wachowski" );
larry.setProperty( "title", "Director" );
index.index( andy, "name", andy.getProperty( "name" ) );
index.index( andy, "title", andy.getProperty( "title" ) );
index.index( larry, "name", larry.getProperty( "name" ) );
index.index( larry, "title", larry.getProperty( "title" ) );
Indexing
http://wiki.neo4j.org/content/Indexing_with_IndexService
// Return the andy node.
index.getSingleNode( "name", "Andy Wachowski" );
// Containing only the larry node
for ( Node hit : index.getNodes( "name", "Larry Wachowski" ) )
{
// do something
}
// Containing both andy and larry
for ( Node hit : index.getNodes( "title", "Director" )
{
// do something
}
Indexing
http://wiki.neo4j.org/content/Indexing_with_IndexService
IndexService index = // your LuceneFulltextIndexService
index.getNodes( "name", "wachowski" ); // --> andy and larry
index.getNodes( "name", "andy" ); // --> andy
index.getNodes( "name", "Andy" ); // --> andy
index.getNodes( "name", "larry Wachowski" ); // --> larry
index.getNodes( "name", "wachowski larry" ); // --> larry
index.getNodes( "name", "wachow* andy" ); // --> andy and larry
index.getNodes( "name", "Andy" ); // --> andy
index.getNodes( "name", "andy" ); // --> andy
index.getNodes( "name", "wachowski" ); // --> andy and larry
index.getNodes( "name", "+wachow* +larry" ); // --> larry
index.getNodes( "name", "andy AND larry" ); // -->
index.getNodes( "name", "andy OR larry" ); // --> andy and larry
index.getNodes( "name", "Wachowski AND larry" ); // --> larry
Full text Search
http://wiki.neo4j.org/content/Indexing_with_IndexService
GraphDB as an External Indexing System
The Graph Traversal Pattern 13
3.2 Traversing Endogenous Indices
A graph is a general-purpose data structure. A graph can be used to modellists, maps, trees, etc. As such, a graph can model an index. It was assumed,in §2.2, that a graph database makes use of an external indexing system toindex the properties of its vertices and edges. The reason stated was that spe-cialized indexing systems are better suited for special-purpose queries such asthose involving full-text search. However, in many cases, there is nothing thatprevents the representation of an index within the graph itself—vertices andedges can be indexed by other vertices and edges.24 In fact, given the nature ofhow vertices and edges directly reference each other in a graph database, indexlook-up speeds are comparable. Endogenous indices a↵ord graph databases agreat flexibility in modeling a domain. Not only can objects and their rela-tionships be modeled (e.g. people and their friendships), but also the indicesthat partition the objects into meaningful subsets (e.g. people within a 2Dregion of space).25 The remainder of this subsection will discuss the represen-tation and traversal of a spatial, 2D-index that is explicitly modeled within aproperty graph.
The domain of spatial analysis makes use of advanced indexing structuressuch as the quadtree [4, 17]. Quadtrees partition a two-dimensional plane intorectangular boxes based upon the spatial density of the points being indexed.Figure 7 diagrams how space is partitioned as the density of points increaseswithin a region of the index.
Fig. 7. A quadtree partition of a plane. This figure is an adaptation of a publicdomain image provided courtesy of David Eppstein.
24 One of the primary motivations behind this article is to stress the importance ofthinking of a graph as simply an index of itself, where the primary purpose is totraverse the various defined indices in ways that elicit problem-solving within thedomain being modeled.
25 Those indices that have a graph-like structure are suited for representing as agraph. It is noted that not all indices meet this criteria.
14 Marko A. Rodriguez1 and Peter Neubauer2
In order to demonstrate how a quadtree index can be represented and tra-versed, a toy graph data set is presented. This data set is diagrammed in Fig-ure 8. The top half of Figure 8 represents a quadtree index (vertices 1-9). This
a b
c
e
f
h
i
d
g
1
2 4
5 86 7
3
type=quadbl=[0,0]
tr=[100,100]
[100,100]
[0,0]
[0,100]
[100,0]
1
2
3
4
5
6
7
8
type=quadbl=[0,0]
tr=[50,100]
type=quadbl=[50,0]
tr=[100,100]
type=quadbl=[0,50]tr=[50,100]
type=quadbl=[50,0]tr=[100,50]
type=quadbl=[0,0]tr=[50,50]
type=quadbl=[50,50]tr=[100,100]
type=quadbl=[50,25]tr=[75,50]
bl=[25,20]tr=[90,45]
sub sub
9
9
type=quadbl=[50,25]tr=[62,37]
Fig. 8. A quadtree index of a space that contains points of interest. The index iscomposed of the vertices 1-9 and the points of interest are the vertices a-i. While notdiagrammed for the sake of clarity, all edges are labeled sub (meaning subsumes) andeach point of interest vertex has an associated bottom-left (bl) property, top-right(tr) property, and a type property which is equal to “poi.”
quadtree index is partitioning “points of interest” (vertices a-i) located withinthe diagrammed plane.26 All vertices maintain three properties—bottom-left
26 The plane depicted does not actually exist as a data structure, but is representedhere to denote how the di↵erent vertices lying on that plane are spatially located(i.e. spatial information is represented explicitly in the properties of the vertices).Thus, vertices closer to each other on the plane are closer together.
The Graph Traversal Pattern
(3) Cypher
Cypher・Designed to be a humane query language
・Most of the keywords like WHERE and ORDER BY are inspired by SQL
・Pattern matching borrows expression approaches from SPARQL
・Regular expression matching is implemented using the Scala programming language
TraversalDescription description = Traversal.description()
.breadthFirst()
.relationships(Relationships.PAIRED, Direction.OUTGOING)
.evaluator(Evaluators.excludeStartPosition());
description.traverse( startNode ); // Retrieves the traverser
Cypher
start programmer=(3) match (programmer)-[:PAIRED]->(pair) return pair
start programmer=(3) match (programmer)-[:PAIRED]->(pair)
where pair.age > 30 return pair, count(*) order by age
skip 5 limit 10
Cyper is very simple
More complex conditions
How Neo4j uses Scala’s Parser Combinator: Cypher’s internals ‒ Part 1
[InComing/Outgoing relationships]
# All nodes that A has outgoing relationships to.
> start n=node(3) match (n)-->(x) return x
==> Node[4]{name->"Bossman"}
==> Node[5]{name->"Cesar"}
> start n=node(3) match (n)<--(x) return x
==> Node[1]{name->David"}
[Match by relationship type]
# All nodes that are Blocked by A.
> start n=node(3) match (n)-[:BLOCKS]->(x) return x
==> Node[5]{name->"Cesar"}
[Multiple relationships]
# The three nodes in the path.
> start a=node(3) match (a)-[:KNOWS]->(b)-[:KNOWS]->(c) return a,b,c
==> a: Node[3]{name->"Anders"}
==> b: Node[4]{name->"Bossman"}
==> c: Node[2]{name->"Emil"} 15.4. Match
Match
[Shortest path]
# : find the shortest path between two nodes, as long as the path is max 15
relationships long. Inside of the parenthesis you can write
> start d=node(1), e=node(2) match p = shortestPath( d-[*..15]->e ) return p
==> p: (1)--[KNOWS,2]-->(3)--[KNOWS,0]-->(4)--[KNOWS,3]-->(2)
15.4. Match
Graph Algorithm
[Count/ Group Count]
> start n=node(2) match (n)-->(x) return n, count(*)
# The start node and the count of related nodes.
==>
n: Node[2]{name->"A",property->13}
count(*): 3
> start n=node(2) match (n)-[r]->() return type(r), count(*)
# The relationship types and their group count.
==>
TYPE(r): KNOWS
count(*): 3
15.7.Aggregation
Aggregation
[SUM/AVG/MAX/COLLECT]
> start n=node(2,3,4) return sum(n.property)
==> 90
> start n=node(2,3,4) return avg(n.property)
==> 30.0
> start n=node(2,3,4) return max(n.property)
==> 44
> start n=node(2,3,4) return collect(n.property)
==> List(13, 33, 44)
15.7.Aggregation
Aggregation
(4) Other Features
High Availability
・Always a single master and zero or more slaves
・Neo4j HA can handle writes on a slave so there is no need to redirect writes to the master
・A slave will handle writes by synchronizing with the master to preserve consistency
・Updates propagate from the master to other slaves eventually
・If the master goes down any running write transaction will be rolled back and during master election no write can take place
High Availability
High Availability
[Zookeeper as a distributed coordination service]・Master Election・Propagation of Cluster and Machine Status Information・Fault Detection
Slaves can handle write transactions.
Updates to slaves are eventual consistent
Write operation automatically synchronize with the master
Automatic Failover
Only Available in the Neo4j Enterprise
[1] Full BackUp
・Full backup copies the database files
・Without acquiring any locks
・Transactions will continue and the store will change
[2] Incremental BackUp
・Incremental backup does not copy store files
・Instead it copies the logs of the transactions
BackUpOnly Available in the Neo4j Enterprise
・Shell - Can create Graph、Traverse、Indexing on the Shell
・neo4j-server - Neo4j REST API ・Neoclipse - Graph Data Visualization Tool
・Batch Insert・SQL Importer
Tools
[Languages]
・Clojure
・Erlang bindings to Neo4j: nerlo , cali
・Gremlin Graph programming language
・Groovy
・Java object mapping
・PHP
・Python
・Ruby (including RESTful API)
・Scala (including RESTful API)
Languages[Frameworks]
・Grails
・Griffon
・Qi4j Domain Driven Development in
Java, with great persistence architecture
・Roo
[Neo4j REST clients]
・Neo4RestNet .Net REST client
・Neo4jRestSharp .Net REST client
・Common Lisp REST client project page
・PHP REST client getting started
・Python REST client
Heroku Add-on
http://addons.heroku.com/neo4j
GraphDB: Comparison (2010/12)GraphDB License Language Protocol Data
Model Gremlin Binding SQL Like Query
Neo4j GPL Java REST/JSON
Property Graph
YesRuby, Python, Scala,...
-
sones AGPLv3 C#REST/JSON(XML)
Property Graph(+Extend)
Yes - Yes
OrientDB Apache2.0 Java REST/JSON
Property Graph
YesPHP, Jruby,Python, JS,...
Yes
Info Grid AGPLv3 Java REST/JSON
Property Graph?
(MeshObj)- - -
Infinite Graph Product C++ - Property
Graph - - -