NoSQL with Graphs

54
NoSQL with Graphs mining graphs for fun & profit claudio martella NoSQLDay 2011 Saturday, March 26, 2011

description

A talk about graph databases and their usage.

Transcript of NoSQL with Graphs

Page 1: NoSQL with Graphs

NoSQL with Graphsmining graphs for fun & profit

claudio martellaNoSQLDay 2011

Saturday, March 26, 2011

Page 2: NoSQL with Graphs

Outline

Graphs

Why

Tools

Apps

NoSQL

RDBMS

O(1)

Semantic Web

Tinkerpop

Recommendation

Query

Table

Documents

GraphDBs

2

Saturday, March 26, 2011

Page 3: NoSQL with Graphs

Who am I?

• PhD in Distributed Graphs @ UniBZ

• Analyst @ TIS Innovation Park

• Topics: Data / Text Mining with Graphs

• Technology: Hadoop, NoSQL, GraphDBs

• Writing Graffiti

3

Saturday, March 26, 2011

Page 4: NoSQL with Graphs

Surrounded by graphs

• the Web Graph

• Semantic Web

• Social Networks

• Natural Sciences

• GIS

4

Saturday, March 26, 2011

Page 5: NoSQL with Graphs

Property Graph

• A Graph is composed by Vertices and Edges

• Vertices are connected by Edges

• An Edge has a Label and Direction

• Edges and Vertices have Properties

5

Saturday, March 26, 2011

Page 6: NoSQL with Graphs

Who am I?6

Me

TIS

works at

UniBZ

studies at

NoSQL

likes Hadoop

works withGraffiti

author

belongs to

GraphDB

belongs to

belongs to

name: claudiosurname: martellaemail: [email protected]

Saturday, March 26, 2011

Page 7: NoSQL with Graphs

A graph in RDBMS

7

Follower Followee

1 2

1 3

1 4

2 5

... ...

ID Name

1 Claudio

2 Cirpo

3 Okram

4 Spinoza

... ...

Saturday, March 26, 2011

Page 8: NoSQL with Graphs

BTree Index 101

8

• Lookup costs Log(N)

• Where N is the global size of the data structure

• Updating the index is also not for free

Cirpo Claudio Okram Spinoza

Saturday, March 26, 2011

Page 9: NoSQL with Graphs

A lookup (RDBMS)

• Look for Claudio’s ID [ Log(N) ]

• Look for K Followees [ Log(N) ]

• Get their names [ K*Log(N) ]

Fr Fe

1 2

1 3

1 4

2 5

... ...

I Name

1 Claudio

2 Cirpo

3 Okram

4 Spinoza

... ...

9

Saturday, March 26, 2011

Page 10: NoSQL with Graphs

A graph in NoSQL

10

ID F1 F2 F3 ...

Cirpo ... ... ... ...

Claudio Cirpo Okram Spinoza ...

Okram ... ... ... ...

Spinoza ... ... ... ...

... ... ... ... ...

Saturday, March 26, 2011

Page 11: NoSQL with Graphs

A lookup (NoSQL)

• Look for Claudio’s ID [ Log(N) ]

• Look for Followees [ O(K) ]

11

ID F1 F2 F3 ...

Cirpo ... ... ... ...

Claudio ... ... ... ...

Okram ... ... ... ...

Spinoza ... ... ... ...

... ... ... ... ...

Saturday, March 26, 2011

Page 12: NoSQL with Graphs

A graph in GraphDB

12

1

2

follows

3follows

4follows

name: Spinoza

name: Okramname: Claudio

name: Cirpo

Saturday, March 26, 2011

Page 13: NoSQL with Graphs

A lookup (Graph)

13

• Look for Claudio’s ID [ Log(N) ]

• Look for Followees [ O(K) ]

1

2

follows

3follows

4follows

name: Spinoza

name: Okramname: Claudio

name: Cirpo

Saturday, March 26, 2011

Page 14: NoSQL with Graphs

What about Friends (of Friends)*?

14

Saturday, March 26, 2011

Page 15: NoSQL with Graphs

A benchmark

15

• 1 Million Vertices

• 4 Million Edges

• Scale-Free Topology

• Postgres VS Neo4J

• Both Hash and BTree

Depth RDBMS Graph

1

2

3

4

5

100ms 30ms

1000ms 500ms

10000ms 3000ms

100000ms 50000ms

N/A 100000ms

Ref: http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-graph-traversal/

Saturday, March 26, 2011

Page 16: NoSQL with Graphs

A benchmark

• 50 friends on average

• Look if there’s a path connecting two people

16

Ref: http://www.slideshare.net/thobe/nosqleu-graph-databases-and-neo4j

DB # Time

RDBMS

Graph

Graph

RDBMS

1K 2000ms

1K 2ms

1M 2ms

1M N/A

Saturday, March 26, 2011

Page 17: NoSQL with Graphs

A Graph Database allows O(1) access to

adjacent Vertices

Ref: The Graph Traversal Pattern: Marko A. Rodriguez and Peter Neubauer17

Saturday, March 26, 2011

Page 18: NoSQL with Graphs

Example: Queries

18

Brad Pitt

Ocean 11

actor Ocean 12

actor Ocean 13

actor

Se7en

actorThe Departedproducer

Actiongenre

Crime

genre

genre

Thrillergenre

genre

genre

genre Drama

genre

genre

genre

Steven Soderbergh

director

director

director

Saturday, March 26, 2011

Page 19: NoSQL with Graphs

Example: Queries

19

Brad Pitt

Ocean 11

actor Ocean 12

actor Ocean 13

actor

Se7en

actorThe Departedproducer

Actiongenre

Crime

genre

genre

Thrillergenre

genre

genre

genre Drama

genre

genre

genre

Steven Soderbergh

director

director

director

Saturday, March 26, 2011

Page 20: NoSQL with Graphs

Example: Queries

20

Brad Pitt

Ocean 11

actor Ocean 12

actor Ocean 13

actor

Se7en

actorThe Departedproducer

Actiongenre

Crime

genre

genre

Thrillergenre

genre

genre

Steven Soderbergh

director

director

director

genre Drama

genre

genre

genre

Saturday, March 26, 2011

Page 21: NoSQL with Graphs

Example: Queries

21

Brad Pitt

Ocean 11

actor Ocean 12

actor Ocean 13

actor

Se7en

actorThe Departedproducer

Actiongenre

Crime

genre

Steven Soderbergh

director

director

director genre

Thrillergenre

genre

genre

genre Drama

genre

genre

genre

Saturday, March 26, 2011

Page 22: NoSQL with Graphs

Example: Recommendations

22

ClaudioGraph Runnerlikes

The Lord of the Graphs

likes

Adventure

tagged

Sci-Fitagged

tagged

Trilogytagged

Cirpo

likes

PHP I love Youlikes

Geekytagged

Boringtagged

Caprazzi

likes

likes

Javatarlikes

tagged

tagged

Saturday, March 26, 2011

Page 23: NoSQL with Graphs

Example: Recommendations

23

ClaudioGraph Runnerlikes

The Lord of the Graphs

likes

Adventure

tagged

Sci-Fitagged

tagged

Trilogytagged

Cirpo

likes

PHP I love Youlikes

Geekytagged

Boringtagged

Caprazzi

likes

likes

Javatarlikes

tagged

tagged

Saturday, March 26, 2011

Page 24: NoSQL with Graphs

Example: Recommendations

24

Claudio

The Lord of the Graphs

likes

Graph Runnerlikes

Cirpo

likes

PHP I love Youlikes

Caprazzi

likes

likes

Javatarlikes

Adventuretagged

Trilogytagged

tagged

Sci-Fitagged

Geekytagged

Boringtagged

tagged

tagged

Saturday, March 26, 2011

Page 25: NoSQL with Graphs

Example: Recommendations

25

Claudio

The Lord of the Graphs

likes

Graph Runnerlikes

Cirpo

likes

PHP I love Youlikes

Caprazzi

likes

Javatarlikes

likes

Adventuretagged

Trilogytaggedtagged

Geekytagged

tagged

Boringtagged

tagged

Sci-Fitagged

Saturday, March 26, 2011

Page 26: NoSQL with Graphs

Example: Recommendations

26

ClaudioGraph Runnerlikes

The Lord of the Graphs

likes

Adventure

tagged

Sci-Fitagged

tagged

Trilogytagged

Cirpo

likes

PHP I love Youlikes

Geekytagged

Boringtagged

Caprazzi

likes

likes

Javatarlikes

tagged

tagged

Saturday, March 26, 2011

Page 27: NoSQL with Graphs

Example: Recommendations

27

ClaudioGraph Runnerlikes

The Lord of the Graphs

likes

Adventure

tagged

Sci-Fitagged

tagged

Trilogytagged

Cirpo

likes

PHP I love Youlikes

Geekytagged

Boringtagged

Caprazzi

likes

likes

Javatarlikes

tagged

tagged

Saturday, March 26, 2011

Page 28: NoSQL with Graphs

Example: Recommendations

28

Claudio

Graph Runnerlikes

The Lord of the Graphs

likes

Sci-Fitagged

Adventure

tagged

Trilogytagged

tagged

Cirpolikes

PHP I love You

likes

Geekytagged

Boringtagged

Caprazzi

likes

likes

Javatarlikes

tagged

tagged

Saturday, March 26, 2011

Page 29: NoSQL with Graphs

Example: Recommendations

29

ClaudioGraph Runnerlikes

The Lord of the Graphs

likes

Adventure

Javatar

tagged

Geekytagged

tagged

Sci-Fitagged

tagged

Trilogytagged

Cirpolikes

PHP I love You

likes

tagged

Boringtagged

Caprazzi likes

likes

likes

Saturday, March 26, 2011

Page 30: NoSQL with Graphs

Example: Recommendations

30

ClaudioGraph Runnerlikes

The Lord of the Graphs

likes

Adventure

Javatar

tagged

Geekytagged

Caprazzi likes

likes

PHP I love You

likes

tagged

Sci-Fitagged

tagged

Trilogytagged

Cirpolikes

likes

tagged

Boringtagged

Saturday, March 26, 2011

Page 31: NoSQL with Graphs

Graph Mining

31

Ref: Programming the Semantic Web - O’Reilly

How are they connected?

Saturday, March 26, 2011

Page 32: NoSQL with Graphs

Graph Mining

32

Ref: Programming the Semantic Web - O’Reilly

Saturday, March 26, 2011

Page 33: NoSQL with Graphs

Graph Mining

33

Saturday, March 26, 2011

Page 34: NoSQL with Graphs

Other Applications

34

• Community Analysis

• Fraud Detection

• Planning

• Text Processing

• Reasoning

Saturday, March 26, 2011

Page 35: NoSQL with Graphs

as you can’t get rid of logicians

35

Saturday, March 26, 2011

Page 36: NoSQL with Graphs

there’s an SQL also for Graphs

36

Saturday, March 26, 2011

Page 37: NoSQL with Graphs

Triplestores

37

Tom Cruise

Top Gun

actor

Katie Holmesmarried

Scientology

advocate

Hollywoodlives

July 3, 1962

born

Saturday, March 26, 2011

Page 38: NoSQL with Graphs

Triplestores

38

Subject Predicate Object

Tom Cruise actor Top Gun

Tom Cruise married Katie Holmes

Tom Cruise advocate Scientology

Tom Cruise lives Hollywood

Tom Cruise born July 3, 1962

Saturday, March 26, 2011

Page 39: NoSQL with Graphs

SPARQL

39

PREFIX ged: <http://www.daml.org/2001/01/gedcom/gedcom#>SELECT ?name ?marriedOnFROM <http://www.daml.org/2001/01/gedcom/royal92.daml>WHERE{ ?royal ged:title "Princess". ?royal ged:name ?name. ?royal ged:spouseIn ?family. ?family ged:marriage ?marriage. ?marriage ged:date ?marriedOn.}ORDER BY ASC [?name]

Saturday, March 26, 2011

Page 40: NoSQL with Graphs

what if Internet was your GraphDB?

40

Saturday, March 26, 2011

Page 41: NoSQL with Graphs

41

Saturday, March 26, 2011

Page 42: NoSQL with Graphs

what about a NoSPARQL?

42

Saturday, March 26, 2011

Page 43: NoSQL with Graphs

Tinkerpop

43

Saturday, March 26, 2011

Page 44: NoSQL with Graphs

44

• Blueprints is the like the JDBC of the graph database community.

• Provides a Java-based interface API for the property graph data model. Graph, Vertex, Edge, Index.

• Provides implementations of the interfaces for TinkerGraph, Neo4j, OrientDB, Sails (e.g. AllegroSail, Neo4jSail), and soon (hopefully) others such as InfiniteGraph, InfoGrid, Sones, and HyperGraphDB

Saturday, March 26, 2011

Page 45: NoSQL with Graphs

45

• A dataflow framework with support for Blueprints-based graph processing.

• Provides a collection of “pipes” (implement Iterable and Iterator)

✴ Filters: ComparisonFilterPipe, RandomFilterPipe, etc.

✴Traversal: VertexEdgePipe, EdgeVertexPipe, PropertyPipe, etc.

✴ Splitting/Merging: CopySplitPipe, RobinMergePipe, etc.

✴ Logic: OrPipe, AndPipe, etc.

Saturday, March 26, 2011

Page 46: NoSQL with Graphs

46

• A Turing-complete, graph-based programming language that compiles Gremlin syntax down to Pipes (implements JSR 223).

• Builds on top of Groovy

• Support various language constructs: :=, foreach, while, repeat, if/else, function and path definitions, etc.

An example of “Amazon’s” recommender:   m = [:]   g.v(1).outE('purchased').inV.inE('purchased').outV.groupCount(m);   m.sort{ a,b -> a.value <=> b.value }

Saturday, March 26, 2011

Page 47: NoSQL with Graphs

47

• Allows Blueprints graphs to be exposed through a RESTful API (HTTP)

• Supports stored traversals written in raw Pipes or Gremlin.

• Supports adhoc traversals represented in Gremlin.

• Provides “helper classes” for performing search-, score-, and rank-based traversal algorithms—in concert, support for recommendation.

Saturday, March 26, 2011

Page 48: NoSQL with Graphs

Sample Stack

48

• HTTP Request arrives

• Converts REST to Gremlin

• Gremlin “compiles” to Pipes

• Pipes makes Blueprints calls

• Store provides the data

Saturday, March 26, 2011

Page 49: NoSQL with Graphs

Neo4J

49

• Engine: Graph

• License: AGPLv3

• Language: Java

• Transactions: ACID

• Distributed: HA, Master-Slave Cache Sharding, Domain-Specific

• Features: Embeddable, REST, many plugins

Saturday, March 26, 2011

Page 50: NoSQL with Graphs

OrientDB

50

• Engine: Document-Graph

• License: Apache 2.0

• Language: Java

• Transactions: ACID

• Distributed: HA through Replication

• Features: Embeddable, REST, SQL-like

Saturday, March 26, 2011

Page 51: NoSQL with Graphs

HypergraphDB

51

• Engine: HyperGraph

• License: LGPL

• Language: Java

• Transactions: ACID

• Distributed: P2P distribution and replication

• Features: Hyperedges, Java OODB, storage on BerkeleyDB

Saturday, March 26, 2011

Page 52: NoSQL with Graphs

InfiniteGraph

52

• Engine: Graph

• License: Commercial

• Language: Java

• Transactions: ACID

• Distributed: Graph Partitioning, Federation on Objectivity

• Features: Distributed lock management, scales to Exabytes

Saturday, March 26, 2011

Page 53: NoSQL with Graphs

Where do I go now?

53

Tinkerpop: http://www.tinkerpop.comNeo4J: http://neo4j.org OrientDB: http://www.orientechnologies.com/orient-db.htm InfoGrid: http://infogrid.orgInfiniteGraph: http://www.infinitegraph.comSones: http://developers.sones.deAllegroGraph: http://www.franz.com/agraph/allegrographHypergraphDB: http://www.kobrix.com/hgdb.jsp

Saturday, March 26, 2011

Page 54: NoSQL with Graphs

[email protected]

http://blog.acaro.orghttp://github.com/claudiomartella/

@claudiomartellahttp://joind.in/2946

Saturday, March 26, 2011