Graph databases in PHP @ PHPCon Poland 10-22-2011

Post on 06-Dec-2014

5.053 views 1 download

description

Presentation given at the national PHP conference in Poland, in Kielce, October 2011, dealing with the introduction of graph databases in PHP, taking a practical look at OrientDB.

Transcript of Graph databases in PHP @ PHPCon Poland 10-22-2011

1

David FunaroAlessandro Nadalin

GraphDB in PHP

domenica 23 ottobre 11

Agenda

2

•Theory•When to use a graph?•Why graphDB?•The graphDB community•OrientDB•OrientDB in PHP•Demo

domenica 23 ottobre 11

Essential (Theory)

3

domenica 23 ottobre 11

Essential (Theory)

3

Gra

phG =

domenica 23 ottobre 11

Essential (Theory)

3

Ver

tex

(V,G

raph

G =

domenica 23 ottobre 11

Essential (Theory)

A

3

Ver

tex

(V,G

raph

G =

domenica 23 ottobre 11

Essential (Theory)

A

3

Ver

tex

(V,G

raph

G =

Edg

e

E)

domenica 23 ottobre 11

Essential (Theory)

A

3

Ver

tex

(V,G

raph

G =

Edg

e

E)

domenica 23 ottobre 11

Binary Relation

4

BA

Hates

Itchy Scratchy

domenica 23 ottobre 11

Binary Relation

4

B

Vertex Vertex

Edge

A

domenica 23 ottobre 11

Graph

5

B

D

E

G

FA

domenica 23 ottobre 11

Undirected Graph

B

D

E

F

A

Example: Friendship 6

domenica 23 ottobre 11

Directed Edge

7

B

Vertex Vertex

A

domenica 23 ottobre 11

Directed Edge

7

B

Vertex Vertex

Edge

A

domenica 23 ottobre 11

Directed Graph

8Example: Followee

D

FA

BA

domenica 23 ottobre 11

Path

9

B

D

E

G

FA

domenica 23 ottobre 11

Path

10

B D EG FA

domenica 23 ottobre 11

Graph -> GraphDB

11

GraphDB is a database that use the Graph as its primary data structure

domenica 23 ottobre 11

... when to use a graph ?

domenica 23 ottobre 11

Web in ’99

13

domenica 23 ottobre 11

Web in 2005

14

domenica 23 ottobre 11

The social web

15

domenica 23 ottobre 11

Your data is a graph

16

domenica 23 ottobre 11

a tree is a graph

17

domenica 23 ottobre 11

parent_id is a graph

18

domenica 23 ottobre 11

Recommendations

19

John

Rome

Milan

Cinema A

Cinema B

Cinema C

Se7en

Mr Bean

Thriller

Fun

lives in

location

location

location

type

type

likes

shows

shows

shows

domenica 23 ottobre 11

Recommendations

20

John

Rome

Milan

Cinema A

Cinema B

Cinema C

Se7en

Mr Bean

Thriller

Fun

lives in

location

location

location

type

type

likes

shows

shows

shows x

x domenica 23 ottobre 11

Recommendations

21

John

Rome

Milan

Cinema A

Cinema B

Cinema C

Se7en

Mr Bean

Thriller

Fun

lives in

location

location

location

type

type

likes

shows

shows

shows x

x

domenica 23 ottobre 11

Recommendations

22

John

Rome

Milan

Cinema A

Cinema B

Cinema C

Se7en

Mr Bean

Thriller

Fun

lives in

location

location

location

type

type

likes

shows

shows

shows x

x

✓ ✓

domenica 23 ottobre 11

Recommendations

23

John

Rome

Milan

Cinema A

Cinema B

Cinema C

Se7en

Mr Bean

Thriller

Fun

lives in

location

location

location

type

type

likes

shows

shows

shows x

x

✓ ✓ x

domenica 23 ottobre 11

Recommendations

24

John

Rome

Milan

Cinema A

Cinema B

Cinema C

Se7en

Mr Bean

Thriller

Fun

lives in

location

location

location

type

type

likes

shows

shows

shows x

x

x x x ✓

domenica 23 ottobre 11

Recommendations

25

John

Rome

Milan

Cinema A

Cinema B

Cinema C

Se7en

Mr Bean

Thriller

Fun

lives in

location

location

location

type

type

likes

shows

shows

shows x

x

x x x ✓ ✓

domenica 23 ottobre 11

Recommendations

26

John

Rome

Milan

Cinema A

Cinema B

Cinema C

Se7en

Mr Bean

Thriller

Fun

lives in

location

location

location

type

type

likes

shows

shows

shows x

x

x x x ✓ ✓ ✓

domenica 23 ottobre 11

Recommendations

27

John

Rome

Milan

Cinema A

Cinema B

Cinema C

Se7en

Mr Bean

Thriller

Fun

lives in

location

location

location

type

type

likes

shows

shows

shows x

x

x x x ✓ ✓ ✓ ✓

domenica 23 ottobre 11

Solve decision problems

domenica 23 ottobre 11

Maximum flow

domenica 23 ottobre 11

domenica 23 ottobre 11

maximum flowGiven a dataset, calculate how to best organize it

domenica 23 ottobre 11

travelling salesman problem

domenica 23 ottobre 11

The pizza guy needs to deliver on A, B,C.

domenica 23 ottobre 11

Decision base on distance, traffic, time and so on.

domenica 23 ottobre 11

Shortest pathdomenica 23 ottobre 11

Identify "special" nodes of the graph

domenica 23 ottobre 11

Given your dataset, organize some clusters

Are there some nodes which cannot belong to a cluster?

They probably have some properties different from the average

domenica 23 ottobre 11

Given your dataset, organize some clusters

Are there some nodes which cannot belong to a cluster?

They probably have some properties different from the average

ACHTUNG!TERRORISTEN!

domenica 23 ottobre 11

but ... why graphDB?

38

domenica 23 ottobre 11

where is the difference ?

40

domenica 23 ottobre 11

A graph database is any storage system that provides index-free adjacency.

GraphDB

http://www.slideshare.net/slidarko/problemsolving-using-graph-traversals-searching-scoring-ranking-and-recommendation

domenica 23 ottobre 11

Step by step example

42

Given a list of people, find their homepages

domenica 23 ottobre 11

Tree-based DB WAY

43

1

domenica 23 ottobre 11

Tree-based DB WAY

43

1

David Funaro

put in the Search Engine2

domenica 23 ottobre 11

Tree-based DB WAY

43

1

find

http://davidfunaro.com

3

David Funaro

put in the Search Engine2

domenica 23 ottobre 11

Tree-based DB WAY

43

1

find

http://davidfunaro.com

3

David Funaro

put in the Search Engine2

The cost to find a single friend HP grows as the friends HP tables grows

domenica 23 ottobre 11

GraphDB WAY

44

it’s like that the GraphDB has an additional information(the ancor <a>)

domenica 23 ottobre 11

GraphDB WAY

44

get the embedded information(index)

www.odino.org

1

it’s like that the GraphDB has an additional information(the ancor <a>)

domenica 23 ottobre 11

GraphDB WAY

45

<a href=”http://odino.org”>Alessandro Nadalin

</a>

The Anchor work as a local index to reach the document = index-free

adjacency

domenica 23 ottobre 11

Local cost

46

The local cost is O(k) = Constant

domenica 23 ottobre 11

Local cost

47

The local cost is O(k) = Constant

domenica 23 ottobre 11

Local cost

48

domenica 23 ottobre 11

Local cost

48

Thus, as the graph grows in size, the cost of a local step remain the same

domenica 23 ottobre 11

any database can implicity represent a graph

BUTonly a graph database make the graph

structure explicit

49

domenica 23 ottobre 11

Benchmark

50

• 1 Million Vertex

• 4 Million Edge

• Scale Free Tolopogy

• Postgres VS Neo4J

• Both Hash and BTree

Deph RDBMS Graph

1

2

3

4

5

100ms 30ms

1000ms 500ms

10000ms 3000ms

100000ms

50000ms

N/A 100000ms

http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-graph-traversal/

domenica 23 ottobre 11

Databases

community that is building and feeding the GraphDB ecosystem

ThinkerPopStack

GraphDB community

domenica 23 ottobre 11

Blueprints is a collection of interfaces, implementations, ouplementations, and test suites for the property graph data

model. Blueprints is analogous to the JDBC, but for graph databases.

https://github.com/tinkerpop/blueprints/wiki/

data model and their implementation

domenica 23 ottobre 11

provide a collection of "pipes" that are connected togheter to from processing

pipelines

a data flow Framework using Process Graph

domenica 23 ottobre 11

a graph-based programming language.

a Turing-Complete graph-base programming language that compiles Gremlin syntax down to Pipes

domenica 23 ottobre 11

a REST-full graph shell.

Allow blueprints graph to be exposed through a RESTful API (HTTP)

domenica 23 ottobre 11

What's hot

domenica 23 ottobre 11

OrientDB

domenica 23 ottobre 11

Glossary

58

<10:05>RID

Cluster Position

domenica 23 ottobre 11

Glossary

58

<10:05>RID

Cluster Position

CLASS

domenica 23 ottobre 11

Main features

domenica 23 ottobre 11

Inheritance

domenica 23 ottobre 11

class Bike

class Vehicle

class Car

domenica 23 ottobre 11

class Bike

class Vehicle

class Car

SELECT FROM Vehicle WHERE owner = 1:1

domenica 23 ottobre 11

class Bike

class Vehicle

class Car

can return records of class Bike or Car

domenica 23 ottobre 11

Traversal

domenica 23 ottobre 11

domenica 23 ottobre 11

SELECT FROM fellas WHERE any() traverse(0,-1) ( @rid = [Michelle @rid] )66

domenica 23 ottobre 11

67SELECT FROM fellas WHERE any() traverse(0,-1) ( @rid = [Michelle @rid] )

domenica 23 ottobre 11

SELECT FROM fellas WHERE any() traverse(0,2) ( @rid = [Michelle @rid] )SELECT FROM fellas WHERE any() traverse(0,2) ( @rid = [Michelle @rid] )

domenica 23 ottobre 11

SELECT FROM fellas WHERE any() traverse(0,2) ( @rid = [Michelle @rid] )

domenica 23 ottobre 11

SQL synthax

domenica 23 ottobre 11

beyond SQL

domenica 23 ottobre 11

SELECT FROM authors WHERE book.title = ...

domenica 23 ottobre 11

ACIDdomenica 23 ottobre 11

speaks JSON

domenica 23 ottobre 11

{ "schema": { "name": "Address" }, "result": [{ "@type": "d", "@rid": "#13:0", "@version": 6, "@class": "Address", "type": "Residence", "street": "Piazza Navona, 1", "city": "#14:0", "nick": "Luca2" }, { ... ...

domenica 23 ottobre 11

Double Protocol

domenica 23 ottobre 11

HTTP

domenica 23 ottobre 11

HTTP

Universal

domenica 23 ottobre 11

HTTP

Easy to interact with

domenica 23 ottobre 11

binary

domenica 23 ottobre 11

Blazing fast

binary

domenica 23 ottobre 11

on-record SELECTs

domenica 23 ottobre 11

SELECT FROM cats

domenica 23 ottobre 11

SELECT FROM cats

domenica 23 ottobre 11

SELECT FROM 11:0

domenica 23 ottobre 11

SELECT FROM 11:0

domenica 23 ottobre 11

SELECT FROM [11:0,11:1]

domenica 23 ottobre 11

SELECT FROM [11:0,11:1]

domenica 23 ottobre 11

SELECT FROM [11:0,12:0]

domenica 23 ottobre 11

SELECT FROM [11:0,12:0]

domenica 23 ottobre 11

stress-free setupdomenica 23 ottobre 11

2 Mb

domenica 23 ottobre 11

./orient/bin/server.sh

93

domenica 23 ottobre 11

in-memory DB

domenica 23 ottobre 11

or disk-persisted

domenica 23 ottobre 11

Supports standards Supports standards

96

domenica 23 ottobre 11

OrientDB

•Inheritance

•Traversal

•Sql syntax like

•ACID

•Speak JSON

•Double protocol

•on-record Select

•ThinkerPop Compliant

domenica 23 ottobre 11

Oh, it's Java.

98

domenica 23 ottobre 11

PHP ?

domenica 23 ottobre 11

somebody started writing thebinary-protocol binding

https://github.com/AntonTerekhov/OrientDB-PHP( beta0.4.1, 28 April 2010 )

domenica 23 ottobre 11

$db = new OrientDB($host, $port);

$record = $db->recordLoad('1:1', '*:-1');

// $record instance of OrientDBRecord

domenica 23 ottobre 11

and others

domenica 23 ottobre 11

domenica 23 ottobre 11

Orient Library

104

... are writing a complete library

https://github.com/congow/Orient

domenica 23 ottobre 11

Orient = PHP Library to work with OrientDB

105

domenica 23 ottobre 11

Data Mapper

Query BuilderHTTP Binding

domenica 23 ottobre 11

HTTP Binding

domenica 23 ottobre 11

use Congow\Orient;use Congow\Orient\Foundation\Binding;

$driver   = new Orient\Http\Client\Curl();$orient   = new Binding($driver, '127.0.0.1', '2480', 'admin', 'admin', 'demo');

$response = $orient->query("SELECT FROM Address");

$output   = json_decode($response->getBody());

foreach ($output->result as $address){  var_dump($address->street);}

domenica 23 ottobre 11

use Congow\Orient;use Congow\Orient\Foundation\Binding;

$driver   = new Orient\Http\Client\Curl();$orient   = new Binding($driver, '127.0.0.1', '2480', 'admin', 'admin', 'demo');

$response = $orient->query("SELECT FROM Address");

$output   = json_decode($response->getBody());

foreach ($output->result as $address){  var_dump($address->street);}

domenica 23 ottobre 11

use Congow\Orient;use Congow\Orient\Foundation\Binding;

$driver   = new Orient\Http\Client\Curl();$orient   = new Binding($driver, '127.0.0.1', '2480', 'admin', 'admin', 'demo');

$response = $orient->query("SELECT FROM Address");

$output   = json_decode($response->getBody());

foreach ($output->result as $address){  var_dump($address->street);}

domenica 23 ottobre 11

use Congow\Orient;use Congow\Orient\Foundation\Binding;

$driver   = new Orient\Http\Client\Curl();$orient   = new Binding($driver, '127.0.0.1', '2480', 'admin', 'admin', 'demo');

$response = $orient->query("SELECT FROM Address");

$output   = json_decode($response->getBody());

foreach ($output->result as $address){  var_dump($address->street);}

domenica 23 ottobre 11

use Congow\Orient;use Congow\Orient\Foundation\Binding;

$driver   = new Orient\Http\Client\Curl();$orient   = new Binding($driver, '127.0.0.1', '2480', 'admin', 'admin', 'demo');

$response = $orient->query("SELECT FROM Address");

$output   = json_decode($response->getBody());

foreach ($output->result as $address){  var_dump($address->street);}

{ "schema": { "name": "Address" }, "result": [{ "@type": "d", "@rid": "#13:0", "@version": 6, "@class": "Address", "type": "Residence", "street": "Piazza Navona, 1", "city": "#14:0", "nick": "Luca2" }, { ... ...

domenica 23 ottobre 11

apart from ->query($SQL)

domenica 23 ottobre 11

->get|delete|postClass($class)

domenica 23 ottobre 11

->post|delete|put|getDocument($rid)

domenica 23 ottobre 11

...and much more!

(connect, disconnect, ...)

domenica 23 ottobre 11

Query Builder

domenica 23 ottobre 11

use Congow\Orient\Query;

$query = new Query();$query->from(array('users'))->where('username = ?', "admin");

echo $query->getRaw(); // SELECT FROM users WHERE username = "admin"

domenica 23 ottobre 11

use Congow\Orient\Query;

$query = new Query();$query->from(array('users'))->where('username = ?', "admin");

echo $query->getRaw(); // SELECT FROM users WHERE username = "admin"

domenica 23 ottobre 11

use Congow\Orient\Query;

$query = new Query();$query->from(array('users'))->where('username = ?', "admin");

echo $query->getRaw(); // SELECT FROM users WHERE username = "admin"

domenica 23 ottobre 11

use Congow\Orient\Query;

$query = new Query();$query->from(array('users'))->where('username = ?', "admin");

echo $query->getRaw(); // SELECT FROM users WHERE username = "admin"

domenica 23 ottobre 11

               $query->select(array('name', 'username', 'email'), false)                ->from(array('12:0', '12:1'), false)                ->where('any() traverse ( any() like "%danger%" )')                ->orWhere("1 = ?", 1)                ->andWhere("links = ?", 1)                ->limit(20)                ->orderBy('username')                ->orderBy('name', true, true)                ->range("12:0", "12:1");

              SELECT name, username, email               FROM [12:0, 12:1]               WHERE any() traverse ( any() like "%danger%" )              OR 1 = "1" AND links = "1"               ORDER BY name, username               LIMIT 20               RANGE 12:0 12:1

domenica 23 ottobre 11

Data Mapper

domenica 23 ottobre 11

A Doctrine2 strange ODM

domenica 23 ottobre 11

namespace Poland\PHPCon\Entity;

use Congow\Orient\ODM\Mapper\Annotations as ODM;

/*** @ODM\Document(class="Person")*/class Speaker{    /**     * @ODM\Property( type="string")     */    protected $name;

    public function setName($name)    {        $this->name = $name;    }

domenica 23 ottobre 11

namespace Poland\PHPCon\Entity;

use Congow\Orient\ODM\Mapper\Annotations as ODM;

/*** @ODM\Document(class="Person")*/class Speaker{    /**     * @ODM\Property(type="string")     */    protected $name;

    public function setName($name)    {        $this->name = $name;    }

domenica 23 ottobre 11

namespace Poland\PHPCon\Entity;

use Congow\Orient\ODM\Mapper\Annotations as ODM;

/*** @ODM\Document(class="Person")*/class Speaker{    /**     * @ODM\Property(type="string")     */    protected $name;

    public function setName($name)    {        $this->name = $name;    }

domenica 23 ottobre 11

namespace Poland\PHPCon\Entity;

use Congow\Orient\ODM\Mapper\Annotations as ODM;

/*** @ODM\Document(class="Person")*/class Speaker{    /**     * @ODM\Property(type="string")     */    protected $name;

    public function setName($name)    {        $this->name = $name;    }

domenica 23 ottobre 11

Domain Driven Design

domenica 23 ottobre 11

{ "schema": { "name": "Speaker" }, "result": [{ "@type": "d", "@rid": "#1:0", "@version": 6, "@class": "Speaker", "name": "David Coallier" }, { ... ...

domenica 23 ottobre 11

{ "schema": { "name": "Speaker" }, "result": [{ "@type": "d", "@rid": "#1:0", "@version": 6, "@class": "Speaker", "name": "David Coallier" }, { ... ...

$david = $mapper->hydrate(json_decode($speaker));

domenica 23 ottobre 11

{ "schema": { "name": "Speaker" }, "result": [{ "@type": "d", "@rid": "#1:0", "@version": 6, "@class": "Speaker", "name": "David Coallier" }, { ... ...

$david instanceOf Poland\PHPCon\Entity\Speaker

domenica 23 ottobre 11

Repository Pattern

$repo = $manager->getRepository('Speaker')

domenica 23 ottobre 11

$speakers = $repo->findAll();

domenica 23 ottobre 11

$speaker = $repo->find($rid);

domenica 23 ottobre 11

$criteria = array('Name' => 'Lorna');

$lornas = $repo->findBy($criteria);

domenica 23 ottobre 11

$criteria = array( 'Name' => 'Lorna', 'last_name' => 'Jane');

$lornaJ = $repo->findOneBy($criteria);

domenica 23 ottobre 11

Know your boundaries

138

domenica 23 ottobre 11

https://github.com/doctrine/common/tree/master/lib/Doctrine/Common/Persistence

139

domenica 23 ottobre 11

Theory sucks.

140

domenica 23 ottobre 11

Demo

domenica 23 ottobre 11

Demo

142

id type page url

1 external NULL http://www.google.com

2 page 1 NULL

Menu items in RDBMS

domenica 23 ottobre 11

Demo

143

rid title url

8:2 google google.com

Menu items in OrientDB

rid title page

9:1 home 1{ Link

PageLink ExternalLink

domenica 23 ottobre 11

144

That’s all, folks!

domenica 23 ottobre 11

144

David Funaro@ingdavidinohttp://davidfunaro.com

That’s all, folks!

domenica 23 ottobre 11

144

David Funaro@ingdavidinohttp://davidfunaro.com

Alessandro Nadalin@_odino_

http://odino.org

That’s all, folks!

domenica 23 ottobre 11

144

David Funaro@ingdavidinohttp://davidfunaro.com

Alessandro Nadalin@_odino_

http://odino.org

That’s all, folks!

domenica 23 ottobre 11

Credits

http://www.flickr.com/photos/sayamindu/5677281218/sizes/l/in/photostream/http://farm1.static.flickr.com/182/471383865_79d04aec36_o.pnghttp://farm1.static.flickr.com/134/318947873_12028f1b66_b.jpg

http://www.flickr.com/photos/atomdocs/3275758118/sizes/o/in/photostream/http://www.flickr.com/photos/pattipics/5229478393/sizes/o/in/photostream/

http://www.flickr.com/photos/kongharald/366597251/sizes/o/in/photostream/http://www.everaldo.com/

http://www.flickr.com/photos/tusnelda/6140792529/sizes/l/in/photostream/http://www.flickr.com/photos/mondi/5368644355/sizes/l/in/photostream/

http://www.flickr.com/photos/jayneandd/4191106566/sizes/l/in/photostream/http://www.flickr.com/photos/jooon/2093253534/sizes/l/in/photostream/

http://www.flickr.com/photos/bluedharma/89186151/sizes/o/in/photostream/http://www.flickr.com/photos/exfordy/2747089295/sizes/l/in/photostream/

http://www.flickr.com/photos/nostri-imago/3137422976/sizes/o/in/photostream/http://www.flickr.com/photos/fionasjournal/379587818/sizes/z/in/photostream/

http://www.flickr.com/photos/nperlapro/1297392267/http://www.flickr.com/photos/fastphive/28428808/sizes/m/in/photostream/

http://www.flickr.com/photos/rnugraha/2003147365/sizes/o/in/photostream/http://www.flickr.com/photos/zigazou76/4412946911/sizes/l/in/photostream/http://www.flickr.com/photos/greatnet/4667555436/sizes/l/in/photostream/

http://www.flickr.com/photos/mnsc/2768391365/sizes/l/in/photostream/http://www.flickr.com/photos/christmaswithak/4675962453/sizes/l/in/photostream/

http://www.amazon.com/Trainspotting-Irvine-Welsh/dp/0393314804http://www.flickr.com/photos/franconadalin59/5778176872/sizes/l/in/photostream/

http://farm6.static.flickr.com/5176/5474445627_875d621689_b.jpghttp://farm3.static.flickr.com/2243/2189435082_a16d3c89ae_b.jpghttp://farm3.static.flickr.com/2647/3816311930_ac52cff491_o.jpg

http://i130.photobucket.com/albums/p266/feike1977/PES6-4-3-3defencesettings.jpghttp://images.usatoday.com/life/_photos/2006/11/30/numb3rs-topper.jpg

http://www.flickr.com/photos/jakecaptive/3205277810/sizes/l/in/photostream/

domenica 23 ottobre 11