Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases

Post on 22-Jan-2018

244 views 1 download

Transcript of Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases

Give sense to your Big Data with Apache

TinkerPop™ and property-graph databases

DuyHai DOAN

Apache Cassandra™ evangelist

@doanduyhai

Who Am I ?

2

• Technical advocate for Apache Cassandra™ at Datastax

• Committer for Apache Zeppelin™ and maintainer of Zeppelin/Cassandra

interpreter

• duy_hai.doan@datastax.com

• @doanduyhai

@doanduyhai

Who is Datastax

3

• Company offering Datastax Enterprise, a commercial distribution of Apache

Cassandra™

• Datastax Enterprise == Apache Cassandra™ ++ features

Why graph databases ?

@doanduyhai

As of 2017

5

@doanduyhai

Who is not using any of those apps ?

6

@doanduyhai

Needle in a haystack

@doanduyhai

Finding patterns

@doanduyhai

Root-cause analysis

Impact propagation

@doanduyhai

Everything is connected

@doanduyhai

Graph databases are trending

11

Graph vs

Relational

@doanduyhai

Relational databases

13

UserspersonId firstname lastname …

MoviesmovieId title country …

ViewpersonId movieId view_time …

@doanduyhai

Relational databases

14

• Define the relationships between entities

• Store the entities and relationships in a normalized fashion (normal forms)

@doanduyhai

Graph databases

15

User Movieview

@doanduyhai

Graph databases

16

• Define the relationships between entities

• Store the entities and relationships

• Allow end-users to explore the relationships

• Allow end-users to discover unexpected relations between entities

The value of data is

proportional to the number of

meaningful relationships

@doanduyhai

When to use graph databases ?

18

Apache TinkerPop™ introduction

What is TinkerPop ?

@doanduyhai

Apache TinkerPop™

21

• Open-source graph computing framework

• Started in 2009 by Marko A. Rodriguez, Josh Shinavier, and Peter Neubauer

• Join ASF since January 2015

• Currently version 3.2.4Frame

Furnac

e

Pipe

BluePrint

RexsterGremlin

@doanduyhai

TinkerPop stack

22

Real-time Batch

@doanduyhai

Graph databases family

23

• RDF (Resource Description Framework)

• AllegroGraph, BlazeGraph, OntoText, OpenLink Virtuoso …

• Property-graph

• Neo4J, Titan, Datastax Enterprise (DSE) Graph, OrientDB …

Property Graph

@doanduyhai

A graph is

25

• A set of vertices (nodes) and edges (arcs)

• Formal definition: G = (V, E)

User Movie

Vertices

Edge

@doanduyhai

A property-graph is

26

• A directed

User

@doanduyhai

A property-graph is

27

• A directed, binary,

User Movie

@doanduyhai

A property-graph is

28

• A directed, binary, attributed multi-graph

User Movie

name: DuyHai

age: 35

title: The Jedi Return

categories: [SF, action,

space]

view

view_time: xxx

knows

@doanduyhai

Some definitions

29

User Movie

Vertex Properties

name: DuyHai

age: 35

title: The Jedi Return

categories: [SF, action,

space]

Vertex Properties

@doanduyhai

Some definitions

30

User Movie

Edge

Edge

EdgeLabe

l

EdgeLabel

view

knows

@doanduyhai

Some definitions

31

User Movie

Edge

Edge

Properties

view

knows

view_time: xxx

Gremlin graph traversal

@doanduyhai

Graph vs Hardware allegory

33

Graph Traversal

@doanduyhai

Example of graph traversal

35

User

friendWith

Movie

like

Person

actor director

Give me all movies in which "Harrison Ford"

has played as actor

name

gender

title

year

id

name

rating

@doanduyhai

Example of graph traversal

36

g.V()

User

friendWith

Movie

like

Person

actor director

Give me all movies in which "Harrison Ford »

has played as actor

name

gender

title

year

id

name

rating

@doanduyhai

Example of graph traversal

37

g.V().hasLabel("Person")

User

friendWith

Movie

like

Person

actor director

Give me all movies in which "Harrison Ford »

has played as actor

name

gender

title

year

id

name

rating

@doanduyhai

Example of graph traversal

38

User

friendWith

Movie

like

Person

actor director

Give me all movies in which "Harrison Ford"

has played as actor

name

gender

title

year

id

name

rating

g.V().hasLabel("Person")

.has("name","Harrison Ford")

@doanduyhai

Example of graph traversal

39

User

friendWith

Movie

like

Person

actor director

Give me all movies in which "Harrison Ford"

has played as actor

name

gender

title

year

id

name

rating

g.V().hasLabel("Person")

.has("name","Harrison Ford")

.out("actor")

@doanduyhai

Example of graph traversal

40

User

friendWith

Movie

like

Person

actor director

Give me all movies in which "Harrison Ford"

has played as actor with mean rating > 7

name

gender

title

year

id

name

rating

g.V().hasLabel("Person")

.has("name","Harrison Ford")

.out("actor")

.where(inE("like").values("rating").mean().is(

gt(7)))

More Examples

Demo

42

43

Q & A

44

@doanduyhai

duy_hai.doan@datastax.com

https://academy.datastax.com/

Thank You