Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop...

20
Extended Property Graphs and Cypher on Gradoop 1st openCypher Implementers Meeting 8 February 2017 Walldorf, Germany Martin Junghanns University of Leipzig – Database Research Group

Transcript of Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop...

Page 1: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop

1st openCypher Implementers Meeting

8 February 2017

Walldorf, Germany

Martin Junghanns University of Leipzig – Database Research Group

Page 2: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

2

Gradoop

„An open-source graph dataflow system for declarative analytics of heterogeneous graph data.“

Page 3: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 3

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

3

Gradoop

Distributed Graph Storage (Apache HDFS)

Apache Flink Operator Implementation

Distributed Operator Execution (Apache Flink)

Extended Property Graph Model (EPGM)

Graph Dataflow Operators

I/O

Page 4: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 4

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

4

Extended Property Graphs

• Vertices and directed Edges

• Logical Graphs

• Identifiers

• Type Labels

• Properties

1 2

3

4

5 1 2 3

4

5

2

1

Hobbit name : Samwise

Orc name : Azog

Clan name : Tribes of Moria founded : 1981

Orc name : Bolg

Hobbit name : Frodo yob : 2968

leaderOf since : 2790

memberOf since : 2013

hates since : 2301

hates

knows since : 2990

|Area|title:Mordor

|Area|title:Shire

Page 5: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 5

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

5

Extended Property Graphs

Graph Operators/Transformations

Unary Binary

Gra

ph

Co

llect

ion

Lo

gica

l Gra

ph

Equality

Union

Intersection

Difference

Limit

Selection

Pattern Matching

Distinct

Apply

Reduce

Call

Aggregation

Pattern Matching

Transformation

Grouping

Call

Subgraph

Equality

Combination

Overlap

Exclusion

Fusion

Page 6: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 6

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

6

Extended Property Graphs

3

1 3

4

5 2 3

4

1 2 UDF

LogicalGraph graph3 = readFromHDFS(); LogicalGraph graph4 = graph3.subgraph( (vertex => vertex.getLabel().equals(‘Green’)), (edge => edge.getLabel().equals(‘orange’)));

Subgraph

Page 7: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 7

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

7

Extended Property Graphs

3

1 3

4

5 2

Pattern

4 5

1 3

4

2

Graph Collection

GraphCollection collection = graph3.match(‘(:Green)-[:orange]->(:Orange)’);

Pattern Matching (Single Graph Input)

Page 8: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 8

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

8

Cypher on Gradoop

„Which two clan leaders hate each other and one of them knows Frodo over one to ten hops?“

Page 9: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 9

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

9

Cypher on Gradoop

Page 10: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 10

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

10

Cypher on Gradoop

PlanTableEntry | type: GRAPH | all-vars: [...] | proc-vars: [...] | attr-vars: [] | est-card: 23 | prediates: () | Plan : |-FilterEmbeddingsNode{filterPredicate=((c1 != c2) AND (o1 != o2))} |.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c1, filterPredicate=((c1.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e0', targetVar='c1', filterPredicate=((_e0.label = leaderOf)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o1, filterPredicate=((o1.label = Orc)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e1', targetVar='o2', filterPredicate=((_e1.label = hates)), projectionKeys=[]} |.|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[h], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=h, filterPredicate=((h.label = Hobbit) AND (h.name = Frodo Baggins)), projectionKeys=[]} |.|.|.|.|-ExpandEmbeddingsNode={startVar='o2', pathVar='_e3', endVar='h', lb=1, ub=10, direction=OUT, vertexMorphism=H, edgeMorphism=I} |.|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o2, filterPredicate=((o2.label = Orc)), projectionKeys=[]} |.|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e3', targetVar='h', filterPredicate=((_e3.label = knows)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c2], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c2, filterPredicate=((c2.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e2', targetVar='c2', filterPredicate=((_e2.label = leaderOf)), projectionKeys=[]}

Page 11: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 11

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

11

Cypher on Gradoop

0 1 2 3 4 5 6 7 8 9

1 37 5 3 7 8 45 99 12 3

0 1 2

Frodo Baggins 1.22 Saruman

0

45: [4,1,33]

EmbeddingMetaData – Stores information about the embedding content

Mapping : Variable -> ID Column {h: 0, e1: 1, o2: 5, ...}

Mapping : Variable.Property -> Property Column {h.name: 0, h.height: 1, c1.name: 2, ...}

Embedding - Data structure used for intermediate results

Identifiers

Properties

Paths

Embedding

Page 12: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 12

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

12

Cypher on Gradoop

Filter Hobbit(name=Frodo Baggins)

name: Frodo Baggins height: 1.22m gender: male city: Bag End

Project [ ]

h.id h.name h.height …

31 Frodo 1.22 …

h.id

32

id Properties

1 {…}

2 {…}

3 {…}

… …

DataSet<Vertex> DataSet<Embedding>

FlatMap(Vertex -> Embedding)

𝜋ℎ.𝐼𝑑(𝑉′) 𝜎 𝐿𝑎𝑏𝑒𝑙=𝐻𝑜𝑏𝑏𝑖𝑡

∧𝑛𝑎𝑚𝑒=𝐹𝑟𝑜𝑑𝑜(𝑉)

Page 13: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 13

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

13

Cypher on Gradoop

c.id _e1.id o1.id

51 11 2

52 12 3

… … …

DataSet<Embedding> DataSet<Embedding>

FlatJoin(lhs, rhs -> combine(lhs, rhs)) DataSet<Embedding>

o1.id _e2.id o2.id

2 13 5

3 14 3

… … …

c.id _e1.id o1.id _e2.id o2.id

51 11 2 13 5

52 12 3 14 3

… … …

Combine Check for vertex/edge isomorphism,

Remove duplicate entries

JoinEmbeddings Left: (c1:Clan)<-[:hasLeader]-(o1:Orc) Right: (o1:Orc)-[:hates]->(o2.Orc)

𝐿 ⋈𝑜1.𝑖𝑑 𝑅

Page 14: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 14

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

14

Cypher on Gradoop

ExpandEmbeddings

o2.id

5

DataSet<Embedding> DataSet<Embedding>

DataSet<Embedding>

_e3.sid _e3.id _e3.tid

5 26 31

31 27 32

32 28 33

o2.id _e3.id h.id

3 [26] 31

3 [26,31,27] 32

3 [26,31,27,32,28] 33

FlatJoin(lhs, rhs -> combine(lhs, rhs))

BulkIteration

𝐿 ⋈𝑜2.𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸

Left: (o2:Orc) Edge: (o2)-[:knows*1..10]->(h)

𝐸′ ⋈𝑒.𝑡𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸

Combine Check for vertex/edge isomorphism

Page 15: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 15

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

15

Cypher on Gradoop

Page 16: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 16

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

16

Cypher on Gradoop

Page 17: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 17

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

17

Cypher on Gradoop

Page 18: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 18

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

18

Conclusion

• Implement Cypher Technology Compatibility KIT (TCK) integration tests • Benchmarking

• Implement and evaluate LDBC benchmarking queries

• Optimizations • DP-Planner • Improve cost model (more statistics, Flink optimizer hints) • Reuse of intermediate results • Consider graph partitioning

• Support more Cypher features • e.g. Aggregation and Functions

• Introduce new Cypher features • e.g. regular path queries

Page 19: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 19

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

19

Conclusion

• Gradoop on Apache Flink • Extended Property Graph abstraction on Apache Flink • Schema flexible: Type Labels and Properties • Logical Graphs / Graphs Collection • Graph Transformations for Graphs and Graph collections

• Cypher on Gradoop • Covering many Cypher features (variable length paths, predicates) • Query execution engine incl. Greedy cost-based optimizer • Physical operators mapped to Flink transformations

Page 20: Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2 Gradoop Extended Property

www.gradoop.com

[1] Junghanns, M.; Petermann, A.; K.; Rahm, E., „Distributed Grouping of Property Graphs with Gradoop“, Proc. BTW Conf. , 2017.

[2] Petermann, A.; Junghanns, M.; Kemper, S.; Gomez, K.; Teichmann, N.; Rahm, E., „Graph Mining for Complex Data Analytics “,

Proc. ICDM Conf. (Demo), 2016.

[3] Junghanns, M.; Petermann, A.; Teichmann, N.; Gomez, K.; Rahm, E., „Analyzing Extended Property Graphs with Apache Flink“,

Int. Workshop on Network Data Analytics (NDA), SIGMOD, 2016.

[4] Petermann, A.; Junghanns, M., „Scalable Business Intelligence with Graph Collections“,

it – Special Issue on Big Data Analytics, 2016.

[5] Petermann, A.; Junghanns, M.; Müller, M.; Rahm, E., „Graph-based Data Integration and Business Intelligence with BIIIG“,

Proc. VLDB Conf. (Demo), 2014.