Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop...

Post on 12-Aug-2020

4 views 0 download

Transcript of Extended Property Graphs and Cypher on Gradoop · Extended Property Graphs and Cypher on Gradoop...

Extended Property Graphs and Cypher on Gradoop

1st openCypher Implementers Meeting

8 February 2017

Walldorf, Germany

Martin Junghanns University of Leipzig – Database Research Group

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 2

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

2

Gradoop

„An open-source graph dataflow system for declarative analytics of heterogeneous graph data.“

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 3

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

3

Gradoop

Distributed Graph Storage (Apache HDFS)

Apache Flink Operator Implementation

Distributed Operator Execution (Apache Flink)

Extended Property Graph Model (EPGM)

Graph Dataflow Operators

I/O

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 4

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

4

Extended Property Graphs

• Vertices and directed Edges

• Logical Graphs

• Identifiers

• Type Labels

• Properties

1 2

3

4

5 1 2 3

4

5

2

1

Hobbit name : Samwise

Orc name : Azog

Clan name : Tribes of Moria founded : 1981

Orc name : Bolg

Hobbit name : Frodo yob : 2968

leaderOf since : 2790

memberOf since : 2013

hates since : 2301

hates

knows since : 2990

|Area|title:Mordor

|Area|title:Shire

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 5

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

5

Extended Property Graphs

Graph Operators/Transformations

Unary Binary

Gra

ph

Co

llect

ion

Lo

gica

l Gra

ph

Equality

Union

Intersection

Difference

Limit

Selection

Pattern Matching

Distinct

Apply

Reduce

Call

Aggregation

Pattern Matching

Transformation

Grouping

Call

Subgraph

Equality

Combination

Overlap

Exclusion

Fusion

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 6

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

6

Extended Property Graphs

3

1 3

4

5 2 3

4

1 2 UDF

LogicalGraph graph3 = readFromHDFS(); LogicalGraph graph4 = graph3.subgraph( (vertex => vertex.getLabel().equals(‘Green’)), (edge => edge.getLabel().equals(‘orange’)));

Subgraph

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 7

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

7

Extended Property Graphs

3

1 3

4

5 2

Pattern

4 5

1 3

4

2

Graph Collection

GraphCollection collection = graph3.match(‘(:Green)-[:orange]->(:Orange)’);

Pattern Matching (Single Graph Input)

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 8

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

8

Cypher on Gradoop

„Which two clan leaders hate each other and one of them knows Frodo over one to ten hops?“

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 9

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

9

Cypher on Gradoop

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 10

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

10

Cypher on Gradoop

PlanTableEntry | type: GRAPH | all-vars: [...] | proc-vars: [...] | attr-vars: [] | est-card: 23 | prediates: () | Plan : |-FilterEmbeddingsNode{filterPredicate=((c1 != c2) AND (o1 != o2))} |.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c1, filterPredicate=((c1.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e0', targetVar='c1', filterPredicate=((_e0.label = leaderOf)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o1, filterPredicate=((o1.label = Orc)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e1', targetVar='o2', filterPredicate=((_e1.label = hates)), projectionKeys=[]} |.|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[h], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=h, filterPredicate=((h.label = Hobbit) AND (h.name = Frodo Baggins)), projectionKeys=[]} |.|.|.|.|-ExpandEmbeddingsNode={startVar='o2', pathVar='_e3', endVar='h', lb=1, ub=10, direction=OUT, vertexMorphism=H, edgeMorphism=I} |.|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o2, filterPredicate=((o2.label = Orc)), projectionKeys=[]} |.|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e3', targetVar='h', filterPredicate=((_e3.label = knows)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c2], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c2, filterPredicate=((c2.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e2', targetVar='c2', filterPredicate=((_e2.label = leaderOf)), projectionKeys=[]}

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 11

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

11

Cypher on Gradoop

0 1 2 3 4 5 6 7 8 9

1 37 5 3 7 8 45 99 12 3

0 1 2

Frodo Baggins 1.22 Saruman

0

45: [4,1,33]

EmbeddingMetaData – Stores information about the embedding content

Mapping : Variable -> ID Column {h: 0, e1: 1, o2: 5, ...}

Mapping : Variable.Property -> Property Column {h.name: 0, h.height: 1, c1.name: 2, ...}

Embedding - Data structure used for intermediate results

Identifiers

Properties

Paths

Embedding

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 12

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

12

Cypher on Gradoop

Filter Hobbit(name=Frodo Baggins)

name: Frodo Baggins height: 1.22m gender: male city: Bag End

Project [ ]

h.id h.name h.height …

31 Frodo 1.22 …

h.id

32

id Properties

1 {…}

2 {…}

3 {…}

… …

DataSet<Vertex> DataSet<Embedding>

FlatMap(Vertex -> Embedding)

𝜋ℎ.𝐼𝑑(𝑉′) 𝜎 𝐿𝑎𝑏𝑒𝑙=𝐻𝑜𝑏𝑏𝑖𝑡

∧𝑛𝑎𝑚𝑒=𝐹𝑟𝑜𝑑𝑜(𝑉)

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 13

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

13

Cypher on Gradoop

c.id _e1.id o1.id

51 11 2

52 12 3

… … …

DataSet<Embedding> DataSet<Embedding>

FlatJoin(lhs, rhs -> combine(lhs, rhs)) DataSet<Embedding>

o1.id _e2.id o2.id

2 13 5

3 14 3

… … …

c.id _e1.id o1.id _e2.id o2.id

51 11 2 13 5

52 12 3 14 3

… … …

Combine Check for vertex/edge isomorphism,

Remove duplicate entries

JoinEmbeddings Left: (c1:Clan)<-[:hasLeader]-(o1:Orc) Right: (o1:Orc)-[:hates]->(o2.Orc)

𝐿 ⋈𝑜1.𝑖𝑑 𝑅

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 14

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

14

Cypher on Gradoop

ExpandEmbeddings

o2.id

5

DataSet<Embedding> DataSet<Embedding>

DataSet<Embedding>

_e3.sid _e3.id _e3.tid

5 26 31

31 27 32

32 28 33

o2.id _e3.id h.id

3 [26] 31

3 [26,31,27] 32

3 [26,31,27,32,28] 33

FlatJoin(lhs, rhs -> combine(lhs, rhs))

BulkIteration

𝐿 ⋈𝑜2.𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸

Left: (o2:Orc) Edge: (o2)-[:knows*1..10]->(h)

𝐸′ ⋈𝑒.𝑡𝑖𝑑=_𝑒3.𝑠𝑖𝑑 𝐸

Combine Check for vertex/edge isomorphism

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 15

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

15

Cypher on Gradoop

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 16

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

16

Cypher on Gradoop

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 17

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

17

Cypher on Gradoop

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 18

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

18

Conclusion

• Implement Cypher Technology Compatibility KIT (TCK) integration tests • Benchmarking

• Implement and evaluate LDBC benchmarking queries

• Optimizations • DP-Planner • Improve cost model (more statistics, Flink optimizer hints) • Reuse of intermediate results • Consider graph partitioning

• Support more Cypher features • e.g. Aggregation and Functions

• Introduce new Cypher features • e.g. regular path queries

Extended Property Graphs and Cypher on Gradoop – 1st openCypher Implementers Meeting – Martin Junghanns 19

Gradoop Extended Property Graphs Conclusion Cypher on Gradoop

19

Conclusion

• Gradoop on Apache Flink • Extended Property Graph abstraction on Apache Flink • Schema flexible: Type Labels and Properties • Logical Graphs / Graphs Collection • Graph Transformations for Graphs and Graph collections

• Cypher on Gradoop • Covering many Cypher features (variable length paths, predicates) • Query execution engine incl. Greedy cost-based optimizer • Physical operators mapped to Flink transformations

www.gradoop.com

[1] Junghanns, M.; Petermann, A.; K.; Rahm, E., „Distributed Grouping of Property Graphs with Gradoop“, Proc. BTW Conf. , 2017.

[2] Petermann, A.; Junghanns, M.; Kemper, S.; Gomez, K.; Teichmann, N.; Rahm, E., „Graph Mining for Complex Data Analytics “,

Proc. ICDM Conf. (Demo), 2016.

[3] Junghanns, M.; Petermann, A.; Teichmann, N.; Gomez, K.; Rahm, E., „Analyzing Extended Property Graphs with Apache Flink“,

Int. Workshop on Network Data Analytics (NDA), SIGMOD, 2016.

[4] Petermann, A.; Junghanns, M., „Scalable Business Intelligence with Graph Collections“,

it – Special Issue on Big Data Analytics, 2016.

[5] Petermann, A.; Junghanns, M.; Müller, M.; Rahm, E., „Graph-based Data Integration and Business Intelligence with BIIIG“,

Proc. VLDB Conf. (Demo), 2014.