RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... ·...

39
RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1 st Workshop on High-Performance Computing for the Semantic Web (HPCSW 2011) Martin Przyjaciel-Zablocki Alexander Schätzle Thomas Hornung Georg Lausen University of Freiburg Databases & Information Systems 29 May 2011

Transcript of RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... ·...

Page 1: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

RDFPath Path Query Processing on Large RDF Graphs

with MapReduce

1st Workshop on High-Performance Computing

for the Semantic Web (HPCSW 2011)

Martin Przyjaciel-Zablocki Alexander Schätzle Thomas Hornung Georg Lausen

University of Freiburg Databases & Information Systems

29 May 2011

Page 2: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

1. Motivation

2. MapReduce

3. RDFPath

4. Experimental Results

5. Summary

RDFPath: Path Query Processing on Large RDF Graphs 2 0. Overview

Overview

Page 3: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Semantic Web ◦ Amount of Semantic Data increases steadily ◦ RDF is W3C standard for representing Semantic Data

Social Networks

◦ > 500 million active users in Facebook ◦ Interacting with >900 million objects (pages, groups, events) ◦ Social Graphs can also be represented in RDF

How to explore, analyze such large RDF Graphs?

Our Approach: Distributed analysis of large RDF Graphs by mapping RDFPath to MapReduce

RDFPath: Path Query Processing on Large RDF Graphs 3 1. Motivation

Large RDF Graphs

Large RDF Graphs

Page 4: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

MapReduce ◦ Popularized by Google & widely used ◦ Automatic parallelization of computations ◦ Fixed two-stage process: Map Reduce Map …

Distributed File System ◦ Clusters of commodity hardware Fault tolerance by replication

◦ Very large files / write-once, read-many pattern

Apache Hadoop ◦ Well-known Open-Source implementation ◦ Used by Yahoo, Facebook, Amazon, IBM, Last.fm, …

RDFPath: Path Query Processing on Large RDF Graphs 4 2. MapReduce

MapReduce

Page 5: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

RDFPath: Path Query Processing on Large RDF Graphs 5 2. MapReduce

MapReduce (2)

Steps of a MapReduce execution

Signatures ◦ Map: (in_key, in_value) list (out_key, intermediate_value)

◦ Reduce: (out_key, list (intermediate_value)) list (out_value)

split 1

split 0 Map

Map

Map

Reduce

Reduce

output 0

output 1

Map phase Shuffle & Sort Reduce phase

split 2

split 3

split 4

split 5

Input

(DFS)

Intermediate Results

(Local)

Output

(DFS)

Page 6: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

3. RDFPath Path queries on RDF Graphs

RDFPath: Path Query Processing on Large RDF Graphs 6 3. RDFPath

Page 7: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Idea ◦ Navigational queries on RDF Graphs

◦ Declarative path specification inspired by XPath

◦ Particularly designed with regard to MapReduce Every location step is mapped to one MapReduce job

◦ Extendibility

Features ◦ Fixed-length paths, shortest paths

◦ Aggregate functions, degree of a node

◦ Adjacent nodes and edges

◦ …

RDFPath: Path Query Processing on Large RDF Graphs 7 3. RDFPath

Path Language

RDFPath

Page 8: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Allen :: knows [country=equals("DE")] > age

Results ◦ Allen (knows) Jacob [country="DE"] (age) 42

◦ Allen (knows) Sarah [country="CH"]

◦ Allen (knows) Chris [country="CH"]

RDFPath: Path Query Processing on Large RDF Graphs 8 3. RDFPath

Example Queries

Example (1)

knows

country

age

Allen

Jacob

Emily

Sarah

Chris

CH

26

DE 42

CH CH

Page 9: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Allen :: knows(*3)

Results ◦ Allen (knows) Jacob

◦ Allen (knows) Jacob (knows) Emily

◦ Allen (knows) Chris

◦ Allen (knows) Sarah

RDFPath: Path Query Processing on Large RDF Graphs 9 3. RDFPath

Example Queries

Example (2)

knows

country

age

Allen

Jacob

Emily

Sarah

Chris

CH

26

DE 42

CH CH

examples

features

Page 10: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

System Architecture

Storing RDF Graphs ◦ RDF Graph is loaded in advance

◦ Vertical Partitioning by predicate

◦ Sequence Files stored in HDFS contain PathObjects

RDFPath: Path Query Processing on Large RDF Graphs 10 3. RDFPath

Implementation

Implementation

Query Processor

Query Engine

Hadoop Cluster

HDFS MapReduce Framework

Triple Loader

RDFPath Query

RDF Graph

Page 11: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Query Processing

RDFPath: Path Query Processing on Large RDF Graphs 11 3. RDFPath

Implementation

Implementation (2)

Query

Query Parser

Instruction Sequence

Instruction 1

Instruction n

...

Query Processor

MapReduce Plan

Assignment 1

Assignment n

...

Query Engine

Instruction

Interpreter

Job 1 Config 1

Job n Config n

MapReduce

Framework HDFS

Hadoop Cluster

Results

Page 12: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Mapping to MapReduce Jobs

◦ Allen :: knows(*2) > country [equals("DE")]

Map task: ◦ Tagging intermediate paths and country partition for join

◦ Applying filter condition

Reduce task: ◦ Perform join and store resulting paths back to HDFS

RDFPath: Path Query Processing on Large RDF Graphs 12 3. RDFPath

Implementation

Implementation (3)

Allen (knows) Jacob (knows) Emily Allen (knows) Jacob Allen (knows) Sarah

intermediate paths

Emily CH Jacob DE Sarah CH

country

Join

mapping

Page 13: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

4. Experimental Results Properties of RDFPath

RDFPath: Path Query Processing on Large RDF Graphs 13 4. Evaluation

Page 14: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Environment Setup ◦ Cluster of 10 machines (Dual Core CPU, 4GB RAM, 1GB HD)

◦ Cloudera‘s Distribution for Hadoop 3 Beta 1

◦ Default configuration with 9 reducers (one per HD)

Real-World Last.fm dataset ◦ 225 million RDF triples

Different input-sizes instead of varying the number of machines

RDFPath: Path Query Processing on Large RDF Graphs 14 4. Evaluation

Setup

Experiments

Page 15: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

The Disco Boys - I Surrender :: trackSimilar(*4) > trackAlbum

Linear scaling behavior for execution times & amount of data

Execution times mainly influenced by number of intermediate results stored locally as well as transferred data

RDFPath: Path Query Processing on Large RDF Graphs 15 4. Evaluation

Query 1

Query 1

1:00

2:00

3:00

4:00

5:00

6:00

0 50 100 150 200

Tim

e (m

m:s

s)

RDF Triples (million)

0

10

20

30

40

50

0 50 100 150 200

Sto

rage (G

B)

RDF Triples (million)

SHUFFLE

LOCAL

HDFS

Page 16: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Chris :: knows(*X) , 1 ≤ X ≤ 10

98% of all reachable persons can be reached by at most 7 edges Corresponds to the well-known six degrees of separation paradigm

Linear scaling behavior, mainly determined by number of joins

RDFPath: Path Query Processing on Large RDF Graphs 16 4. Evaluation

Query 3

Query 3

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10

% o

f re

achable

people

Path length

50

100

200

Triples (million)

0:00

1:00

2:00

3:00

4:00

5:00

6:00

7:00

1 2 3 4 5 6 7 8 9 10

Tim

e (m

m:s

s)

Path length

50

100

200

Triples (million)

queries

Page 17: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Chris :: knows(*X) , 1 ≤ X ≤ 10

98% of all reachable persons can be reached by at most 7 edges Corresponds to the well-known six degrees of separation paradigm

Linear scaling behavior, mainly determined by number of joins

RDFPath: Path Query Processing on Large RDF Graphs 17 4. Evaluation

Query 3

Query 3

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10

% o

f re

achable

people

Path length

50

100

200

Triples (million)

0:00

1:00

2:00

3:00

4:00

5:00

6:00

7:00

1 2 3 4 5 6 7 8 9 10

Tim

e (m

m:s

s)

Path length

50

100

200

Triples (million)

queries

Page 18: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Amount of Semantic Web data is growing constantly!

RDFPath ◦ Intuitive syntax for path queries ◦ Expression of interesting graph issues such as Friend of a Friend queries ◦ Investigating small world properties like six degrees of separation ◦ Designed with MapReduce in mind

Execution ◦ Direct mapping to MapReduce ◦ Scaling properties of MapReduce enable analysis of large RDF Graphs

Evaluation

◦ Based on real-world data from Last.fm ◦ Shows linear scaling behavior in the size of the graph ◦ Confirms the effectiveness of our approach

RDFPath: Path Query Processing on Large RDF Graphs 18 5. Summary

Summary

Page 19: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

RDFPath: Path Query Processing on Large RDF Graphs 19 Thanks

Thanks for your attention.

Page 20: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Backup Slides a) Last.fm

b) Examples cont.

c) Evaluation cont.

d) RDFPath Features

e) Mapping

f) Comparison

g) Reduce-Side-Join

RDFPath: Path Query Processing on Large RDF Graphs 20

Thanks

Page 21: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

RDFPath: Path Query Processing on Large RDF Graphs 21 4. Evaluation

Last.fm

a) Last.fm 26,4

9

22,2

0

19,4

1

5,9

1

5,8

5

5,8

4

4,0

0

3,4

4

3,3

1

1,7

0

0,4

2

0,1

2

0,1

2

0,1

2

0,1

2

0,1

2

in %

User Track

Artist Album

Gender

Realname

Age

Country Duration

Playcount

Playcount

knows

listenedTo+ topTracks

topFans trackSimilar

artistSimilar topArtists

↴ ↶

Page 22: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Allen :: knows > knows [country=prefix("C")] > age [min(18)][max(80)] .avg()

Result ◦ 34

RDFPath: Path Query Processing on Large RDF Graphs 22 3. RDFPath

Example Queries

b) Example (3)

knows

country

age

Allen

Jacob

Emily

Sarah

Chris

CH

26

DE 42

CH CH

↴ ↶

Page 23: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

* :: knows [equals("Sarah")]

Results ◦ Allen (knows) Sarah

◦ Chris (knows) Sarah

◦ Jacob (knows) Emily

◦ Allen (knows) Jacob

◦ Allen (knows) Chris

RDFPath: Path Query Processing on Large RDF Graphs 23 3. RDFPath

Example Queries

b) Example (4)

knows

country

age

Allen

Jacob

Emily

Sarah

Chris

CH

26

DE 42

CH CH

↴ ↶

Page 24: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

* :: knows > knows [equals("Sarah")]

Results ◦ Allen (knows) Chris (knows) Sarah

◦ Allen (knows) Jacob (knows) Emily

RDFPath: Path Query Processing on Large RDF Graphs 24 3. RDFPath

Example Queries

b) Example (5)

knows

country

age

Allen

Jacob

Emily

Sarah

Chris

CH

26

DE 42

CH CH

↴ ↶

Page 25: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

* :: * > * [equals("Sarah")]

Results ◦ Allen (knows) Chris (knows) Sarah

◦ Allen (knows) Jacob (knows) Emily

◦ ...

RDFPath: Path Query Processing on Large RDF Graphs 25 3. RDFPath

Example Queries

b) Example (6)

knows

country

age

Allen

Jacob

Emily

Sarah

Chris

CH

26

DE 42

CH CH

↴ ↶

Page 26: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Allen :: * .count()

Result ◦ 3

RDFPath: Path Query Processing on Large RDF Graphs 26 3. RDFPath

Example Queries

b) Example (7)

knows

country

age

Allen

Jacob

Emily

Sarah

Chris

CH

26

DE 42

CH CH

↴ ↶

Page 27: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Michael_Jackson :: artistTracks

[trackAlbum = equals(Michael_Jackson_-_Thriller)]

> trackSimilar [trackDuration = min(50000)]

> trackTopFans [userCountry = equals(DE)].

Results ◦ Michael_Jackson (artistTracks) Michael_Jackson_-_Beat_It (trackSimilar) Michael_Jackson_-_Billie_Jean (trackTopFans) Mark

◦ Michael_Jackson (artistTracks) Michael_Jackson_-_Someone_in_the_Dark (trackSimilar) Rihanna_-_Russian_Roulette (trackTopFans) Megan

RDFPath: Path Query Processing on Large RDF Graphs 27

b) Example (Last.fm)

3. RDFPath

Example Last.fm

↴ ↶

Page 28: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Michael_Jackson :: artistTracks [ trackAlbum = equals(‘Michael_Jackson_-_Thriller’) ] > trackSimilar [ trackDuration = min(50000) ] > trackTopFans [ country = equals(‘DE’) ]

RDFPath: Path Query Processing on Large RDF Graphs 28 4. Evaluation

Query 2

c) Query 2 (Backup)

2:45

2:55

3:05

3:15

3:25

0 50 100 150 200

Tim

e (m

m:s

s)

RDF Triples (million)

0

5

10

15

20

0 50 100 150 200Sto

rage (G

B)

RDF Triples (million)

SHUFFLE

LOCAL

HDFS

↴ ↶

Page 29: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Startnode ◦ Chris :: knows

◦ * :: knows

Location Step ◦ Chris :: knows > knows > age

◦ Chris :: knows (2) > age

◦ Chris :: *

Filters: equals(), prefix(), suffix(), min(), max() ◦ Chris :: knows > age [min(18)] [max(67)] (6)

◦ Chris :: * > * [equals('Peter')] (7)

◦ Chris :: knows [age = min(30)] [country = prefix('D')] > name

Bounded Search ◦ Chris :: knows [country = equals('DE')] (*3)

Bounded Shortest Path ◦ Chris :: knows (*3).distance('Peter')

Aggregation Functions: count(), sum(), avg(), min() and max() ◦ Chris :: *.count()

◦ Chris :: knows > age.avg()

RDFPath: Path Query Processing on Large RDF Graphs 29 3. RDFPath

Comparison

d) RDFPath Features by Example ↴ ↶

Page 30: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Composition ◦ Queries in RDFPath are composed of location steps

◦ One location step is mapped to a join in MapReduce

Map tasks: ◦ Tagging for RSJs

◦ Applying filter conditions

Reduce tasks: ◦ Performing RSJs

◦ Aggregating values

◦ Computing Shortest paths

◦ Checking for Cycles

RDFPath: Path Query Processing on Large RDF Graphs 30 3. RDFPath

Implementation

e) Mapping to MapReduce Jobs ↴ ↶

Page 31: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

RDFPath: Path Query Processing on Large RDF Graphs 31 3. RDFPath

Comparison

f) RDF Query Languages ↴ ↶

Page 32: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Adjacent nodes: ◦ All nodes adjacent to a startnode

Adjacent edges: ◦ All predicates of statements involving a given startnode

Degree of a node: ◦ Number of predicates involving a startnode

Path: ◦ Find paths between a fixed startnode and an endnote of arbitrary length

Fixed-length paths: ◦ Find all paths of length 2 between startnote and endnote

Distance between two resources: ◦ (length of shortest path without a max length!)

Diameter of a graph: ◦ Diameter of a given RDF Graph.

RDFPath: Path Query Processing on Large RDF Graphs 32 3. RDFPath

Comparison

f) RDF Query Languages (2) ↴ ↶

Page 33: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

SPARQL

f) Comparison to SPARQL 1.0

RDFPath

Fixed-length paths

SELECT ?tmp1, ?tmp2, ?tmp3,

?tmp4, ?tmp5

WHERE {

Allen knows ?tmp1 .

?tmp1 knows ?tmp2 .

?tmp2 knows ?tmp3 .

?tmp3 knows ?tmp4 .

?tmp5 knows ?tmp5

}

Allen :: knows(5)

RDFPath: Path Query Processing on Large RDF Graphs 33 3. RDFPath

Comparison

↴ ↶

Page 34: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

SPARQL

f) Comparison to SPARQL 1.0

RDFPath

Filters

SELECT ?tmp1, ?tmp2, ?tmp3, ?tmp4

WHERE {

Allen knows ?tmp1 .

?tmp1 age ?age1 .

Filter (?age1 >= 20)

?tmp1 knows ?tmp2 .

?tmp2 country ‚DE‛ .

}

Allen :: knows [age=min(20)] > knows [country=equals(‚DE‛)]

RDFPath: Path Query Processing on Large RDF Graphs 34 3. RDFPath

Comparison

↴ ↶

Page 35: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Drawbacks of SPARQL ◦ Limited navigational capabilities

◦ No Shortest Path Queries

◦ No Aggregate functions (count, min, max , …)

◦ No Abbreviations for following same edges

Advantages of SPARQL ◦ Expressive general purpose query language for RDF

◦ Intuitive Syntax, related to SQL

◦ SPARQL 1.1 adds support for some path queries

RDFPath: Path Query Processing on Large RDF Graphs 35 3. RDFPath

Comparison

f) Comparison to SPARQL 1.0 ↴ ↶

Page 36: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Example:

* :: knows(*2) > knows

RDFPath: Path Query Processing on Large RDF Graphs 36 3. RDFPath

Reduce-Side-Join

g) Reduce-Side Join

Peter (knows) Frank (knows) Chris Peter (knows) Simon (knows) Johan Klaus (knows) Simon (knows) Johan Frank (knows) Chris Peter (knows) Klaus

intermediate paths

Chris Peter Johan Frank Johan Lukas Frank Chris

knows Reduce-Sid

e Jo

in

↴ ↶

Page 37: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Mapper Input Mapper Output

RDFPath: Path Query Processing on Large RDF Graphs 37 3. RDFPath

Reduce-Side-Join

g) Reduce-Side Join (2)

Peter (knows) Frank (knows) Chris Peter (knows) Simon (knows) Johan Klaus (knows) Simon (knows) Johan Frank (knows) Chris Peter (knows) Klaus

intermediate paths

Chris Peter Johan Frank Johan Lukas Frank Chris

knows

(Chris, 1) Peter (knows) Frank (knows) Chris (Johan, 1) Peter (knows) Simon (knows) Johan (Johan, 1) Klaus (knows) Simon (knows) Johan (Chris, 1) Frank (knows) Chris (Klaus, 1) Peter (knows) Klaus

(Chris, 0) Peter (Johan, 0) Frank (Johan, 0) Lukas (Frank, 0) Chris …

Key Value

↴ ↶

Page 38: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Sorting phase: (1) Partition according to the first keypair % #reducer (2) Sort within a partition according to the whole keypair

Consequences ◦ A reducer gets all values with the same first keypair

◦ The values within a partition contain at first all new nodes and thereafter all intermediate paths

RDFPath: Path Query Processing on Large RDF Graphs 38 3. RDFPath

Reduce-Side-Join

g) Reduce-Side Join (3) ↴ ↶

Page 39: RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Reducer Input Reducer Output

RDFPath: Path Query Processing on Large RDF Graphs 39 3. RDFPath

Reduce-Side-Join

g) Reduce-Side Join (4)

Johan Frank Lukas Peter (knows) Simon (knows) Johan Klaus (knows) Simon (knows) Johan

Chris Peter Manu Peter (knows) Frank (knows) Chris Frank (knows) Chris

Peter (knows) Frank (knows) Chris (knows) Peter Peter (knows) Frank (knows) Chris (knows) Manu Frank (knows) Chris (knows) Peter Frank (knows) Chris (knows) Manu

… Peter (knows) Simon (knows) Johan (knows) Frank Peter (knows) Simon (knows) Johan (knows) Lukas Klaus (knows) Simon (knows) Johan (knows) Frank Klaus (knows) Simon (knows) Johan (knows) Lukas

↴ ↶