Download - An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Transcript
Page 1: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

An Introduction to Graph Analytics Platforms(Very Short Version)

M. Tamer Ozsu

University of WaterlooDavid R. Cheriton School of Computer Science

© M. Tamer Ozsu TD-LSG (2018/08/31) 1 / 46

Page 2: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Outline

1 Introduction – Graph Types2 Property Graph Processing

ClassificationOnline queryingOffline analytics

3 Graph Analytics ApproachesMapReduce & VariantsClassification of NativeApproaches

4 Graph Analytics Systems

5 OLAP-Style AnalyticsGraph SummarizationSnapshot-basedAggregationGraph CubePagrolGagg Model

© M. Tamer Ozsu TD-LSG (2018/08/31) 2 / 46

Page 3: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Outline

1 Introduction – Graph Types2 Property Graph Processing

ClassificationOnline queryingOffline analytics

3 Graph Analytics ApproachesMapReduce & VariantsClassification of NativeApproaches

4 Graph Analytics Systems

5 OLAP-Style AnalyticsGraph SummarizationSnapshot-basedAggregationGraph CubePagrolGagg Model

© M. Tamer Ozsu TD-LSG (2018/08/31) 3 / 46

Page 4: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Graph Types

Property graph

film 2014(initial release date, “1980-05-23”)

(label, “The Shining”)(music contributor, music contributor/4110)

(language, (iso639 3/eng)(label, “English”)

(usedIn, iso3166/CA)(usesScript, script/latn))

books 0743424425(rating, 4.7)

StephenKing

(creator)

offers 0743424425amazonOffer

geo 2635167(name, “United Kingdom”)

(population, 62348447)

UnitedKingdom

(wikipediaArticle)

actor 29704(actor name, “Jack Nicholson”)

film 3418(label, “The Passenger”)

film 1267(label, “The Last Tycoon”)

director 8476(director name, “Stanley Kubrick”)

film 2685(label, “A Clockwork Orange”)

film 424(label, “Spartacus”)

actor 30013(actor name, “Shelley Duvall”)

(relatedBook)

(hasOffer)

(based near)(actor)

(director) (actor)

(actor) (actor)

(director) (director)

© M. Tamer Ozsu TD-LSG (2018/08/31) 4 / 46

Page 5: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Graph Types

RDF graph

mdb:film/2014

“1980-05-23”

movie:initial release date

“The Shining”refs:label

mob:music contributormusic contributor

lexvo:iso639 3/eng

language

bm:books/0743424425

4.7

rev:rating

bm:persons/StephenKingdc:creator

bm:offers/0743424425amazonOffer

geo:2635167

“United Kingdom”

gn:name

62348447

gn:population

wp:UnitedKingdom

gn:wikipediaArticle

mdb:actor/29704

“Jack Nicholson”

movie:actor name

mdb:film/3418

“The Passenger”

refs:label

mdb:film/1267

“The Last Tycoon”

refs:label

mdb:director/8476

“Stanley Kubrick”

movie:director name

mdb:film/2685

“A Clockwork Orange”

refs:label

mdb:film/424

“Spartacus”

refs:label

mdb:actor/30013

“Shelley Duvall”

movie:actor name“English”

rdf:label

lexvo:iso3166/CA

lvont:usedInlexvo:script/latin

lvont:usesScript

movie:relatedBook

scam:hasOffer

foaf:based nearmovie:actor

movie:director

movie:actor

movie:actor movie:actor

movie:director movie:director

© M. Tamer Ozsu TD-LSG (2018/08/31) 4 / 46

Page 6: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Graph Types

Property graph

film 2014(initial release date, “1980-05-23”)

(label, “The Shining”)(music contributor, music contributor/4110)

(language, (iso639 3/eng)(label, “English”)

(usedIn, iso3166/CA)(usesScript, script/latn))

books 0743424425(rating, 4.7)

StephenKing

(creator)

offers 0743424425amazonOffer

geo 2635167(name, “United Kingdom”)

(population, 62348447)

UnitedKingdom

(wikipediaArticle)

actor 29704(actor name, “Jack Nicholson”)

film 3418(label, “The Passenger”)

film 1267(label, “The Last Tycoon”)

director 8476(director name, “Stanley Kubrick”)

film 2685(label, “A Clockwork Orange”)

film 424(label, “Spartacus”)

actor 30013(actor name, “Shelley Duvall”)

(relatedBook)

(hasOffer)

(based near)(actor)

(director) (actor)

(actor) (actor)

(director) (director)

Workload: Online queries andanalytic workloads

Query execution: Varies

RDF graph

mdb:film/2014

“1980-05-23”

movie:initial release date

“The Shining”refs:label

bm:books/0743424425

4.7

rev:rating

bm:offers/0743424425amazonOffer

geo:2635167

“United Kingdom”

gn:name

62348447

gn:population

mdb:actor/29704

“Jack Nicholson”

movie:actor name

mdb:film/3418

“The Passenger”

refs:label

mdb:film/1267

“The Last Tycoon”

refs:label

mdb:director/8476

“Stanley Kubrick”

movie:director name

mdb:film/2685

“A Clockwork Orange”

refs:label

mdb:film/424

“Spartacus”

refs:label

mdb:actor/30013

movie:relatedBook

scam:hasOffer

foaf:based nearmovie:actor

movie:directormovie:actor

movie:actor movie:actor

movie:director movie:director

Workload: SPARQL queries

Query execution: subgraphmatching by homomorphism

© M. Tamer Ozsu TD-LSG (2018/08/31) 4 / 46

Page 7: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Outline

1 Introduction – Graph Types2 Property Graph Processing

ClassificationOnline queryingOffline analytics

3 Graph Analytics ApproachesMapReduce & VariantsClassification of NativeApproaches

4 Graph Analytics Systems

5 OLAP-Style AnalyticsGraph SummarizationSnapshot-basedAggregationGraph CubePagrolGagg Model

© M. Tamer Ozsu TD-LSG (2018/08/31) 5 / 46

Page 8: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Outline

1 Introduction – Graph Types2 Property Graph Processing

ClassificationOnline queryingOffline analytics

3 Graph Analytics ApproachesMapReduce & VariantsClassification of NativeApproaches

4 Graph Analytics Systems

5 OLAP-Style AnalyticsGraph SummarizationSnapshot-basedAggregationGraph CubePagrolGagg Model

© M. Tamer Ozsu TD-LSG (2018/08/31) 6 / 46

Page 9: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Classification Summary

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

© M. Tamer Ozsu TD-LSG (2018/08/31) 7 / 46

Page 10: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Example Design Points

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Compute the query result/perform analytic computation over the graphas it exists.

© M. Tamer Ozsu TD-LSG (2018/08/31) 8 / 46

Page 11: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Example Design Points

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Compute the query result/perform analytic computation on each snap-shot from scratch.

© M. Tamer Ozsu TD-LSG (2018/08/31) 8 / 46

Page 12: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Example Design Points

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

EvolvingGraphs

Algorithm Types

Offline Online

Streaming Incremental

Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Continuously compute the query result/perform analytic computation asthe input changes.

© M. Tamer Ozsu TD-LSG (2018/08/31) 8 / 46

Page 13: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Example Design Points – Not all alternatives make sense

Graph Type

RDFGraphs

PropertyGraphs

Graph Dynamism

StaticGraphs

DynamicGraphs

StreamingGraphs

Algorithm Types

Offline Online Dynamic

BatchDynamic

Workload Types

OnlineQueries

AnalyticsWorkloads

Dynamic (or batch-dynamic) algorithms do not make sense for staticgraphs.

© M. Tamer Ozsu TD-LSG (2018/08/31) 9 / 46

Page 14: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Scale-up or Scale-out? [Lin, 2018]

Scale-up: Single machine execution

Graph datasets are small and can fit in a single machine – even in mainmemorySingle machine avoids parallel execution complexities

Scale-out: Parallel execution

Graph data sets grow when they are expanded to their storage formatsWorkstations big enough to handle even smaller datasets are stillexpensiveSome graphs are very large: Alibaba: several billion vertices, > 100million edgesDataset size may not be the determinant ⇒ parallelizing computationis important

Dataset |V | |E | Regular size Single Machine∗

Live Journal 4,847,571 68,993,773 1.08GB 6.3GBUSA Road 23,947,347 58,333,344 951MB 9.09GBTwitter 41,652,230 1,468,365,182 26GB 128 GBUK0705 82,240,700 2,829,101,180 48GB 247GBWorld Road 682,496,072 717,016,716 15GB 194GBCommonCrawl2014 1,727,000,000 64,422,000,000 1.3TB Out of memory

∗ Using (PowerLyra)

We focus on parallel graph analytics systems

© M. Tamer Ozsu TD-LSG (2018/08/31) 10 / 46

Page 15: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Scale-up or Scale-out? [Lin, 2018]

Scale-up: Single machine execution

Graph datasets are small and can fit in a single machine – even in mainmemorySingle machine avoids parallel execution complexities

Scale-out: Parallel execution

Graph data sets grow when they are expanded to their storage formats

Workstations big enough to handle even smaller datasets are stillexpensiveSome graphs are very large: Alibaba: several billion vertices, > 100million edgesDataset size may not be the determinant ⇒ parallelizing computationis important

Dataset |V | |E | Regular size Single Machine∗

Live Journal 4,847,571 68,993,773 1.08GB 6.3GBUSA Road 23,947,347 58,333,344 951MB 9.09GBTwitter 41,652,230 1,468,365,182 26GB 128 GBUK0705 82,240,700 2,829,101,180 48GB 247GBWorld Road 682,496,072 717,016,716 15GB 194GBCommonCrawl2014 1,727,000,000 64,422,000,000 1.3TB Out of memory

∗ Using (PowerLyra)

We focus on parallel graph analytics systems

© M. Tamer Ozsu TD-LSG (2018/08/31) 10 / 46

Page 16: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Scale-up or Scale-out? [Lin, 2018]

Scale-up: Single machine execution

Graph datasets are small and can fit in a single machine – even in mainmemorySingle machine avoids parallel execution complexities

Scale-out: Parallel execution

Graph data sets grow when they are expanded to their storage formatsWorkstations big enough to handle even smaller datasets are stillexpensiveSome graphs are very large: Alibaba: several billion vertices, > 100million edgesDataset size may not be the determinant ⇒ parallelizing computationis important

Dataset |V | |E | Regular size Single Machine∗

Live Journal 4,847,571 68,993,773 1.08GB 6.3GBUSA Road 23,947,347 58,333,344 951MB 9.09GBTwitter 41,652,230 1,468,365,182 26GB 128 GBUK0705 82,240,700 2,829,101,180 48GB 247GBWorld Road 682,496,072 717,016,716 15GB 194GBCommonCrawl2014 1,727,000,000 64,422,000,000 1.3TB Out of memory

∗ Using (PowerLyra)

We focus on parallel graph analytics systems

© M. Tamer Ozsu TD-LSG (2018/08/31) 10 / 46

Page 17: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Scale-up or Scale-out? [Lin, 2018]

Scale-up: Single machine execution

Graph datasets are small and can fit in a single machine – even in mainmemorySingle machine avoids parallel execution complexities

Scale-out: Parallel execution

Graph data sets grow when they are expanded to their storage formatsWorkstations big enough to handle even smaller datasets are stillexpensiveSome graphs are very large: Alibaba: several billion vertices, > 100million edgesDataset size may not be the determinant ⇒ parallelizing computationis important

Dataset |V | |E | Regular size Single Machine∗

Live Journal 4,847,571 68,993,773 1.08GB 6.3GBUSA Road 23,947,347 58,333,344 951MB 9.09GBTwitter 41,652,230 1,468,365,182 26GB 128 GBUK0705 82,240,700 2,829,101,180 48GB 247GBWorld Road 682,496,072 717,016,716 15GB 194GBCommonCrawl2014 1,727,000,000 64,422,000,000 1.3TB Out of memory

∗ Using (PowerLyra)

We focus on parallel graph analytics systems

© M. Tamer Ozsu TD-LSG (2018/08/31) 10 / 46

Page 18: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Graph Partitioning

Edge-cut (vertex-disjoint)Achieve disjoint partitions by allocating each vertex to a partitionObjective 1: Partitions should be balancedObjective 2: Minimize edge-cuts (to reduce communication)Good for graphs with low-degree vertices, not for power-law graphsExamples: Hashing, METIS [Karypis and Kumar, 1995], labelpropagation algorithms [Ugander and Backstrom, 2013]

Vertex-cut (edge-disjoint)Achieve disjoint partitions by allocating each edge to a partition andreplicating vertices as necessaryObjective 1: Partitions should be balancedObjective 2: Minimize vertex-cuts (to reduce replica cost)Perform better on power-law graphsExamples: Hashing, greedy algorithms [Gonzalez et al., 2012]

HybridEdge-cut for low-degree vertices/vertex-cut for high-degree onesPowerLyra [Chen et al., 2015]

© M. Tamer Ozsu TD-LSG (2018/08/31) 11 / 46

Page 19: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Graph Partitioning

Edge-cut (vertex-disjoint)Achieve disjoint partitions by allocating each vertex to a partitionObjective 1: Partitions should be balancedObjective 2: Minimize edge-cuts (to reduce communication)Good for graphs with low-degree vertices, not for power-law graphsExamples: Hashing, METIS [Karypis and Kumar, 1995], labelpropagation algorithms [Ugander and Backstrom, 2013]

Vertex-cut (edge-disjoint)Achieve disjoint partitions by allocating each edge to a partition andreplicating vertices as necessaryObjective 1: Partitions should be balancedObjective 2: Minimize vertex-cuts (to reduce replica cost)Perform better on power-law graphsExamples: Hashing, greedy algorithms [Gonzalez et al., 2012]

HybridEdge-cut for low-degree vertices/vertex-cut for high-degree onesPowerLyra [Chen et al., 2015]

© M. Tamer Ozsu TD-LSG (2018/08/31) 11 / 46

Page 20: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Graph Partitioning

Edge-cut (vertex-disjoint)Achieve disjoint partitions by allocating each vertex to a partitionObjective 1: Partitions should be balancedObjective 2: Minimize edge-cuts (to reduce communication)Good for graphs with low-degree vertices, not for power-law graphsExamples: Hashing, METIS [Karypis and Kumar, 1995], labelpropagation algorithms [Ugander and Backstrom, 2013]

Vertex-cut (edge-disjoint)Achieve disjoint partitions by allocating each edge to a partition andreplicating vertices as necessaryObjective 1: Partitions should be balancedObjective 2: Minimize vertex-cuts (to reduce replica cost)Perform better on power-law graphsExamples: Hashing, greedy algorithms [Gonzalez et al., 2012]

HybridEdge-cut for low-degree vertices/vertex-cut for high-degree onesPowerLyra [Chen et al., 2015]

© M. Tamer Ozsu TD-LSG (2018/08/31) 11 / 46

Page 21: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Graph Workloads

Online graph querying

Reachability

Single source shortest-path

Subgraph matching

SPARQL queries

Offline graph analytics

PageRank

Clustering

Connected components

Diameter finding

Graph colouring

All pairs shortest path

Graph pattern mining

Machine learning algorithms(Belief propagation, Gaussiannon-negative matrixfactorization)

© M. Tamer Ozsu TD-LSG (2018/08/31) 12 / 46

Page 22: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Outline

1 Introduction – Graph Types2 Property Graph Processing

ClassificationOnline queryingOffline analytics

3 Graph Analytics ApproachesMapReduce & VariantsClassification of NativeApproaches

4 Graph Analytics Systems

5 OLAP-Style AnalyticsGraph SummarizationSnapshot-basedAggregationGraph CubePagrolGagg Model

© M. Tamer Ozsu TD-LSG (2018/08/31) 13 / 46

Page 23: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Reachability Queries

film 2014(initial release date, “1980-05-23”)

(label, “The Shining”)(music contributor, music contributor/4110)

(language, (iso639 3/eng)(label, “English”)

(usedIn, iso3166/CA)(usesScript, script/latn))

books 0743424425(rating, 4.7)

StephenKing

(creator)

offers 0743424425amazonOffer

geo 2635167(name, “United Kingdom”)

(population, 62348447)

UnitedKingdom

(wikipediaArticle)

actor 29704(actor name, “Jack Nicholson”)

film 3418(label, “The Passenger”)

film 1267(label, “The Last Tycoon”)

director 8476(director name, “Stanley Kubrick”)

film 2685(label, “A Clockwork Orange”)

film 424(label, “Spartacus”)

actor 30013(actor name, “Shelley Duvall”)

(relatedBook)

(hasOffer)

(based near)(actor)

(director) (actor)

(actor) (actor)

(director) (director)

© M. Tamer Ozsu TD-LSG (2018/08/31) 14 / 46

Page 24: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Reachability Queries

film 2014(initial release date, “1980-05-23”)

(label, “The Shining”)(music contributor, music contributor/4110)

(language, (iso639 3/eng)(label, “English”)

(usedIn, iso3166/CA)(usesScript, script/latn))

books 0743424425(rating, 4.7)

StephenKing

(creator)

offers 0743424425amazonOffer

geo 2635167(name, “United Kingdom”)

(population, 62348447)

UnitedKingdom

(wikipediaArticle)

actor 29704(actor name, “Jack Nicholson”)

film 3418(label, “The Passenger”)

film 1267(label, “The Last Tycoon”)

director 8476(director name, “Stanley Kubrick”)

film 2685(label, “A Clockwork Orange”)

film 424(label, “Spartacus”)

actor 30013(actor name, “Shelley Duvall”)

(relatedBook)

(hasOffer)

(based near)(actor)

(director) (actor)

(actor) (actor)

(director) (director)

Can you reach film 1267 from film 2014?

© M. Tamer Ozsu TD-LSG (2018/08/31) 14 / 46

Page 25: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Reachability Queries

film 2014(initial release date, “1980-05-23”)

(label, “The Shining”)(music contributor, music contributor/4110)

(language, (iso639 3/eng)(label, “English”)

(usedIn, iso3166/CA)(usesScript, script/latn))

books 0743424425(rating, 4.7)

StephenKing

(creator)

offers 0743424425amazonOffer

geo 2635167(name, “United Kingdom”)

(population, 62348447)

UnitedKingdom

(wikipediaArticle)

actor 29704(actor name, “Jack Nicholson”)

film 3418(label, “The Passenger”)

film 1267(label, “The Last Tycoon”)

director 8476(director name, “Stanley Kubrick”)

film 2685(label, “A Clockwork Orange”)

film 424(label, “Spartacus”)

actor 30013(actor name, “Shelley Duvall”)

(relatedBook)

(hasOffer)

(based near)(actor)

(director) (actor)

(actor) (actor)

(director) (director)

Is there a book whose rating is > 4.0 associated with a film thatwas directed by Stanley Kubrick?

© M. Tamer Ozsu TD-LSG (2018/08/31) 14 / 46

Page 26: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Reachability Queries

Think of Facebook graph and finding friends of friends.

© M. Tamer Ozsu TD-LSG (2018/08/31) 14 / 46

Page 27: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Subgraph Matching

?m ?dmovie:director

?name

rdfs:label

?b

movie:relatedBook

“Stanley Kubrick”

movie:director name

?rrev:rating

FILTER(?r > 4.0)

mdb:film/2014

“1980-05-23”

movie:initial release date

“The Shining”refs:label

bm:books/0743424425

4.7

rev:rating

bm:offers/0743424425amazonOffer

geo:2635167

“United Kingdom”

gn:name

62348447

gn:population

mdb:actor/29704

“Jack Nicholson”

movie:actor name

mdb:film/3418

“The Passenger”

refs:label

mdb:film/1267

“The Last Tycoon”

refs:label

mdb:director/8476

“Stanley Kubrick”

movie:director name

mdb:film/2685

“A Clockwork Orange”

refs:label

mdb:film/424

“Spartacus”

refs:label

mdb:actor/30013

movie:relatedBook

scam:hasOffer

foaf:based nearmovie:actor

movie:directormovie:actor

movie:actor movie:actor

movie:director movie:director

SubgraphM

atching

© M. Tamer Ozsu TD-LSG (2018/08/31) 15 / 46

Page 28: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Outline

1 Introduction – Graph Types2 Property Graph Processing

ClassificationOnline queryingOffline analytics

3 Graph Analytics ApproachesMapReduce & VariantsClassification of NativeApproaches

4 Graph Analytics Systems

5 OLAP-Style AnalyticsGraph SummarizationSnapshot-basedAggregationGraph CubePagrolGagg Model

© M. Tamer Ozsu TD-LSG (2018/08/31) 16 / 46

Page 29: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

PageRank Computation

A web page is important if it is pointed to by other importantpages.

P1 P2

P3

P5P6

P4

r(Pi ) = (1− d) + d∑

Pj∈BPi

r(Pj)

|FPj|

(let d = 1)

r(P2) =r(P1)

2+

r(P3)

3

rk+1(Pi ) =∑

Pj∈BPi

rk(Pj)

|FPj|

BPi: in-neighbours of Pi

FPi: out-neighbours of Pi

© M. Tamer Ozsu TD-LSG (2018/08/31) 17 / 46

Page 30: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

PageRank Computation

A web page is important if it is pointed to by other importantpages.

P1 P2

P3

P5P6

P4

rk+1(Pi ) =∑

Pj∈BPi

rk(Pj)

|FPj|

Iteration 0 Iteration 1 Iteration 2Rank atIter. 2

r0(P1) = 1/6 r1(P1) = 1/18 r2(P1) = 1/36 5r0(P2) = 1/6 r1(P2) = 5/36 r2(P2) = 1/18 4r0(P3) = 1/6 r1(P3) = 1/12 r2(P3) = 1/36 5r0(P4) = 1/6 r1(P4) = 1/4 r2(P4) = 17/72 1r0(P5) = 1/6 r1(P5) = 5/36 r2(P5) = 11/72 3r0(P6) = 1/6 r1(P6) = 1/6 r2(P6) = 14/72 2

Iterative processingTouch each vertex

© M. Tamer Ozsu TD-LSG (2018/08/31) 17 / 46

Page 31: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Outline

1 Introduction – Graph Types2 Property Graph Processing

ClassificationOnline queryingOffline analytics

3 Graph Analytics ApproachesMapReduce & VariantsClassification of NativeApproaches

4 Graph Analytics Systems

5 OLAP-Style AnalyticsGraph SummarizationSnapshot-basedAggregationGraph CubePagrolGagg Model

© M. Tamer Ozsu TD-LSG (2018/08/31) 18 / 46

Page 32: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Outline

1 Introduction – Graph Types2 Property Graph Processing

ClassificationOnline queryingOffline analytics

3 Graph Analytics ApproachesMapReduce & VariantsClassification of NativeApproaches

4 Graph Analytics Systems

5 OLAP-Style AnalyticsGraph SummarizationSnapshot-basedAggregationGraph CubePagrolGagg Model

© M. Tamer Ozsu TD-LSG (2018/08/31) 19 / 46

Page 33: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Can MapReduce be Used for Graph Analytics?

Yes; map and reduce functions can be written for graph analyticsworkloads

Scalable Graph processing Class SGC [Qin et al., 2014]Connected component computation [Kiveris et al., 2014; Rastogi et al.,2013]

Not suitable for iterative processing due to data movement at eachstage

No guarantee that computation will be assigned to the same workernodes in the next round

High I/O cost

Need to save in storage system (HDFS) intermediate results of eachiteration

© M. Tamer Ozsu TD-LSG (2018/08/31) 20 / 46

Page 34: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Can MapReduce be Used for Graph Analytics?

Yes; map and reduce functions can be written for graph analyticsworkloads

Scalable Graph processing Class SGC [Qin et al., 2014]Connected component computation [Kiveris et al., 2014; Rastogi et al.,2013]

Not suitable for iterative processing due to data movement at eachstage

No guarantee that computation will be assigned to the same workernodes in the next round

High I/O cost

Need to save in storage system (HDFS) intermediate results of eachiteration

There are systems that address these concerns

HaLoop [Bu et al., 2010, 2012]GraphX over Spark [Gonzalez et al., 2014]

© M. Tamer Ozsu TD-LSG (2018/08/31) 20 / 46

Page 35: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Spark System

Spark objectives

Better support for iterative programsProvide a complete ecosystemSimilar abstraction (to MapReduce) for programmingMaintain MapReduce fault-tolerance and scalability

Fundamental concepts

RDD: Reliable Distributed DatasetsCaching of working setMaintaining lineage for fault-tolerance

© M. Tamer Ozsu TD-LSG (2018/08/31) 21 / 46

Page 36: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Spark System

Spark objectives

Better support for iterative programsProvide a complete ecosystemSimilar abstraction (to MapReduce) for programmingMaintain MapReduce fault-tolerance and scalability

Fundamental concepts

RDD: Reliable Distributed DatasetsCaching of working setMaintaining lineage for fault-tolerance

© M. Tamer Ozsu TD-LSG (2018/08/31) 21 / 46

Page 37: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

GraphX [Gonzalez et al., 2014]

Built on top of Spark

Objective is to combine data analytics with graph processing

Unify computation on tables and graphs

Carefully convert graph to tabular representation

Native GraphX API or can accommodate vertex-centric computation

NativeSparkApps

SparkSQL

SparkStreaming

MLlib(machinelearning)

GraphX(graph

processing)

Apache Spark

Vertex-centric API

AppApp

App App

© M. Tamer Ozsu TD-LSG (2018/08/31) 22 / 46

Page 38: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

GraphX: Representation of Graphs as Tables

A

B

C

D

E

F

G

H

I

J

© M. Tamer Ozsu TD-LSG (2018/08/31) 23 / 46

Page 39: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

GraphX: Representation of Graphs as Tables

Partition 1

Partition 2

A

B

C

D

E

F

G

H

I

J

Edge-disjointpartitioning

© M. Tamer Ozsu TD-LSG (2018/08/31) 23 / 46

Page 40: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

GraphX: Representation of Graphs as Tables

Partition 1

Partition 2

Mac

hin

e1

Mac

hin

e2

Vertex Table

(RDD)v-prop:vertex prop.

A

B

C

D

E

F

G

H

I

J

Edge-disjointpartitioning

A v-prop

B v-prop

...

I v-prop

D v-prop

E v-prop

F v-prop

J v-prop

© M. Tamer Ozsu TD-LSG (2018/08/31) 23 / 46

Page 41: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

GraphX: Representation of Graphs as Tables

Partition 1

Partition 2

Mac

hin

e1

Mac

hin

e2

Vertex Table

(RDD)v-prop:vertex prop.

Edge Table

(RDD)e-prop:edge prop.

A

B

C

D

E

F

G

H

I

J

Edge-disjointpartitioning

A v-prop

B v-prop

...

I v-prop

D v-prop

E v-prop

F v-prop

J v-prop

A e-prop B

A e-prop C

...

F e-prop G

A e-prop D

A e-prop E...

E e-prop F

© M. Tamer Ozsu TD-LSG (2018/08/31) 23 / 46

Page 42: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

GraphX: Representation of Graphs as Tables

Partition 1

Partition 2

Mac

hin

e1

Mac

hin

e2

Vertex Table

(RDD)v-prop:vertex prop.

Edge Table

(RDD)e-prop:edge prop.

A

B

C

D

E

F

G

H

I

J

Edge-disjointpartitioning

A v-prop

B v-prop

...

I v-prop

D v-prop

E v-prop

F v-prop

J v-prop

A e-prop B

A e-prop C

...

F e-prop G

A e-prop D

A e-prop E...

E e-prop FJoining vertices

and edgesMove vertices to edges

© M. Tamer Ozsu TD-LSG (2018/08/31) 23 / 46

Page 43: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

GraphX: Representation of Graphs as Tables

Partition 1

Partition 2

Mac

hin

e1

Mac

hin

e2

Vertex Table

(RDD)v-prop:vertex prop.

Edge Table

(RDD)e-prop:edge prop.

RoutingTable

(RDD)

A

B

C

D

E

F

G

H

I

J

Edge-disjointpartitioning

A v-prop

B v-prop

...

I v-prop

D v-prop

E v-prop

F v-prop

J v-prop

A e-prop B

A e-prop C

...

F e-prop G

A e-prop D

A e-prop E...

E e-prop F

A 1 2

B 1

...

I 1

F 1 2

D 2

E 2

J 2

© M. Tamer Ozsu TD-LSG (2018/08/31) 23 / 46

Page 44: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Outline

1 Introduction – Graph Types2 Property Graph Processing

ClassificationOnline queryingOffline analytics

3 Graph Analytics ApproachesMapReduce & VariantsClassification of NativeApproaches

4 Graph Analytics Systems

5 OLAP-Style AnalyticsGraph SummarizationSnapshot-basedAggregationGraph CubePagrolGagg Model

© M. Tamer Ozsu TD-LSG (2018/08/31) 24 / 46

Page 45: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Classification of Graph Processing Systems [Han, 2015]

Programming model

Computation model

© M. Tamer Ozsu TD-LSG (2018/08/31) 25 / 46

Page 46: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Classification of Graph Processing Systems [Han, 2015]

Programming modelComputation model

Vertex-centric Partition-centric Edge-centric

Block

Synch

ronou

s

Paral

lel(B

SP)

Asynch

ronou

s

Gather

-Apply

Scatt

er(G

AS)

Programming Model

Com

pu

tati

onM

od

el

© M. Tamer Ozsu TD-LSG (2018/08/31) 25 / 46

Page 47: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Classification of Graph Processing Systems [Han, 2015]

Programming modelComputation model

Vertex-centric Partition-centric Edge-centric

Block

Synch

ronou

s

Paral

lel(B

SP)

Asynch

ronou

s

Gather

-Apply

Scatt

er(G

AS)

Programming Model

Com

pu

tati

onM

od

el

Vertex-centric

BSP

Vertex-centric

Asynchronous

Vertex-centric

GAS

Partition-centric

BSP

???

???

Edge-centric

BSP

???

???

© M. Tamer Ozsu TD-LSG (2018/08/31) 25 / 46

Page 48: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Programming Models

Vertex-centric

Computation on a vertex is thefocus“Think like a vertex”Vertex computation depends onits own state + states of itsneighborsCompute(vertex v)

GetValue(), WriteValue()

Partition-centric (Block-centric)

Edge-centric

?

© M. Tamer Ozsu TD-LSG (2018/08/31) 26 / 46

Page 49: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Programming Models

Vertex-centric

Partition-centric (Block-centric)

Computation on an entirepartition is specified“Think like a block” or “Thinklike a graph”Aim is to reduce thecommunication cost amongvertices

Edge-centric

© M. Tamer Ozsu TD-LSG (2018/08/31) 26 / 46

Page 50: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Programming Models

Vertex-centric

Partition-centric (Block-centric)

Edge-centric

Computation is specified on eachedge rather than on each vertex orblockCompute(edge e)

© M. Tamer Ozsu TD-LSG (2018/08/31) 26 / 46

Page 51: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]

Asynchronous Parallel

Gather-Apply-Scatter (GAS)

Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 52: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

Each machine performscomputationon its graph partition

At the end of each superstepresults are pushed to otherworkers

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3

Computation

Asynchronous ParallelGather-Apply-Scatter (GAS)

Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 53: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

Each machine performscomputationon its graph partition

At the end of each superstepresults are pushed to otherworkers

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3

Computation

Asynchronous ParallelGather-Apply-Scatter (GAS)

Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 54: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

Each machine performscomputationon its graph partition

At the end of each superstepresults are pushed to otherworkers

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3

Computation

Asynchronous ParallelGather-Apply-Scatter (GAS)

Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 55: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

Each machine performscomputationon its graph partition

At the end of each superstepresults are pushed to otherworkers

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3

Computation

Asynchronous ParallelGather-Apply-Scatter (GAS)

Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 56: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]

Asynchronous Parallel

Gather-Apply-Scatter (GAS)

Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 57: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]Asynchronous Parallel

No communication barriers. 3Uses the most recent values. 3Implemented via distributed locking

Consider vertex-centric program

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

v0

v1 v2

v3 v4

Compute State

Gather-Apply-Scatter (GAS)Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 58: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]Asynchronous Parallel

No communication barriers. 3Uses the most recent values. 3Implemented via distributed lockingConsider vertex-centric program

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

v0

v1 v2

v3 v4

Compute State

Gather-Apply-Scatter (GAS)Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 59: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]Asynchronous Parallel

No communication barriers. 3Uses the most recent values. 3Implemented via distributed lockingConsider vertex-centric program

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

v0

v1 v2

v3 v4

Compute State

Gather-Apply-Scatter (GAS)Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 60: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]Asynchronous Parallel

No communication barriers. 3Uses the most recent values. 3Implemented via distributed lockingConsider vertex-centric program

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

v0

v1 v2

v3 v4

Compute State

Gather-Apply-Scatter (GAS)Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 61: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]Asynchronous Parallel

No communication barriers. 3Uses the most recent values. 3Implemented via distributed lockingConsider vertex-centric program

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

v0

v1 v2

v3 v4

Compute State

Gather-Apply-Scatter (GAS)Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 62: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]Asynchronous Parallel

No communication barriers. 3Uses the most recent values. 3Implemented via distributed lockingConsider vertex-centric program

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

v0

v1 v2

v3 v4

Compute State

Gather-Apply-Scatter (GAS)Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 63: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]Asynchronous Parallel

No communication barriers. 3Uses the most recent values. 3Implemented via distributed lockingConsider vertex-centric program

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

v0

v1 v2

v3 v4

Compute State

Gather-Apply-Scatter (GAS)Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 64: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]Asynchronous Parallel

No communication barriers. 3Uses the most recent values. 3Implemented via distributed lockingConsider vertex-centric program

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

v0

v1 v2

v3 v4

Compute State

Gather-Apply-Scatter (GAS)Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 65: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Computational Models

Block Synchronous Parallel (BSP) [Valiant, 1990]

Asynchronous Parallel

Gather-Apply-Scatter (GAS)

Similar to BSP, but pull-basedGather: pull stateApply: Compute functionScatter: Update stateUpdates of states separated from scheduling

© M. Tamer Ozsu TD-LSG (2018/08/31) 27 / 46

Page 66: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Real-World Graph Characteristics

Read-world graphs have skewed vertex degree distribution

Common in power-law graphsProblem: imbalanced communication workloads

Real-world graphs have large diameters

Common in road networks, web graphs, terrain meshesProblem: one superstep per hop ⇒ too many supersteps

Real-world graphs have high average vertex degree

Common in social networks, mobile communication networksProblem: heavy average communication workloads

© M. Tamer Ozsu TD-LSG (2018/08/31) 28 / 46

Page 67: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Outline

1 Introduction – Graph Types2 Property Graph Processing

ClassificationOnline queryingOffline analytics

3 Graph Analytics ApproachesMapReduce & VariantsClassification of NativeApproaches

4 Graph Analytics Systems

5 OLAP-Style AnalyticsGraph SummarizationSnapshot-basedAggregationGraph CubePagrolGagg Model

© M. Tamer Ozsu TD-LSG (2018/08/31) 29 / 46

Page 68: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric BSP Systems

“Think like a vertex”

Compute(vertex v)

BSP Computation – push state toneighbor vertices at the end of eachsuperstep

Continue until all vertices areinactive

Vertex state machine

Example systems: Pregel [Malewiczet al., 2010], Apache Giraph, GPS[Salihoglu and Widom, 2013], Mizan[Khayyat et al., 2013], Trinity [?]

?

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3

Active Inactive

Vote halt

Message received

© M. Tamer Ozsu TD-LSG (2018/08/31) 30 / 46

Page 69: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric BSP Systems

“Think like a vertex”

Compute(vertex v)

BSP Computation – push state toneighbor vertices at the end of eachsuperstep

Continue until all vertices areinactive

Vertex state machine

Example systems: Pregel [Malewiczet al., 2010], Apache Giraph, GPS[Salihoglu and Widom, 2013], Mizan[Khayyat et al., 2013], Trinity [?]

?

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3

Active Inactive

Vote halt

Message received

© M. Tamer Ozsu TD-LSG (2018/08/31) 30 / 46

Page 70: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric BSP Systems

“Think like a vertex”

Compute(vertex v)

BSP Computation – push state toneighbor vertices at the end of eachsuperstep

Continue until all vertices areinactive

Vertex state machine

Example systems: Pregel [Malewiczet al., 2010], Apache Giraph, GPS[Salihoglu and Widom, 2013], Mizan[Khayyat et al., 2013], Trinity [?]

?

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3

Active Inactive

Vote halt

Message received

© M. Tamer Ozsu TD-LSG (2018/08/31) 30 / 46

Page 71: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric BSP Systems

“Think like a vertex”

Compute(vertex v)

BSP Computation – push state toneighbor vertices at the end of eachsuperstep

Continue until all vertices areinactive

Vertex state machine

Example systems: Pregel [Malewiczet al., 2010], Apache Giraph, GPS[Salihoglu and Widom, 2013], Mizan[Khayyat et al., 2013], Trinity [?]

?

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

CommunicationBarrier

CommunicationBarrier

Superstep 1 Superstep 2 Superstep 3

Active Inactive

Vote halt

Message received

© M. Tamer Ozsu TD-LSG (2018/08/31) 30 / 46

Page 72: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric Asynchronous Systems

“Think like a vertex”

Compute(vertex v)

Supersteps exist along withsynchronization barriers, but ...

Compute(vertex v) function cansee messages it was sent in the samesuperstep as well as those that comeat the end of the previous superstep

Consistency of vertex states:distributed locking

Consistency issues: no guaranteeabout input to Compute()

Example systems: GRACE [Wanget al., 2013], GiraphCU [Han andDaudjee, 2015]

?

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

© M. Tamer Ozsu TD-LSG (2018/08/31) 31 / 46

Page 73: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric Asynchronous Systems

“Think like a vertex”

Compute(vertex v)

Supersteps exist along withsynchronization barriers, but ...

Compute(vertex v) function cansee messages it was sent in the samesuperstep as well as those that comeat the end of the previous superstep

Consistency of vertex states:distributed locking

Consistency issues: no guaranteeabout input to Compute()

Example systems: GRACE [Wanget al., 2013], GiraphCU [Han andDaudjee, 2015]

?

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

© M. Tamer Ozsu TD-LSG (2018/08/31) 31 / 46

Page 74: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric Asynchronous Systems

“Think like a vertex”

Compute(vertex v)

Supersteps exist along withsynchronization barriers, but ...

Compute(vertex v) function cansee messages it was sent in the samesuperstep as well as those that comeat the end of the previous superstep

Consistency of vertex states:distributed locking

Consistency issues: no guaranteeabout input to Compute()

Example systems: GRACE [Wanget al., 2013], GiraphCU [Han andDaudjee, 2015]

?

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

© M. Tamer Ozsu TD-LSG (2018/08/31) 31 / 46

Page 75: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric Asynchronous Systems

“Think like a vertex”

Compute(vertex v)

Supersteps exist along withsynchronization barriers, but ...

Compute(vertex v) function cansee messages it was sent in the samesuperstep as well as those that comeat the end of the previous superstep

Consistency of vertex states:distributed locking

Consistency issues: no guaranteeabout input to Compute()

Example systems: GRACE [Wanget al., 2013], GiraphCU [Han andDaudjee, 2015]

?

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

© M. Tamer Ozsu TD-LSG (2018/08/31) 31 / 46

Page 76: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric Asynchronous Systems

“Think like a vertex”

Compute(vertex v)

Supersteps exist along withsynchronization barriers, but ...

Compute(vertex v) function cansee messages it was sent in the samesuperstep as well as those that comeat the end of the previous superstep

Consistency of vertex states:distributed locking

Consistency issues: no guaranteeabout input to Compute()

Example systems: GRACE [Wanget al., 2013], GiraphCU [Han andDaudjee, 2015]

?

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

© M. Tamer Ozsu TD-LSG (2018/08/31) 31 / 46

Page 77: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric Asynchronous Systems

“Think like a vertex”

Compute(vertex v)

Supersteps exist along withsynchronization barriers, but ...

Compute(vertex v) function cansee messages it was sent in the samesuperstep as well as those that comeat the end of the previous superstep

Consistency of vertex states:distributed locking

Consistency issues: no guaranteeabout input to Compute()

Example systems: GRACE [Wanget al., 2013], GiraphCU [Han andDaudjee, 2015]

?

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

© M. Tamer Ozsu TD-LSG (2018/08/31) 31 / 46

Page 78: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric Asynchronous Systems

“Think like a vertex”

Compute(vertex v)

Supersteps exist along withsynchronization barriers, but ...

Compute(vertex v) function cansee messages it was sent in the samesuperstep as well as those that comeat the end of the previous superstep

Consistency of vertex states:distributed locking

Consistency issues: no guaranteeabout input to Compute()

Example systems: GRACE [Wanget al., 2013], GiraphCU [Han andDaudjee, 2015]

?

Machine 1

Machine 2

Machine 3

Machine 1

Machine 2

Machine 3

© M. Tamer Ozsu TD-LSG (2018/08/31) 31 / 46

Page 79: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric GAS Systems

“Think like a vertex”Gather phase

Gather local computation fromneighbours if vertex v : called scope Sv

Apply phase

Compute(v, Sv)

Scatter phase

Compute(v, Sv) produces S′

v

(scattering state)

Example: GraphLab [Low et al., 2012]Synchronous version

Similar to vertex-centric BSP, exceptpulling Sv rather than pushing

Asynchronous version different

Sv

© M. Tamer Ozsu TD-LSG (2018/08/31) 32 / 46

Page 80: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric GAS Systems – Asynchronous

procedure GraphLab Async(G = (V ,E ,D), T )end procedure

while T is not empty dov ← RemoveNext(T )Compute(v ,Sv )→ (T ′ ,S ′v )T ← T ∪ T ′

end while

return Modified G = (V ,E ,D′)

end procedure

Graph mutation restricted to vertex states

Computing S′v updates (scatters) states of

the vertices in scope

Computation of new states S′v separated

from computation of new T ′

RemoveNext() can remove any vertex →scheduling separated from state scatter

SvSv

© M. Tamer Ozsu TD-LSG (2018/08/31) 33 / 46

Page 81: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric GAS Systems – Asynchronous

procedure GraphLab Async(G = (V ,E ,D), T )end procedurewhile T is not empty do

v ← RemoveNext(T )Compute(v ,Sv )→ (T ′ ,S ′v )T ← T ∪ T ′

end whilereturn Modified G = (V ,E ,D

′)

end procedure

Graph mutation restricted to vertex states

Computing S′v updates (scatters) states of

the vertices in scope

Computation of new states S′v separated

from computation of new T ′

RemoveNext() can remove any vertex →scheduling separated from state scatter

SvSv

© M. Tamer Ozsu TD-LSG (2018/08/31) 33 / 46

Page 82: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric GAS Systems – Asynchronous

procedure GraphLab Async(G = (V ,E ,D), T )end procedurewhile T is not empty do

v ← RemoveNext(T )

Compute(v ,Sv )→ (T ′ ,S ′v )T ← T ∪ T ′

end whilereturn Modified G = (V ,E ,D

′)

end procedure

Graph mutation restricted to vertex states

Computing S′v updates (scatters) states of

the vertices in scope

Computation of new states S′v separated

from computation of new T ′

RemoveNext() can remove any vertex →scheduling separated from state scatter

Sv

Sv

© M. Tamer Ozsu TD-LSG (2018/08/31) 33 / 46

Page 83: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric GAS Systems – Asynchronous

procedure GraphLab Async(G = (V ,E ,D), T )end procedurewhile T is not empty do

v ← RemoveNext(T )Compute(v ,Sv )→ (T ′ ,S ′v )

T ← T ∪ T ′

end whilereturn Modified G = (V ,E ,D

′)

end procedure

Graph mutation restricted to vertex states

Computing S′v updates (scatters) states of

the vertices in scope

Computation of new states S′v separated

from computation of new T ′

RemoveNext() can remove any vertex →scheduling separated from state scatter

Sv

Sv

© M. Tamer Ozsu TD-LSG (2018/08/31) 33 / 46

Page 84: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Vertex-Centric GAS Systems – Asynchronous

procedure GraphLab Async(G = (V ,E ,D), T )end procedurewhile T is not empty do

v ← RemoveNext(T )Compute(v ,Sv )→ (T ′ ,S ′v )T ← T ∪ T ′

end whilereturn Modified G = (V ,E ,D

′)

end procedure

Graph mutation restricted to vertex states

Computing S′v updates (scatters) states of

the vertices in scope

Computation of new states S′v separated

from computation of new T ′

RemoveNext() can remove any vertex →scheduling separated from state scatter

SvSv

© M. Tamer Ozsu TD-LSG (2018/08/31) 33 / 46

Page 85: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Partition- (Block-)Centric BSP Systems

Blogel [Yan et al., 2014]: “Think like a block”; also “think like agraph” [Tian et al., 2013]

Better handles the characteristics of real-world graphs by reducingcommunication

Exploit the partitioning of the graph

Message exchanges only among blocks

Within a block, run a serial in-memory algorithm; BSP betweenpartitions

© M. Tamer Ozsu TD-LSG (2018/08/31) 34 / 46

Page 86: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Benefits of Partition- (Block-)Centric BSP

High-degree vertices inside a block send no messages

Fewer number of supersteps

Fewer number of blocks than vertices

© M. Tamer Ozsu TD-LSG (2018/08/31) 35 / 46

Page 87: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Edge-Centric BSP Systems

“Think like an edge”

Compute(edge e)

Number of edges � number of vertices

More computation but perhaps fewer messagesOperate on unsorted sequence of edges ⇒ no random access

X-Stream [Roy et al., 2013]

© M. Tamer Ozsu TD-LSG (2018/08/31) 36 / 46

Page 88: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Outline

1 Introduction – Graph Types2 Property Graph Processing

ClassificationOnline queryingOffline analytics

3 Graph Analytics ApproachesMapReduce & VariantsClassification of NativeApproaches

4 Graph Analytics Systems

5 OLAP-Style AnalyticsGraph SummarizationSnapshot-basedAggregationGraph CubePagrolGagg Model

© M. Tamer Ozsu TD-LSG (2018/08/31) 37 / 46

Page 89: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

OLAP Over Graphs

OLAP in RDBMS

Usage: Data Warehousing + Business IntelligenceModel: Multidimensional cubeOperations: Roll-up, drill-down, and slice and dice

Analytics that we discussed over graphs is much different

Can we do OLAP-style analytics over graphs?There is some work

Graph summarization [Tian et al., 2008]Snapshot-based Aggregation [Chen et al., 2008]Graph Cube [Zhao et al., 2011]Pagrol [?]Gagg Model [Maali et al., 2015]

© M. Tamer Ozsu TD-LSG (2018/08/31) 38 / 46

Page 90: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Acknowledgements

This presentation draws upon collaborative research and discussions withthe following colleagues

Khaled Ammar, U. Waterloo

Khuzaima Daudjee,U. Waterloo

Xiaofei Zhang, U. Waterloo (U. Memphis)

Young Han, U. Waterloo (Google)

© M. Tamer Ozsu TD-LSG (2018/08/31) 39 / 46

Page 91: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

Thank you!

© M. Tamer Ozsu TD-LSG (2018/08/31) 40 / 46

Page 92: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

© M. Tamer Ozsu TD-LSG (2018/08/31) 41 / 46

Page 93: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

References I

Bu, Y., Howe, B., Balazinska, M., and Ernst, M. D. (2010). HaLoop: efficient iterativedata processing on large clusters. Proc. VLDB Endowment, 3(1):285–296. Availablefrom: http://dl.acm.org/citation.cfm?id=1920841.1920881.

Bu, Y., Howe, B., Balazinska, M., and Ernst, M. D. (2012). The HaLoop approach tolarge-scale iterative data analysis. VLDB J., 21(2):169–190.

Chen, C., Yan, X., Zhu, F., Han, J., and Yu, P. S. (2008). Graph OLAP: Towards onlineanalytical processing on graphs. In Proc. 8th IEEE Int. Conf. on Data Mining, pages103–112.

Chen, R., Shi, J., Chen, Y., and Chen, H. (2015). PowerLyra: Differentiated graphcomputation and partitioning on skewed graphs. In Proc. 10th ACMSIGOPS/EuroSys European Conf. on Comp. Syst., pages 1:1–1:15. Available from:http://doi.acm.org/10.1145/2741948.2741970.

Gonzalez, J. E., Low, Y., Gu, H., Bickson, D., and Guestrin, C. (2012). PowerGraph:Distributed graph-parallel computation on natural graphs. In Proc. 10th USENIXSymp. on Operating System Design and Implementation, pages 17–30. Availablefrom: http://dl.acm.org/citation.cfm?id=2387880.2387883.

© M. Tamer Ozsu TD-LSG (2018/08/31) 42 / 46

Page 94: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

References II

Gonzalez, J. E., Xin, R. S., Dave, A., Crankshaw, D., Franklin, M. J., and Stoica, I.(2014). GraphX: graph processing in a distributed dataflow framework graphprocessing in a distributed dataflow framework. In Proc. 11th USENIX Symp. onOperating System Design and Implementation, pages 599–613. Available from:https://www.usenix.org/conference/osdi14/technical-sessions/

presentation/gonzalez.

Han, M. (2015). On improving distributed Pregel-like graph processing systems.Master’s thesis, University of Waterloo, David R. Cheriton School of ComputerScience.

Han, M. and Daudjee, K. (2015). Giraph unchained: Barrierless asynchronous parallelexecution in Pregel-like graph processing systems. Proc. VLDB Endowment,8(9):950–961. Available from: http://www.vldb.org/pvldb/vol8/p950-han.pdf.

Karypis, G. and Kumar, V. (1995). Multilevel graph partitioning schemes. In Proc. 1995Int. Conf. on Parallel Processing, pages 113–122.

Khayyat, Z., Awara, K., Alonazi, A., Jamjoom, H., Williams, D., and Kalnis, P. (2013).Mizan: A system for dynamic load balancing in large-scale graph processing. In Proc.8th ACM SIGOPS/EuroSys European Conf. on Comp. Syst., pages 169–182.Available from: http://doi.acm.org/10.1145/2465351.2465369.

© M. Tamer Ozsu TD-LSG (2018/08/31) 43 / 46

Page 95: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

References III

Kiveris, R., Lattanzi, S., Mirrokni, V., Rastogi, V., and Vassilvitskii, S. (2014).Connected components in MapReduce and beyond. In Proc. 5th ACM Symp. onCloud Computing, pages 18:1–18:13.

Lin, J. (2018). Scale up or scale out for graph processing? IEEE Internet Comput.,22(3):72–78.

Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., and Hellerstein, J. M.(2012). Distributed graphlab: A framework for machine learning in the cloud. Proc.VLDB Endowment, 5(8):716–727.

Maali, F., Campinas, S., and Decker, S. (2015). Gagg: A graph aggregation operator. InProc. 14th Int. Semantic Web Conf., pages 491–504.

Malewicz, G., Austern, M. H., Bik, A. J. C., Dehnert, J. C., Horn, I., Leiser, N., andCzajkowski, G. (2010). Pregel: a system for large-scale graph processing. In Proc.ACM SIGMOD Int. Conf. on Management of Data, pages 135–146.

Qin, L., Yu, J. X., Chang, L., Cheng, H., Zhang, C., and Lin, X. (2014). Scalable biggraph processing in mapreduce. In Proc. ACM SIGMOD Int. Conf. on Managementof Data, pages 827–838. Available from:http://doi.acm.org/10.1145/2588555.2593661.

© M. Tamer Ozsu TD-LSG (2018/08/31) 44 / 46

Page 96: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

References IV

Rastogi, V., Machanavajjhala, A., Chitnis, L., and Sarma, A. D. (2013). Findingconnected components in map-reduce in logarithmic rounds. In Proc. 29th Int. Conf.on Data Engineering, pages 50–61.

Roy, A., Mihailovic, I., and Zwaenepoel, W. (2013). X-stream: edge-centric graphprocessing using streaming partitions. In Proc. 24th ACM Symp. on OperatingSystem Principles, pages 472–488. Available from:http://doi.acm.org/10.1145/2517349.2522740.

Salihoglu, S. and Widom, J. (2013). GPS: a graph processing system. In Proc. 25th Int.Conf. on Scientific and Statistical Database Management, pages 22:1–22:12.Available from: http://doi.acm.org/10.1145/2484838.2484843.

Tian, Y., Balmin, A., Corsten, S. A., Tatikonda, S., and McPherson, J. (2013). From“think like a vertex” to “think like a graph”. Proc. VLDB Endowment, 7(3):193–204.Available from: http://www.vldb.org/pvldb/vol7/p193-tian.pdf.

Tian, Y., Hankins, R. A., and Patel, J. M. (2008). Efficient aggregation for graphsummarization. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages567–580.

© M. Tamer Ozsu TD-LSG (2018/08/31) 45 / 46

Page 97: An Introduction to Graph Analytics Platformstozsu/presentations/Graph-analytics-shortes… · An Introduction to Graph Analytics Platforms (Very Short Version) M. Tamer Ozsu ... 1

References V

Ugander, J. and Backstrom, L. (2013). Balanced label propagation for partitioningmassive graphs. In Proc. 6th ACM Int. Conf. Web Search and Data Mining, pages507–516. Available from: http://doi.acm.org/10.1145/2433396.2433461.

Valiant, L. G. (1990). A bridging model for parallel computation. Commun. ACM,33(8):103–111.

Wang, G., Xie, W., Demers, A. J., and Gehrke, J. (2013). Asynchronous large-scalegraph processing made easy. In Proc. 6th Biennial Conf. on Innovative Data SystemsResearch.

Yan, D., Cheng, J., Lu, Y., and Ng, W. (2014). Blogel: A block-centric framework fordistributed computation on real-world graphs. Proc. VLDB Endowment,7(14):1981–1992. Available from:http://www.vldb.org/pvldb/vol7/p1981-yan.pdf.

Zhao, P., Li, X., Xin, D., and Han, J. (2011). Graph cube: on warehousing and OLAPmultidimensional networks. In Proc. ACM International Conference on Managementof Data, pages 853–864.

© M. Tamer Ozsu TD-LSG (2018/08/31) 46 / 46