Pregel and GraphX - Stanford Universityrezab/classes/cme323/S16/notes/Lecture16/Pregel... ·...

Reza Zadeh

Pregel and GraphX

@Reza_Zadeh | http://reza-zadeh.com

Overview Graph Computations and Pregel" Introduction to Matrix Computations

Graph Computations and Pregel

Data Flow Models Restrict the programming interface so that the system can do more automatically Express jobs as graphs of high-level operators » System picks how to split each operator into tasks

and where to run each task » Run parts twice fault recovery

New example: Pregel (parallel graph google)

Pregel

Expose specialized APIs to simplify graph programming.

“Think like a vertex”

Graph-Parallel Pattern

Model / Alg. State

Computation depends only on the neighbors

PregelDataFlowInputgraph Vertexstate1 Messages1

Superstep1

Vertexstate2 Messages2

Superstep2

GroupbyvertexID

SimplePregelinSparkSeparateRDDsforimmutablegraphstateandforvertexstatesandmessagesateachiteration

UsegroupByKeytoperformeachstep

CachetheresultingvertexandmessageRDDs

Optimization:co-partitioninputgraphandvertexstateRDDstoreducecommunication

Update ranks in parallel Iterate until convergence

Rank of user i Weighted sum of

neighbors’ ranks

Example: PageRank R[i] = 0.15 +

j2Nbrs(i)

wjiR[j]

PageRankinPregelInputgraph Vertexranks1 Contributions1

Superstep1(addcontribs)

Vertexranks2 Contributions2

Superstep2(addcontribs)

Group&addbyvertex

GraphX

ProvidesPregelmessage-passingandotheroperatorsontopofRDDs

GraphX: Properties

GraphX: Triplets The triplets operator joins vertices and edges:

Map Reduce Triplets Map-Reduce for each vertex

mapF( )A B

mapF( )A C

reduceF( , )A1 A2 A

Example: Oldest Follower

CWhat is the age of the oldest follower for each user? val oldestFollowerAge = graph .mrTriplets( e=> (e.dst.id, e.src.age),//Map (a,b)=> max(a, b) //Reduce ) .vertices

Summary of Operators All operations: https://spark.apache.org/docs/latest/graphx-programming-guide.html#summary-list-of-operators

Pregel API: https://spark.apache.org/docs/latest/graphx-programming-guide.html#pregel-api

The GraphX Stack"(Lines of Code)

GraphX (3575)

Pregel (28) + GraphLab (50)

PageRank (5)

Connected Comp. (10)

Shortest Path (10)

ALS(40) LDA

K-core(51) Triangle

Count(45)

SVD(40)

Optimizations Overloaded vertices have their work distributed

Optimizations

More examples In your HW: Single-Source-Shortest Paths using Pregel

Distributing Matrix Computations

Distributing Matrices How to distribute a matrix across machines? »  By Entries (CoordinateMatrix) »  By Rows (RowMatrix)

»  By Blocks (BlockMatrix) All of Linear Algebra to be rebuilt using these partitioning schemes

Asofversion1.3

Distributing Matrices Even the simplest operations require thinking about communication e.g. multiplication

How many different matrix multiplies needed? »  At least one per pair of {Coordinate, Row,

Block, LocalDense, LocalSparse} = 10 »  More because multiplies not commutative

Block Matrix Multiplication Let’s look at Block Matrix Multiplication

(on the board and on GitHub)

Pregel and GraphX - Stanford Universityrezab/classes/cme323/S16/notes/Lecture16/Pregel... ·...

Documents

Transcript of Pregel and GraphX - Stanford Universityrezab/classes/cme323/S16/notes/Lecture16/Pregel... ·...

Cost Model for Pregel on GraphX

PREGEL - Apache Software Foundationpeople.apache.org/~edwardyoon/documents/pregel.pdf · Pregel 2 . Characteristics of the algorithms ... •Single-computer graph algorithm libraries

Pregel reading circle

Pregel and giraph

087 Pagerank Mapreduce Pregel

CATALOG - PreGel · Manufacturing plant – ITALY, Reggio Emilia Manufacturing plant – USA, Charlotte THE COMPANY Founded in the province of Reggio Emilia, Italy in 1967, PreGel

On Improving Distributed Pregel-like Graph Processing Systems

Pregel : A System for Large-Scale Graph Processing

Processing large-scale graphs with Google Pregel

Pregel: A System for Large-Scale Graph Processingtozsu/courses/CS848/W15/presentations/... · Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern,

Pregel Algorithms for Graph Connectivity Problems with ... · Pregel Algorithms for Graph Connectivity Problems with Performance Guarantees Da Yan #1, James Cheng #2, Kai Xing 3,

CATALOG - PreGel Family

PREGEL - home.apache.org

PAstRy bAses - PreGel America › pga_collateral › PreGel...PAstRy GelAtIN bAse Code: 311008 PreGel 5-Star Chef Pastry Select™ Pastry Gelatin Base 8 bags x 3.3 lbs A mousse base

An Experimental Comparison of Pregel-like Graph ... - … · An Experimental Comparison of Pregel-like Graph Processing Systems Minyang Han, Khuzaima Daudjee, Khaled Ammar, M. Tamer

Intro to Apache Spark - Stanford Universityrezab/classes/cme323/S15/slides/... · 2016-02-28 · 2002 2002 MapReduce @ Google 2004 MapReduce paper 2006 Hadoop @ Yahoo! 2004 2006 2008

Large Scale Graph Processing - Pregel and GraphLab · Large Scale Graph Processing - Pregel and GraphLab Amir H. Payberah payberah@kth.se 01/10/2018

Edmonds’ Blossom Algorithm - Stanford Universityrezab/classes/cme323/S16/projects_reports/shoemaker_vare.pdfEdmonds’ Blossom algorithm is a polynomial time algorithm for ﬁnding

Pregel Algorithms for Graph Connectivity Problems with ...

Distributed Computing with Spark - Stanford Universityrezab/classes/cme323/S15/slides/lec1.pdf · Problem Data growing faster than processing speeds Only solution is to parallelize