Scalable Big Graph Processing in Map Reduce · Scalable Big Graph Processing in Map Reduce Lu Qin,...

Scalable Big Graph Processing in Map Reduce

Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Chengqi Zhang,Xuemin Lin,

Presented by Megan Bryant

College of William and Mary

February 11, 2015

Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Chengqi Zhang, Xuemin Lin, Presented by Megan Bryant (College of William and Mary)Scalable Big Graph Processing in Map Reduce February 11, 2015 1 / 60

Overview

In this presentation, we will be introduced to methods for scalable biggraph processing in MapReduce.

Specifically, we will be introduced with a new class SGC which has thepotential to guide the development of scalable graph processing algorithmin MapReduce.

Two new graph join operators will also be introduced which will greatlyenhance the capabilities of the SGC class.

Finally, we will compare the performance of these three classes on severalscalable graph algorithms.


Computational Complexity

Computational complexity theory provides a framework and a set ofanalysis tools for gauging the work performed by an algorithm as measuredby the elementary (i.e. basic) operations it performs.

The different basic steps (operations) that an algorithm typically takesare:

Assignment (e.g. assigning some value to a variable)

Arithmetic (e.g. addition, subtraction, multiplication, and division)

Logical (e.g. comparison of two numbers)


Big-O Notation

We utilize Big-O notation to define the complexity of an algorithm.

Definition

An algorithm is said to run in O(f(n)) time if for some numbers c and n0,the time taken by the algorithm is at most cf(n) for all n ≥ n0 for someconstant c.

This is an example of worst case analysis, which is independent ofcomputing environment, relatively easy to perform, and providing an upperbound on the maximum number of steps an running time an algorithmmust take.


Big-O Complexity


Common Complexities

The following table contains the complexities of common algorithms.

Algorithm Data Structure Time SpaceComplexity Complexity

Depth First Search Graph w/n nodes O(n+m) O(m)and n nodes

Breadth First Search Graph w/n nodes O(n+m) O(m)and m nodes

Binary Search Sorted array O(log(n)) O(1)

Dijkstra’s Shortest Graph w/m nodes O(n2) O(n)Path (unsorted array) and n nodes


Algorithm Classes in Map Reduce

There are currently two main algorithm classes in the MapReduceparadigm:

The MapReduce Class (MRC).

The Minimal MapReduce Class (MMC).

These classes are defined in terms of disk usage, memory usage,communication cost, CPU cost, and number of map reduce rounds.

There is also the popular Parallel Random-Access Machine (PRAM)model, against which performance studies were run.


Map Reduce Class

Let S be the set of objects in the problem and let t be the number ofmachines in the system.

Fix a ε > 0, a MapReduce algorithm in MRC should have the followingproperties:

Each Machine Total

Disk: O(|S|1−ε) O(|S|2−2ε)Memory: O(|S|1−ε) O(|S|2−2ε)Communication: O(|S|1−1ε)/per round O(|S|2−2ε)CPU: O(

Tseqt )∗

Number of Rounds: O(1)

∗Tseq is the time to solve the same problem on a single sequential machine


Minimal Map Reduce Class

Let S be the set of objects in the problem and let t be the number ofmachines in the system.

Fix a ε > 0, a MapReduce algorithm in MRC should have the followingproperties:

Each Machine Total

Disk: O( |S|t ) O(|S|)Memory: O( |S|t ) O(|S|)Communication: O( |S|t )/per round O(|S|)CPU: O(poly(|S|))/per round

Number of Rounds: O(logi |S|), i ≥ 0


Parallel Random Access Machine

Parallel Random Access Machine (PRAM) is an algorithm for creating amodel of parallel computation. It is an extension of the RAM model ofsequential computation.

In this model, there are p processors connected to a single sharedmemory and each processor has a unique index 1 ≤ i ≤ p called theprocessor id. A single program is executed in single-instruction stream,multiple-data stream fashion. Meaning that each instruction is carried outby all processors simultaneously and requires unit time, regardless of thenumber of processors. Finally, each processor has a private flag thatcontrols whether it is active in the execution of an instruction. Inactiveprocessors do no participate in the execution of instructions, except forinstructions to reset the flag.

We will later compare the performance of this algorithm to MRC, MMC,and SGC.


MRC VS MMC

MRC defines the basic requirements for an algorithm to execute inMapReduce, whereas MMC requires several aspects to achieve optimalitysimultaneously in a MapReduce algorithm.

We will begin by analyzing the problems involved in MRC and MMC ingraph processing.


Defining a Graph

Let’s consider a graph G = (V,E), where V represents the set of vertices(nodes) and E represents the set of edges (arcs). Further, let n = |V | bethe number of nodes and m = |E| be the number of edges.

A graph can be either directed or undirected, cyclic or acyclic, connectedor unconnected.

We can represent a graph in either a

Adjacency Matrix

Adjacency List


Adjacency Matrix


Adjacency List


Scalable Graph Processing in MMC

For a graph G(V,E), a common graph operation is to exchange dataamong all adjacent nodes (nodes that share a common edge) in the graph.

The memory constraint in MMC requires that all edges/nodes aredistributed evenly among all machines in the system.

This can be formalized as: Let Ei,j be the set of edges (u, v) in G suchthat u is in machine i and v is in machine j.



The communication constraint in MMC can be formalized as follows:

max1≤i≤t

(∑

1≤j≤t,j 6=i|Ei,j |) ≤ O(

(n+m)

t)

where once again E(i, j) is the set of edges (u, v) ∈ G and u is in machini and v is in machine j.

In order to achieve this inequality, we must minimize the maximum, i.e.

min max1≤i≤t

(∑

1≤j≤t,j 6=i|Ei,j |).

However, this problem is actually NP -Hard, meaning that it is at leastas hard as the hardest problems in NP.



In addition to being NP -Hard, the optimal solution to

max1≤i≤t

(∑

1≤j≤t,j 6=i|Ei,j |) ≤ O(

(n+m)

t)

is successfully, computed, we can’t guarantee that the inequality≤ O( (n+m)

t ) since it might be as large as O(n+m).

Therefore, MMC is not a suitable class for scalable graph processing.


Scalable Graph Processing in MRC

MRC has few constraints than MMC as it simply defines the basicconditions that a MapReduce algorithm should satisfy. Thus a graphalgorithm in MapReduce is not an exception. Like MMC, however, we candefine a better class to handle Scalable Graph Processing

Given a graph G(V,E) with n nodes and m edges, assume that m ≥ n1+c,an MRC graph define a class based on MRC for graph processing inMapReduce, in which a MapReduce algorithm has the following properties:

Each Machine Total

Disk: O(n1+c2 ) O(m

1+c2 )

Memory: O(n1+c2 ) O(m

1+c2 )

Communication: O(n1+c2 )/per round O(m

1+c2 )

CPU: O(poly(m))/per round

Number of Rounds: O(1)


Scalable Graph Processing in MRC

This class has a good property in that the algorithm runs in constantrounds. However, the memory constraint can cause difficulty as it is largefor even a dense graph.

(Note: Dense graphs are generally easier to solve than sparse graphs.)

Furthermore, if the memory of each machine cannot hold O(n1+c2 ), then

the algorithm will always fail. Thus, the class is not scalable and can’thandle large n.


Scalable Graph Processing Class

We will now formulate a new algorithm class which counters thisdeficiency. First, we will weaken the bounds on the communication costper machine from O(m+n

t ) to O(mt , D(G, t)).

This is done to account for the fact that graphs, especially large graphs,can have a skewed degree distribution. This is seen in graphs such associal networks, which often have several nodes with a large number ofdegrees (subscribers, followers, etc.) as opposed to lower-level users withonly a few connections.


Skewed Degree Distribution



Suppose the nodes are uniformly distributed among all machines, denoteby Vi the set of nodes stored in machine i for 1 ≤ i ≤ t, and let dj be the

degree of node vj in the input graph, O(mt , D(G, t)) is defined as:

O(m

t,D(G, t)) =O( max

1≤i≤t(∑vj∈Vi

dj))

D(G, t) =t1

t2

∑vj∈V

d2j



This leads us to the following lemma, the proof of which has beenomitted.

Lemma

Lemma 3.1: Let xi(1 ≤ i ≤ q) be the communication cost upper boundfor machine i, i.e., xi =

∑vj∈Vi

dj , the expected value of xi, E(xi) = 2mt ,

and the variance of xi, V ar(xi) = D(G, t).

The important thing that we want to note here is that the variance ofthe degree distribution of G, denoted V ar(G) is(∑vj∈V

(dj − 2mn )2/n = (n

∑vj∈V

d2j − 4m2)/n2.

For fixed t, n, and m values, minimizing D(G, t) is equivalent tominimizing V ar(G). In other words, the variance of communication costfor each machine is minimized if all nodes in the graph have the samedegree.

This solves the problem experienced by the previous scalable graphprocessing algorithm by reducing communication costs.



Thus, we define the Scalable Graph Processing Class (SGC) as follows.

Each Machine Total

Disk: O(m+n2 ) O(m+ n)

Memory: O(1) O(t)

Communication: O(mt , D(G, t))∗/per round O(m+ n)

CPU: O(mt , D(G, t))∗/per round

Number of Rounds: O(log(n))


Comparison Between Classes

We examine the upper bounds of the three classes to see how therunning times of SGC compare.

MRC MMC SGC

Disk/machine O(n1+c2 ) O(n+mt ) O(n+mt )

Disk/total O(m1+ c2 ) O(n+m) O(n+m)

Memory/machine O(n1+c2 ) O(n+mt ) O(1)

Memory/total O(m1+ c2 ) O(n+m) O(t)

Communication/machine O(n1+c2 ) O(n+mt) O(mt , D(G, t))

Communication/total O(m1+ c2 ) O(n+m) O(n+m)

CPU/machine O(poly(m)) O(Tseqt ) O(mt , D(G, t))

CPU/total O(poly(m)) O(Tseq) O(n+m)Number of rounds O(1) O(1) O(log(n))


Comparison Between Classes

We see that even though SGC requires each machine to use constantmemory. Meaning, if the total memory of the system is smaller than theinput data, the algorithm can still be processed successfully. This is aneven stronger constraint than that defined in MMC.

Given the constraints on memory, communication, and CPU, it is nearlyimpossible for a wide range of graph algorithms to be processed inconstant rounds in MapReduce.

Thus, we relax the O(1) rounds defined in MMC to O(log(n)) rounds.

Since Ω(log(n)) is the processing time lower bound for a large number ofparallel graph algorithms in the parallel random-access machines, it ispractical for the MapReduce framework as evidenced by our experiments.


Big-O Complexity


Graph Operators in SGC

In addition to the normal set of graph operators, such as union,intersection, etc., we have introduced two graph operators in SGC, namely,NE join, and EN join, using which a large range of graph problems can bedesigned.



We assume that a graph G(V,E) is stored in a distributed file system asa node table V and an edge table E.

Each node in the table has a unique id and some other information suchas label and keywords.

Each edge in the table has id1, id2 defining the source and target nodeids of the edge, and some other information such as weight and label.

We use the node id to represent the node if it is obvious. G can beeither directed or undirected.

For an undirected graph, each edge is stored as two edges (id1, id2) and(id2, id1).



Before we go any further, let’s examine the natural join operation, ./,acting on two sets of data.

Here we see a graphical representation of Employee ./ Dept.


NE Join

An NE join aims to propagate the information on nodes into edges.

For each edge (vi, vj) ∈ E, an NE join outputs an edge (vi, vj , F (vi)) (or(vi, vj , F (vj))) where F (vi) (or F (vj)) is a set of functions operated on vi(or vj ) in the node table V .


NE Join

Given node table Vi, & edge table Ej , an NE join of Vi & Ej isrepresented in SQL as:

select id1, id2, f1(c1) as p1, f2(c2) as p2, · · ·from Vi as V NE join Ej as E on V.id = E.id′

where cond(c)count cond′(c′) as cnt

With the following definitions,c, c′, · · · a subset of fields in the two tables Vi and Ejc1, c2 a subset of fields in the two tables Vi and Ejfk a function operated on the fields ckcond a fucntion that retrusn true or false defined on the fields in c.cond′ a fucntion that retrusn true or false defined on the fields in c′.id′ can be either id1 or id2.count counts the number of trues in cond′(c′), assigns it to cnt.


EN Join

An EN join aims to aggregate the information on edges into nodes.

For each node vi ∈ V , an EN join outputs a node (vi, G(adj(vi))) whereadj(vi) = (vi, vj) ∈ E, and G is a set of decomposable aggregatefunctions on the edge set adj(vi).

A decomposable aggregate function gk is defined as decomposable iffor any dataset s, and any two subsets of s, s1 and s2, with s1 ∩ s2 = ∅and s1 ∪ s2 = s, gk(s) can be computed using gk(s1) and gk(s2).


EN Join

EN join can be defined in SQL form as

select id, g1(c1) as p1, g2(c2) as p2, · · ·from Vi as V EN join Ej as E on V.id = E.id′

where cond(c)group by idcount cond′(c′) as cnt

With the following definitions,c, c′, · · · a subset of fields in the two tables Vi and Ejc1, c2 a subset of fields in the two tables Vi and Ejid′ either id1 or id2count cond′(c′) as cntgk decomposable aggregate function operated on the fields in ck

by grouping the results using node id


Basic Graph Algorithms

The combination of NE join and EN join can solve a wide range of graphproblems in SGC.

In this section, we introduce some basic graph algorithms:

PageRank

Breadth First Search

Graph Keyword Search

We will use MRC, MMC, and SGC versions of these algorithms forperformance testing, which will be covered later.


Page Rank

PageRank is a key graph operation which computes the rank of eachnode based on the links (directed edges) among them.

Given a directed graph G(V,E), and a page x with inlinks t1, . . . , tn, thepage rank of x can be calculated iteratively as follows

PR(x) = α

(1

|V |

)+ (1− α)

n∑i=1

PR(ti)

C(ti)

with the following definitions

C(t) out-degree of tα probability of random jump|V | total number of nodes


Page Rank Algorithm

Graphical overview of the Page Rank algorithm.


Page Rank in MapReduce

Graphical overview of the Page Rank in MapReduce.



Breadth First Search (BFS) is a fundamental graph operation. Given anundirected graph G(V,E), and a source node s, a BFS computes for everynode v ∈ V the shortest distance (i.e., the minimum number of hops)from s to v in G.

Define: b is reachable from a if b is on adjacency list of aDistanceTo(s) =0For all nodes p reachable from s, DistanceTo(p)= 1For all nodes n reachable from some other set of nodes M ,DistanceTo(n)= 1 + min(DistanceTo(m), m ∈M)



Graphical overview of the Breadth First Search algorithm.


Graph Key Word Search

We now investigate a more complex algorithm, namely, keyword search inan undirected graph G(V,E). Suppose for each v ∈ V, t(v) is the textinformation included in v. Given a keyword query with

Q = k1, k2, . . . , kl set of l keywords(r, (p1, d(r, p1)), (p2, d(r, p2)), set of rooted trees. . . , (pl, d(r, pl)))r the root nodepi node that contains keyword ki in t(pi)d(r, pi) shortest distance from r to pi

in G for 1 ≤ i ≤ l

Each answer is uniquely determined by its root node r and rmax is themaximum distance allowed from s to a keyword node in an answer, i.e.,d(r, pi) ≤ rmax for 1 ≤ i ≤ l.


Connected Component

Given an undirected graph G(V,E) with n nodes and m edges, aConnected Component (CC) is a maximal set of nodes that can reacheach other through paths in G.

Computing all CCs of G is a fundamental graph problem and can besolved efficiently on a sequential machine using O(n+m) time. However,it is non-trivial to solve the problem in MapReduce.


Existing Algorithms

We present three algorithms for Connected Components computation inMapReduce to compare the success of CC in SGC.

HashToMin

HashGToMin

PRAM-Simulation


HashToMin

HashToMin and HashGToMin are two MapReduce algorithms with asimilar idea to use the smallest node in each CC as the representative ofthe CC, assuming that there is a total order among all nodes in G.

The HashToMin algorithm finishes in O(log(n)) rounds, withO(log(n)(m+ n)) total communication cost in each round.

The algorithm can be optimized to use O(1) memory on each machineusing secondary sort in MapReduce.


HashGToMin

The HashGToMin algorithm finishes in O(log(n)).

Meaning, it is expected to finish in O(log(n))) rounds, with O(m+ n)total communication cost in each round.

However, it needs O(n) memory for a single machine to hold a wholeCC in memory.

Thus, HashGToMin is not suitable to handle a graph with large n.


PRAM Simulation

PRAM-Simulation is to simulate the algorithm in the Parallel RandomAccess Machine (PRAM) model in MapReduce using simulation. ThePRAM model allows multiple processors to compute in parallel using ashared memory.

A theoretical result shows that an CREW PRAM algorithm in O(t) timecan be simulated in MapReduce in O(t) rounds. For the CC computationproblem, in the literature, the best result in computes CCs in O(log(n))time.

However, it needs to compute the 2-hop node pairs which requires O(n2)communication cost in the worst case in each round. Thus, the simulationalgorithm is impractical.


Connected Component in SGC

We introduce our algorithm to compute CCs in SGC. Conceptually, thealgorithm shares similar ideas with most deterministic O(log(n)) PRAMalgorithms, but it is non-trivial.

Our algorithm maintains a forest using a parent pointer p(v) for eachv ∈ V . Each rooted tree in the forest represents a partial CC.

A singleton is a tree with one node, and a star is a tree of height 1.

A tree is an isolated tree if there are no edges in E that connect the treeto another tree.

The forest is iteratively updated using two operations: hooking andpointer jumping. Hooking merges several trees into a larger tree, andpointer jumping changes the parent of each node to its grandparent ineach tree.

When the algorithm ends, each tree becomes an isolated star thatrepresents a CC in the graph.


Comparison

We can now compare the running times of these algorithms. We omitPRAM since it was impractical.

Note that the CC algorithm in SGC class has the best bounds in eachcategory. This indicates the significant improvement that SGC representsfor scalable big graph processing.


Minimum Spanning Forest

Given a weighted undirected graph G(V,E) of n nodes and m edges,with each edge (u, v) ∈ E assigned a weight w((u, v)), a MinimumSpanning Forest (MSF) is a spanning forest of G with the minimum totaledge weight.

We also use (u, v, w((u, v))) to denote an edge.

Although MSF can be efficiently computed on a sequential machineusing O(m+ nlog(n)) time, it is non-trivial to solve the algorithm inMapReduce.


Minimum Spanning Forest

The following is an example of a Minimum Spanning Tree. A forest ismade up of many trees.


MSF Algorithm in SGC

Suppose there is a total order among all edges as follows. For any twoedges e1 = (u1, v1, w1) and e2 = (u2, v2, w2), e1 < e2 iff one of thefollowing conditions holds:

1 w1 < w2

2 w1 = w2 and min(u1, v1) < min(u2, v2)

3 w1 = w2 and min(u1, v1) = min(u2, v2), andmax(u1, v1) < max(u2, v2)


MSF Comparisons

The comparison of two existing algorithms OneRoundMSF,MultiRoundMSF, and our algorithm MSF is shown below in terms ofmemory consumption per machine, total communication cost per round,and the number of rounds.

As we will show in our performance testing, the high memory requirementof OneRoundMSF and MultiRoundMSF becomes the bottleneck for thealgorithms to achieve high scalability when handling graphs with large n.


Performance Testing

We tested the performance of the aforementioned algorithms on a clusterof 17 computing nodes, including one master node and 16 slave nodesrunning, each of which has four Intel Xeon 2.4GHz CPUs and 15GB RAMrunning 64-bit Ubuntu Linux.

We implement all algorithms using Hadoop (version 1.2.1) with Java 1.6.

We allow each node to run three mappers and three reducersconcurrently


Data Sets

We use two web-scale graphs Twitter-2010 and Friendster with differentgraph characteristics for testing.

Twitter-2010 contains 41,652,230 nodes and 1,468,365,182 edgeswith an average degree of 71. The maximum degree is 3,081,112 andthe diameter of Twitter-2010 is around 24.

Friendster contains 65,608,366 nodes and 1,806,067,135 edges withan average degree of 55. The maximum degree is 5,214 and thediameter of Friendster is around 32.


Algorithms

Besides the five algorithms PageRank (Algorithm 1), BFS (Algorithm 2),KWS (Algorithm 3), CC (Algorithm 4), and MSF (Algorithm 5), we alsoimplement the algorithms for PageRank, BFS, and graph keyword searchusing the join operations supported by Pig on Hadoop, denotedPageRank-Pig, BFS-Pig and KWS-Pig respectively.


PageRank Algorithm


BFS Algorithm


CC Algorithm


MSF Algorithm


Conclusions

In this paper, we studied scalable big graph processing in MapReduce.

We reviewed previous MapReduce classes, and propose a new class SGCto guide the development of scalable graph processing algorithms inMapReduce. We introduce two graph join operators using which a largerange of graph algorithms can be designed in SGC.

Especially, for two fundamental graph algorithms CC computation andMSF computation, we improve the state-of-the-art algorithms both intheory and practice. We conducted extensive performance studies usingreal web-scale graphs to show the high scalability achieved for ouralgorithms in SGC.


Scalable Big Graph Processing in Map Reduce · Scalable Big Graph Processing in Map Reduce Lu Qin,...

Documents

Transcript of Scalable Big Graph Processing in Map Reduce · Scalable Big Graph Processing in Map Reduce Lu Qin,...