Joey gonzalez, graph lab, m lconf 2013

44
Joseph Gonzalez Co-Founder, GraphLab Inc. [email protected] Postdoc, UC Berkeley AMPLab [email protected] Machine Learning on Graphs

description

Joseph Gonzalez, Co-Founder at GraphLab: Large-Scale Machine Learning on Graphs

Transcript of Joey gonzalez, graph lab, m lconf 2013

Page 1: Joey gonzalez, graph lab, m lconf 2013

Joseph Gonzalez Co-Founder, GraphLab Inc.

[email protected] Postdoc, UC Berkeley AMPLab

[email protected]

Machine Learning on Graphs

6. Before

8. After

7. After

Page 2: Joey gonzalez, graph lab, m lconf 2013

2!

Big Graphs Data

More Signal

More Noise

Page 3: Joey gonzalez, graph lab, m lconf 2013

Social Media

Graphs encode relationships between:

Big: billions of vertices and edges & rich metadata Facebook  (10/2012):  1B  users,  144B  friendships    Twi>er  (2011):  15B  follower  edges  

Advertising Science Web

People Facts

Products Interests

Ideas

3

Page 4: Joey gonzalez, graph lab, m lconf 2013

Graphs are Essential to "Data Mining and Machine Learning Identify influential people and information Find communities

Understand people’s shared interests Model complex data dependencies

Page 5: Joey gonzalez, graph lab, m lconf 2013

Liberal Conservative

Post

Post

Post

Post

Post

Post

Post

Post

Predicting User Behavior

Post

Post

Post

Post

Post

Post

Post  

Post  

Post  

Post  

Post  

Post  

Post  

Post  

?  ?  

?  

?  

?  ?  

?  

?   ?  ?  

?  

?  

?  ?  

?  ?  

?  

?  

?  

?  

?  

?  

?  

?  

?  

?  

?  

?   ?  

?  

5  

Conditional Random Field!Belief Propagation!

Page 6: Joey gonzalez, graph lab, m lconf 2013

Count triangles passing through each vertex: "

Measures “cohesiveness” of local community

More Triangles Stronger Community

Fewer Triangles Weaker Community

1 2 3

4

Finding Communities

Page 7: Joey gonzalez, graph lab, m lconf 2013

Ratings Items

Recommending Products Users

Page 8: Joey gonzalez, graph lab, m lconf 2013

Low-Rank Matrix Factorization:

8

r13

r14

r24

r25

f(1)

f(2)

f(3)

f(4)

f(5) User

Fac

tors

(U)

Movie Factors (M

) Us

ers Movies

Netflix Us

ers

≈ x

Movies

f(i)

f(j)

Iterate:

f [i] = arg minw2Rd

X

j2Nbrs(i)

�rij � wT f [j]

�2+ �||w||22

Recommending Products

Page 9: Joey gonzalez, graph lab, m lconf 2013

9  

Identifying Leaders

Page 10: Joey gonzalez, graph lab, m lconf 2013

Everyone starts with equal ranks Update ranks in parallel Iterate until convergence

Rank of user i Weighted sum of

neighbors’ ranks

10  

R[i] = 0.15 +X

j2Nbrs(i)

wjiR[j]

Identifying Leaders

Page 11: Joey gonzalez, graph lab, m lconf 2013

Graph-Parallel Algorithms

11  

Model / Alg. State

Computation depends only on the neighbors

Page 12: Joey gonzalez, graph lab, m lconf 2013

Many More Graph Algorithms •  Collaborative Filtering!

–  Alternating Least Squares!–  Stochastic Gradient Descent!–  Tensor Factorization!–  SVD!

•  Structured Prediction!–  Loopy Belief Propagation!–  Max-Product Linear

Programs!–  Gibbs Sampling!

•  Semi-supervised ML!– Graph SSL !– CoEM!

•  Graph Analytics!–  PageRank!–  Shortest Path!–  Triangle-Counting!–  Graph Coloring!–  K-core Decomposition!–  Personalized PageRank!

•  Classification!– Neural Networks!–  Lasso!…!

12

Page 13: Joey gonzalez, graph lab, m lconf 2013

How should we program"graph-parallel algorithms?

13

Page 14: Joey gonzalez, graph lab, m lconf 2013

Dependency Graph

Table

Structure of Computation

14

Result

Data-Parallel Graph-Parallel

Row

Row

Row

Row

6. Before

8. After

7. After

Page 15: Joey gonzalez, graph lab, m lconf 2013

How should we program"graph-parallel algorithms?

“Think like a Vertex.” - Pregel [SIGMOD’10]

15

Page 16: Joey gonzalez, graph lab, m lconf 2013

The Graph-Parallel Abstraction A user-defined Vertex-Program runs on each vertex Graph constrains interaction along edges

Using messages (e.g. Pregel [PODC’09, SIGMOD’10]) Through shared state (e.g., GraphLab [UAI’10, VLDB’12])

Parallelism: run multiple vertex programs simultaneously 16

Page 17: Joey gonzalez, graph lab, m lconf 2013

The GraphLab Vertex Program Vertex Programs directly access adjacent vertices and edges

GraphLab_PageRank(i)        //  Compute  sum  over  neighbors      total  =  0      foreach  (j  in  neighbors(i)):            total  =  total  +  R[j]  *  wji        //  Update  the  PageRank      R[i]  =  0.15  +  total          //  Trigger  neighbors  to  run  again      if  R[i]  not  converged  then        signal  nbrsOf(i)  to  be  recomputed  

17  

R[4]  *  w41  

+  +  

4 1

3 2

Signaled vertices are recomputed eventually.

Page 18: Joey gonzalez, graph lab, m lconf 2013

Convergence of Dynamic PageRank

1  100  

10000  1000000  

100000000  

0   10   20   30   40   50   60   70  

Num

-­‐Ver1ces  

Number  of  Updates  

51%  updated  only  once!  

Be>er  

18  

Page 19: Joey gonzalez, graph lab, m lconf 2013

Adaptive Belief Propagation

Noisy “Sunset” Image

Cumulative Vertex Updates

Many Updates

Few Updates

Algorithm identifies and focuses on hidden sequential structure

Graphical Model

Challenge = Boundaries

Splash  

Page 20: Joey gonzalez, graph lab, m lconf 2013

BeDer  for  Machine  Learning  

Graph-­‐parallel  Abstrac(ons  

20  

Shared  State  

6. Before

8. After

7. After

i

Dynamic  Asynchronous  

 Messaging  

i

Synchronous  

Page 21: Joey gonzalez, graph lab, m lconf 2013

21  

Natural GraphsGraphs derived from natural

phenomena

Page 22: Joey gonzalez, graph lab, m lconf 2013

Properties of Natural Graphs

22

Power-Law Degree Distribution

Regular Mesh Natural Graph

Page 23: Joey gonzalez, graph lab, m lconf 2013

Power-Law Degree Distribution

23

“Star Like” Motif

President Obama Followers

Page 24: Joey gonzalez, graph lab, m lconf 2013

Challenges  of  High-­‐Degree  VerMces  

Touches  a  large  fracMon  of  graph  

SequenMally  process  edges  

24  

CPU 1 CPU 2

Provably  Difficult  to  ParMMon  

Page 25: Joey gonzalez, graph lab, m lconf 2013

Machine  1   Machine  2  

Random  ParMMoning  

•  GraphLab  resorts  to  random  (hashed)  parMMoning  on  natural  graphs  

3"2"

1"

D

A"

C"

B" 2"3"

C"

D

B"A"

1"

D

A"

C"C"

B"

(a) Edge-Cut

B"A" 1"

C" D3"

C" B"2"

C" D

B"A" 1"

3"

(b) Vertex-Cut

Figure 4: (a) An edge-cut and (b) vertex-cut of a graph intothree parts. Shaded vertices are ghosts and mirrors respectively.

5 Distributed Graph Placement

The PowerGraph abstraction relies on the distributed data-graph to store the computation state and encode the in-teraction between vertex programs. The placement ofthe data-graph structure and data plays a central role inminimizing communication and ensuring work balance.

A common approach to placing a graph on a cluster of pmachines is to construct a balanced p-way edge-cut (e.g.,Fig. 4a) in which vertices are evenly assigned to machinesand the number of edges spanning machines is minimized.Unfortunately, the tools [21, 31] for constructing balancededge-cuts perform poorly [1, 26, 23] or even fail on power-law graphs. When the graph is difficult to partition, bothGraphLab and Pregel resort to hashed (random) vertexplacement. While fast and easy to implement, hashedvertex placement cuts most of the edges:

Theorem 5.1. If vertices are randomly assigned to pmachines then the expected fraction of edges cut is:

E|Edges Cut|

|E|

�= 1� 1

p(5.1)

For example if just two machines are used, half of theof edges will be cut requiring order |E|/2 communication.

5.1 Balanced p-way Vertex-CutThe PowerGraph abstraction enables a single vertex pro-gram to span multiple machines. Hence, we can ensurework balance by evenly assigning edges to machines.Communication is minimized by limiting the number ofmachines a single vertex spans. A balanced p-way vertex-cut formalizes this objective by assigning each edge e2 Eto a machine A(e) 2 {1, . . . , p}. Each vertex then spansthe set of machines A(v)✓ {1, . . . , p} that contain its ad-jacent edges. We define the balanced vertex-cut objective:

minA

1|V | Â

v2V|A(v)| (5.2)

s.t. maxm

|{e 2 E | A(e) = m}|< l |E|p

(5.3)

where the imbalance factor l � 1 is a small constant. Weuse the term replicas of a vertex v to denote the |A(v)|copies of the vertex v: each machine in A(v) has a replicaof v. The objective term (Eq. 5.2) therefore minimizes the

average number of replicas in the graph and as a conse-quence the total storage and communication requirementsof the PowerGraph engine.

Vertex-cuts address many of the major issues associatedwith edge-cuts in power-law graphs. Percolation theory[3] suggests that power-law graphs have good vertex-cuts.Intuitively, by cutting a small fraction of the very highdegree vertices we can quickly shatter a graph. Further-more, because the balance constraint (Eq. 5.3) ensuresthat edges are uniformly distributed over machines, wenaturally achieve improved work balance even in the pres-ence of very high-degree vertices.

The simplest method to construct a vertex cut is torandomly assign edges to machines. Random (hashed)edge placement is fully data-parallel, achieves nearly per-fect balance on large graphs, and can be applied in thestreaming setting. In the following we relate the expectednormalized replication factor (Eq. 5.2) to the number ofmachines and the power-law constant a .

Theorem 5.2 (Randomized Vertex Cuts). Let D[v] denotethe degree of vertex v. A uniform random edge placementon p machines has an expected replication factor

E"

1|V | Â

v2V|A(v)|

#=

p|V | Â

v2V

1�✓

1� 1p

◆D[v]!. (5.4)

For a graph with power-law constant a we obtain:

E"

1|V | Â

v2V|A(v)|

#= p� pLia

✓p�1

p

◆/z (a) (5.5)

where Lia (x) is the transcendental polylog function andz (a) is the Riemann Zeta function (plotted in Fig. 5a).

Higher a values imply a lower replication factor, con-firming our earlier intuition. In contrast to a random 2-way edge-cut which requires order |E|/2 communicationa random 2-way vertex-cut on an a = 2 power-law graphrequires only order 0.3 |V | communication, a substantialsavings on natural graphs where E can be an order ofmagnitude larger than V (see Tab. 1a).

5.2 Greedy Vertex-CutsWe can improve upon the randomly constructed vertex-cut by de-randomizing the edge-placement process. Theresulting algorithm is a sequential greedy heuristic whichplaces the next edge on the machine that minimizes theconditional expected replication factor. To construct thede-randomization we consider the task of placing the i+1edge after having placed the previous i edges. Using theconditional expectation we define the objective:

argmink

E"

Âv2V

|A(v)|

����� Ai,A(ei+1) = k

#(5.6)

6

10  Machines  !  90%  of  edges  cut  100  Machines  !  99%  of  edges  cut!  

25  

Page 26: Joey gonzalez, graph lab, m lconf 2013

Machine 1 Machine 2

•  Split  High-­‐Degree  verMces  •  New  Abstrac1on  !  Equivalence  on  Split  Ver(ces  

26  

Program  For  This  

Run  on  This  

Page 27: Joey gonzalez, graph lab, m lconf 2013

Gather  Informa1on  About  Neighborhood  

Update  Vertex  

Signal  Neighbors  &  Modify  Edge  Data  

A Common Pattern inVertex Programs

GraphLab_PageRank(i)        //  Compute  sum  over  neighbors      total  =  0      foreach(  j  in  neighbors(i)):            total  =  total  +  R[j]  *  wji        //  Update  the  PageRank      R[i]  =  total          //  Trigger  neighbors  to  run  again      priority  =  |R[i]  –  oldR[i]|      if  R[i]  not  converged  then          signal  neighbors(i)  with  priority    

27  

Page 28: Joey gonzalez, graph lab, m lconf 2013

Machine  2  Machine  1  

Machine  4  Machine  3  

GAS Decomposition

Σ1   Σ2  

Σ3   Σ4  

+                        +                        +      

Y  Y  Y  Y  

Y’  

Σ  

Y’  Y’  Y’  Gather  

Apply  

Sca>er  

28  

Master  

Mirror  

Mirror  Mirror  

Page 29: Joey gonzalez, graph lab, m lconf 2013

Minimizing Communication in PowerGraph

Y Y Y

A vertex-cut minimizes "machines each vertex spans

Percolation theory suggests that power law graphs have good vertex cuts. [Albert et al. 2000]

Communication is linear in "the number of machines "

each vertex spans

29

Page 30: Joey gonzalez, graph lab, m lconf 2013

EC2  HPC  Nodes  

MPI/TCP-­‐IP   PThreads   HDFS  

GraphLab2  System  

Graph    AnalyMcs  

Graphical  Models  

Computer  Vision   Clustering   Topic  

Modeling  CollaboraMve  

Filtering  

Machine Learning and Data-Mining Toolkits

Apache 2 License http://graphlab.org

Page 31: Joey gonzalez, graph lab, m lconf 2013

PageRank on Twitter Follower Graph Natural Graph with 40M Users, 1.4 Billion Links

Hadoop results from [Kang et al. '11] Twister (in-memory MapReduce) [Ekanayake et al. ‘10]

31

0   50   100   150   200  

Hadoop  

GraphLab  

Twister  

Piccolo  

PowerGraph  

Run1me  Per  Itera1on  

Order of magnitude by exploiting

properties of Natural Graphs

Page 32: Joey gonzalez, graph lab, m lconf 2013

GraphLab2 is Scalable Yahoo Altavista Web Graph (2002):

One of the largest publicly available web graphs

1.4 Billion Webpages, 6.6 Billion Links

1024 Cores (2048 HT) 64 HPC Nodes

7 Seconds per Iter. 1B links processed per second

30 lines of user code

32

Page 33: Joey gonzalez, graph lab, m lconf 2013

Topic Modeling English language Wikipedia

–  2.6M Documents, 8.3M Words, 500M Tokens –  Computationally intensive algorithm

33  

0   20   40   60   80   100   120   140   160  

Smola  et  al.  

PowerGraph  

Million  Tokens  Per  Second  

100 Yahoo! Machines Specifically engineered for this task

64 cc2.8xlarge EC2 Nodes 200 lines of code & 4 human hours

Page 34: Joey gonzalez, graph lab, m lconf 2013

Counted: 34.8 Billion Triangles

34  

Triangle Counting on Twitter

64 Machines 15 Seconds

1536 Machines 423 Minutes

Hadoop���[WWW’11]

S.  Suri  and  S.  Vassilvitskii,  “CounMng  triangles  and  the  curse  of  the  last  reducer,”  WWW’11  

1000 x Faster

40M Users, 1.4 Billion Links

Page 35: Joey gonzalez, graph lab, m lconf 2013

Orders of magnitude improvements over existing systems

New ways execute graph algorithms

Machine 1 Machine 2

New ways to represent real-world graphs

6. Before

8. After

7. After

By exploiting common patterns in graph data and computation:

Page 36: Joey gonzalez, graph lab, m lconf 2013

6. Before

8. After

7. After

Possibility

Scalability

Usability

Page 37: Joey gonzalez, graph lab, m lconf 2013

Exciting Time to Work in ML

With ML, I will" cure cancer!!!

With ML I will "find true love.

Why won’t "ML read"

my mind???

L Building scalable learning system requires experts …

J Unique opportunities to change the world!!

Page 38: Joey gonzalez, graph lab, m lconf 2013

ML key to any new service we want to build

But…

Even  basics  of  scalable  ML  can  be  challenging  

>6  months  from  prototype  to  producMon  

State-­‐of-­‐art  ML  algorithms  trapped  in  research  papers  

Goal of GraphLab 3: Make large-scale machine learning accessible to all! J

Page 39: Joey gonzalez, graph lab, m lconf 2013

EC2  HPC  Nodes  

MPI/TCP-­‐IP   PThreads   HDFS  

GraphLab2  System  

Graph    AnalyMcs  

Graphical  Models  

Computer  Vision   Clustering   Topic  

Modeling  CollaboraMve  

Filtering  

Adding a Python Layer

Python  API  

Page 40: Joey gonzalez, graph lab, m lconf 2013

Learning ML with GraphLab Notebook

https://beta.graphlab.com/examples!

Page 41: Joey gonzalez, graph lab, m lconf 2013

Prototype to Productionwith Python GraphLab:

Easily install & prototype locally

Deploy to the cluster in one step

Page 42: Joey gonzalez, graph lab, m lconf 2013

Learn: GraphLab Notebook

Prototype: pip install graphlab

è local prototyping

Production: Same code scales

to EC2 cluster

Page 43: Joey gonzalez, graph lab, m lconf 2013

GraphLab Toolkits

Highly scalable, state-of-the-art machine learning straight from python

Graph Analytics

GraphicalModels

ComputerVision Clustering Topic

Modeling Collaborative

Filtering

Page 44: Joey gonzalez, graph lab, m lconf 2013

Joseph Gonzalez Co-Founder, GraphLab Inc.

[email protected]

[email protected]

NIPS Workshop on Big Learning: biglearn.org"Lake Tahoe, December 9th

Machine Learning on Graphs

6. Before

8. After

7. After