Large Graph Algorithms

31
CMU SCS C. Faloutsos (CMU) #1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman Chau, Polo Kang, U OpenCirrus'10

description

Large Graph Algorithms. Christos Faloutsos CMU. Akoglu, Leman Chau, Polo Kang, U. McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis. Graphs - why should we care?. Internet Map [lumeta.com]. Food Web [Martinez ’91]. Protein Interactions [genomebiology.com]. - PowerPoint PPT Presentation

Transcript of Large Graph Algorithms

Page 1: Large Graph Algorithms

CMU SCS

C. Faloutsos (CMU) #1

Large Graph Algorithms

Christos Faloutsos

CMU

McGlohon, MaryPrakash, AdityaTong, HanghangTsourakakis, Babis

Akoglu, LemanChau, PoloKang, U

OpenCirrus'10

Page 2: Large Graph Algorithms

CMU SCS

ICDM-LDMTA 2009 C. Faloutsos 2

Graphs - why should we care?

Internet Map [lumeta.com]

Food Web [Martinez ’91]

Protein Interactions [genomebiology.com]

Friendship Network [Moody ’01]

Page 3: Large Graph Algorithms

CMU SCS

ICDM-LDMTA 2009 C. Faloutsos 3

Graphs - why should we care?

• IR: bi-partite graphs (doc-terms)

• web: hyper-text graph

• Social networking sites (Facebook, twitter)

• Users posing and answering questions

• Click-streams (user – page bipartite graph)

• ... and more – any M:N db relationship

D1

DN

T1

TM

... ...

Page 4: Large Graph Algorithms

CMU SCS

C. Faloutsos (CMU) 4

Our goal:

One-stop solution for mining huge graphs:

PEGASUS project (PEta GrAph mining System)

• www.cs.cmu.edu/~pegasus

• Open-source code and papers

OpenCirrus'10

Page 5: Large Graph Algorithms

CMU SCS

C. Faloutsos (CMU) 5

Centralized Hadoop/PEGASUS

Degree Distr. old old

Pagerank old old

Diameter/ANF old DONE

Conn. Comp old DONE

Triangles DONE

Visualization STARTED

Outline – Algorithms & results

OpenCirrus'10

Page 6: Large Graph Algorithms

CMU SCS

HADI for diameter estimation

• Radius Plots for Mining Tera-byte Scale Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’10

• Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1B)

• Our HADI: linear on E (~10B)– Near-linear scalability wrt # machines– Several optimizations -> 5x faster

C. Faloutsos (CMU) 6OpenCirrus'10

Page 7: Large Graph Algorithms

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges)• Largest publicly available graph ever studied.

????????

??

19+? [Barabasi+]

7C. Faloutsos (CMU)OpenCirrus'10

Radius

Count

Page 8: Large Graph Algorithms

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges)• effective diameter: surprisingly small.• Multi-modality: probably mixture of cores .

8C. Faloutsos (CMU)OpenCirrus'10

Page 9: Large Graph Algorithms

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges)• effective diameter: surprisingly small.• Multi-modality: probably mixture of cores .

9C. Faloutsos (CMU)OpenCirrus'10

Page 10: Large Graph Algorithms

CMU SCS

Radius Plot of GCC of YahooWeb.

10C. Faloutsos (CMU)OpenCirrus'10

Page 11: Large Graph Algorithms

CMU SCS

Running time - Kronecker and Erdos-Renyi Graphs with billions edges.

#11C. Faloutsos (CMU)OpenCirrus'10

Page 12: Large Graph Algorithms

CMU SCS

C. Faloutsos (CMU) 12

Centralized Hadoop/PEGASUS

Degree Distr. old old

Pagerank old old

Diameter/ANF old DONE

Conn. Comp old DONE

Triangles DONE

Visualization STARTED

Outline – Algorithms & results

OpenCirrus'10

Page 13: Large Graph Algorithms

CMU SCS

Generalized Iterated Matrix Vector Multiplication (GIMV)

OpenCirrus'10 C. Faloutsos (CMU) 13

PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up).

Page 14: Large Graph Algorithms

CMU SCS

Generalized Iterated Matrix Vector Multiplication (GIMV)

OpenCirrus'10 C. Faloutsos (CMU) 14

• PageRank• proximity (RWR)• Diameter• Connected components• (eigenvectors, • Belief Prop. • … )

Matrix – vectorMultiplication(iterated)

Page 15: Large Graph Algorithms

CMU SCS

15

Example: GIM-V At Work

• Connected Components

Size

Count

C. Faloutsos (CMU)OpenCirrus'10

Page 16: Large Graph Algorithms

CMU SCS

16

Example: GIM-V At Work

• Connected Components

Size

Count300-size cmptX 500.Why?

1100-size cmptX 65.Why?

C. Faloutsos (CMU)OpenCirrus'10

Page 17: Large Graph Algorithms

CMU SCS

17

Example: GIM-V At Work

• Connected Components

Size

Count

suspiciousfinancial-advice sites(not existing now)

C. Faloutsos (CMU)OpenCirrus'10

Page 18: Large Graph Algorithms

CMU SCS

C. Faloutsos (CMU) 18

Centralized Hadoop/PEGASUS

Degree Distr. old old

Pagerank old old

Diameter/ANF old DONE

Conn. Comp old DONE

Triangles DONE

Visualization STARTED

Outline – Algorithms & results

OpenCirrus'10

Page 19: Large Graph Algorithms

CMU SCS

ASONAM 2009 C. Faloutsos 19

Triangles

• Real social networks have a lot of triangles

Page 20: Large Graph Algorithms

CMU SCS

ASONAM 2009 C. Faloutsos 20

Triangles

• Real social networks have a lot of triangles– Friends of friends are friends

• Q1: how to compute quickly?

• Q2: Any patterns?

Page 21: Large Graph Algorithms

CMU SCS

ASONAM 2009 C. Faloutsos 21

Triangles : Computations [Tsourakakis ICDM 2008]

Q: Can we do that quickly?

Triangles are expensive to compute(3-way join; several approx. algos)

Page 22: Large Graph Algorithms

CMU SCS

ASONAM 2009 C. Faloutsos 22

Triangles : Computations [Tsourakakis ICDM 2008]

But: triangles are expensive to compute(3-way join; several approx. algos)

Q: Can we do that quickly?A: Yes!

#triangles = 1/6 Sum ( i3 )

(and, because of skewness, we only need the top few eigenvalues!

Page 23: Large Graph Algorithms

CMU SCS

ASONAM 2009 C. Faloutsos 23

Triangles : Computations [Tsourakakis ICDM 2008]

1000x+ speed-up, high accuracy

Page 24: Large Graph Algorithms

CMU SCS

C. Faloutsos (CMU) 24

Triangles• Easy to implement on hadoop: it only needs

eigenvalues (working on it, using Lanczos)

OpenCirrus'10

Page 25: Large Graph Algorithms

CMU SCS

ASONAM 2009 C. Faloutsos 25

Triangles

• Real social networks have a lot of triangles– Friends of friends are friends

• Q1: how to compute quickly?

• Q2: Any patterns?

Page 26: Large Graph Algorithms

CMU SCS

ASONAM 2009 C. Faloutsos 26

Triangle Law: #1 [Tsourakakis ICDM 2008]

ASNHEP-TH

Epinions X-axis: # of Triangles a node participates inY-axis: count of such nodes

Page 27: Large Graph Algorithms

CMU SCS

ASONAM 2009 C. Faloutsos 27

Triangle Law: #2 [Tsourakakis ICDM 2008]

SNReuters

EpinionsX-axis: degreeY-axis: mean # trianglesNotice: slope ~ degree exponent (insets)

Page 28: Large Graph Algorithms

CMU SCS

C. Faloutsos (CMU) 28

Centralized Hadoop/PEGASUS

Degree Distr. old old

Pagerank old old

Diameter/ANF old DONE

Conn. Comp old DONE

Triangles DONE

Visualization STARTED

Outline – Algorithms & results

OpenCirrus'10

Page 29: Large Graph Algorithms

CMU SCS

Visualization: ShiftR

• Supporting Ad Hoc Sensemaking: Integrating Cognitive, HCI, and Data Mining ApproachesAniket Kittur, Duen Horng (‘Polo’) Chau, Christos Faloutsos, Jason I. HongSensemaking Workshop at CHI 2009, April 4-5. Boston, MA, USA.

OpenCirrus'10 C. Faloutsos (CMU) 29

Page 30: Large Graph Algorithms

CMU SCS

Page 31: Large Graph Algorithms

CMU SCS

C. Faloutsos (CMU) 31

Conclusions

One-stop shopping for large graph mining:

• www.cs.cmu.edu/~pegasus

OpenCirrus'10

Akoglu, Leman

Chau, PoloKang, U

McGlohon, MaryTsourakakis, Babis

THANKS: NSF, Yahoo (M45), LLNL