Mining Billion-Node Graphs: Patterns and influence...

114
CMU SCS Mining Billion-Node Graphs: Patterns and influence propagation Christos Faloutsos CMU

Transcript of Mining Billion-Node Graphs: Patterns and influence...

Page 1: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Mining Billion-Node Graphs: Patterns and influence

propagation Christos Faloutsos

CMU

Page 2: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Thank you! •  Sinan Aral •  Foster Provost •  Arun Sundararajan

•  Shirley Lau

WIN'11 C. Faloutsos (CMU) 2

Page 3: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 3

Our goal:

Open source system for mining huge graphs:

PEGASUS project (PEta GrAph mining System)

•  www.cs.cmu.edu/~pegasus •  code and papers

WIN'11

Page 4: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 4

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools •  Conclusions

WIN'11

Page 5: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 5

Graphs - why should we care?

Internet Map [lumeta.com]

Food Web [Martinez ’91]

Recommendation

systems

WIN'11

Page 6: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 6

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs

–  overview – Triangles – Diameter –  `Eigenspokes’ – Phonecall duration

•  Problem#2: Tools •  Conclusions WIN'11

Page 7: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 7

Problem #1 - network and graph mining

•  What does the Internet look like? •  What does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

WIN'11

Page 8: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 8

Problem #1 - network and graph mining

•  What does the Internet look like? •  What does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

–  BIG DATA helps: finds patterns that would be ‘invisible’

WIN'11

Page 9: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Typical viewpoint •  ‘Signal’ and •  noise

WIN'11 C. Faloutsos (CMU) 9

Page 10: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

MORE REALISTIC viewpoint •  ‘Signal’ and •  Weaker signal

WIN'11 C. Faloutsos (CMU) 10

Page 11: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

MORE REALISTIC viewpoint •  ‘Signal’ and •  Weaker signal and •  Even weaker signal •  …

BIG DATA helps

WIN'11 C. Faloutsos (CMU) 11

Page 12: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

MORE REALISTIC viewpoint •  ‘Signal’ and •  Weaker signal and •  Even weaker signal •  …

BIG DATA helps (and sampling may

hurt) WIN'11 C. Faloutsos (CMU) 12

Page 13: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 13

Graph mining •  Are real graphs random?

WIN'11

Page 14: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 14

Laws and patterns •  Are real graphs random? •  A: NO!!

– Diameter (small, and decreasing!) –  in- and out- degree distributions (skewed/PL) –  # triangles (skewed) –  other (surprising) patterns

•  So, let’s look at the data

WIN'11

Page 15: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 15

Solution# S.1 •  Power law in the degree distribution

[SIGCOMM99]

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

WIN'11

Page 16: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 16

Solution# S.2: Eigen Exponent E

•  A2: power law in the eigenvalues of the adjacency matrix

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

WIN'11

Page 17: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Real Graph Patterns unweighted weighted static

P01. Power-law degree distribution [Faloutsos et. al.`99, Kleinberg et. al.`99, Chakrabarti et. al. `04, Newman`04] P02. Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos et. al. `03] P04. Community structure [Flake et. al.`02, Girvan and Newman `02] P05. Clique Power Laws [Du et. al. ‘09]

P12. Snapshot Power Law [McGlohon et. al. `08]

dynamic

P06. Densification Power Law [Leskovec et. al.`05] P07. Small and shrinking diameter [Albert and Barabási `99, Leskovec et. al. ‘05, McGlohon et. al. ‘08] P08. Gelling point [McGlohon et. al. `08] P09. Constant size 2nd and 3rd connected components [McGlohon et. al. `08] P10. Principal Eigenvalue Power Law [Akoglu et. al. `08] P11. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et. al. `98, Crovella and Bestavros `99, McGlohon et .al. `08]

P13. Weight Power Law [McGlohon et. al. `08] P14. Skewed call duration distributions [Vaz de Melo et. al. `10]

17 WIN'11 C. Faloutsos (CMU) RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. ECML PKDD’09.

Page 18: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Real Graph Patterns unweighted weighted static

P01. Power-law degree distribution [Faloutsos et. al.`99, Kleinberg et. al.`99, Chakrabarti et. al. `04, Newman`04] P02. Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos et. al. `03] P04. Community structure [Flake et. al.`02, Girvan and Newman `02] P05. Clique Power Laws [Du et. al. ‘09]

P12. Snapshot Power Law [McGlohon et. al. `08]

dynamic

P06. Densification Power Law [Leskovec et. al.`05] P07. Small and shrinking diameter [Albert and Barabási `99, Leskovec et. al. ‘05, McGlohon et. al. ‘08] P08. Gelling point [McGlohon et. al. `08] P09. Constant size 2nd and 3rd connected components [McGlohon et. al. `08] P10. Principal Eigenvalue Power Law [Akoglu et. al. `08] P11. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et. al. `98, Crovella and Bestavros `99, McGlohon et .al. `08]

P13. Weight Power Law [McGlohon et. al. `08] P14. Skewed call duration distributions [Vaz de Melo et. al. `10]

18 WIN'11 C. Faloutsos (CMU) RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. ECML PKDD’09.

Page 19: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

19 WIN'11 19 C. Faloutsos (CMU)

Page 20: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

20 WIN'11 20 C. Faloutsos (CMU)

Page 21: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

21 WIN'11 21 C. Faloutsos (CMU)

Page 22: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

22 WIN'11 22 C. Faloutsos (CMU)

Page 23: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

23 WIN'11 23 C. Faloutsos (CMU)

Page 24: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Triangle counting for large graphs?

Q: How to compute # triangles in B-node graph? (O(dmax ** 2) )?

24 WIN'11 24 C. Faloutsos (CMU)

Page 25: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Triangle counting for large graphs?

Q: How to compute # triangles in B-node graph? (O(dmax ** 2) )? A: cubes of eigvals

25 WIN'11 25 C. Faloutsos (CMU)

Page 26: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 26

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs

–  overview – Triangles – Diameter –  `Eigenspokes’ – Phonecall duration

•  Problem#2: Tools •  Conclusions WIN'11

Page 27: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

HADI for diameter estimation •  Radius Plots for Mining Tera-byte Scale

Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’10

•  Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1B)

•  Our HADI: linear on E (~10B) – Near-linear scalability wrt # machines – Several optimizations -> 5x faster

C. Faloutsos (CMU) 27 WIN'11

Page 28: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

????

19+ [Barabasi+]

28 C. Faloutsos (CMU)

Radius

Count

WIN'11

~1999, ~1M nodes

Page 29: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  Largest publicly available graph ever studied.

????

19+ [Barabasi+]

29 C. Faloutsos (CMU)

Radius

Count

WIN'11

??

~1999, ~1M nodes

Page 30: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  Largest publicly available graph ever studied.

????

19+? [Barabasi+]

30 C. Faloutsos (CMU)

Radius

Count

WIN'11

14 (dir.) ~7 (undir.)

Page 31: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) • 7 degrees of separation (!) • Diameter: shrunk

????

19+? [Barabasi+]

31 C. Faloutsos (CMU)

Radius

Count

WIN'11

14 (dir.) ~7 (undir.)

Page 32: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Q: Shape?

????

32 C. Faloutsos (CMU)

Radius

Count

WIN'11

~7 (undir.)

Page 33: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

33 C. Faloutsos (CMU)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality (?!)

WIN'11

Page 34: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Radius Plot of GCC of YahooWeb.

34 C. Faloutsos (CMU) WIN'11

Page 35: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

35 C. Faloutsos (CMU)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality: probably mixture of cores .

WIN'11

Page 36: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

36 C. Faloutsos (CMU)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality: probably mixture of cores .

WIN'11

EN

~7

Conjecture: DE

BR

Page 37: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

37 C. Faloutsos (CMU)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality: probably mixture of cores .

WIN'11

~7

Conjecture:

Page 38: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Running time - Kronecker and Erdos-Renyi Graphs with billions edges.

details

Page 39: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 39

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs

–  overview – Triangles – Diameter –  `Eigenspokes’ – Phonecall duration

•  Problem#2: Tools •  Conclusions WIN'11

Page 40: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

EigenSpokes B. Aditya Prakash, Mukund Seshadri, Ashwin

Sridharan, Sridhar Machiraju and Christos Faloutsos: EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs, PAKDD 2010, Hyderabad, India, 21-24 June 2010.

C. Faloutsos (CMU) 40 WIN'11

Page 41: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

EigenSpokes • Eigenvectors of adjacency matrix

  equivalent to singular vectors (symmetric, undirected graph)

41 C. Faloutsos (CMU) WIN'11

Page 42: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

EigenSpokes • Eigenvectors of adjacency matrix

  equivalent to singular vectors (symmetric, undirected graph)

42 C. Faloutsos (CMU) WIN'11

N

N

details

Page 43: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

EigenSpokes • Eigenvectors of adjacency matrix

  equivalent to singular vectors (symmetric, undirected graph)

43 C. Faloutsos (CMU) WIN'11

N

N

details

Page 44: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

EigenSpokes • Eigenvectors of adjacency matrix

  equivalent to singular vectors (symmetric, undirected graph)

44 C. Faloutsos (CMU) WIN'11

N

N

details

Page 45: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

EigenSpokes • Eigenvectors of adjacency matrix

  equivalent to singular vectors (symmetric, undirected graph)

45 C. Faloutsos (CMU) WIN'11

N

N

details

Page 46: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

EigenSpokes •  EE plot: •  Scatter plot of

scores of u1 vs u2 •  One would expect

– Many points @ origin

– A few scattered ~randomly

C. Faloutsos (CMU) 46

u1

u2

WIN'11

1st Principal component

2nd Principal component

Page 47: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

EigenSpokes •  EE plot: •  Scatter plot of

scores of u1 vs u2 •  One would expect

– Many points @ origin

– A few scattered ~randomly

C. Faloutsos (CMU) 47

u1

u2 90o

WIN'11

Page 48: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

EigenSpokes - pervasiveness • Present in mobile social graph

 across time and space

• Patent citation graph

48 C. Faloutsos (CMU) WIN'11

Page 49: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

49 C. Faloutsos (CMU) WIN'11

Page 50: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

50 C. Faloutsos (CMU) WIN'11

Page 51: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

51 C. Faloutsos (CMU) WIN'11

Page 52: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

So what?  Extract nodes with high

scores   high connectivity  Good “communities”

spy plot of top 20 nodes

52 C. Faloutsos (CMU) WIN'11

Page 53: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Bipartite Communities!

magnified bipartite community

patents from same inventor(s)

`cut-and-paste’ bibliography!

53 C. Faloutsos (CMU) WIN'11

Page 54: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 54

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs

–  overview – Triangles – Diameter –  `Eigenspokes’ – Phonecall duration

•  Problem#2: Tools •  Conclusions WIN'11

Page 55: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Real Graph Patterns unweighted weighted static

P01. Power-law degree distribution [Faloutsos et. al.`99, Kleinberg et. al.`99, Chakrabarti et. al. `04, Newman`04] P02. Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos et. al. `03] P04. Community structure [Flake et. al.`02, Girvan and Newman `02] P05. Clique Power Laws [Du et. al. ‘09]

P12. Snapshot Power Law [McGlohon et. al. `08]

dynamic

P06. Densification Power Law [Leskovec et. al.`05] P07. Small and shrinking diameter [Albert and Barabási `99, Leskovec et. al. ‘05, McGlohon et. al. ‘08] P08. Gelling point [McGlohon et. al. `08] P09. Constant size 2nd and 3rd connected components [McGlohon et. al. `08] P10. Principal Eigenvalue Power Law [Akoglu et. al. `08] P11. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et. al. `98, Crovella and Bestavros `99, McGlohon et .al. `08]

P13. Weight Power Law [McGlohon et. al. `08] P14. Skewed call duration distributions [Vaz de Melo et. al. `10]

55 WIN'11 C. Faloutsos (CMU) RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. ECML PKDD’09.

Page 56: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Duration of phonecalls Surprising Patterns for the Call

Duration Distribution of Mobile Phone Users

Pedro O. S. Vaz de Melo, Leman Akoglu, Christos Faloutsos, Antonio A. F. Loureiro

PKDD 2010 WIN'11 C. Faloutsos (CMU) 56

Page 57: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Probably, power law (?)

WIN'11 C. Faloutsos (CMU) 57

??

Page 58: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

No Power Law!?

WIN'11 C. Faloutsos (CMU) 58

Page 59: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

‘TLaC: Lazy Contractor’ •  The longer a task (phonecall) has taken, •  The even longer it will take

WIN'11 C. Faloutsos (CMU) 59

Odds ratio=

Casualties(<x): Survivors(>=x)

== power law

Page 60: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

60

Data Description

  Data from a private mobile operator of a large city   4 months of data   3.1 million users   more than 1 billion phone records

  Over 96% of ‘talkative’ users obeyed a TLAC distribution (‘talkative’: >30 calls)

WIN'11 C. Faloutsos (CMU)

Page 61: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Outliers:

WIN'11 C. Faloutsos (CMU) 61

Page 62: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 62

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– … –  `Eigenspokes’ – Phonecall duration – Connected components

•  Problem#2: Tools •  Conclusions

WIN'11

Page 63: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS Generalized Iterated Matrix

Vector Multiplication (GIMV)

C. Faloutsos (CMU) 63

PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up).

WIN'11

Page 64: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

64

Example: GIM-V At Work •  Connected Components – 4 observations:

Size

Count

C. Faloutsos (CMU) WIN'11

Page 65: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

65

Example: GIM-V At Work •  Connected Components

Size

Count

C. Faloutsos (CMU) WIN'11

1) 10K x larger than next

Page 66: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

66

Example: GIM-V At Work •  Connected Components

Size

Count

C. Faloutsos (CMU) WIN'11

2) ~0.7B singleton nodes

Page 67: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

67

Example: GIM-V At Work •  Connected Components

Size

Count

C. Faloutsos (CMU) WIN'11

3) SLOPE!

Page 68: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

68

Example: GIM-V At Work •  Connected Components

Size

Count 300-size

cmpt X 500. Why? 1100-size cmpt

X 65. Why?

C. Faloutsos (CMU) WIN'11

4) Spikes!

Page 69: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

69

Example: GIM-V At Work •  Connected Components

Size

Count

suspicious financial-advice sites

(not existing now)

C. Faloutsos (CMU) WIN'11

Page 70: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

70

GIM-V At Work •  Connected Components over Time •  LinkedIn: 7.5M nodes and 58M edges

Stable tail slope after the gelling point

C. Faloutsos (CMU) WIN'11

Page 71: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 71

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

–  Immunization – BP –  visualization

•  Conclusions

WIN'11

Page 72: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS Immunization and epidemic

thresholds •  Q1: which nodes to immunize? •  Q2: will a virus vanish, or will it create an

epidemic?

WIN'11 C. Faloutsos (CMU) 72

Page 73: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Q1: Immunization: • Given

• a network, • k vaccines, and • the virus details

• Which nodes to immunize?

WIN'11 73 C. Faloutsos (CMU)

Page 74: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Q1: Immunization: • Given

• a network, • k vaccines, and • the virus details

• Which nodes to immunize?

WIN'11 74 C. Faloutsos (CMU)

Page 75: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Q1: Immunization: • Given

• a network, • k vaccines, and • the virus details

• Which nodes to immunize?

WIN'11 75 C. Faloutsos (CMU)

Page 76: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Q1: Immunization: • Given

• a network, • k vaccines, and • the virus details

• Which nodes to immunize?

A: immunize the ones that maximally raise the `epidemic threshold’ [Tong+, ICDM’10]

WIN'11 76 C. Faloutsos (CMU)

Page 77: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Q2: will a virus take over? •  Flu-like virus (no immunity, ‘SIS’) •  Mumps (life-time immunity, ‘SIR’) •  Pertussis (finite-length immunity, ‘SIRS’)

WIN'11 C. Faloutsos (CMU) 77

β: attack prob δ: heal prob

Page 78: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Q2: will a virus take over? •  Flu-like virus (no immunity, ‘SIS’) •  Mumps (life-time immunity, ‘SIR’) •  Pertussis (finite-length immunity, ‘SIRS’)

WIN'11 C. Faloutsos (CMU) 78

β: attack prob δ: heal prob

Α: depends on connectivity (avg degree? Something else?)

Page 79: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

WIN'11 C. Faloutsos (CMU) 79

Epidemic threshold τ

What should τ depend on? •  avg. degree? and/or highest degree? •  and/or variance of degree? •  and/or third moment of degree? •  and/or diameter?

Page 80: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

WIN'11 C. Faloutsos (CMU) 80

Epidemic threshold - SIS

•  [Theorem] We have no epidemic, if

β/δ <τ = 1/ λ1,A

Page 81: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

WIN'11 C. Faloutsos (CMU) 81

Epidemic threshold - SIS

•  [Theorem] We have no epidemic, if

β/δ <τ = 1/ λ1,A

largest eigenvalue of adj. matrix A

attack prob.

recovery prob. epidemic threshold

Proof: [Wang+03] (for SIS=flu only)

Page 82: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

WIN'11 C. Faloutsos (CMU) 82

Epidemic threshold - SIS

•  [Theorem] We have no epidemic, if

β/δ <τ = 1/ λ1,A

Proof: [Wang+03] (for SIS=flu only)

What about other V.P.M.? (SIR, SIRS, etc?)

Page 83: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Theorem: •  For all typical virus propagation models (flu,

mumps, pertussis, HIV, etc) •  The only connectivity measure that matters, is

1/λ1 the first eigenvalue of the adj. matrix [Prakash+, ‘10, arxiv]

WIN'11 C. Faloutsos (CMU) 83

Page 84: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

84 C. Faloutsos (CMU) WIN'11

Thresholds for some models

•  s = effective strength •  s < 1 : below threshold

Models Effective Strength (s)

Threshold (tipping point)

SIS, SIR, SIRS, SEIR s = λ .

s = 1 SIV, SEIV s = λ .

(H.I.V.) s = λ .

Page 85: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

A2: will a virus take over?

WIN'11 C. Faloutsos (CMU) 85

Fraction of infected

Time ticks

Below: exp. extinction

Above: take-over

Graph: Portland, OR 31M links 1.5M nodes

Page 86: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Q1: Immunization: • Given

• a network, • k vaccines, and • the virus details

• Which nodes to immunize?

A: immunize the ones that maximally raise the `epidemic threshold’ [Tong+, ICDM’10]

WIN'11 86 C. Faloutsos (CMU)

Page 87: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Q1: Immunization: • Given

• a network, • k vaccines, and • the virus details

• Which nodes to immunize?

A: immunize the ones that maximally raise the `epidemic threshold’ [Tong+, ICDM’10]

WIN'11 87 C. Faloutsos (CMU)

Max eigen-drop Δλ For any virus!

Page 88: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 88

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

–  Immunization – BP –  visualization

•  Conclusions

WIN'11

Page 89: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

WIN'11 C. Faloutsos (CMU) 89

E-bay Fraud detection

w/ Polo Chau & Shashank Pandit, CMU [www’07]

Page 90: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

WIN'11 C. Faloutsos (CMU) 90

E-bay Fraud detection

Page 91: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

WIN'11 C. Faloutsos (CMU) 91

E-bay Fraud detection

Page 92: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

WIN'11 C. Faloutsos (CMU) 92

E-bay Fraud detection - NetProbe

Page 93: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Popular press

And less desirable attention: •  E-mail from ‘Belgium police’ (‘copy of

your code?’) WIN'11 C. Faloutsos (CMU) 93

Page 94: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 94

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

–  Immunization – BP - theory –  visualization

•  Conclusions

WIN'11

Page 95: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS Guilt-by-Association Techniques

Given: •  graph and •  few labeled nodes

Find: class (red/green) for rest nodes Assuming: network effects (homophily/ heterophily)

WIN'11 95 C. Faloutsos (CMU)

Page 96: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Correspondence  of  Methods

Random Walk with Restarts (RWR) Google Semi-supervised Learning (SSL) Belief Propagation (BP) Bayesian

WIN'11 96 C. Faloutsos (CMU)

Page 97: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Correspondence  of  Methods

Random Walk with Restarts (RWR) ≈ Semi-supervised Learning (SSL) ≈ Belief Propagation (BP)

Method Matrix unknown known RWR [I – c AD-1] × x = (1-c)y SSL [I + a(D - A)] × x = y

FABP [I + a D - c’A] × bh = φh

0 1 0 1 0 1 0 1 0

? 0 1 1

WIN'11 97 C. Faloutsos (CMU) Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms. Danai Koutra, et al PKDD’11

Page 98: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 98

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

–  Immunization – BP –  visualization

•  Conclusions

WIN'11

Page 99: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Apolo  Making  Sense  of  Large  Network  Data:  Combining  Rich  User  Interac<on  &  Machine  Learning  CHI 2011, Vancouver, Canada

Polo  Chau   Prof.  Niki  KiCur   Prof.  Jason  Hong   Prof.  Christos  Faloutsos  

WIN'11 99 C. Faloutsos (CMU)

Page 100: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

Exemplars  Rest  of  the  nodes  are  considered  relevant  (by  BP);  relevance  indicated  by  color  satura<on.  Note  that  BP  supports  mul<ple  groups   WIN'11 100 C. Faloutsos (CMU)

Page 101: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 101

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

–  Immunization – BP –  visualization

•  Conclusions

WIN'11

Page 102: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 102

OVERALL CONCLUSIONS – low level:

•  Several new patterns (eigenspokes, radius plot etc)

•  New tools and theoretical results –  belief propagation (~ RWR ~ SSL )

–  Immunization: Δ λ, for ‘any’ V.P.M.

WIN'11

Page 103: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 103

OVERALL CONCLUSIONS – high level

•  BIG DATA: -> patterns/outliers that are invisible otherwise

WIN'11

Page 104: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 104

References •  Leman Akoglu, Christos Faloutsos: RTG: A Recursive

Realistic Graph Generator Using Random Typing. ECML/PKDD (1) 2009: 13-28

•  Deepayan Chakrabarti, Christos Faloutsos: Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38(1): (2006)

WIN'11

Page 105: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 105

References •  Deepayan Chakrabarti, Yang Wang, Chenxi Wang,

Jure Leskovec, Christos Faloutsos: Epidemic thresholds in real networks. ACM Trans. Inf. Syst. Secur. 10(4): (2008)

•  Deepayan Chakrabarti, Jure Leskovec, Christos Faloutsos, Samuel Madden, Carlos Guestrin, Michalis Faloutsos: Information Survival Threshold in Sensor and P2P Networks. INFOCOM 2007: 1316-1324

WIN'11

Page 106: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 106

References •  Christos Faloutsos, Tamara G. Kolda, Jimeng Sun:

Mining large graphs and streams using matrix and tensor tools. Tutorial, SIGMOD Conference 2007: 1174

WIN'11

Page 107: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 107

References •  T. G. Kolda and J. Sun. Scalable Tensor

Decompositions for Multi-aspect Data Mining. In: ICDM 2008, pp. 363-372, December 2008.

WIN'11

Page 108: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 108

References •  Jure Leskovec, Jon Kleinberg and Christos Faloutsos

Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award).

•  Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, Christos Faloutsos: Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication. PKDD 2005: 133-145

WIN'11

Page 109: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 109

References •  Jimeng Sun, Yinglian Xie, Hui Zhang, Christos

Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr 2007.

•  Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos, GraphScope: Parameter-free Mining of Large Time-evolving Graphs ACM SIGKDD Conference, San Jose, CA, August 2007

WIN'11

Page 110: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

References •  Jimeng Sun, Dacheng Tao, Christos

Faloutsos: Beyond streams and graphs: dynamic tensor analysis. KDD 2006: 374-383

WIN'11 C. Faloutsos (CMU) 110

Page 111: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 111

References •  Hanghang Tong, Christos Faloutsos, and

Jia-Yu Pan, Fast Random Walk with Restart and Its Applications, ICDM 2006, Hong Kong.

•  Hanghang Tong, Christos Faloutsos, Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA

WIN'11

Page 112: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 112

References •  Hanghang Tong, Christos Faloutsos, Brian

Gallagher, Tina Eliassi-Rad: Fast best-effort pattern matching in large attributed graphs. KDD 2007: 737-746

WIN'11

Page 113: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 113

Project info

Akoglu, Leman

Chau, Polo

Kang, U McGlohon, Mary

Tong, Hanghang

Prakash, Aditya

WIN'11

Thanks to: NSF IIS-1017415, 0705359, 0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab

www.cs.cmu.edu/~pegasus

Out, in ‘12

Koutra, Danae

Page 114: Mining Billion-Node Graphs: Patterns and influence propagationchristos/TALKS/11-WIN/faloutsos_WIN_2011.pdf · Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos

CMU SCS

C. Faloutsos (CMU) 114

OVERALL CONCLUSIONS – high level

•  BIG DATA: -> patterns/outliers that are invisible otherwise

WIN'11