Discovering Roles and Anomalies in Graphs: Theory and...

140
Discovering Roles and Anomalies in Graphs: Theory and Applications Part 2: Patterns and Anomalies Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU) ECML PKDD 2013 Tutorial

Transcript of Discovering Roles and Anomalies in Graphs: Theory and...

Page 1: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Discovering Roles and Anomalies in Graphs:

Theory and Applications

Part 2: Patterns and Anomalies Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU)

ECML PKDD 2013 Tutorial

Page 2: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 2

OVERVIEW - high level:

ECML PKDD 2013 Tutorial

Roles

Features

Anomalies

Patterns

= rare roles

Page 3: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 3

Resource:

Open source system for mining huge graphs: PEGASUS project (PEta GrAph mining

System) •  www.cs.cmu.edu/~pegasus •  code and papers ECML PKDD 2013 Tutorial

Page 4: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 4

Roadmap

•  Patterns in graphs – Overview – Static graphs – Weighted graphs – Time-evolving graphs

•  Anomaly Detection •  Application: ebay fraud •  Conclusions

ECML PKDD 2013 Tutorial

Page 5: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 5

Problem #1 - network and graph mining

•  What does the Internet look like? •  What does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

ECML PKDD 2013 Tutorial

Page 6: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 6

Problem #1 - network and graph mining

•  What does the Internet look like? •  What does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  Which patterns/laws hold?

–  To spot anomalies (rarities), we have to discover patterns

ECML PKDD 2013 Tutorial

Page 7: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 7

Problem #1 - network and graph mining

•  What does the Internet look like? •  What does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  Which patterns/laws hold?

–  To spot anomalies (rarities), we have to discover patterns

–  Large datasets reveal patterns/anomalies that may be invisible otherwise…

ECML PKDD 2013 Tutorial

Page 8: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 8

Graph mining •  Are real graphs random?

ECML PKDD 2013 Tutorial

Page 9: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 9

Laws and patterns •  Are real graphs random? •  A: NO!!

– Diameter –  In- and out- degree distributions – Other (surprising) patterns

•  So, let’s look at the data

ECML PKDD 2013 Tutorial

Page 10: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Real Graph Patterns Unweighted Weighted Static

P01. Power-law degree distribution [Faloutsos et. al.`99, Kleinberg et. al.`99, Chakrabarti et. al. `04, Newman`04] P02. Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos et. al. `03] P04. Community structure [Flake et. al.`02, Girvan and Newman `02] P05. Clique Power Laws [Du et. al. ‘09]

P12. Snapshot Power Law [McGlohon et. al. `08]

Dynam

ic

P06. Densification Power Law [Leskovec et. al.`05] P07. Small and shrinking diameter [Albert and Barabási `99, Leskovec et. al. ‘05, McGlohon et. al. ‘08] P08. Gelling point [McGlohon et. al. `08] P09. Constant size 2nd and 3rd connected components [McGlohon et. al. `08] P10. Principal Eigenvalue Power Law [Akoglu et. al. `08] P11. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et. al. `98, Crovella and Bestavros `99, McGlohon et .al. `08]

P13. Weight Power Law [McGlohon et. al. `08] P14. Skewed call duration distributions [Vaz de Melo et. al. `10]

10 ECML PKDD 2013 Tutorial

T. Eliassi-Rad & C. Faloutsos RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. ECML PKDD’09.

Page 11: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

11

Roadmap

•  Patterns in graphs – Overview – Static graphs – Weighted graphs – Time-evolving graphs

•  Anomaly Detection •  Application: ebay fraud •  Conclusions

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 12: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 12

Solution# S.1

•  Power law in the degree distribution [SIGCOMM99]

log(rank)

log(degree)

Internet Domains att.com

ibm.com

ECML PKDD 2013 Tutorial

Page 13: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 13

Solution# S.1

•  Power law in the degree distribution [SIGCOMM99]

log(rank)

log(degree)

-0.82

att.com

ibm.com

Internet Domains

ECML PKDD 2013 Tutorial

Page 14: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 14

Solution# S.2: Eigen Exponent E

•  A2: power law in the eigenvalues of the adjacency matrix

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

ECML PKDD 2013 Tutorial

Page 15: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 15

Solution# S.2: Eigen Exponent E

•  [Mihail, Papadimitriou ‘02]: slope is ½ of rank exponent

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

ECML PKDD 2013 Tutorial

Page 16: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 16

But: How about graphs from other domains?

ECML PKDD 2013 Tutorial

Page 17: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 17

More power laws: •  web hit counts [w/ A. Montgomery]

Web Site Traffic

in-degree (log scale)

Count (log scale)

Zipf

users sites

``ebay’’

ECML PKDD 2013 Tutorial

Page 18: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 18

epinions.com •  who-trusts-whom

[Richardson + Domingos, KDD 2001]

(out) degree

count

trusts-2000-people user

ECML PKDD 2013 Tutorial

Page 19: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

And numerous more •  # of sexual contacts •  Income [Pareto] – ‘80-20 distribution’ •  Duration of downloads [Bestavros+] •  Duration of UNIX jobs (`mice and

elephants’) •  Size of files of a user •  … •  ‘Black swans’

T. Eliassi-Rad & C. Faloutsos 19 ECML PKDD 2013 Tutorial

Page 20: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 20

Roadmap

•  Patterns in graphs –  overview – Static graphs

•  S1: Degree, S2: Eigenvalues •  S3-4: Triangles, S5: Cliques •  Radius plot •  Other observations (‘EigenSpokes’)

– Weighted graphs – Time-evolving graphs

ECML PKDD 2013 Tutorial

Page 21: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 21

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles

ECML PKDD 2013 Tutorial

Page 22: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 22

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles –  Friends of friends are friends

•  Any patterns?

ECML PKDD 2013 Tutorial

Page 23: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 23

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

ASN HEP-TH

Epinions X-axis: # of participating triangles Y: count (~ pdf)

ECML PKDD 2013 Tutorial

Page 24: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

ASN HEP-TH

Epinions X-axis: # of participating triangles Y: count (~ pdf)

T. Eliassi-Rad & C. Faloutsos 24 ECML PKDD 2013 Tutorial

Page 25: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Triangle Law: #S.4 [Tsourakakis ICDM 2008]

SN Reuters

Epinions X-axis: degree Y-axis: mean # triangles n friends → ~n1.6 triangles

T. Eliassi-Rad & C. Faloutsos 25 ECML PKDD 2013 Tutorial

Page 26: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 26

Triangle Law: Computations [Tsourakakis ICDM 2008]

Triangles are expensive to compute (3-way join; several approximation algorithms)

Q: Can we do this computation quickly?

ECML PKDD 2013 Tutorial

Page 27: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 27

Triangle Law: Computations [Tsourakakis ICDM 2008]

Triangle are expensive to compute (3-way join; several approx. algos)

Q: Can we do this computation quickly? A: Yes!

#triangles = 1/6 Sum ( λi3 )

(and, because of skewness (S2) , we only need the top few eigenvalues!)

ECML PKDD 2013 Tutorial

Page 28: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 28

Triangle Law: Computations [Tsourakakis ICDM 2008]

1000x+ speed-up, >90% accuracy ECML PKDD 2013 Tutorial

Page 29: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Triangle Counting on Big Graphs Anomalous nodes in Twitter (~ 3 billion edges)

[U Kang, Brendan Meeder, +, PAKDD’11] T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial 29

Page 30: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Triangle Counting on Big Graphs Anomalous nodes in Twitter (~ 3 billion edges)

[U Kang, Brendan Meeder, +, PAKDD’11] T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial 30

Page 31: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Triangle Counting on Big Graphs Anomalous nodes in Twitter(~ 3 billion edges)

[U Kang, Brendan Meeder, +, PAKDD’11] T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial 31

Page 32: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Triangle Counting on Big Graphs Anomalous nodes in Twitter (~ 3 billion edges)

[U Kang, Brendan Meeder, +, PAKDD’11] T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial 32

Page 33: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Triangle Counting on Big Graphs Anomalous nodes in Twitter (~ 3 billion edges)

[U Kang, Brendan Meeder, +, PAKDD’11] T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial 33

Page 34: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Triangle Counting on Big Graphs Q: How to compute # triangles in B-node

graph? (O(dmax2) )?

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial 34

Page 35: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Triangle Counting on Big Graphs Q: How to compute # triangles in B-node

graph? A: cubes of eigvals T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial 35

Page 36: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

36

Roadmap

•  Patterns in graphs –  overview – Static graphs

•  S1: Degree, S2: Eigenvalues •  S3-4: Triangles, S5: Cliques •  Radius plot •  Other observations (‘EigenSpokes’)

– Weighted graphs – Time-evolving graphs

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 37: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

How about cliques? •  Large Human Communication Networks

Patterns and a Utility-Driven Generator. Nan Du, Christos Faloutsos, Bai Wang, Leman Akoglu, KDD 2009

37 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 38: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

38

Cliques •  Clique is a complete subgraph. •  If a clique can not be

contained by any larger clique, it is called the maximal clique.

2 0

1 3

4

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 39: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

39

Clique •  Clique is a complete subgraph. •  If a clique can not be

contained by any larger clique, it is called the maximal clique.

2 0

1 3

4

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 40: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

40

Clique •  Clique is a complete subgraph. •  If a clique can not be

contained by any larger clique, it is called the maximal clique.

2 0

1 3

4

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 41: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

41

Clique •  Clique is a complete subgraph. •  If a clique can not be

contained by any larger clique, it is called the maximal clique.

•  {0,1,2}, {0,1,3}, {1,2,3} {2,3,4}, {0,1,2,3} are cliques;

•  {0,1,2,3} and {2,3,4} are the maximal cliques.

2 0

1 3

4

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 42: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

42

S5: Clique-Degree Power-Law •  Power law:

α∝idg iavC d

1 8 2 2 is the power law exponent

[ . , . ] for S1~S3αα ∈

More friends, even more social circles !

# maximal cliques of node i

degree of node i

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 43: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 43

S5: Clique-Degree Power-Law •  Outlier Detection

ECML PKDD 2013 Tutorial

Page 44: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 44

S5: Clique-Degree Power-Law •  Outlier Detection

ECML PKDD 2013 Tutorial

Page 45: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 45

Roadmap

•  Patterns in graphs –  overview – Static graphs

•  S1: Degree, S2: Eigenvalues •  S3-4: Triangles, S5: Cliques •  Radius plot •  Other observations (‘EigenSpokes’)

– Weighted graphs – Time-evolving graphs

ECML PKDD 2013 Tutorial

Page 46: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

HADI for Diameter Estimation •  Radius Plots for Mining Tera-byte Scale Graphs

U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’10

•  Naively: diameter needs O(N2) space and up to O(N3) time – prohibitive (N~1B)

•  Our HADI: linear on E (~10B) –  Near-linear scalability w.r.t. # machines –  Several optimizations → 5x faster

T. Eliassi-Rad & C. Faloutsos 46 ECML PKDD 2013 Tutorial

Page 47: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

????

19+ [Barabasi+]

47 T. Eliassi-Rad & C. Faloutsos

Radius

Count

~1999, ~1M nodes

ECML PKDD 2013 Tutorial

Page 48: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  Largest publicly available graph ever studied.

????

19+ [Barabasi+]

48 T. Eliassi-Rad & C. Faloutsos

Radius

Count ??

~1999, ~1M nodes

ECML PKDD 2013 Tutorial

Page 49: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

????

19+? [Barabasi+]

49 T. Eliassi-Rad & C. Faloutsos

Radius

Count

14 (dir.) ~7 (undir.)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  Largest publicly available graph ever studied.

ECML PKDD 2013 Tutorial

Page 50: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  7 degrees of separation (!) •  Diameter: shrunk

????

19+? [Barabasi+]

50 T. Eliassi-Rad & C. Faloutsos

Radius

Count

14 (dir.) ~7 (undir.)

ECML PKDD 2013 Tutorial

Page 51: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Q: Shape?

????

51 T. Eliassi-Rad & C. Faloutsos

Radius

Count

~7 (undir.)

ECML PKDD 2013 Tutorial

Page 52: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

52 T. Eliassi-Rad & C. Faloutsos

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  Effective diameter: surprisingly small •  Multi-modality (?!)

ECML PKDD 2013 Tutorial

Page 53: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Radius Plot of GCC of YahooWeb Graph

53 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 54: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

54 T. Eliassi-Rad & C. Faloutsos

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  Effective diameter: surprisingly small •  Multi-modality: probably mixture of cores

ECML PKDD 2013 Tutorial

Page 55: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

55 T. Eliassi-Rad & C. Faloutsos

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  Effective diameter: surprisingly small •  Multi-modality: probably mixture of cores

EN

~7

Conjecture: DE

BR

ECML PKDD 2013 Tutorial

Page 56: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

56 T. Eliassi-Rad & C. Faloutsos

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small •  Multi-modality: probably mixture of cores

~7

Conjecture:

ECML PKDD 2013 Tutorial

Page 57: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 57

Roadmap

•  Patterns in graphs –  overview – Static graphs

•  S1: Degree, S2: Eigenvalues •  S3-4: Triangles, S5: Cliques •  Radius plot •  Other observations (‘EigenSpokes’)

– Weighted graphs – Time-evolving graphs

ECML PKDD 2013 Tutorial

Page 58: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

S6: EigenSpokes EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs. B. Aditya Prakash, Mukund Seshadri, Ashwin Sridharan, Sridhar Machiraju, and Christos Faloutsos: PAKDD 2010.

T. Eliassi-Rad & C. Faloutsos 58 ECML PKDD 2013 Tutorial

Page 59: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

EigenSpokes • Eigenvectors of adjacency matrix

§  equivalent to singular vectors (symmetric, undirected graph)

A = U�UT

59 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 60: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

EigenSpokes • Eigenvectors of adjacency matrix

§  equivalent to singular vectors (symmetric, undirected graph)

A = U�UT

�u1 �ui60 T. Eliassi-Rad & C. Faloutsos

N

N

ECML PKDD 2013 Tutorial

Page 61: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

EigenSpokes • Eigenvectors of adjacency matrix

§  equivalent to singular vectors (symmetric, undirected graph)

A = U�UT

�u1 �ui61 T. Eliassi-Rad & C. Faloutsos

N

N

ECML PKDD 2013 Tutorial

Page 62: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

EigenSpokes • Eigenvectors of adjacency matrix

§  equivalent to singular vectors (symmetric, undirected graph)

A = U�UT

�u1 �ui62 T. Eliassi-Rad & C. Faloutsos

N

N

ECML PKDD 2013 Tutorial

Page 63: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

EigenSpokes • Eigenvectors of adjacency matrix

§  equivalent to singular vectors (symmetric, undirected graph)

A = U�UT

�u1 �ui63 T. Eliassi-Rad & C. Faloutsos

N

N

ECML PKDD 2013 Tutorial

Page 64: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

EigenSpokes •  EigenEigen (EE) Plot

– Scatter plot of scores of u1 vs. u2

•  One would expect – Many points @

origin – A few scattered

~randomly

T. Eliassi-Rad & C. Faloutsos 64

u1

u2

1st Principal component

2nd Principal component

ECML PKDD 2013 Tutorial

Page 65: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

EigenSpokes •  EigenEigen (EE)

Plot – Scatter plot of

scores of u1 vs u2 •  One would expect

– Many points @ origin

– A few scattered ~randomly

65

u1

u2 90o

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 66: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

EigenSpokes - pervasiveness • Present in mobile social graph

§ Across time and space • Patent citation graph

66 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 67: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

67 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 68: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

68 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 69: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

69 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 70: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

EigenSpokes - explanation

Near-cliques, or near-bipartite-cores, loosely connected

So what?

§ Extract nodes with high scores

§ High connectivity § Good “communities”

spy plot of top 20 nodes

70 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 71: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Bipartite Communities!

magnified bipartite community

patents from same inventor(s)

`cut-and-paste’ bibliography!

71 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 72: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

72

Roadmap

•  Patterns in graphs – Overview – Static graphs – Weighted graphs – Time-evolving graphs

•  Anomaly Detection •  Application: ebay fraud •  Conclusions

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 73: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

73

Observations on Weighted Graphs

•  A: yes - even more ‘laws’!

M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. KDD 2008

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 74: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

74

Observation W.1: Fortification

Q: How do the weights of nodes relate to degree?

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 75: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

75

Observation W.1: Fortification

More donors, more $ ?

$10

$5

‘Reagan’

‘Clinton’ $7

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 76: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Edges (# donors)

In-weights ($)

76

Observation W.1: fortification: Snapshot Power Law

•  Weight: super-linear on in-degree •  exponent ‘iw’: 1.01 < iw < 1.26

Orgs-Candidates

e.g. John Kerry, $10M received, from 1K donors

More donors, even more $

$10

$5

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 77: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

77

Roadmap

•  Patterns in graphs – Overview – Static graphs – Weighted graphs – Time-evolving graphs

•  Anomaly Detection •  Application: ebay fraud •  Conclusions

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 78: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

78

Problem: Time evolution •  with Jure Leskovec

(CMU → Stanford)

•  and Jon Kleinberg (Cornell – sabb. @ CMU)

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 79: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

79

T.1 Evolution of the Diameter •  Prior work on Power Law graphs hinted

at slowly growing diameter: –  diameter ~ O(log N) –  diameter ~ O(log log N)

•  What is happening in real data?

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 80: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

80

T.1 Evolution of the Diameter •  Prior work on Power Law graphs hints

at slowly growing diameter: –  diameter ~ O(log N) –  diameter ~ O(log log N)

•  What is happening in real data? •  Diameter shrinks over time

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 81: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

81

T.1 Diameter – “Patents”

•  Patent citation network

•  25 years of data •  @1999

–  2.9 M nodes –  16.5 M edges

time [years]

diameter

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 82: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

82

T.2 Temporal Evolution of the Graphs

•  N(t) … nodes at time t •  E(t) … edges at time t •  Suppose that

N(t+1) = 2 * N(t) •  Q: what is your guess for

E(t+1) =? 2 * E(t)

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 83: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

83

T.2 Temporal Evolution of the Graphs

•  N(t) … nodes at time t •  E(t) … edges at time t •  Suppose that

N(t+1) = 2 * N(t) •  Q: what is your guess for

E(t+1) =? 2 * E(t)

•  A: over-doubled! – But obeying the “Densification Power Law”

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 84: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

84

T.2 Densification – Patent Citations

•  Citations among patents granted

•  @1999 –  2.9 M nodes –  16.5 M edges

•  Each year is a datapoint

N(t)

E(t)

1.66

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 85: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 85

Roadmap

•  Patterns in graphs – … – Time-evolving graphs

•  T1: Shrinking diameter •  T2: Densification •  T3: Connected components •  T4: Popularity over time •  T5: Phone-call patterns

•  …

ECML PKDD 2013 Tutorial

Page 86: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

86

More on Time-evolving graphs

M. McGlohon, L. Akoglu, and C. Faloutsos. Weighted Graphs and Disconnected Components: Patterns and a Generator. KDD 2008

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 87: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

87

Observation T.3: NLCC behavior Q: How do the NLCCs emerge and join with

the GCC? (“NLCC” = non-largest conn. components) – Do they continue to grow in size? –  or do they shrink? –  or stabilize?

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 88: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

88

Observation T.3: NLCC behavior Q: How do NLCCs emerge and join with the

GCC? (“NLCC” = non-largest conn. components) – Do they continue to grow in size? –  or do they shrink? –  or stabilize?

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 89: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

89

Observation T.3: NLCC behavior Q: How do NLCCs emerge and join with the

GCC? (“NLCC” = non-largest conn. components) – Do they continue to grow in size? –  or do they shrink? –  or stabilize?

YES

YES

YES

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 90: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

90

Observation T.3: NLCC behavior •  After the gelling point, the GCC takes off, but

the NLCCs remain ~constant (actually, oscillate).

IMDB

CC size

Time-stamp T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 91: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

91

Roadmap

•  Patterns in graphs – … – Time-evolving graphs

•  T1: Shrinking diameter •  T2: Densification •  T3: Connected components •  T4: Popularity over time •  T5: Phone-call patterns

•  …

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 92: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

92

Timing for Blogs

•  with Mary McGlohon (CMU→Google) •  Jure Leskovec (CMU→Stanford) •  Natalie Glance (now at Google) •  Matt Hurst (now at MSR) [SDM’07]

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 93: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

93

T.4 : popularity over time

Post popularity drops-off – exponentially?

lag: days after post

# in links

1 2 3

@t

@t + lag T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 94: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

94

T.4 : popularity over time

Post popularity drops-off – exponentially? POWER LAW! Exponent?

# in links (log)

days after post (log)

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 95: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

95

T.4 : popularity over time

Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 •  close to -1.5: Barabasi’s stack model •  and like the zero-crossings of a random walk

# in links (log) -1.6

days after post (log)

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 96: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

96

-1.5 slope

J. G. Oliveira & A.-L. Barabási. Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]

Log # days to respond

Log Prob() -1.5

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 97: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

97

Roadmap

•  Patterns in graphs – … – Time-evolving graphs

•  T1: Shrinking diameter •  T2: Densification •  T3: Connected components •  T4: Popularity over time •  T5: Phone-call patterns

•  …

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 98: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T.5: duration of phonecalls Surprising Patterns for the Call Duration Distribution of Mobile Phone Users. Pedro O. S. Vaz de Melo, Leman Akoglu, Christos Faloutsos, Antonio A. F. Loureiro. PKDD 2010

98 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 99: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Probably, power law (?)

99

??

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 100: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

No Power Law!

100 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 101: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

‘TLaC: Lazy Contractor’ •  The longer a task (phone-call) has taken,

the even longer it will take

101

Odds ratio= Casualties(<x): Survivors(>=x) == power law

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 102: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

102

Data Description

n  Data from a private mobile operator of a large city n  4 months of data n  3.1 million users n  more than 1 billion phone records

n  Over 96% of ‘talkative’ users obeyed a TLAC distribution (‘talkative’: >30 calls)

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 103: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Outliers:

103 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 104: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Real Graph Patterns unweighted weighted static

P01. Power-law degree distribution [Faloutsos et. al.`99, Kleinberg et. al.`99, Chakrabarti et. al. `04, Newman`04] P02. Triangle Power Law [Tsourakakis `08] P03. Eigenvalue Power Law [Siganos et. al. `03] P04. Community structure [Flake et. al.`02, Girvan and Newman `02] P05. Clique Power Laws [Du et. al. ‘09]

P12. Snapshot Power Law [McGlohon et. al. `08]

dynamic

P06. Densification Power Law [Leskovec et. al.`05] P07. Small and shrinking diameter [Albert and Barabási `99, Leskovec et. al. ‘05, McGlohon et. al. ‘08] P08. Gelling point [McGlohon et. al. `08] P09. Constant size 2nd and 3rd connected components [McGlohon et. al. `08] P10. Principal Eigenvalue Power Law [Akoglu et. al. `08] P11. Bursty/self-similar edge/weight additions [Gomez and Santonja `98, Gribble et. al. `98, Crovella and Bestavros `99, McGlohon et .al. `08]

P13. Weight Power Law [McGlohon et. al. `08] P14. Skewed call duration distributions [Vaz de Melo et. al. `10]

104 ECML PKDD 2013 Tutorial

T. Eliassi-Rad & C. Faloutsos RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. ECML PKDD’09.

✓ ✓ ✓

✓ ✓

✓ ✓

Page 105: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

105

Roadmap

•  Patterns in graphs – Overview – Static graphs – Weighted graphs – Time-evolving graphs

•  Anomaly Detection •  Application: ebay fraud •  Conclusions

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 106: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

OddBall: Spotting Anomalies in Weighted Graphs

Leman Akoglu, Mary McGlohon,

Christos Faloutsos

PAKDD 2010, Hyderabad, India

Page 107: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Main idea For each node, •  Extract ‘ego-net’ (=1-step-away neighbors) •  Extract features (#edges, total weight, etc) •  Compare with the rest of the population

107 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 108: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

What is an egonet?

ego

108

egonet

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 109: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Selected Features §  Ni: number of neighbors (degree) of ego i §  Ei: number of edges in egonet i §  Wi: total weight of egonet i §  λw,i: principal eigenvalue of the weighted

adjacency matrix of egonet I

109 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 110: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Near-Clique/Star

110 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 111: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Near-Clique/Star

111 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 112: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Near-Clique/Star

112 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 113: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Andrew Lewis (director)

Near-Clique/Star

113 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 114: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Dominant Heavy Link

114 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 115: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

115

Roadmap

•  Patterns in graphs –  overview – Static graphs – Weighted graphs – Time-evolving graphs

•  Anomaly Detection •  Application: ebay fraud •  Conclusions

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 116: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

NetProbe: The Problem Find bad sellers (fraudsters) on eBay who don’t deliver their (expensive) items

116

$$$

X T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 117: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

117

E-bay Fraud detection

w/ Polo Chau & Shashank Pandit, CMU [www’07]

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 118: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

118

E-bay Fraud detection

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 119: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

119

E-bay Fraud detection

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 120: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

120

E-bay Fraud detection - NetProbe

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 121: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

NetProbe: Key Ideas •  Fraudsters fabricate their reputation by

“trading” with their accomplices •  Transactions form near bipartite cores •  How to detect them?

121 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 122: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

NetProbe: Key Ideas Use ‘Belief Propagation’ and ~heterophily

122

F A H Fraudster

Accomplice Honest

Darker means more likely

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 123: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

NetProbe: Main Results

123 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 124: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

124

Roadmap

•  Patterns in graphs •  Anomaly Detection •  Application: ebay fraud

– How-to: Belief Propagation •  Conclusions

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 125: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Guilt-by-Association Techniques

Given: •  graph and •  few labeled nodes

Find: class (red/green) for rest nodes Assuming: network effects (homophily/ heterophily, etc)

125

red green

F

H A

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 126: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Correspondence  of  Methods

Random Walk with Restarts (RWR) Google Semi-supervised Learning (SSL) Belief Propagation (BP) Bayesian

126 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 127: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Correspondence  of  Methods

Random Walk with Restarts (RWR) ≈ Semi-supervised Learning (SSL) ≈ Belief Propagation (BP)

Method Matrix unknown known RWR [I – c AD-1] × x = (1-c)y SSL [I + a(D - A)] × x = y

FABP [I + a D - c’A] × bh = φh

0 1 0 1 0 1 0 1 0

? 0 1 1

ECML PKDD 2013 Tutorial

127 T. Eliassi-Rad & C. Faloutsos Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms. Danai Koutra, et al PKDD’11

Page 128: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

128

Roadmap

•  Patterns in graphs •  Anomaly Detection •  Application: ebay fraud •  Conclusions

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 129: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

Overall Conclusions •  Roles

– Past work in social networks (‘regular’, ‘structural’, etc)

– Scalable algorithms’ to find such roles •  Anomalies & patterns

– Static (power-laws, ‘six degrees’) – Weighted (super-linearity) – Time-evolving (densification, -1.5 exponent)

129 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 130: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

130

Thank You!

Roles

Features

Anomalies

Patterns

= rare roles

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

[email protected] [email protected]

Page 131: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

T. Eliassi-Rad & C. Faloutsos 131

Project info

Akoglu, Leman

Chau, Polo

Kang, U McGlohon, Mary

Tong, Hanghang

Prakash, Aditya

ECML PKDD 2013 Tutorial

Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; ADAMS-DARPA; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab

www.cs.cmu.edu/~pegasus

Koutra, Danai

Page 132: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

132

References •  Leman Akoglu, Christos Faloutsos: RTG: A Recursive

Realistic Graph Generator Using Random Typing. ECML/PKDD (1) 2009: 13-28

•  Deepayan Chakrabarti, Christos Faloutsos: Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38(1): (2006)

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 133: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

133

References •  Deepayan Chakrabarti, Yang Wang, Chenxi Wang,

Jure Leskovec, Christos Faloutsos: Epidemic thresholds in real networks. ACM Trans. Inf. Syst. Secur. 10(4): (2008)

•  Deepayan Chakrabarti, Jure Leskovec, Christos Faloutsos, Samuel Madden, Carlos Guestrin, Michalis Faloutsos: Information Survival Threshold in Sensor and P2P Networks. INFOCOM 2007: 1316-1324

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 134: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

134

References •  Christos Faloutsos, Tamara G. Kolda, Jimeng Sun:

Mining large graphs and streams using matrix and tensor tools. Tutorial, SIGMOD Conference 2007: 1174

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 135: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

135

References •  T. G. Kolda and J. Sun. Scalable Tensor

Decompositions for Multi-aspect Data Mining. In: ICDM 2008, pp. 363-372, December 2008.

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 136: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

136

References •  Jure Leskovec, Jon Kleinberg and Christos Faloutsos

Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award).

•  Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, Christos Faloutsos: Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication. PKDD 2005: 133-145

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 137: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

137

References •  Jimeng Sun, Yinglian Xie, Hui Zhang, Christos

Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr 2007.

•  Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos, GraphScope: Parameter-free Mining of Large Time-evolving Graphs ACM SIGKDD Conference, San Jose, CA, August 2007

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 138: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

References •  Jimeng Sun, Dacheng Tao, Christos

Faloutsos: Beyond streams and graphs: dynamic tensor analysis. KDD 2006: 374-383

138 T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 139: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

139

References •  Hanghang Tong, Christos Faloutsos, and

Jia-Yu Pan, Fast Random Walk with Restart and Its Applications, ICDM 2006, Hong Kong.

•  Hanghang Tong, Christos Faloutsos, Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial

Page 140: Discovering Roles and Anomalies in Graphs: Theory and …eliassi.org/ecmlpkdd13tut/ecmlpkdd13tut_roles_part2... · 2013. 9. 22. · 11 Roadmap • Patterns in graphs – Overview

140

References •  Hanghang Tong, Christos Faloutsos, Brian

Gallagher, Tina Eliassi-Rad: Fast best-effort pattern matching in large attributed graphs. KDD 2007: 737-746

T. Eliassi-Rad & C. Faloutsos ECML PKDD 2013 Tutorial