Fast D irection- A ware P roximity for Graph Mining

40
2007-8-13 KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos

description

Fast D irection- A ware P roximity for Graph Mining. Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos. Proximity on Graph. Un-directed graph What is Prox between A and B ‘ how close is Smith to Johnson ’? But, many real graphs are directed…. - PowerPoint PPT Presentation

Transcript of Fast D irection- A ware P roximity for Graph Mining

Page 1: Fast D irection- A ware P roximity for Graph Mining

2007-8-13 KDD 2007, San Jose

Fast Direction-Aware Proximity for Graph Mining

Speaker: Hanghang Tong

Joint work w/ Yehuda Koren, Christos Faloutsos

Page 2: Fast D irection- A ware P roximity for Graph Mining

2

Proximity on Graph

• Un-directed graph– What is Prox between A and B– ‘how close is Smith to Johnson’?

But, many real graphs are directed….

A B

1 1

1

111

1

Page 3: Fast D irection- A ware P roximity for Graph Mining

3

Edge Direction w/ Proximity

A B

1 1

1

111

1A B

1 1

1

10.51

0.5

What is Prox from A to B?What is Prox from B to A?

Page 4: Fast D irection- A ware P roximity for Graph Mining

4

Motivating Questions (Fast DAP)

• Q1: How to define it?

• Q2: How to compute it efficiently?

• Q3: How to benefit real applications?

Page 5: Fast D irection- A ware P roximity for Graph Mining

5

Roadmap

• DAP definitions– Escape Probability– Issue # 1: ‘degree-1 node’ effect– Issue # 2: weakly connected pair

• Computational Issues– FastAllDAP: ALL pairs– FastOneDAP: One pair

• Experimental Results• Conclusion

Page 6: Fast D irection- A ware P roximity for Graph Mining

6

Defining DAP: escape probability

• Define Random Walk (RW) on the graph• Esc_Prob(AB)

– Prob (starting at A, reaches B before returning to A)

Esc_Prob = Pr (smile before cry)

A Bthe remaining graph

Page 7: Fast D irection- A ware P roximity for Graph Mining

7

Esc_Prob: Example

Esc_Prob(a->b)=1 > Esc_Prob(b->a)=0.5

A B

1 1

1

10.51

0.5

Page 8: Fast D irection- A ware P roximity for Graph Mining

8

Esc_Prob is good, but…

• Issue #1: – `Degree-1 node’ effect

• Issue #2:– Weakly connected pair

Need some practical modifications!

Page 9: Fast D irection- A ware P roximity for Graph Mining

9

Issue#1: `degree-1 node’ effect[Faloutsos+] [Koren+]

• no influence for degree-1 nodes (E, F)!– known as ‘pizza delivery guy’ problem in

undirected graph

• Solutions: Universal Absorbing Boundary!

A BD1 1

A BD1 1/3

E F

1/31/311

Esc_Prob(a->b)=1

Esc_Prob(a->b)=1

Page 10: Fast D irection- A ware P roximity for Graph Mining

10

Universal Absorbing Boundary

U-A-B is a black-hole!

A BD1 1

U-A-B

Footnote: fly-out probability = 0.1

A BD0.9 0.9

U-A-B0.1

0.1

0.1

1

Page 11: Fast D irection- A ware P roximity for Graph Mining

11

Introducing Universal-Absorbing-Boundary

A BD0.9 0.9

U-A-B0.1

0.1

0.1

A BD0.9 0.3

E F

0.30.30.90.9

U-A-B

0.1

0.10.10.10.1

Prox(a->b)=0.91

Prox(a->b)=0.74

A BD1 1

A BD1 1/3

E F

1/31/311

Footnote: fly-out probability = 0.1

Esc_Prob(a->b)=1

Esc_Prob(a->b)=1

Page 12: Fast D irection- A ware P roximity for Graph Mining

12

Issue#2: Weakly connected pair

A B1 1 1

wi j

Prox(AB) = Prox (BA)=0

Solution: Partial symmetry!

a w

i j

(1-a) w

.

.

Page 13: Fast D irection- A ware P roximity for Graph Mining

13

Practical Modifications: Partial Symmetry

A B1 1 1

Prox(AB) = Prox (BA)=0

A B0.9 0.9 0.9

0.1 0.1 0.1

Prox(AB) =0.081 > Prox (BA)=0.009

Page 14: Fast D irection- A ware P roximity for Graph Mining

14

Roadmap

• DAP definitions– Escape Probability– Issue # 1: ‘degree-1 node’ effect– Issue # 2: weakly connected pair

• Computational Issues– FastAllDAP: ALL pairs– FastOneDAP: One pair

• Experimental Results• Conclusion

Page 15: Fast D irection- A ware P roximity for Graph Mining

15

Solving Esc_Prob: [Doyle+]

P: transition matrix (row norm.)n: # of nodes in the graph

1 x (n-2) 1 x (n-2)(n-2) x (n-2)

One matrix inversion , one Esc_Prob!

i^th row removing i^th & j^th elements

P removing i^th & j^th rows & cols

i^th col removing i^th & j^th elements

Page 16: Fast D irection- A ware P roximity for Graph Mining

18

Esc_Prob(1->5) =

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

P=

I - +

-1

1 5

3

2

6

4

0.5 0.5

0.5

0.50.5

0.5

0.5

1

0.5 1

P: Transition matrix (row norm.)

Page 17: Fast D irection- A ware P roximity for Graph Mining

19

Solving DAP (Straight-forward way)

One matrix inversion, one proximity!

2 1,

ˆProx( )=c ( )ti j i ji j p I cP p c p

1 x (n-2) 1 x (n-2)(n-2) x (n-2)

1-c: fly-out probability (to black-hole)

Page 18: Fast D irection- A ware P roximity for Graph Mining

20

• Case 1, Medium Size Graph– Matrix inversion is feasible, but…– What if we want many proximities?– Q: How to get all (n ) proximities efficiently?– A: FastAllDAP!

• Case 2: Large Size Graph – Matrix inversion is infeasible– Q: How to get one proximity efficiently?– A: FastOneDAP!

Challenges

2

Page 19: Fast D irection- A ware P roximity for Graph Mining

21

FastAllDAP

• Q1: How to efficiently compute all possible proximities on a medium size graph?– a.k.a. how to efficiently solve multiple

linear systems simultaneously?

• Goal: reduce # of matrix inversions!

Page 20: Fast D irection- A ware P roximity for Graph Mining

22

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

FastAllDAP: Observation

1 5

3

2

6

4

0.5 0.5

0.5

0.50.5

0.5

0.5

1

0.5 1

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

Need two different matrix inversions!

P=

P=

Page 21: Fast D irection- A ware P roximity for Graph Mining

23

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

FastAllDAP: Rescue

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

Redundancy among different linear systems!

P=

P=

Overlap between two gray parts!

Prox(1 5)

Prox(1 6)

Page 22: Fast D irection- A ware P roximity for Graph Mining

24

FastAllDAP: Theorem

• Theorem:

• Proof: by SM Lemma

• Example:

Page 23: Fast D irection- A ware P roximity for Graph Mining

25

FastAllDAP: Algorithm

• Alg.– Compute Q– For i,j =1,…, n, compute

• Computational Save O(1) instead of O(n )!

• Example– w/ 1000 nodes, – 1m matrix inversion vs. 1 matrix!

2

Page 24: Fast D irection- A ware P roximity for Graph Mining

26

FastOneDAP

• Q1: How to efficiently compute one single proximity on a large size graph?– a.k.a. how to solve one linear system

efficiently?

• Goal: avoid matrix inversion!

Page 25: Fast D irection- A ware P roximity for Graph Mining

27

FastOneDAP: Observation

1 5

3

2

6

4

0.5 0.5

0.5

0.50.5

0.5

0.5

1

0.5 1

Partial Info. (4 elements /2 cols ) of Q is enough!

Page 26: Fast D irection- A ware P roximity for Graph Mining

28

FastOneDAP: Observation

• Q: How to compute one column of Q?• A: Taylor expansion

Reminder:

i col of Qth

[0, …0, 1, 0, …, 0]T

Page 27: Fast D irection- A ware P roximity for Graph Mining

29

FastOneDAP: Observation

x x x

Sparse matrix-vector multiplications!

….

i col of Qth

[0, …0, 1, 0, …, 0]T

Page 28: Fast D irection- A ware P roximity for Graph Mining

30

FastOneDAP: Iterative Alg.

• Alg. to estimate i Col of Qth

Page 29: Fast D irection- A ware P roximity for Graph Mining

31

FastOneDAP: Property

• Convergence Guaranteed !

• Computational Save– Example:

• 100K nodes and 1M edges (50 Iterations)• 10,000,000x fast!

• Footnote: 1 col is enough! – (details in paper)

Page 30: Fast D irection- A ware P roximity for Graph Mining

32

Roadmap

• DAP definitions– Escape Probability– Issue # 1: ‘degree-1 node’ effect– Issue # 2: weakly connected pair

• Computational Issues– FastAllDAP: ALL pairs– FastOneDAP: One pair

• Experimental Results• Conclusion

Page 31: Fast D irection- A ware P roximity for Graph Mining

33

Datasets (all real)

Name Node # Edge # Directionality

WL 4k 10k A-links to-B

PC 36k 64k Who-contact-whom

EP 76k 509k Who-trust-whom

CN 28k 353k A-cites-B

AE 38k 115k Who-email to-whom

Page 32: Fast D irection- A ware P roximity for Graph Mining

350 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0

0.05

0.1

0.15

0.2

0.25

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18 Link Prediction: existence

no link

with link

density

density

Prox (ij)+Prox (ji)

Prox (ij)+Prox (ji)

DAP is effective to distinguish red and blue!

Page 33: Fast D irection- A ware P roximity for Graph Mining

37

Link Prediction: existence

Dataset Accuracy

WL 65.40%

PC 79.60%

AE 81.51%

CN 86.71%

EP 92.21%

Page 34: Fast D irection- A ware P roximity for Graph Mining

38

Link Prediction: direction

• Q: Given the existence of the link, what is the direction of the link?

• A: Compare prox(ij) and prox(ji)>70%

Prox (ij) - Prox (ji)

density

Page 35: Fast D irection- A ware P roximity for Graph Mining

41

Efficiency: FastAllDAP

Size of Graph

Time (sec)

Straight-Solver

FastAllDAP

1,000xfaster!

Page 36: Fast D irection- A ware P roximity for Graph Mining

42

Efficiency: FastOneDAP

Size of Graph

Time (sec)

FastOneDAP

Straight-Solver

1,0000xfaster!

Page 37: Fast D irection- A ware P roximity for Graph Mining

43

Roadmap

• DAP definitions– Escape Probability– Issue # 1: ‘degree-1 node’ effect– Issue # 2: weakly connected pair

• Computational Issues– FastAllDAP: ALL pairs– FastOneDAP: One pair

• Experimental Results• Conclusion

Page 38: Fast D irection- A ware P roximity for Graph Mining

44

Conclusion (Fast DAP)

• Q1: How to define it?• A1: Esc_Prob + Practical Modifications

• Q2: How to compute it efficiently?• A2: FastAllDAP & FastOneDAP

– (100x – 10,000x faster!)

• Q3: How to benefit real applications?• A3: Link Prediction (existence & direction)

Page 39: Fast D irection- A ware P roximity for Graph Mining

45

More in the paper…• Generalization to group proximity

– Definitions; Fast solutions– ‘How close between/from CEOs and/to

Accountants?’

• More applications– Dir-CePS, attributed-graphs

A C

B

A C

B

A C

B

A C

B

CePS Common descendant

Common ancestor

Descendant of B; & Common ancestor of A and C

...

Page 40: Fast D irection- A ware P roximity for Graph Mining

46

Cupid uses arrows, so does graph mining!

Thank you!www.cs.cmu.edu/~htong