SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker:...

38
SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008 Fast Mining of Complex Time-Stamped Events
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    3

Transcript of SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker:...

Page 1: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Joint Work by

Hanghang Tong, Yasushi Sakurai,

Tina Eliassi-Rad, Christos Faloutsos

Speaker: Hanghang Tong

Oct. 26-30, 2008, Napa, CA CIKM 2008

Fast Mining of Complex Time-Stamped Events

Page 2: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

A Motivating Example: InputsTime Event(e.g., Session) EntityOct. 26 Link Analysis Tom, Bob

Clustering Bob, Alan

Oct. 27 Classification Bob, Alan

Anomaly Detection Alan, Beck

Oct. 28 Party Beck, Dan

Oct. 29 Web Search Dan, Jack

Advertising Jack, Peter

Oct. 30 Enterprise Search Jack, Peter

Oct. 31 Q & A Peter, Smith2

Page 3: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Time Cluster, rep. entities: b7,b6, b8

A Motivating Example: Outputs

Jack

Oct. 29

Oct. 30Oct. 30

Oct. 28

Oct. 26

Oct. 27

Time ClusterRep. Entities:

``Jack’’, ``Peter’’, ``Smith’’

Abnormal TimeRep. Entities:

``Beck’’ ``Dan’’

Time ClusterRep. Entities:

``Tom’’, ``Bob’’,``Alan’’

1st eigen-vector

2n

d eig

en-vector

Page 4: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Problem Definitions: (How to Understand Time in such complex context)

• Given datasets collected at different time stamps;

• Find– Q1: Time Clusters

– Q2: Abnormal Time stamps

– Q3: Interpretations

– Q4: Right time granularity

4

Page 5: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Roadmap• Motivation

• T3: Single Resolution Analysis

• MT3: Multi Resolution Analysis

• Experimental Evaluations

• Conclusion

5

Page 6: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

T3: Single Resolution Analysis

• Given the data sets collected at different time stamps…

• Find –(1) Clusters for time stamps

–(2) Abnormal time stamps

–(3) Interpretations

6

Page 7: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

How to represent the data sets?Time Event(e.g., Session) EntityOct. 26 Link Analysis Tom, Bob

Clustering Bob, Alan

Oct. 27 Classification Bob, Alan

Anomaly Detection Alan, Beck

Oct. 28 Party Beck, Dan

Oct. 29 Web Search Dan, Jack

Advertising Jack, Peter

Oct. 30 Enterprise Search Jack, Peter

Oct. 31 Q & A Peter, Smith7

Page 8: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

A: Graph Representation!

Oct. 26, 2008

Oct. 27, 2008

Oct. 28, 2008

Oct. 29, 2008

Oct. 30, 2008

Oct. 31, 2008

Link Analysis

Clustering

Classification

Anomaly Dect.

Party

Web Search

Advertising

En. Search

Q & A

Tom

Bob

Alan

Beck

Dan

Jack

Peter

Smith8

Page 9: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

A: Graph Representation!

Oct. 26, 2008

Oct. 27, 2008

Oct. 28, 2008

Oct. 29, 2008

Oct. 30, 2008

Oct. 31, 2008

Link Analysis

Clustering

Classification

Anomaly Dect.

Party

Web Search

Advertising

En. Search

Q & A

Tom

Bob

Alan

Beck

Dan

Jack

Peter

Smith9

Prof.

CEO

Stu.

Page 10: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Qs: Given the graph,

• How to cluster time nodes?

• How to spot abnormal time nodes?

• How to interpret?

10

Page 11: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Q1: How to cluster time nodes?

• Step 1: Time-To-Time (TT) proximity matrix

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Oct

. 26

Oct

. 27

Oct

. 28

Oct

. 29

Oct

. 30

Oct

. 31

11

Page 12: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Q1: How to cluster time nodes?

• Step 2: Cluster time nodes by TT matrix– Spectral Cluster Alg. (and a lot of others)

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Oct

. 26

Oct

. 27

Oct

. 28

Oct

. 29

Oct

. 30

Oct

. 31

12

Page 13: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Q2: how to find abnormal time node?

• Abnormal time = Time cluster with singleton

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Oct

. 26

Oct

. 27

Oct

. 28

Oct

. 29

Oct

. 30

Oct

. 31

Oct. 28 is abnormal! 13

Page 14: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Q3: How to interpret?

• Step 1: Time-to-People (TP) proximity matrix

.9 .8 .8 .01 .01 .01 .01 .3

.5 .8 .8 .5 .01 .01 .01 .3

.2 .2 .2 .9 .9 .01 .01 .3

.01 .01 .01 .01 .5 .8 .8 .8

.01 .01 .01 .01 .01 .8 .8 .8

.01 .01 .01 .01 .01 .8 .8 .8

Tom

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Bob

Ala

n

Bec

k

Dan Jack

Pete

rSm

ith

e.g., we want to use people to interpret time cluster/anomaly14

Page 15: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Q3: How to interpret?

• Step 2: Time Cluster-to-People (TCP) matrix

.9 .8 .8 .01 .01 .01 .01 .3

.5 .8 .8 .5 .01 .01 .01 .3

.2 .2 .2 .9 .9 .2 .2 .3

.01 .01 .01 .01 .5 .8 .8 .8

.01 .01 .01 .01 .01 .8 .8 .8

.01 .01 .01 .01 .01 .8 .8 .8

Tom

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Bob

Ala

n

Bec

k

Dan Jack

Pete

rSm

ith

e.g., we want to use people to interpret time cluster/anomaly15

Page 16: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Q3: How to interpret?

• Step 2: Time Cluster-to-People (TCP) matrix

Tom

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Bob

Ala

n

Bec

k

Dan Jack

Pete

rSm

ith

e.g., we want to use people to interpret time cluster/anomaly16

Page 17: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Q3: How to interpret?• Step 3: Find `unique’ entity nodes

Tom

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Bob Ala

n

Bec

k

Dan Jack

Pete

rSm

ith

.9

.8

.8

e.g., “Bob is close to green cluster on average, but far away from both red & blue clusters”

Page 18: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Summary So Far…

• Given the data sets collected at different time stamps,

• We– Construct a graph representation– Get two proximity matrices– Find time clusters/abnormal time stamps– Provide the interpretations.

Q: How to get proximity matrices ? 18

Page 19: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

How to get proximity matrices ? (i.e., TT/TP matrices)

A BH1 1

D1 1

E

F

G1 11

I J1

1 1

a.k.a Relevance, Closeness, ‘Similarity’… 19

e.g., Oct. 28, 2008e.g., Oct. 28, 2008Or, ``John Smith’’

Page 20: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

What is a ``good’’ Proximity?

A BH1 1

D1 1

E

F

G1 11

I J1

1 1

• Multiple Connections/paths

• Quality of connection

•Direct & In-directed Conns

•Length, Degree, Weight…

20

Page 21: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

1

4

3

2

56

7

910

8

11

12

Random walk with restart

21

Page 22: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Random walk with restart

Node 4

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12

0.130.100.130.220.130.050.050.080.040.030.040.02

1

4

3

2

56

7

910

811

120.13

0.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

Ranking vector More red, more relevant

Nearby nodes, higher scores

4r

22

Page 23: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Computing RWR

1

43

2

5 6

7

9 10

811

12

0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0

0.13

0.22

0.13

0.050.9

0.05

0.08

0.04

0.03

0.04

0.02

0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 1/4 0 0 0 0 0 0 0

0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0 0 0 1/4 0 1/3 0 1/2

0 0 0 0 0 0 0 0 0 1/3 1/3 0

0.13 0

0.10 0

0.13 0

0.22

0.13 0

0.05 00.1

0.05 0

0.08 0

0.04 0

0.03 0

0.04 0

2 0

1

0.0

n x n n x 1n x 1

Ranking vector Starting vectorAdjacency matrix

1

(1 )i i ir cWr c e

Restart p

A lot of techniques exist to solve this, - e.g., Iterative method

23

Page 24: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Roadmap• Motivation

• T3: Single Resolution Analysis

• MT3: Multi Resolution Analysis

• Experimental Evaluations

• Conclusion

24

Page 25: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

MT3: Multiple Resolution Analysis

• Given – (1) the data sets collected at different time

stamps; – (2) different time resolutions

• Find – (1) Clusters for time stamps– (2) Abnormal time stamps– (3) Interpretations

• At each of the given resolutions, efficiently.25

Page 26: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

• Given

• We want to …– (At the Finest Res.) Mine & Interpret

• `Oct 26’, `Oct 27’, `Oct 28’, `Oct 29’, `Oct 30’, `Oct 31’

– (At the coarser Res.) Mine & Interpret • `Oct 26-27’, `Oct 28-29’, `Oct 30-31’

MT3:an example

26

Page 27: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Outputs

At the finest resolution At the coaser resolution

27

Page 28: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

MT3: How to (Naïve Solution)

TT

TP

Time Cluster& Anomaly

Annotations/interpretations

TT

TP

Time Cluster& Anomaly

~

~28

Annotations/interpretations

Page 29: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Challenges

• Given the mining results at the finest resolution,

• How to speed up the analysis at the coarser resolutions?

29

Page 30: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

MT3: Observation

A lot of overlap between two graphs !

for finest resolution for coarser resolution

Page 31: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

MT3: Solution

TT TP TTTP

~ ~31

Page 32: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Roadmap• Motivation

• T3: Single Resolution Analysis

• MT3: Multi Resolution Analysis

• Experimental Evaluations

• Conclusion

32

Page 33: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Data Sets• CIKM: from CIKM proceedings

• Time: Publication year (1993-2007, 15)• Event: Paper-published (952)• Entities: Author (1895) & Session (279)• Attribute: Keyword (158)

• DeviceScan: from MIT Reality Mining• Time: the day scanning happened

• 1/1/2004-5/5/2005, 294• Event: blue tooth device scanning person

(114, 046)• Entities: Device (103) & Person (97)• Attribute: NA

33

Page 34: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

T3 on `CIKM’ Data Set Rep. Authors Rep. Keywords

James. P. CallanW. Bruce CroftJames AllanPhilip S. Yu

George KarypisCharles Clarke

WebCluster

ClassificationXML

LanguageStream

Rep. Authors Rep. Keywords

Elke RundensteinerDaniel MirankerAndreas Henrich

Il-Yeol SongScott B Huffman

Robert J. Hall

KnowledgeSystem

UnstructuredRule

Object-orientedDeductive 34

Page 35: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

MT3 on `DeviceScan’ Data Set

Aggregate by Month

Apr. 2004 is anomaly

Aggregate by Day

Work day

Semester Break & Holiday

35

Page 36: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Evaluation on Speed of MT3

Aggregation Length

Log Time (Sec.)

MT3

Naïve Sol.

DeviceScan Data Set120x speed up

36

Page 37: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Conclusion• T3: Single Resolution Analysis

• Graph Representation• Using Proximity to

• Find Time Cluster/Anomaly• Provide Interpretations

• MT3: Multiple Resolution Analysis• Redundancy among different resolutions• Up to 2 orders of magnitude speedup

• (same quality)37

Page 38: SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

SCS CMU

Thank you!

[email protected]

38