SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker:...
-
date post
21-Dec-2015 -
Category
Documents
-
view
220 -
download
3
Transcript of SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker:...
SCS CMU
Joint Work by
Hanghang Tong, Yasushi Sakurai,
Tina Eliassi-Rad, Christos Faloutsos
Speaker: Hanghang Tong
Oct. 26-30, 2008, Napa, CA CIKM 2008
Fast Mining of Complex Time-Stamped Events
SCS CMU
A Motivating Example: InputsTime Event(e.g., Session) EntityOct. 26 Link Analysis Tom, Bob
Clustering Bob, Alan
Oct. 27 Classification Bob, Alan
Anomaly Detection Alan, Beck
Oct. 28 Party Beck, Dan
Oct. 29 Web Search Dan, Jack
Advertising Jack, Peter
Oct. 30 Enterprise Search Jack, Peter
Oct. 31 Q & A Peter, Smith2
SCS CMU
Time Cluster, rep. entities: b7,b6, b8
A Motivating Example: Outputs
Jack
Oct. 29
Oct. 30Oct. 30
Oct. 28
Oct. 26
Oct. 27
Time ClusterRep. Entities:
``Jack’’, ``Peter’’, ``Smith’’
Abnormal TimeRep. Entities:
``Beck’’ ``Dan’’
Time ClusterRep. Entities:
``Tom’’, ``Bob’’,``Alan’’
1st eigen-vector
2n
d eig
en-vector
SCS CMU
Problem Definitions: (How to Understand Time in such complex context)
• Given datasets collected at different time stamps;
• Find– Q1: Time Clusters
– Q2: Abnormal Time stamps
– Q3: Interpretations
– Q4: Right time granularity
4
SCS CMU
Roadmap• Motivation
• T3: Single Resolution Analysis
• MT3: Multi Resolution Analysis
• Experimental Evaluations
• Conclusion
5
SCS CMU
T3: Single Resolution Analysis
• Given the data sets collected at different time stamps…
• Find –(1) Clusters for time stamps
–(2) Abnormal time stamps
–(3) Interpretations
6
SCS CMU
How to represent the data sets?Time Event(e.g., Session) EntityOct. 26 Link Analysis Tom, Bob
Clustering Bob, Alan
Oct. 27 Classification Bob, Alan
Anomaly Detection Alan, Beck
Oct. 28 Party Beck, Dan
Oct. 29 Web Search Dan, Jack
Advertising Jack, Peter
Oct. 30 Enterprise Search Jack, Peter
Oct. 31 Q & A Peter, Smith7
SCS CMU
A: Graph Representation!
Oct. 26, 2008
Oct. 27, 2008
Oct. 28, 2008
Oct. 29, 2008
Oct. 30, 2008
Oct. 31, 2008
Link Analysis
Clustering
Classification
Anomaly Dect.
Party
Web Search
Advertising
En. Search
Q & A
Tom
Bob
Alan
Beck
Dan
Jack
Peter
Smith8
SCS CMU
A: Graph Representation!
Oct. 26, 2008
Oct. 27, 2008
Oct. 28, 2008
Oct. 29, 2008
Oct. 30, 2008
Oct. 31, 2008
Link Analysis
Clustering
Classification
Anomaly Dect.
Party
Web Search
Advertising
En. Search
Q & A
Tom
Bob
Alan
Beck
Dan
Jack
Peter
Smith9
Prof.
CEO
Stu.
SCS CMU
Qs: Given the graph,
• How to cluster time nodes?
• How to spot abnormal time nodes?
• How to interpret?
10
SCS CMU
Q1: How to cluster time nodes?
• Step 1: Time-To-Time (TT) proximity matrix
Oct. 26
Oct. 27
Oct. 28
Oct. 29
Oct. 30
Oct. 31
Oct
. 26
Oct
. 27
Oct
. 28
Oct
. 29
Oct
. 30
Oct
. 31
11
SCS CMU
Q1: How to cluster time nodes?
• Step 2: Cluster time nodes by TT matrix– Spectral Cluster Alg. (and a lot of others)
Oct. 26
Oct. 27
Oct. 28
Oct. 29
Oct. 30
Oct. 31
Oct
. 26
Oct
. 27
Oct
. 28
Oct
. 29
Oct
. 30
Oct
. 31
12
SCS CMU
Q2: how to find abnormal time node?
• Abnormal time = Time cluster with singleton
Oct. 26
Oct. 27
Oct. 28
Oct. 29
Oct. 30
Oct. 31
Oct
. 26
Oct
. 27
Oct
. 28
Oct
. 29
Oct
. 30
Oct
. 31
Oct. 28 is abnormal! 13
SCS CMU
Q3: How to interpret?
• Step 1: Time-to-People (TP) proximity matrix
.9 .8 .8 .01 .01 .01 .01 .3
.5 .8 .8 .5 .01 .01 .01 .3
.2 .2 .2 .9 .9 .01 .01 .3
.01 .01 .01 .01 .5 .8 .8 .8
.01 .01 .01 .01 .01 .8 .8 .8
.01 .01 .01 .01 .01 .8 .8 .8
Tom
Oct. 26
Oct. 27
Oct. 28
Oct. 29
Oct. 30
Oct. 31
Bob
Ala
n
Bec
k
Dan Jack
Pete
rSm
ith
e.g., we want to use people to interpret time cluster/anomaly14
SCS CMU
Q3: How to interpret?
• Step 2: Time Cluster-to-People (TCP) matrix
.9 .8 .8 .01 .01 .01 .01 .3
.5 .8 .8 .5 .01 .01 .01 .3
.2 .2 .2 .9 .9 .2 .2 .3
.01 .01 .01 .01 .5 .8 .8 .8
.01 .01 .01 .01 .01 .8 .8 .8
.01 .01 .01 .01 .01 .8 .8 .8
Tom
Oct. 26
Oct. 27
Oct. 28
Oct. 29
Oct. 30
Oct. 31
Bob
Ala
n
Bec
k
Dan Jack
Pete
rSm
ith
e.g., we want to use people to interpret time cluster/anomaly15
SCS CMU
Q3: How to interpret?
• Step 2: Time Cluster-to-People (TCP) matrix
Tom
Oct. 26
Oct. 27
Oct. 28
Oct. 29
Oct. 30
Oct. 31
Bob
Ala
n
Bec
k
Dan Jack
Pete
rSm
ith
e.g., we want to use people to interpret time cluster/anomaly16
SCS CMU
Q3: How to interpret?• Step 3: Find `unique’ entity nodes
Tom
Oct. 26
Oct. 27
Oct. 28
Oct. 29
Oct. 30
Oct. 31
Bob Ala
n
Bec
k
Dan Jack
Pete
rSm
ith
.9
.8
.8
e.g., “Bob is close to green cluster on average, but far away from both red & blue clusters”
SCS CMU
Summary So Far…
• Given the data sets collected at different time stamps,
• We– Construct a graph representation– Get two proximity matrices– Find time clusters/abnormal time stamps– Provide the interpretations.
Q: How to get proximity matrices ? 18
SCS CMU
How to get proximity matrices ? (i.e., TT/TP matrices)
A BH1 1
D1 1
E
F
G1 11
I J1
1 1
a.k.a Relevance, Closeness, ‘Similarity’… 19
e.g., Oct. 28, 2008e.g., Oct. 28, 2008Or, ``John Smith’’
SCS CMU
What is a ``good’’ Proximity?
A BH1 1
D1 1
E
F
G1 11
I J1
1 1
• Multiple Connections/paths
• Quality of connection
•Direct & In-directed Conns
•Length, Degree, Weight…
…
20
SCS CMU
1
4
3
2
56
7
910
8
11
12
Random walk with restart
21
SCS CMU
Random walk with restart
Node 4
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12
0.130.100.130.220.130.050.050.080.040.030.040.02
1
4
3
2
56
7
910
811
120.13
0.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
Ranking vector More red, more relevant
Nearby nodes, higher scores
4r
22
SCS CMU
Computing RWR
1
43
2
5 6
7
9 10
811
12
0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0
0.13
0.22
0.13
0.050.9
0.05
0.08
0.04
0.03
0.04
0.02
0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4 0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0 0 0 1/4 0 1/3 0 1/2
0 0 0 0 0 0 0 0 0 1/3 1/3 0
0.13 0
0.10 0
0.13 0
0.22
0.13 0
0.05 00.1
0.05 0
0.08 0
0.04 0
0.03 0
0.04 0
2 0
1
0.0
n x n n x 1n x 1
Ranking vector Starting vectorAdjacency matrix
1
(1 )i i ir cWr c e
Restart p
A lot of techniques exist to solve this, - e.g., Iterative method
23
SCS CMU
Roadmap• Motivation
• T3: Single Resolution Analysis
• MT3: Multi Resolution Analysis
• Experimental Evaluations
• Conclusion
24
SCS CMU
MT3: Multiple Resolution Analysis
• Given – (1) the data sets collected at different time
stamps; – (2) different time resolutions
• Find – (1) Clusters for time stamps– (2) Abnormal time stamps– (3) Interpretations
• At each of the given resolutions, efficiently.25
SCS CMU
• Given
• We want to …– (At the Finest Res.) Mine & Interpret
• `Oct 26’, `Oct 27’, `Oct 28’, `Oct 29’, `Oct 30’, `Oct 31’
– (At the coarser Res.) Mine & Interpret • `Oct 26-27’, `Oct 28-29’, `Oct 30-31’
MT3:an example
26
SCS CMU
Outputs
At the finest resolution At the coaser resolution
27
SCS CMU
MT3: How to (Naïve Solution)
TT
TP
Time Cluster& Anomaly
Annotations/interpretations
TT
TP
Time Cluster& Anomaly
~
~28
Annotations/interpretations
SCS CMU
Challenges
• Given the mining results at the finest resolution,
• How to speed up the analysis at the coarser resolutions?
29
SCS CMU
MT3: Observation
A lot of overlap between two graphs !
for finest resolution for coarser resolution
SCS CMU
MT3: Solution
TT TP TTTP
~ ~31
SCS CMU
Roadmap• Motivation
• T3: Single Resolution Analysis
• MT3: Multi Resolution Analysis
• Experimental Evaluations
• Conclusion
32
SCS CMU
Data Sets• CIKM: from CIKM proceedings
• Time: Publication year (1993-2007, 15)• Event: Paper-published (952)• Entities: Author (1895) & Session (279)• Attribute: Keyword (158)
• DeviceScan: from MIT Reality Mining• Time: the day scanning happened
• 1/1/2004-5/5/2005, 294• Event: blue tooth device scanning person
(114, 046)• Entities: Device (103) & Person (97)• Attribute: NA
33
SCS CMU
T3 on `CIKM’ Data Set Rep. Authors Rep. Keywords
James. P. CallanW. Bruce CroftJames AllanPhilip S. Yu
George KarypisCharles Clarke
WebCluster
ClassificationXML
LanguageStream
Rep. Authors Rep. Keywords
Elke RundensteinerDaniel MirankerAndreas Henrich
Il-Yeol SongScott B Huffman
Robert J. Hall
KnowledgeSystem
UnstructuredRule
Object-orientedDeductive 34
SCS CMU
MT3 on `DeviceScan’ Data Set
Aggregate by Month
Apr. 2004 is anomaly
Aggregate by Day
Work day
Semester Break & Holiday
35
SCS CMU
Evaluation on Speed of MT3
Aggregation Length
Log Time (Sec.)
MT3
Naïve Sol.
DeviceScan Data Set120x speed up
36
SCS CMU
Conclusion• T3: Single Resolution Analysis
• Graph Representation• Using Proximity to
• Find Time Cluster/Anomaly• Provide Interpretations
• MT3: Multiple Resolution Analysis• Redundancy among different resolutions• Up to 2 orders of magnitude speedup
• (same quality)37