Abdullah Mueen Eamonn Keogh University of California, Riverside
description
Transcript of Abdullah Mueen Eamonn Keogh University of California, Riverside
Abdullah Mueen Eamonn KeoghUniversity of California, Riverside
Online Discovery and Maintenance
of Time Series Motifs
Time Series Motifs
• Repeated pattern in a time series
• Utility:– Activity/Event discovery– Summarization– Classification
• Application– Data Center Chiller Management (Patnaik, et al. KDD 09)– Designing Smart Cane for Elders (Vahdatpour, et al. IJCAI 09)
0 2000 4000 6000 80000
10
20
30
Online Time Series Motifs
• Streaming time series• Sliding window of the recent history
– What minute long trace repeated in the last hour?
Problem FormulationDiscovery
100 200 300 400 500 600 700 800 900 1000
-8000
-7500
-7000
. . .
12345678...
873
The most similar pair of non-overlappingsubsequences
Maintenance
12345678...
873874
time:100112345678...
873874875
time:100212345678...
873874875876
time:100312345678...
873874875876877
time:100412345678...
873874875876877878
time:1005
Update motif pair afterevery time tick
time
Challenge
• A subsequence is a high dimensional point– The dynamic closest pair of points problem
• Closest pair may change upon every update• Naïve approach: Do quadratic comparisons.
4
5
2
7
3
6
89
4
5
2
7
3
6
8
1
4
5
2
7
3
6
8
1
Deletion Insertion
Our Approach
• Goal: Algorithm with Linear update time• Previous method for dynamic closest pair
(Eppstein,00) – A matrix of all-pair distances is maintained
• O(w2) space required
– Quad-tree is used to update the matrix• Maintain a set of neighbors and reverse
neighbors for all points• We do it in O( ) spaceww
Outline
• Methods• Evaluation• Extensions• Case Studies• Conclusion
Maintaining Motif• Smallest nearest neighbor → Closest pair• Upon insertion
– Find the nearest neighbor; Needs O(w) comparisons.
• Upon deletion– Find the next NN of all the reverse NN
4
5
2
7
3
6
8
1
4
5
2
7
3
6
8
1
Insertion
9
4
5
2
7
3
6
8
1
Deletion
9
Data Structure
1 2 3 4 5 6 7 8FIFO Sliding window
N-listsNeighbors in
order of distances
8 7 21 8 1 24
5 1 17 3 3 67
7 6 75 2 4 55
1 4 83 7 5 72
4 8 42 5 2 13
6 1 38 1 8 38
3 5 66 4 6 46
R-listsPoints that should be updated
Total number of nodes in R-lists is w
5 1 3 24
8 67
4
5
2
7
3
6
8
1
4
7
Still O(w2) space
(ptr, distance)
ObservationsWhile inserting
Updating NN of old points is not necessary A point can be removed from the neighbor list if it
violates the temporal order
Averagelength O( )w
Example4
5
2
7
3
6
8
1
1 1 21 3 1 2
52
83
2 43 5 3 6
4 7
5
6
4
7
6
1 2 3 4 5 6 7 8
2 3 32 3 2 6
3 7 9
5
54 4 6 8
5 7
6
68
4
2 3 4 5 6 7 8 91
4
5
2
7
3
6
8
1
9
4
5
2
7
3
6
89
10
3 34 3 6 6
57 9
54 4 7 8
5
6
4
6 8
5
7
8
9
3 4 5 6 7 8 9 1021
10
Evaluation
0
0.02
0.04
0.06
0.08
0.1
0.12
0.140.16
0.18
Fast Pair Insect
EEGEOGRW
Ave
rage
Upd
ate
Tim
e (S
ec)
0 0.5 1 1.5 2 2.5 3 3.5 4x 104
Window Size (w)0 200 400 600 800 1000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Insect
EEG
EOGRW
Fast Pair
Ave
rage
Upd
ate
Tim
e (S
ec)
Motif Length (m)
8x
0 0.5 1 1.5 2 2.5 3 3.5 4x 104
102030405060708090100110
Insect
EEG
EOG
RW
Ave
rage
Len
gth
of N
-lis
t
Window Size (w)
•Up to 8x speedup from general dynamic closest pair• Stable space cost per point with increasing window size
0 200 400 600 800 10000
50
100
150
200
250
300
350
InsectEEG
EOG
RW
Motif Length (m)
Ave
rage
Len
gth
of N
-lis
t
Outline
• Methods• Evaluation• Extensions• Case Studies• Conclusion
Extension
1 1 21 3 1 2
52
83
2 43 5 3 6
4 5
5 7
6
4
7
6
1 2 3 4 5 6 7 8
1 1 21 3 1 2
52
83
2 43 5 3 6
4 5
5 7
6
4
7
6
1 2 3 4 5 6 7 8
1 1 21 3 1 2
52
83
2 43 5 3 6
4 5
5 7
6
4
7
6
1 2 3 4 5 6 7 8
• Multidimensional Motif• Maintain motif in Multiple Synchronous time series• Points in one series can have neighbors in others
• Arbitrary Data Rate• Load shedding: Skip points if can’t handle
0.40.50.60.70.80.91.0
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Fraction of Data Used
Fraction of M
otifs Discovered
EOGRandom Walk
EEGInsect FIFO for each streams
Case Study 1: Online Compression
• Replace all the occurrences of a motif by a pointer to that motif.
• For signals with regular repetitions– Higher compression rate with less error.
0 500 1000 1500 2000
Raw
PLA
MR
A
BA AB B B B B
Dictionary
Case Study 2: Closing The Loop
• Check if a robot closed a loop• Convert the stream of video frames to stream
of color histograms.• Most similar color histograms are good
candidates for loop detection.
0 400 800 1200 16000481216
0 400 800 1200 16000481216
Start Loop Closure Detected
Conclusion
• First attempt to maintain time series motif online– Maintains minutes long repetition in hours long
sliding window• Linear update time with less space cost than
quad-tree based method– O( ) Vs O(w2)
• Faster than general dynamic closest pair solution
ww
Thank You
Code and Data: http://www.cs.ucr.edu/~mueen/OnlineMotif