A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1,...
-
date post
22-Dec-2015 -
Category
Documents
-
view
222 -
download
3
Transcript of A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1,...
![Page 1: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/1.jpg)
A Multiresolution Symbolic Representation of Time Series
Vasileios Megalooikonomou1, Qiang Wang1, Guo Li1, Christos Faloutsos2
1Temple University, Philadelphia, USA2Carnegie Mellon University, Pittsburgh, USA
![Page 2: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/2.jpg)
Outline
Background
Methodology
Experimental results
Conclusion
![Page 3: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/3.jpg)
IntroductionTime Sequence:A sequence (ordered collection) of real
values: X = x1, x2,…, xn
Challenges:
• High dimensionality
• High amount of data
• Similarity metric definition
……
![Page 4: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/4.jpg)
Introduction
Goal: To achieve:
• High efficiency• High accuracy
in similarity searches among time series and
in discovering interesting patterns
![Page 5: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/5.jpg)
Introduction
Similarity metric for time series• Euclidean Distance:
most common, sensitive to shifts
• Dynamic Time Warping (DTW):
improving accuracy, but time consuming O(n2)• Envelope-based DTW:
improving time complexity, o(n)
![Page 6: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/6.jpg)
Introduction
Similarity metric for time series
A more intuitive idea:
two series should be considered similar if they have enough non-overlapping time-ordered pairs of subsequences that are similar (Agrawal et al. VLDB, 1995)
![Page 7: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/7.jpg)
Introduction
Dimensionality reduction techniques:• DFT: Discrete Fourier Transform• DWT: Discrete Wavelet Transform• SVD: Singular Vector Decomposition
• APCA: Adaptive Piecewise Constant Approximation
• PAA: Piecewise Aggregate Approximate• SAX: Symbolic Aggregate approXimation• …
![Page 8: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/8.jpg)
Introduction
Suggested Solution: Multiresolution Vector Quantized (MVQ) approximation1) Uses a ‘vocabulary’ of subsequences
2) Takes multiple resolutions into account
3) Unlike wavelets partially ignores the ordering of ‘codewords’
3) Exploits prior knowledge about the data
4) Provides a new distance metric
![Page 9: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/9.jpg)
Background
Methodology
Experimental results
Conclusion
Outline: A Multiresolution Symbolic Representation of Time Series
![Page 10: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/10.jpg)
Methodology
A new framework (four steps):• Create a ‘vocabulary’ of subsequences (codebook)
• Represent time series using codecords• Utilize multiple resolutions
• Employ a new distance metric
![Page 11: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/11.jpg)
Methodology
Codebook s=16
Generation
Series Transformation
Series
Encoding
112100000000100012000100110000001000000012001100100000001100210000010101001100101010000100100011
……
c m d b c a i f a j b bm i n j j a ma I n j m h l d f k o p h c a k o o g c b l p o c c b l h l h n k k k p l c a c g k k g j h h g k g j l p
……
![Page 12: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/12.jpg)
MethodologyCreating a ‘vocabulary’ Frequently
appearing patterns in
subsequences
Frequently appearing patterns in
subsequences
Q: How to create?
A: Use Vector Quantization, in particular, the Generalized Lloyd Algorithm (GLA)
Produces a codebook based on two conditions:
•Nearest neighbor Condition (NNC)
•Centroid condition (CC)
Output:
A codebook with s codewords
![Page 13: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/13.jpg)
Methodology
Representing time series
X = x1, x2,…, xn
f = (f1,f2,…, fs)
is encoded with a new representation
(fi is the frequency of the i th codeword in X)
![Page 14: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/14.jpg)
Methodology
New distance metric:
),(1
1),(
tqdistqSHM
s
i qiti
qiti
ff
fftqdis
1 ,,
,,
1),(
The histogram model is used to calculate similarity at each resolution level:
with
![Page 15: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/15.jpg)
Methodology
Time series summarization:• High level information (frequently appearing patterns) is more useful
• The new representation can provide this kind of information
Both codeword (pattern) 3 & 5
show up 2 times
Both codeword (pattern) 3 & 5
show up 2 times
![Page 16: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/16.jpg)
Methodology
Problems of frequency based encoding:• It can not record the location of a subsequence
• It is hard to define an approximate resolution (codeword length)
• It may lose global information
![Page 17: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/17.jpg)
Methodology
Utilizing multiple resolutions:
Solution: encoding with multiple resolutionsEach resolution level will be complementary to each other
Reconstruction of time series using
different
resolutions
Reconstruction of time series using
different
resolutions
![Page 18: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/18.jpg)
Methodology
New distance metric:For all resolution levels a weighted similarity metric is defined as:
c
1ijHMiijHHM )d(q,S * w )d(q,S
![Page 19: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/19.jpg)
MethodologyParameters of MVQ
X Original time series, X= x1,x2,…,xn of length n
X’ Encoded form of the original time series X′=f′1,f′2,…,f′s
N Number of time series in the dataset
n Length of original time series
C Codebook: a set of codewords {c1,…,ck,…, cs}
c Number of resolution levels
s Size of codebook
l Length of codeword
![Page 20: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/20.jpg)
MethodologyParameters of MVQ
•Number of resolution levels
c = log (n / lmin) +1 lmin is the minimal codeword length•Length of codeword (on i th level)
l = n / 2i-1 •Size of codebook
Data dependent. However, in practice, small codebooks can achieve very good results
![Page 21: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/21.jpg)
Background
Methodology
Experimental results
Conclusion
Outline: A Multiresolution Symbolic Representation of Time Series
![Page 22: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/22.jpg)
Experiments
Datasets SYNDATA (control chart data): synthetic
CAMMOUSE: 3 *5 sequences obtained using the Camera Mouse Program RTT: RTT measurements from UCR to CMU with sending rate of 50 msec for a day
![Page 23: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/23.jpg)
Experiments
Best Match Searching: For a given query, time series within the same class as the query (given our prior knowledge) form the standard set (std_set(q) ), and the results found by different approaches (knn(q) ) are compared to this set
The matching accuracy is defined as:
100% k
|std_set(q) knn(q)| Accuracy
![Page 24: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/24.jpg)
Experiments
Best Match Searching
Method Weight Vector
Accuracy
Single levelVQ
[1 0 0 0 0]
0.55
[0 1 0 0 0]
0.70
[0 0 1 0 0]
0.65
[0 0 0 1 0]
0.48
[0 0 0 0 1]
0.46
MVQ [1 1 1 1 1]
0.83
Euclidean
0.51
SYNDATA CAMMOUSE
Method Weight Vector
Accuracy
Single levelVQ
[1 0 0 0 0] 0.56
[0 1 0 0 0] 0.60
[0 0 1 0 0] 0.44
[0 0 0 1 0] 0.56
[0 0 0 0 1] 0.60
MVQ [1 1 1 1 1] 0.83
Euclidean
0.58
![Page 25: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/25.jpg)
ExperimentsBest Match Searching
(a) (b) Precision-recall for different methods
(a) on SYNDATA dataset (b) on CAMMOUSE dataset
![Page 26: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/26.jpg)
ExperimentsClustering experiments
Given two clusterings, G=G1, G2, …, GK (the true clusters), and A = A1, A2, …, Ak (clustering result by a certain method), the clustering accuracy is evaluated with the cluster similarity defined as:
k
AGSimi ji
),(maxA)Sim(G,
j |A| |G|
|AG|2 Aj)Sim(Gi,
ji
ji
with
![Page 27: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/27.jpg)
ExperimentsClustering experiments
Method Weight Vector
Accuracy
Single level
VQ
[1 0 0 0 0] 0.69
[0 1 0 0 0] 0.71
[0 0 1 0 0] 0.63
[0 0 0 1 0] 0.51
[0 0 0 0 1] 0.49
MVQ [1 1 1 1 1] 0.82
DFT 0.67
SAX 0.65
DTW 0.80
Euclidean 0.55
SYNDATA RTTMethod Weight
VectorAccuracy
Single levelVQ
[1 0 0 0 0] 0.55
[0 1 0 0 0] 0.52
[0 0 1 0 0] 0.57
[0 0 0 1 0] 0.80
[0 0 0 0 1] 0.79
MVQ [0 0 0 1 1] 0.81
DFT 0.54
SAX 0.54
DTW 0.62
Euclidean 0.50
![Page 28: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/28.jpg)
ExperimentsSummarization (SYNDATA)
Typical series:
![Page 29: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/29.jpg)
ExperimentsFirst Level Second Level
![Page 30: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/30.jpg)
Background
Methodology
Experimental results
Conclusion
Outline: A Multiresolution Symbolic Representation of Time Series
![Page 31: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/31.jpg)
Conclusion
• A new symbolic representation of time series
• A more meaningful similarity metric• Improved efficiency due to the dimensionality reduction
• Nice summarization of time series
• Utilizes multiple resolutions
• Uses prior knowledge (training process)
![Page 32: A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou 1, Qiang Wang 1, Guo Li 1, Christos Faloutsos 2 1 Temple University,](https://reader035.fdocuments.us/reader035/viewer/2022062407/56649d765503460f94a58011/html5/thumbnails/32.jpg)