Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1...
Transcript of Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1...
![Page 1: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/1.jpg)
CMU SCS
Indexing and Mining Time
Sequences
Christos Faloutsos and Lei Li
CMU
![Page 2: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/2.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 2
Outline
• Motivation
• Similarity Search and Indexing
• DSP (Digital Signal Processing)
• Linear Forecasting
• Kalman filters
• fractals and multifractals
• Non-linear forecasting
• Conclusions
![Page 3: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/3.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 3
Problem definition
• Given: one or more sequences
x1 , x2 , … , xt , …
(y1, y2, … , yt, …
… )
• Find
– similar sequences; forecasts
– patterns; clusters; outliers
![Page 4: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/4.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 4
Motivation - Applications
• Financial, sales, economic series
• Medical
– reactions to new drugs
– elderly care
![Page 5: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/5.jpg)
CMU SCS
ECG - physionet.org
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 5
![Page 6: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/6.jpg)
CMU SCS
EEG - epilepsy
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 6from wikipedia
![Page 7: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/7.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 7
Motivation - Applications
(cont‟d)
• „Smart house‟
– sensors monitor temperature, humidity,
air quality
• video surveillance
![Page 8: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/8.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 8
Motivation - Applications
(cont‟d)
• civil/automobile infrastructure
– bridge vibrations [Oppenheim+02]
– road conditions / traffic monitoring
![Page 9: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/9.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 9
Motivation - Applications
(cont‟d)
• Weather, environment/anti-pollution
– volcano monitoring
– air/water pollutant monitoring
![Page 10: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/10.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 10
Motivation - Applications
(cont‟d)
• Computer systems
– „Active Disks‟ (buffering, prefetching)
– web servers (ditto)
– network traffic monitoring
– ...
![Page 11: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/11.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 11
Stream Data: Disk accesses
time
#bytes
![Page 12: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/12.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 12
Problem #1:
Goal: given a signal (eg., #packets over time)
Find: patterns, periodicities, and/or compress
year
count lynx caught per year
(packets per day;
temperature per day)
![Page 13: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/13.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 13
Problem#2: Forecast
Given xt, xt-1, …, forecast xt+1
0
10
20
30
40
50
60
70
80
90
1 3 5 7 9 11
Time Tick
Nu
mb
er o
f p
ack
ets
sen
t
??
![Page 14: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/14.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 14
Problem#2‟: Similarity search
Eg., Find a 3-tick pattern, similar to the last one
0
10
20
30
40
50
60
70
80
90
1 3 5 7 9 11
Time Tick
Nu
mb
er o
f p
ack
ets
sen
t
??
![Page 15: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/15.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 15
Problem #3:
• Given: A set of correlated time sequences
• Forecast „Sent(t)‟
0
10
20
30
40
50
60
70
80
90
1 3 5 7 9 11
Time Tick
Nu
mb
er o
f p
ack
ets
sent
lost
repeated
![Page 16: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/16.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 16
Important observations
Patterns, rules, forecasting and similarity indexing are closely related:
• To do forecasting, we need
– to find patterns/rules
– to find similar settings in the past
• to find outliers, we need to have forecasts
– (outlier = too far away from our forecast)
![Page 17: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/17.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 17
Important topics NOT in this
tutorial:
• Continuous queries
– [Babu+Widom ] [Gehrke+] [Madden+]
• Categorical data streams
– [Hatonen+96]
• Outlier detection (discontinuities)
– [Breunig+00]
![Page 18: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/18.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 18
Outline
• Motivation
• Similarity Search and Indexing
• DSP
• Linear Forecasting
• Bursty traffic - fractals and multifractals
• Non-linear forecasting
• Conclusions
![Page 19: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/19.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 19
Outline
• Motivation
• Similarity Search and Indexing
– distance functions: Euclidean;Time-warping
– indexing
– feature extraction
• DSP
• ...
![Page 20: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/20.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 20
Importance of distance functions
Subtle, but absolutely necessary:
• A „must‟ for similarity indexing (->
forecasting)
• A „must‟ for clustering
Two major families
– Euclidean and Lp norms
– Time warping and variations
![Page 21: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/21.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 21
Euclidean and Lp
n
i
ii yxyxD1
2)(),(x(t) y(t)
...
n
i
p
iip yxyxL1
||),(
•L1: city-block = Manhattan
•L2 = Euclidean
•L
![Page 22: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/22.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 22
Observation #1
• Time sequence -> n-d
vector
...
Day-1
Day-2
Day-n
![Page 23: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/23.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 23
Observation #2
Euclidean distance is
closely related to
– cosine similarity
– dot product
– „cross-correlation‟
function
...
Day-1
Day-2
Day-n
![Page 24: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/24.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 24
Time Warping
• allow accelerations - decelerations
– (with or w/o penalty)
• THEN compute the (Euclidean) distance (+
penalty)
• related to the string-editing distance
![Page 25: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/25.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 25
Time Warping
„stutters‟:
![Page 26: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/26.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 26
Time Warping
Q: how to compute it?
A: dynamic programming
D( i, j ) = cost to match
prefix of length i of first sequence x with prefix of length j of second sequence y
Skip
![Page 27: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/27.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 27
Thus, with no penalty for stutter, for sequences
x1, x2, …, xi,; y1, y2, …, yj
),1(
)1,(
)1,1(
min][][),(
jiD
jiD
jiD
jyixjiD x-stutter
y-stutter
no stutter
Time WarpingSkip
![Page 28: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/28.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 34
Other Distance functions
• piece-wise linear/flat approx.; compare
pieces [Keogh+01] [Faloutsos+97]
• „cepstrum‟ (for voice [Rabiner+Juang])
– do DFT; take log of amplitude; do DFT again!
• Allow for small gaps [Agrawal+95]
![Page 29: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/29.jpg)
CMU SCS
More distance functions.
• Chen + Ng [vldb‟04]: ERP „Edit distance
with Real Penalty‟: give a penalty to stutters
• Keogh+ [kdd‟04]: VERY NICE, based on
information theory: compress each
sequence (quantize + Lempel-Ziv), using
the other sequences‟ LZ tables
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 35
On The Marriage of Lp-norms and Edit Distance, Lei Chen,
Raymond T. Ng:, VLDB‟04
Towards Parameter-Free Data Mining, E. Keogh, S. Lonardi,
C.A. Ratanamahatana, KDD‟04
![Page 30: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/30.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 36
Conclusions
Prevailing distances:
– Euclidean and
– time-warping
![Page 31: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/31.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 37
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DSP
• ...
![Page 32: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/32.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 38
Indexing
Problem:
• given a set of time sequences,
• find the ones similar to a desirable query
sequence
![Page 33: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/33.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 39
day
$price
1 365
day
$price
1 365
day
$price
1 365
distance function: by expert
![Page 34: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/34.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 40
Idea: „GEMINI‟
Eg., „find stocks similar to MSFT’
Seq. scanning: too slow
How to accelerate the search?
[Faloutsos96]
![Page 35: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/35.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 41
day1 365
day1 365
S1
Sn
F(S1)
F(Sn)
„GEMINI‟ - Pictorially
eg, avg
eg,. std
![Page 36: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/36.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 42
GEMINI
Solution: Quick-and-dirty' filter:
• extract n features (numbers, eg., avg., etc.)
• map into a point in n-d feature space
• organize points with off-the-shelf spatial
access method („SAM‟)
• discard false alarms
![Page 37: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/37.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 43
Examples of GEMINI
• Time sequences: DFT (up to 100 times
faster) [SIGMOD94];
• [Kanellakis+], [Mendelzon+]
![Page 38: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/38.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 44
Indexing - SAMs
Q: How do Spatial Access Methods (SAMs)
work?
A: they group nearby points (or regions)
together, on nearby disk pages, and answer
spatial queries quickly („range queries‟,
„nearest neighbor‟ queries etc)
For example:
![Page 39: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/39.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 45
R-trees
• [Guttman84] eg., w/ fanout 4: group nearby
rectangles to parent MBRs; each group ->
disk page
A
B
C
DE
FG
H
I
J
Skip
![Page 40: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/40.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 46
R-trees
• eg., w/ fanout 4:
A
B
C
DE
FG
H
I
J
P1
P2
P3
P4
F GD E
H I JA B C
Skip
![Page 41: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/41.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 47
R-trees
• eg., w/ fanout 4:
A
B
C
DE
FG
H
I
J
P1
P2
P3
P4
P1 P2 P3 P4
F GD E
H I JA B C
Skip
![Page 42: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/42.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 48
R-trees - range search?
A
B
C
DE
FG
H
I
J
P1
P2
P3
P4
P1 P2 P3 P4
F GD E
H I JA B C
Skip
![Page 43: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/43.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 49
R-trees - range search?
A
B
C
DE
FG
H
I
J
P1
P2
P3
P4
P1 P2 P3 P4
F GD E
H I JA B C
Skip
![Page 44: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/44.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 50
Conclusions
• Fast indexing: through GEMINI
– feature extraction and
– (off the shelf) Spatial Access Methods
[Gaede+98]
![Page 45: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/45.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 51
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DSP
• ...
![Page 46: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/46.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 52
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DFT, DWT, DCT (data independent)
• SVD, etc (data dependent)
• MDS, FastMap
![Page 47: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/47.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 53
DFT and cousins
• very good for compressing real signals
• more details on DFT/DCT/DWT: later
![Page 48: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/48.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 54
DFT and stocks
0.00
2000.00
4000.00
6000.00
8000.00
10000.00
12000.00
1 11 21 31 41 51 61 71 81 91 101 111 121
Fourier appx actual
• Dow Jones Industrial
index, 6/18/2001-
12/21/2001
![Page 49: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/49.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 55
DFT and stocks
0.00
2000.00
4000.00
6000.00
8000.00
10000.00
12000.00
1 11 21 31 41 51 61 71 81 91 101 111 121
Fourier appx actual
• Dow Jones Industrial
index, 6/18/2001-
12/21/2001
• just 3 DFT
coefficients give very
good approximation
1
10
100
1000
10000
100000
1000000
10000000
1 11 21 31 41 51 61 71 81 91 101 111 121
Series1
freq
Log(ampl)
![Page 50: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/50.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 56
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DFT, DWT, DCT (data independent)
• SVD etc (data dependent)
• MDS, FastMap
![Page 51: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/51.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 57
SVD
• THE optimal method for dimensionality
reduction
– (under the Euclidean metric)
![Page 52: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/52.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 58
Singular Value Decomposition
(SVD)• SVD (~LSI ~ KL ~ PCA ~ spectral
analysis...) LSI: S. Dumais; M. Berry
KL: eg, Duda+Hart
PCA: eg., Jolliffe
Details: [Press+],
[Faloutsos96]
day1
day2
![Page 53: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/53.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 59
SVD
• Extremely useful tool
– (also behind PageRank/google and Kleinberg‟s
algorithm for hubs and authorities)
• But may be slow: O(N * M * M) if N>M
• any approximate, faster method?
![Page 54: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/54.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 60
SVD shorcuts
• random projections (Johnson-Lindenstrauss
thm [Papadimitriou+ pods98])
![Page 55: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/55.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 61
Random projections
• pick „enough‟ random directions (will be
~orthogonal, in high-d!!)
• distances are preserved probabilistically,
within epsilon
• (also, use as a pre-processing step for SVD
[Papadimitriou+ PODS98])
![Page 56: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/56.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 62
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DFT, DWT, DCT (data independent)
• SVD etc (data dependent), ICA
• MDS, FastMap
![Page 57: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/57.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 63
Citation
• AutoSplit: Fast and Scalable Discovery of
Hidden Variables in Stream and Multimedia
Databases, Jia-Yu Pan, Hiroyuki Kitagawa,
Christos Faloutsos and Masafumi
Hamamoto
PAKDD 2004, Sydney, Australia
![Page 58: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/58.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 64
Motivation:
(Q1) Find patterns in data• Motion capture data (broad jumps)
Left Knee
Right Knee
Energy exerted
Take-off
Landing
![Page 59: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/59.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 65
PCA sometimes misses essential
features
• Best SVD axis: not always meaningful!
![Page 60: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/60.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 66
Motivation:
(Q1) Find patterns in data
• Human would say
– Pattern 1: along
diagonal
– Pattern 2: along
vertical axis
• How to find these
automatically? Left Knee
Right Knee
60:1
1:1
![Page 61: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/61.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 67
Motivation:
(Q2) Find hidden variables
Dow Jones Industrial Average
Alcoa
American
Express
Boeing
Caterpillar
Citi
Group
Find common
hidden variables,
and weights.
![Page 62: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/62.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 68
Motivation:
(Q2) Find hidden variables
Hidden variable 1 Hidden variable 2
B1,CAT
B1,INTC B2,CAT
B2,INTC?
? ?
Caterpillar Intel
![Page 63: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/63.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 69
Motivation:
(Q2) Find hidden variables
“Hidden variable 1” “Hidden variable 2”
0.940.63 0.03
0.64
Caterpillar Intel
![Page 64: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/64.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 70
Motivation:
(Q2) Find hidden variables
“General trend” “Internet bubble”
0.940.63 0.03
0.64
Caterpillar Intel
![Page 65: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/65.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 80
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DFT, DWT, DCT (data independent)
• SVD (data dependent)
• MDS, FastMap
![Page 66: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/66.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 81
MDS / FastMap
• but, what if we have NO points to start
with?
(eg. Time-warping distance)
• A: Multi-dimensional Scaling (MDS) ;
FastMap
![Page 67: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/67.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 82
MDS/FastMap
01100100100O5
10100100100O4
100100011O3
100100101O2
100100110O1
O5O4O3O2O1~100
~1
![Page 68: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/68.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 83
MDS
Multi Dimensional
Scaling
![Page 69: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/69.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 84
FastMap
• Multi-dimensional scaling (MDS) can do
that, but in O(N**2) time
• FastMap [Faloutsos+95] takes O(N) time
![Page 70: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/70.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 85
FastMap: Application
VideoTrails [Kobla+97]
scene-cut detection (about 10% errors)
![Page 71: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/71.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 86
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DFT, DWT, DCT (data independent)
• SVD (data dependent)
• MDS, FastMap, IsoMap etc
![Page 72: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/72.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li #87
Variations
• Isomap [Tenenbaum, de Silva, Langford,
2000]
• LLE (Local Linear Embedding) [Roweis,
Saul, 2000]
• MVE (Minimum Volume Embedding)
[Shaw & Jebara, 2007]
![Page 73: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/73.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li #88
Variations
• Isomap [Tenenbaum, de Silva, Langford,
2000]
• LLE (Local Linear Embedding) [Roweis,
Saul, 2000]
• MVE (Minimum Volume Embedding)
[Shaw & Jebara, 2007]
![Page 74: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/74.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 89
Outline
• Motivation
• Similarity Search and Indexing
– distance functions
– indexing
– feature extraction
• DFT, DWT, DCT (data independent)
• SVD (data dependent)
• MDS, FastMap
![Page 75: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/75.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 90
Conclusions - Practitioner‟s
guide
Similarity search in time sequences
1) establish/choose distance (Euclidean, time-
warping,…)
2) extract features (SVD, DWT, MDS), and use
an SAM (R-tree/variant) or a Metric Tree (M-
tree)
2‟) for high intrinsic dimensionalities, consider sequential
scan (it might win…)
![Page 76: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/76.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 91
Books
• William H. Press, Saul A. Teukolsky, William T.
Vetterling and Brian P. Flannery: Numerical Recipes in C,
Cambridge University Press, 1992, 2nd Edition. (Great
description, intuition and code for SVD)
• C. Faloutsos: Searching Multimedia Databases by Content,
Kluwer Academic Press, 1996 (introduction to SVD, and
GEMINI)
![Page 77: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/77.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 92
References
• Agrawal, R., K.-I. Lin, et al. (Sept. 1995). Fast Similarity
Search in the Presence of Noise, Scaling and Translation in
Time-Series Databases. Proc. of VLDB, Zurich,
Switzerland.
• Babu, S. and J. Widom (2001). “Continuous Queries over
Data Streams.” SIGMOD Record 30(3): 109-120.
• Breunig, M. M., H.-P. Kriegel, et al. (2000). LOF:
Identifying Density-Based Local Outliers. SIGMOD
Conference, Dallas, TX.• Berry, Michael: http://www.cs.utk.edu/~lsi/
![Page 78: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/78.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 93
References
• Ciaccia, P., M. Patella, et al. (1997). M-tree: An Efficient
Access Method for Similarity Search in Metric Spaces.
VLDB.
• Foltz, P. W. and S. T. Dumais (Dec. 1992). “Personalized
Information Delivery: An Analysis of Information
Filtering Methods.” Comm. of ACM (CACM) 35(12): 51-
60.
• Guttman, A. (June 1984). R-Trees: A Dynamic Index
Structure for Spatial Searching. Proc. ACM SIGMOD,
Boston, Mass.
![Page 79: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/79.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 94
References
• Gaede, V. and O. Guenther (1998). “Multidimensional
Access Methods.” Computing Surveys 30(2): 170-231.
• Gehrke, J. E., F. Korn, et al. (May 2001). On Computing
Correlated Aggregates Over Continual Data Streams.
ACM Sigmod, Santa Barbara, California.
![Page 80: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/80.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li Part2.1 #95
References
• Gunopulos, D. and G. Das (2001). Time Series
Similarity Measures and Time Series Indexing.
SIGMOD Conference, Santa Barbara, CA.
• Eamonn J. Keogh, Themis Palpanas, Victor B.
Zordan, Dimitrios Gunopulos, Marc Cardle:
Indexing Large Human-Motion Databases. VLDB
2004: 780-791
![Page 81: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/81.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 96
References
• Hatonen, K., M. Klemettinen, et al. (1996). Knowledge
Discovery from Telecommunication Network Alarm
Databases. ICDE, New Orleans, Louisiana.
• Jolliffe, I. T. (1986). Principal Component Analysis,
Springer Verlag.
![Page 82: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/82.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 97
References
• Keogh, E. J., K. Chakrabarti, et al. (2001). Locally
Adaptive Dimensionality Reduction for Indexing Large
Time Series Databases. SIGMOD Conference, Santa
Barbara, CA.
• Kobla, V., D. S. Doermann, et al. (Nov. 1997).
VideoTrails: Representing and Visualizing Structure in
Video Sequences. ACM Multimedia 97, Seattle, WA.
![Page 83: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/83.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 98
References
• Oppenheim, I. J., A. Jain, et al. (March 2002). A MEMS
Ultrasonic Transducer for Resident Monitoring of Steel
Structures. SPIE Smart Structures Conference SS05, San
Diego.
• Papadimitriou, C. H., P. Raghavan, et al. (1998). Latent
Semantic Indexing: A Probabilistic Analysis. PODS,
Seattle, WA.
• Rabiner, L. and B.-H. Juang (1993). Fundamentals of
Speech Recognition, Prentice Hall.
![Page 84: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/84.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 99
References
• Traina, C., A. Traina, et al. (October 2000). Fast feature
selection using the fractal dimension,. XV Brazilian
Symposium on Databases (SBBD), Paraiba, Brazil.
![Page 85: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/85.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 100
References• Dennis Shasha and Yunyue Zhu High Performance
Discovery in Time Series: Techniques and Case Studies Springer 2004
• Yunyue Zhu, Dennis Shasha ``StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time'‘ VLDB, August, 2002. pp. 358-369.
• Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong. The Design of an Acquisitional Query Processor for Sensor Networks. SIGMOD, June 2003, San Diego, CA.
![Page 86: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/86.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li #101
References
• Lawrence Saul & Sam Roweis. An Introduction to Locally Linear Embedding (draft)
• Sam Roweis & Lawrence Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, v.290 no.5500 , Dec.22, 2000. pp.2323--2326.
• B. Shaw and T. Jebara. "Minimum Volume Embedding" . Artificial Intelligence and Statistics, AISTATS, March 2007.
![Page 87: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/87.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li #102
References
• Josh Tenenbaum, Vin de Silva and John Langford. A
Global Geometric Framework for Nonlinear
dimensionality Reduction. Science 290, pp. 2319-2323,
2000.
![Page 88: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/88.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 103
![Page 89: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/89.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 104
Outline
• Motivation
• Similarity Search and Indexing
• DSP (DFT, DWT)
• Linear Forecasting
• Kalman filters
• Bursty traffic - fractals and multifractals
• Non-linear forecasting
• Conclusions
![Page 90: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/90.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 105
Outline
• DFT
– Definition of DFT and properties
– how to read the DFT spectrum
• DWT
– Definition of DWT and properties
– how to read the DWT scalogram
![Page 91: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/91.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 106
Introduction - Problem#1
Goal: given a signal (eg., packets over time)
Find: patterns and/or compress
year
count
lynx caught per year
(packets per day;
automobiles per hour)-2000
0
2000
4000
6000
8000
1 14 27 40 53 66 79 92 105
![Page 92: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/92.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 107
What does DFT do?
A: highlights the periodicities
![Page 93: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/93.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 108
DFT: definition• For a sequence x0, x1, … xn-1
• the (n-point) Discrete Fourier Transform is
• X0, X1, … Xn-1 :
)/2exp(*/1
)1(
1,,0)/2exp(*/1
1
0
1
0
ntfjXnx
j
nfntfjxnX
n
t
ft
n
t
tf
inverse DFT
Skip
![Page 94: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/94.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 109
DFT: definition
• Good news: Available in all symbolic math
packages, eg., in „mathematica‟
x = [1,2,1,2];
X = Fourier[x];
Plot[ Abs[X] ];
![Page 95: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/95.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 110
DFT: Amplitude spectrum
actual mean mean+freq12
1
12
23
34
45
56
67
78
89
10
0
11
1
year
count
Freq.
Ampl.
freq=12
freq=0
)(Im)(Re 222
fffXXA Amplitude:
![Page 96: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/96.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 111
DFT: examples
flat
0.000
0.010
0.020
0.030
0.040
0.050
0.060
0.070
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
time freq
Amplitude
Skip
![Page 97: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/97.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 112
DFT: examples
Low frequency sinusoid
-0.150
-0.100
-0.050
0.000
0.050
0.100
0.150
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
time freq
Skip
![Page 98: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/98.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 113
DFT: examples
• Sinusoid - symmetry property: Xf = X*n-f
-0.150
-0.100
-0.050
0.000
0.050
0.100
0.150
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
time freq
Skip
![Page 99: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/99.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 114
DFT: examples
• Higher freq. sinusoid
-0.080
-0.060
-0.040
-0.020
0.000
0.020
0.040
0.060
0.080
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
time freq
Skip
![Page 100: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/100.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 115
DFT: examples
examples
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
=
+
+
Skip
![Page 101: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/101.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 116
DFT: examples
examples
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Freq.
Ampl.
Skip
![Page 102: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/102.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 117
Outline
• Motivation
• Similarity Search and Indexing
• DSP
• Linear Forecasting
• Bursty traffic - fractals and multifractals
• Non-linear forecasting
• Conclusions
![Page 103: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/103.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 118
Outline
• Motivation
• Similarity Search and Indexing
• DSP
– DFT
• Definition of DFT and properties
• how to read the DFT spectrum
– DWT
![Page 104: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/104.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 119
DFT: Amplitude spectrum
actual mean mean+freq12
1
12
23
34
45
56
67
78
89
10
0
11
1
year
count
Freq.
Ampl.
freq=12
freq=0
)(Im)(Re 222
fffXXA Amplitude:
![Page 105: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/105.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 120
DFT: Amplitude spectrum
actual mean mean+freq12
1
12
23
34
45
56
67
78
89
10
0
11
1
year
count
Freq.
Ampl.
freq=12
freq=0
![Page 106: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/106.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 121
1
12
23
34
45
56
67
78
89
10
0
11
1
DFT: Amplitude spectrum
actual mean mean+freq12
year
count
Freq.
Ampl.
freq=12
freq=0
![Page 107: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/107.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 122
DFT: Amplitude spectrum
• excellent approximation, with only 2
frequencies!
• so what?
actual mean mean+freq12
1
12
23
34
45
56
67
78
89
10
0
11
1
Freq.
![Page 108: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/108.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 123
DFT: Amplitude spectrum
• excellent approximation, with only 2
frequencies!
• so what?
• A1: (lossy) compression
• A2: pattern discovery 1
12
23
34
45
56
67
78
89
10
0
11
1
![Page 109: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/109.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 124
DFT: Amplitude spectrum
• excellent approximation, with only 2
frequencies!
• so what?
• A1: (lossy) compression
• A2: pattern discovery
actual mean mean+freq12
![Page 110: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/110.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 125
DFT - Conclusions
• It spots periodicities (with the „amplitude spectrum’)
• can be quickly computed (O( n log n)), thanks to the FFT algorithm.
• standard tool in signal processing (speech, image etc signals)
• (closely related to DCT and JPEG)
![Page 111: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/111.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 126
Outline
• Motivation
• Similarity Search and Indexing
• DSP
– DFT
– DWT
• Definition of DWT and properties
• how to read the DWT scalogram
![Page 112: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/112.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 127
Problem #1:
Goal: given a signal (eg., #packets over time)
Find: patterns, periodicities, and/or compress
year
count lynx caught per year
(packets per day;
virus infections per month)
![Page 113: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/113.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 128
Wavelets - DWT
• DFT is great - but, how about compressing
a spike?
value
time
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
![Page 114: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/114.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 129
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Wavelets - DWT
• DFT is great - but, how about compressing
a spike?
• A: Terrible - all DFT coefficients needed!
0
0.2
0.4
0.6
0.8
1
1.2
1 3 5 7 9 11 13 15
Freq.
Ampl.value
time
![Page 115: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/115.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 130
Wavelets - DWT
• DFT is great - but, how about compressing
a spike?
• A: Terrible - all DFT coefficients needed!
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
value
time
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
![Page 116: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/116.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 131
Wavelets - DWT
• Similarly, DFT suffers on short-duration
waves (eg., baritone, silence, soprano)
time
value
![Page 117: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/117.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 132
Wavelets - DWT
• Solution#1: Short window Fourier
transform (SWFT)
• But: how short should be the window?
time
freq
time
value
![Page 118: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/118.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 133
Wavelets - DWT
• Answer: multiple window sizes! -> DWT
time
freq
Time
domain DFT SWFT DWT
![Page 119: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/119.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 134
Haar Wavelets
• subtract sum of left half from right half
• repeat recursively for quarters, eight-ths, ...
![Page 120: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/120.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 135
Wavelets - construction
x0 x1 x2 x3 x4 x5 x6 x7
Skip
![Page 121: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/121.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 136
Wavelets - construction
x0 x1 x2 x3 x4 x5 x6 x7
s1,0
+-
d1,0 s1,1d1,1 .......level 1
Skip
![Page 122: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/122.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 137
Wavelets - construction
d2,0
x0 x1 x2 x3 x4 x5 x6 x7
s1,0
+-
d1,0 s1,1d1,1 .......
s2,0level 2
Skip
![Page 123: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/123.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 138
Wavelets - construction
d2,0
x0 x1 x2 x3 x4 x5 x6 x7
s1,0
+-
d1,0 s1,1d1,1 .......
s2,0
etc ...
Skip
![Page 124: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/124.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 139
Wavelets - construction
d2,0
x0 x1 x2 x3 x4 x5 x6 x7
s1,0
+-
d1,0 s1,1d1,1 .......
s2,0
Q: map each coefficient
on the time-freq. plane
t
f
Skip
![Page 125: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/125.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 140
Wavelets - construction
d2,0
x0 x1 x2 x3 x4 x5 x6 x7
s1,0
+-
d1,0 s1,1d1,1 .......
s2,0
Q: map each coefficient
on the time-freq. plane
t
f
Skip
![Page 126: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/126.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 141
Haar wavelets - code
#!/usr/bin/perl5
# expects a file with numbers
# and prints the dwt transform
# The number of time-ticks should be a power of 2
# USAGE
# haar.pl <fname>
my @vals=();
my @smooth; # the smooth component of the signal
my @diff; # the high-freq. component
# collect the values into the array @val
while(<>){
@vals = ( @vals , split );
}
my $len = scalar(@vals);
my $half = int($len/2);
while($half >= 1 ){
for(my $i=0; $i< $half; $i++){
$diff [$i] = ($vals[2*$i] - $vals[2*$i + 1] )/ sqrt(2);
print "\t", $diff[$i];
$smooth [$i] = ($vals[2*$i] + $vals[2*$i + 1] )/ sqrt(2);
}
print "\n";
@vals = @smooth;
$half = int($half/2);
}
print "\t", $vals[0], "\n" ; # the final, smooth component
![Page 127: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/127.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 142
Wavelets - construction
Observation1:
„+‟ can be some weighted addition
„-‟ is the corresponding weighted difference
(„Quadrature mirror filters‟)
Observation2: unlike DFT/DCT,
there are *many* wavelet bases: Haar, Daubechies-
4, Daubechies-6, Coifman, Morlet, Gabor, ...
![Page 128: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/128.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 143
Wavelets - how do they look
like?
• E.g., Daubechies-4
![Page 129: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/129.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 144
Wavelets - how do they look
like?
• E.g., Daubechies-4
?
?
![Page 130: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/130.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 145
Wavelets - how do they look
like?
• E.g., Daubechies-4
![Page 131: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/131.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 146
Outline
• Motivation
• Similarity Search and Indexing
• DSP
– DFT
– DWT
• Definition of DWT and properties
• how to read the DWT scalogram
![Page 132: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/132.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 147
Wavelets - Drill#1:
t
f
• Q: baritone/silence/soprano - DWT?
time
value
![Page 133: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/133.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 148
Wavelets - Drill#1:
t
f
• Q: baritone/silence/soprano - DWT?
time
value
![Page 134: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/134.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 149
Wavelets - Drill#2:
• Q: spike - DWT?
t
f
1 2 3 4 5 6 7 8
![Page 135: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/135.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 150
Wavelets - Drill#2:
t
f
• Q: spike - DWT?
1 2 3 4 5 6 7 8
0.00 0.00 0.71 0.00
0.00 0.50
-0.35
0.35
![Page 136: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/136.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 151
Wavelets - Drill#3:
• Q: weekly + daily periodicity, + spike -
DWT?
t
f
![Page 137: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/137.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 152
Wavelets - Drill#3:
• Q: weekly + daily periodicity, + spike -
DWT?
t
f
![Page 138: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/138.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 153
Wavelets - Drill#3:
• Q: weekly + daily periodicity, + spike -
DWT?
t
f
![Page 139: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/139.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 154
Wavelets - Drill#3:
• Q: weekly + daily periodicity, + spike -
DWT?
t
f
![Page 140: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/140.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 155
Wavelets - Drill#3:
• Q: weekly + daily periodicity, + spike -
DWT?
t
f
![Page 141: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/141.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 156
Wavelets - Drill#3:
• Q: DFT?
t
f
t
f
DWT DFT
![Page 142: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/142.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 157
Advantages of Wavelets
• Better compression (better RMSE with same
number of coefficients - used in JPEG-2000)
• fast to compute (usually: O(n)!)
• very good for „spikes‟
• mammalian eye and ear: Gabor wavelets
![Page 143: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/143.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 158
Overall Conclusions
• DFT, DCT spot periodicities
• DWT : multi-resolution - matches
processing of mammalian ear/eye better
• All three: powerful tools for compression,
pattern detection in real signals
• All three: included in math packages
– (matlab, „R‟, mathematica, … - often in
spreadsheets!)
![Page 144: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/144.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 159
Overall Conclusions
• DWT : very suitable for self-similar traffic
• DWT: used for summarization of streams [Gilbert+01], db histograms etc
![Page 145: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/145.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 160
Resources - software and urls
• http://www.dsptutor.freeuk.com/jsanalyser/
FFTSpectrumAnalyser.html : Nice java
applets for FFT
• http://www.relisoft.com/freeware/freq.html
voice frequency analyzer (needs
microphone)
![Page 146: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/146.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 161
Resources: software and urls
• xwpl: open source wavelet package from
Yale, with excellent GUI
• http://monet.me.ic.ac.uk/people/gavin/java
/waveletDemos.html : wavelets and
scalograms
![Page 147: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/147.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 162
Books
• William H. Press, Saul A. Teukolsky, William T.
Vetterling and Brian P. Flannery: Numerical Recipes in C,
Cambridge University Press, 1992, 2nd Edition. (Great
description, intuition and code for DFT, DWT)
• C. Faloutsos: Searching Multimedia Databases by Content,
Kluwer Academic Press, 1996 (introduction to DFT,
DWT)
![Page 148: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/148.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 163
Additional Reading
• [Gilbert+01] Anna C. Gilbert, Yannis Kotidis and S.
Muthukrishnan and Martin Strauss, Surfing Wavelets on
Streams: One-Pass Summaries for Approximate Aggregate
Queries, VLDB 2001
![Page 149: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/149.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 164
![Page 150: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/150.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 165
Outline
• Motivation
• Similarity Search and Indexing
• DSP
• Linear Forecasting
• Kalman filters
• Bursty traffic - fractals and multifractals
• Non-linear forecasting
• Conclusions
![Page 151: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/151.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 166
Forecasting
"Prediction is very difficult, especially about
the future." - Nils Bohr
http://www.hfac.uh.edu/MediaFutures/thoughts.html
![Page 152: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/152.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 167
Outline
• Motivation
• ...
• Linear Forecasting
– Auto-regression: Least Squares; RLS
– Co-evolving time sequences
– Examples
– Conclusions
![Page 153: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/153.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 168
Problem#2: Forecast
• Example: give xt-1, xt-2, …, forecast xt
0
10
20
30
40
50
60
70
80
90
1 3 5 7 9 11
Time Tick
Nu
mb
er o
f p
ack
ets
sen
t
??
![Page 154: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/154.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 169
Forecasting: Preprocessing
MANUALLY:
remove trends spot periodicities
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
0
0.5
1
1.5
2
2.5
3
3.5
1 3 5 7 9 11 13
timetime
7 days
![Page 155: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/155.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 170
Problem#2: Forecast
• Solution: try to express
xt
as a linear function of the past: xt-2, xt-2, …,
(up to a window of w)
Formally:
0102030405060708090
1 3 5 7 9 11Time Tick
??noisexaxax wtwtt 11
![Page 156: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/156.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 171
(Problem: Back-cast; interpolate)• Solution - interpolate: try to express
xt
as a linear function of the past AND the future:
xt+1, xt+2, … xt+wfuture; xt-1, … xt-wpast
(up to windows of wpast , wfuture)
• EXACTLY the same algo‟s
0102030405060708090
1 3 5 7 9 11Time Tick
??
![Page 157: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/157.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 172
Linear Regression: idea
40
45
50
55
60
65
70
75
80
85
15 25 35 45
Body weight
patient weight height
1 27 43
2 43 54
3 54 72
……
…
N 25 ??
• express what we don‟t know (= „dependent variable‟)
• as a linear function of what we know (= „indep. variable(s)‟)
Body height
![Page 158: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/158.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 173
Linear Auto Regression:
Time Packets
Sent (t-1)
Packets
Sent(t)
1 - 43
2 43 54
3 54 72
……
…
N 25 ??
![Page 159: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/159.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 174
Linear Auto Regression:
40
45
50
55
60
65
70
75
80
85
15 25 35 45
Number of packets sent (t-1)N
um
ber
of
pa
cket
s se
nt
(t)
Time Packets
Sent (t-1)
Packets
Sent(t)
1 - 43
2 43 54
3 54 72
……
…
N 25 ??
• lag w=1
• Dependent variable = # of packets sent (S [t])
• Independent variable = # of packets sent (S[t-1])
„lag-plot‟
![Page 160: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/160.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 175
Outline
• Motivation
• ...
• Linear Forecasting
– Auto-regression: Least Squares; RLS
– Co-evolving time sequences
– Examples
– Conclusions
![Page 161: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/161.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 176
More details:
• Q1: Can it work with window w>1?
• A1: YES!
xt-2
xt
xt-1
![Page 162: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/162.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 177
More details:
• Q1: Can it work with window w>1?
• A1: YES! (we‟ll fit a hyper-plane, then!)
xt-2
xt
xt-1
![Page 163: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/163.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 178
More details:
• Q1: Can it work with window w>1?
• A1: YES! (we‟ll fit a hyper-plane, then!)
xt-2
xt-1
xt
![Page 164: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/164.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 179
More details:
• Q1: Can it work with window w>1?
• A1: YES! The problem becomes:
X[N w] a[w 1] = y[N 1]
• OVER-CONSTRAINED– a is the vector of the regression coefficients
– X has the N values of the w indep. variables
– y has the N values of the dependent variable
Skip
![Page 165: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/165.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 180
More details:
• X[N w] a[w 1] = y[N 1]
N
w
NwNN
w
w
y
y
y
a
a
a
XXX
XXX
XXX
2
1
2
1
21
22221
11211
,,,
,,,
,,,
Ind-var1 Ind-var-w
time
Skip
![Page 166: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/166.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 181
More details:
• X[N w] a[w 1] = y[N 1]
N
w
NwNN
w
w
y
y
y
a
a
a
XXX
XXX
XXX
2
1
2
1
21
22221
11211
,,,
,,,
,,,
Ind-var1 Ind-var-w
time
Skip
![Page 167: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/167.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 182
More details
• Q2: How to estimate a1, a2, … aw = a?
• A2: with Least Squares fit
• (Moore-Penrose pseudo-inverse)
• a is the vector that minimizes the RMSE
from y
a = ( XT X )-1 (XT y)
Skip
![Page 168: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/168.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 183
Even more details
• Q3: Can we estimate a incrementally?
• A3: Yes, with the brilliant, classic method
of „Recursive Least Squares‟ (RLS) (see,
e.g., [Yi+00], for details) - pictorially:
Skip
![Page 169: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/169.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 184
Even more details
• Given:
Independent Variable
Dep
enden
t V
aria
ble
Skip
![Page 170: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/170.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 185
Even more details
Independent Variable
Dep
enden
t V
aria
ble
.
new point
Skip
![Page 171: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/171.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 186
Even more details
Independent Variable
Dep
enden
t V
aria
ble
RLS: quickly compute new best fit
new point
Skip
![Page 172: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/172.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 187
Even more details
• Straightforward Least
Squares
– Needs huge matrix
(growing in size)
O(N×w)
– Costly matrix operation
O(N×w2)
• Recursive LS
– Need much smaller, fixed
size matrix
O(w×w)
– Fast, incremental
computation
O(1×w2)
N = 106, w = 1-100
Skip
![Page 173: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/173.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 188
Even more details
• Q4: can we „forget‟ the older samples?
• A4: Yes - RLS can easily handle that
[Yi+00]:
Skip
![Page 174: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/174.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 189
Adaptability - „forgetting‟
Independent Variable
eg., #packets sent
Dep
enden
t V
aria
ble
eg.,
#by
tes
sent
Skip
![Page 175: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/175.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 202
Outline
• Motivation
• ...
• Linear Forecasting
– Auto-regression: Least Squares; RLS
– Co-evolving time sequences
– Examples
– Conclusions
![Page 176: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/176.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 203
Co-Evolving Time Sequences
• Given: A set of correlated time sequences
• Forecast „Repeated(t)‟
0
10
20
30
40
50
60
70
80
90
1 3 5 7 9 11
Time Tick
Nu
mb
er o
f p
ack
ets
sent
lost
repeated
??
![Page 177: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/177.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 204
Solution:
Q: what should we do?
![Page 178: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/178.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 205
Solution:
Least Squares, with
• Dep. Variable: Repeated(t)
• Indep. Variables: Sent(t-1) … Sent(t-w);
Lost(t-1) …Lost(t-w); Repeated(t-1), ...
• (named: „MUSCLES‟ [Yi+00])
![Page 179: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/179.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 206
Time Series Analysis - Outline
• Auto-regression
• Least Squares; recursive least squares
• Co-evolving time sequences
• Examples
• Conclusions
![Page 180: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/180.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 207
Conclusions - Practitioner‟s
guide
• AR(IMA) methodology: prevailing method
for linear forecasting
• Brilliant method of Recursive Least Squares
for fast, incremental estimation.
• See [Box-Jenkins]
![Page 181: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/181.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 208
Resources: software and urls
• MUSCLES: Prof. Byoung-Kee Yi:
http://www.postech.ac.kr/~bkyi/
• free-ware: „R‟ for stat. analysis
(clone of Splus)
http://cran.r-project.org/
![Page 182: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/182.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 209
Books
• George E.P. Box and Gwilym M. Jenkins and Gregory C.
Reinsel, Time Series Analysis: Forecasting and Control,
Prentice Hall, 1994 (the classic book on ARIMA, 3rd ed.)
• Brockwell, P. J. and R. A. Davis (1987). Time Series:
Theory and Methods. New York, Springer Verlag.
![Page 183: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/183.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 210
Additional Reading
• [Papadimitriou+ vldb2003] Spiros Papadimitriou, Anthony
Brockwell and Christos Faloutsos Adaptive, Hands-Off
Stream Mining VLDB 2003, Berlin, Germany, Sept. 2003
• [Yi+00] Byoung-Kee Yi et al.: Online Data Mining for
Co-Evolving Time Sequences, ICDE 2000. (Describes
MUSCLES and Recursive Least Squares)
![Page 184: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/184.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 211
Next: Kalman filters
BREAK!
![Page 185: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/185.jpg)
CMU SCS
Indexing and Mining Time
Sequences
Part 4
Kalman Filters
Christos Faloutsos
Lei Li
CMU
KDD 2010 1Copyright: C. Faloutsos & L. Li, 2010
![Page 186: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/186.jpg)
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
• Kalman filters at work
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 2
![Page 187: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/187.jpg)
CMU SCS
Intuition
• Tracking moving objects, estimate velocity
and acceleration on the fly
3KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
from FIFA 2010RoboCup 2010
![Page 188: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/188.jpg)
CMU SCS
Linear Dynamical System
• Known parameters
– Original Kalman Filters [Kalman 1960, Rauch
1965]
• Unknown parameters
– Parameter estimation through EM algorithm
[Shumway et al 1982, Ghahramani 1996]
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 4
![Page 189: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/189.jpg)
CMU SCS
Kalman Filters
Given observations of the soccer ball position
t=1..T, “Model parameters”
Goal: two types of prediction
Kalman filtering: Estimate the true position,
velocity & acceleration based on the
previous observations
Kalman smoothing: Estimate for every time
tick, based on all observationsKDD 2010 Copyright: C. Faloutsos & L. Li, 2010 5
?
?
![Page 190: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/190.jpg)
CMU SCS
Kalman Filters (intuition)
t=1, soccer with initial pos, vel and acc.
To estimate the future
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 6
v1
a1
p1
past observations
![Page 191: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/191.jpg)
CMU SCS
Kalman Filters (intuition)
t=2, according to Newton’s law, it should be...
7KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
v1
a1
p1
2p~
2v~2a~
![Page 192: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/192.jpg)
CMU SCS
Kalman Filters (intuition)
t=2, according to Newton’s law, it should be...
however, imperfect soccer/kick movement…
8KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
v1
a1
p1
v2
a2
p2
![Page 193: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/193.jpg)
CMU SCS
Kalman Filters (intuition)
Now take a photo, due to imperfect camera…
9KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
v1
a1
p1
v2
a2
p2
![Page 194: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/194.jpg)
CMU SCS
Kalman Filters (intuition)
What is the best estimate for next time tick?
10KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
v1
a1
p1
v2
a2
p2
![Page 195: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/195.jpg)
CMU SCS
Kalman Filters (intuition)
What is the best estimate for next time tick?
11KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
v1
a1
p1
2p
2v2a
![Page 196: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/196.jpg)
CMU SCS
Some math notationname
A transition matrix
C transmission/ projection/
output matrix
Q transition covariance
R transmission/ projection/
output covariance
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 12
v1
a1
p1
‘Newton’s dynamics’: Transition matrix A
2p~
2v~2a~
v1
a1
p1
2p
2v2a
![Page 197: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/197.jpg)
CMU SCS
Example
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 13
observation
hidden states
(details)
z1=(p1, v1, a1)T
x1=(observed1)
A= 1,1,1/2
0, 1, 1
0, 0, 1
v1
a1
p1
2p
2v2a
C=(1, 0, 0)
transition matrix
output matrix
transition covariance Q
output covariance R
![Page 198: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/198.jpg)
CMU SCS
Example
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 14
observation
hidden states
(details)
z1=(p1, v1, a1)T
x1=(observed1)
A= 1,1,1/2
0, 1, 1
0, 0, 1
C=(1, 0, 0)
transition matrix
output matrix
p2=p1+ v1*∆t + 0.5*a1*∆t2
v2=v1+a1*∆t
a2=a1
v1
a1
p1
![Page 199: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/199.jpg)
CMU SCS
Kalman Filtering (intuition)
Step1, forecast next time tick before observation
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 15
v1
a1
p1
‘Newton’s dynamics’:
Transition matrix A
2p~
2v~2a~
![Page 200: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/200.jpg)
CMU SCS
Kalman Filtering (intuition)
Step 2: adjust estimation after observation
16KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
v1
a1
p1
2p
2v2a
![Page 201: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/201.jpg)
CMU SCS
Example: Kalman filtering
(forward)
17
Time*
observed
Position
*
** *
*
Given:
a sequence of observations, Model parameters (A, C …)
Goal: remove noise and forecast real position
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 202: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/202.jpg)
CMU SCS
Example: Kalman filtering
(forward)
18
Time*
observed
estimated
Position
[R. E. Kalman. A new approach to linear filtering and prediction problems. 1960]
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
t=1
![Page 203: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/203.jpg)
CMU SCS
Example: Kalman filtering
19
Time*
observed
estimated
Position
*
QAVAP
RCPCCPK
PKIV
zACxKzA
T
nn
T
n
T
nn
nnn
nnnnn
11
1
11
1
11
ˆ
)(
)(ˆ
)ˆ(ˆz
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
t=2
Intuition: #2 may
be close to #1
Using the “Newton
Dynamics”
![Page 204: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/204.jpg)
CMU SCS
20
Time*
observed
Position
** *
*
Example: Kalman filtering
*estimated
QAVAP
RCPCCPK
PKIV
zACxKzA
T
nn
T
n
T
nn
nnn
nnnnn
11
1
11
1
11
ˆ
)(
)(ˆ
)ˆ(ˆz
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 205: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/205.jpg)
CMU SCS
Kalman Filters
Given observations of the soccer position
t=1..T, and model parameters (A, C …)
Goal: two types of prediction
Kalman filtering: Estimate the true position,
velocity & acceleration based on the
previous observations
Kalman smoothing: Estimate for every time
tick, based on all observationsKDD 2010 Copyright: C. Faloutsos & L. Li, 2010 21
?
?
![Page 206: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/206.jpg)
CMU SCS
Kalman Smoothing
Given: all observation x1, …, xn
Estimate: the hidden state for every time tick
zt (t=1..n)
Difference from Kalman filtering:
bring future observation back in history
estimate
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 22
??
![Page 207: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/207.jpg)
CMU SCS
23
Time
Forward
*
observed
Position
** *
*
Recap: Kalman filtering
*estimated
QAVAP
RCPCCPK
PKIV
zACxKzA
T
nn
T
n
T
nn
nnn
nnnnn
11
1
11
1
11
ˆ
)(
)(ˆ
)ˆ(ˆz
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 208: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/208.jpg)
CMU SCS
Example: Kalman Smoothing
24
Backward
Time*
observed
Position
** *
*
**
11
1
1
ˆ
)ˆ
(ˆˆ
)ˆˆ(ˆz
n
T
nn
T
nnnnnn
nnnnn
PAVJ
JPVJVV
zAzJz
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
[Rauch et al, Maximum likelihood estimates of linear dynamic systems. 1965]
![Page 209: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/209.jpg)
CMU SCS
Example: Kalman Smoothing
25
Backward
Time*
observed
Position
** *
*
*
estimated
*
**
*
*
1
1
1
ˆ
)ˆ
(ˆˆ
)ˆˆ(ˆz
n
T
nn
T
nnnnnn
nnnnn
PAVJ
JPVJVV
zAzJz
2
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 210: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/210.jpg)
CMU SCS
Example: Kalman Smoothing
26
Backward
Time*
observed
Position
** *
*
*
estimated
*
**
*
*
Reconstructed
signal after
smoothing
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 211: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/211.jpg)
CMU SCS
Outline
• Intuition, example, and definition
– Original Kalman
– Kalman filters with parameter estimation
• Extensions
• Kalman filters at work
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 27
![Page 212: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/212.jpg)
CMU SCS
What if not know the model
parameters?
• E.g. Datacenter sensor temperatures
• no longer “Newton dynamics”
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 28
v1
a1
p1
Transition matrix A =
output matrix C =
2p
2v2a
![Page 213: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/213.jpg)
CMU SCS
Graphical Model Representation
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 29
z1 = µ0 + ω0
zn+1= A∙zn + ωn
xn = C∙zn + εn
Model parameters:
θ={µ0, Q0, A, Q, C, R}
observation
hidden states
(details)
Z1 Z2 Z3 Z4
X1 X2 X3X4
N(A∙z1,Q)
N(µ0,Q0)
N(C∙z3,R)N(C∙z1,R) N(C∙z2,R) N(C∙z4,R) …
N(A∙z2,Q) N(A∙z3,Q) N(A∙z4,Q)
Gaussian
noise
![Page 214: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/214.jpg)
CMU SCS
Linear Dynamical Systems:
parametersname meaning & example
µ0 initial state for hidden
variable
e.g. initial position, velocity &
acceleration
A transition matrix how the states move forward, e.g.
soccer flying in the air
C transmission/ projection/
output matrix
hidden state observation, e.g.
camera taking picture of the soccer
Q0 Initial covariance
Q transition covariance how precision is the soccer motion
R transmission/ projection
covariance
i.e. observation noise; e.g. how
accurate is the cameraKDD 2010 Copyright: C. Faloutsos & L. Li, 2010 30
![Page 215: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/215.jpg)
CMU SCS
Estimating parameters of LDS• Given: a sequence of observations (e.g. car
positions)
• Find: best-fit parameters
• Basic principle: maximum likelihood
• Through Expectation-Maximization Alg.
31
Z1 Z2 Z3 Z4
X1 X2 X3X4
N(A∙z1,Q)
N(µ0,Q0)
N(C∙z3,R)N(C∙z1,R) N(C∙z2,R) N(C∙z4,R)…
N(A∙z2,Q) N(A∙z3,Q) N(A∙z4,Q)
![Page 216: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/216.jpg)
CMU SCS
Learning LDS: EM alg.
• E-step: Kalman filtering-smoothing
– Estimate the hidden variables based on
observation and current parameters
• M-step:
– Update parameters (A, C…) to
maximize
the log-likelihood.
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 32
Z1
Z2
Z3
Z4
X1
X2
X3
X4
N(A∙z1,Q)
N(µ0,Q0)
N(C∙z3,R)N(C∙z1,R) N(C∙z2,R) N(C∙z4,R)…
N(A∙z2,Q) N(A∙z3,Q) N(A∙z4,Q)
[Shumway et al 1982, Ghahramani 1996]
![Page 217: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/217.jpg)
CMU SCS
EM alg. Intuition
33
E-step: compute hidden states using Kalman
filtering-smoothing (exactly as before)
Time*
observed
Position
** *
*
*
estimated
*
**
*
*
Reconstructed
signal after
smoothing
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 218: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/218.jpg)
CMU SCS
EM alg.
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 34
M-step: Maximizing the log-likelihood
by solving 0A
L
0C
L
...
![Page 219: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/219.jpg)
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
– Handling missing values
– Switching LDS
– Particle filters
• Kalman filters at work
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 35
![Page 220: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/220.jpg)
CMU SCS
Motivation Examples
• Motion Capture:– Markers on human actors
– Cameras used to track the 3D positions
– Duration: 100-500
– 93 dimensional body-local coordinates after preprocessing (31-bones)
• Sensor data missing due to:– Low battery
– RF error
36
From mocap.cs.cmu.edu
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 221: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/221.jpg)
CMU SCS
Time
sensor 1
sensor 2
…
sensorm
blackout
Illustration of data
37KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 222: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/222.jpg)
CMU SCS
DynaMMo: Intuition
38
Position of
Left hand
marker
Position of
right hand
marker
missing
Recover using
Correlation
among multiple
sequences
[Lei Li et al DynaMMo. KDD 2009]KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 223: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/223.jpg)
CMU SCS
DynaMMo: Intuition
39
missing
Recover using
Dynamics
temporal moving
pattern
Position of
Left hand
marker
Position of
right hand
marker
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
[Lei Li et al DynaMMo. KDD 2009]
![Page 224: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/224.jpg)
CMU SCS
LDS with Missing Value
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 40
observation
hidden states Z1 Z2 Z3 Z4
X1 X2 X3X4
N(A∙z1,Q)
N(µ0,Q0)
N(C∙z3,R)N(C∙z1,R) N(C∙z2,R) N(C∙z4,R) …
N(A∙z2,Q) N(A∙z3,Q) N(A∙z4,Q)
partially
observed
![Page 225: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/225.jpg)
CMU SCS
DynaMMo Illustration:
estimate all colored elements
41x1 x2 x3
× × ×
×× ×
C C C
Details in [Li+2009]
z1 z2 z3
A A A
![Page 226: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/226.jpg)
CMU SCS
DynaMMo Illustration: step 1
estimate hidden variables
42x1 x2 x3
× × ×
×× ×
C C C
Details in [Li+2009]
z1 z2 z3
A A A
![Page 227: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/227.jpg)
CMU SCS
DynaMMo Illustration: step 2
estimate missing values
43x1 x2 x3
× × ×
×× ×
Details in [Li+2009]
z1 z2 z3
C C C
A A A
![Page 228: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/228.jpg)
CMU SCS
DynaMMo Illustration: step 3
update model parameters
44x1 x2 x3
× × ×
×× ×
C C C
Details in [Li+2009]
z1 z2 z3
A A A
![Page 229: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/229.jpg)
CMU SCS
DynaMMo Intuition, forward-
backward algorithm
• How to recover the missing values?
45x1 x2 x3
![Page 230: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/230.jpg)
CMU SCS
DynaMMo: How to Recover?
46
×
x1
hidden variables
e.g. velocity,
acceleration
projection
matrix C
z1
C
![Page 231: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/231.jpg)
CMU SCS
DynaMMo: How to Recover?
47
×
×
×
x1 x2
Transition
matrix
projection
matrix
z1 z2
A
C C
![Page 232: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/232.jpg)
CMU SCS
DynaMMo: How to Recover?
48
×
x1 x2
× ×
×z1 z2 z3
C C
x3
A A
![Page 233: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/233.jpg)
CMU SCS
DynaMMo: How to Recover?
49x1 x2 x3
× × ×
××
C C C
z1 z2 z3
A A
![Page 234: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/234.jpg)
CMU SCS
DynaMMo: How to Recover?
50x1 x2 x3
× × ×
×× ×
C C C
z1 z2 z3
similarly backward pass
A A A
![Page 235: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/235.jpg)
CMU SCS
DynaMMo Learning
51
Guess Initial
Estimate Hidden
Recover Missing
Update Model
Fix X,
Estimate P(Z|X;):E(zn|X;), E(znz’n|X;)E(znz’n+1|X;)
fix Z,
estimate
E(X_missing|Z;)Using E(zn|X;),
Fix both X and Z,
estimate new model
parameters
argmax E[log(X,Z;)]
Random
Guess model
parameters
1
2
3
0
(details)
![Page 236: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/236.jpg)
CMU SCS
Result – Better Missing Value Recovery
52
Reconstruction
error
Average missing length
Ideal
Proposed DynaMMo
MSVD
[Srebro’03]Linear
Interpolation
Spline
Dataset:
CMU Mocap #16
mocap.cs.cmu.edumore results in [Li+2009]
![Page 237: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/237.jpg)
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
– Handling missing values
– Switching LDS
– Particle filters
• Kalman filters at work
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 53
![Page 238: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/238.jpg)
CMU SCS
Switching LDS: Intuition
• Tracking human motion
– Start from slow walking
– Transition to running for a while
– Gradually stop
• Each part corresponds to a different
state & dynamics
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 54
![Page 239: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/239.jpg)
CMU SCS
Switching LDS
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 55
S={running,
walking}
Z2,1 Z2,2 Z2,3 Z2,4
X1 X2 X3X4
s1 s2 s3 s4
Z1,1 Z1,2 Z1,3 Z1,4
![Page 240: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/240.jpg)
CMU SCS
Switching LDS: example
S=1, walking
A=A1, C=C1
S=2, running
A=A2 , C=C2
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 56
Z2,1 Z2,2 Z2,3 Z2,4
X1 X2 X3X4
s1 s2 s3 s4
Z1,1 Z1,2 Z1,3 Z1,4
![Page 241: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/241.jpg)
CMU SCS
Switching LDS in Penalty Kick
Before touching ball
S=1, running towards left
A=A1, C=C1
After hitting ball
S=2, kicking towards right
A=A2 , C=C2
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 57
Z2,1 Z2,2 Z2,3 Z2,4
X1 X2 X3X4
s1 s2 s3 s4
Z1,1 Z1,2 Z1,3 Z1,4
From FIFA 2010
![Page 242: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/242.jpg)
CMU SCS
Estimating Parameters for SLDS
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 58
Approximate inference: Variational EM
[Ghahramani&Hinton. Variational learning for switching state-space
models.2000]
Z2,1 Z2,2 Z2,3 Z2,4
X1 X2 X3X4
s1 s2 s3 s4
Z1,1 Z1,2 Z1,3 Z1,4
Z2,1 Z2,2 Z2,3 Z2,4
s1 s2 s3 s4
Z1,1 Z1,2 Z1,3 Z1,4
E-step
![Page 243: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/243.jpg)
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
– Handling missing values
– Switching LDS
– Particle filters
• Kalman filters at work
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 59
![Page 244: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/244.jpg)
CMU SCS
Particle Filters
• What if non-Gaussian noise?
• Inference using Markov chain Monte Carlo
sampling
60[Gordon et al, Nonlinear/non-gaussian bayesian state
estimation. 1993]
v1
a1
p1
v2
a2
p2
![Page 245: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/245.jpg)
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
• Kalman filters at work
– Segmentation & Compression
– Parallel learning on Multi-core
– Motion Stitching
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 61
![Page 246: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/246.jpg)
CMU SCS
How to Segment
• Segment by threshold on reconstruction error
62
original data
reconstruction
error
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
[Lei Li et al DynaMMo. KDD 2009]
![Page 247: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/247.jpg)
CMU SCS
Results – Segmentation
• Find the transition during “running” to
“stop”.
63
left hip
left femur
reconstruction
error
KDD 2010
![Page 248: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/248.jpg)
CMU SCS
Results – Segmentation• Find the transition during “running” to
“stop”.
64
left hip
left femur
reconstruction
error
run stopslow
down
KDD 2010
![Page 249: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/249.jpg)
CMU SCS
How to Compress (DynaMMo)
65
Original data w/ missing values
hidden variables
keep only a portion
(optimal samples)
C
DynaMMo
A
![Page 250: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/250.jpg)
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
• Kalman filters at work
– Segmentation & Compression
– Parallel learning on Multi-core
– Motion Stitching
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 66
![Page 251: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/251.jpg)
CMU SCS
Challenge illustration
Expectation-Maximization Alg.
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Step 8
Timeline for E-step (forward-backward) in learning LDS
1 2 3 4 5
67
EM can
only use
single CPU
due to data
dependency
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 252: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/252.jpg)
CMU SCS
Parallel learning by
Cut-And-Stitch Method
Step 1
Step 2
Step 3
Step 4
1 2 3 4 5
68
Goal:
with 2 CPUs
[Li et al 2008b]
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
[Li et al Cut-And-Stitch. KDD 2008]
![Page 253: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/253.jpg)
CMU SCS
It works!
69
sp
eed
up
# of processors
ideal
Proposed
Cut-And-Stitch
EM algorithm
Dataset:
58 motion sequences
CMU Mocap #16
mocap.cs.cmu.edu,
tested on NCSA super
computer,
![Page 254: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/254.jpg)
CMU SCS
Outline
• Intuition, example, and definition
• Extensions
• Kalman filters at work
– Segmentation & Compression
– Parallel learning on Multi-core
– Motion Stitching
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 70
![Page 255: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/255.jpg)
CMU SCS
Motion Stitching
A Database Approach• Select best
stitchable
segments from a
set of basic motion
pieces and
generate new
natural motions
71KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 256: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/256.jpg)
CMU SCS
Problem Definition
• Given two motion-capture sequences that are to be
stitched together, how can we assess the goodness
of the stitching?
72
12
3
Which stitching looks best?
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 257: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/257.jpg)
CMU SCS
Proposed Method:
Laziness Score [Li+2008a]
• Conjecture: less human effort more natural
• Proposed: use Kalman filters to estimate position,
velocity, acceleration Compute effort/ energy
73KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 258: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/258.jpg)
CMU SCS
Which continues to?
Green or Blue?
74
straight moving U-Turn
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
![Page 259: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/259.jpg)
CMU SCS
Result – Laziness-score prefers
straightforward moving
75
straight moving U-Turn
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010
[Li et al Laziness Score. Eurographics 2008]
![Page 260: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/260.jpg)
CMU SCS
Conclusion• Intuition, example, and definition
– Original kalman filter (known parameters)• Kalman filtering
• Kalman smoothing
– Kalman filters with parameter estimation (EM)
• Extensions– Handling missing values
– Switching linear dynamical
– Particle filters (MCMC sampling)
• Kalman filters at work– Segmentation & compression
– Parallel learning
– Motion stitchingKDD 2010 Copyright: C. Faloutsos & L. Li, 2010 76
![Page 261: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/261.jpg)
CMU SCS
References
• Zoubin Ghahramani and Geoffrey E. Hinton. Parameter estimation for linear dynamical systems. 1996
• Zoubin Ghahramani and Geoffrey E. Hinton. Variational learning for switching state-space models. Neural Computation, 12(4):831–864, 2000.
• N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-gaussian bayesian state estimation. IEE Proceedings F Radar and Signal Processing, 140(2):107–113, 1993.
• R. H. Shumway and D. S. Stoffer. An approach to time series smoothing and forecasting using the em algorithm. Journal of Time Series Analysis, 3:253–264, 1982.
• R. E. Kalman. A new approach to linear filtering and prediction problems. Transactions of the ASME šC Journal of Basic Engineering, (82 (Series D)):35–45, 1960.
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 77
![Page 262: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/262.jpg)
CMU SCS
References (cont’)
• H. Rauch, F Tung, and C T Striebel. Maximum likelihood estimates of linear dynamic systems. AIAA Journal, pages 1445–1450, 1965.
• Lei Li, Wenjie Fu, Fan Guo, Todd C Mowry, and Christos Faloutsos. Cut-and-stitch: efficient parallel learning of linear dynamical systems on smps. In Proceedings of the 14th ACM SIGKDD, pages 471–479, 2008.
• Lei Li, James McCann, Christos Faloutsos, and Nancy Pollard. Laziness is a virtue: Motion stitching using effort minimization. In Short Papers Proceedings of EUROGRAPHICS, 2008.
• Lei Li, James McCann, Nancy S Pollard, and Christos Faloutsos. DynaMMo: mining and summarization of coevolving sequences with missing values. In Proceedings of the 15th ACM SIGKDD,, pages 507–516, 2009.
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 78
![Page 263: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/263.jpg)
CMU SCS
Software
• DynaMMo code (matlab) for missing value,
compression & segmentation.
• Parallel learning (in C) for LDS
• http://www.cs.cmu.edu/~leili/
• http://www.cs.cmu.edu/~leili/pubs/dynamm
o.2.1.2.zip
• http://www.cs.cmu.edu/~leili/paralearn/para
learn.0.1.zip (running on gcc 4.2.0 above)
KDD 2010 Copyright: C. Faloutsos & L. Li, 2010 79
![Page 264: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/264.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 212
![Page 265: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/265.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 213
Outline• Motivation
• Similarity Search and Indexing
• DSP
• Linear Forecasting
• Kalman filters
• Bursty traffic - fractals and multifractals
• Non-linear forecasting
• Conclusions
![Page 266: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/266.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 214
Outline
• Motivation
• ...
• Linear Forecasting
• Bursty traffic - fractals and multifractals
– Problem
– Main idea (80/20, Hurst exponent)
– Results
![Page 267: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/267.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 215
Recall: Problem #1:
Goal: given a signal (eg., #bytes over time)
Find: patterns, periodicities, and/or compress
time
#bytes Bytes per 30‟
(packets per day;
earthquakes per year)
![Page 268: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/268.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 216
Problem #1
• model bursty traffic
• generate realistic traces
• (Poisson does not work)
time
# bytes
Poisson
![Page 269: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/269.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 217
Motivation
• predict queue length distributions (e.g., to
give probabilistic guarantees)
• “learn” traffic, for buffering, prefetching,
„active disks‟, web servers
![Page 270: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/270.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 218
Q: any „pattern‟?
time
# bytes• Not Poisson
• spike; silence; more
spikes; more silence…
• any rules?
![Page 271: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/271.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 219
solution: self-similarity
# bytes
time time
# bytes
![Page 272: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/272.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 220
solution: self-similarity
# bytes
time time
# bytes
![Page 273: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/273.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 221
solution: self-similarity
# bytes
time time
# bytes
![Page 274: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/274.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 222
solution: self-similarity
# bytes
time time
# bytes
![Page 275: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/275.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 223
But:
• Q1: How to generate realistic traces;
extrapolate; give guarantees?
• Q2: How to estimate the model parameters?
![Page 276: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/276.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 224
Outline
• Motivation
• ...
• Linear Forecasting
• Bursty traffic - fractals and multifractals
– Problem
– Main idea (80/20, Hurst exponent)
– Results
![Page 277: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/277.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 225
Approach
• Q1: How to generate a sequence, that is
– bursty
– self-similar
– and has similar queue length distributions
![Page 278: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/278.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 226
Approach
• A: „binomial multifractal‟ [Wang+02]
• ~ 80-20 „law‟:
– 80% of bytes/queries etc on first half
– repeat recursively
• b: bias factor (eg., 80%)
![Page 279: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/279.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 227
binary multifractals
20 80
![Page 280: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/280.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 228
binary multifractals
20 80
![Page 281: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/281.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 229
Parameter estimation
• Q2: How to estimate the bias factor b?
![Page 282: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/282.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 230
Parameter estimation
• Q2: How to estimate the bias factor b?
• A: MANY ways [Crovella+96]
– Hurst exponent
– variance plot
– even DFT amplitude spectrum! („periodogram‟)
– More robust: „entropy plot‟ [Wang+02]
![Page 283: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/283.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 231
Entropy plot
• Rationale:
– burstiness: inverse of uniformity
– entropy measures uniformity of a distribution
– find entropy at several granularities, to see
whether/how our distribution is close to
uniform.
![Page 284: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/284.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 232
Entropy plot
• Entropy E(n) after n
levels of splits
• n=1: E(1)= - p1 log2(p1)-
p2 log2(p2)
p1 p2
% of bytes
here
![Page 285: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/285.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 233
Entropy plot
• Entropy E(n) after n
levels of splits
• n=1: E(1)= - p1 log(p1)-
p2 log(p2)
• n=2: E(2) = - Si p2,i *
log2 (p2,i)
p2,1 p2,2 p2,3 p2,4
![Page 286: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/286.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 234
Real traffic
• Has linear entropy plot
(-> self-similar)
# of levels (n)
Entropy
E(n)
0.73
![Page 287: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/287.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 235
Observation - intuition:
intuition: slope =
intrinsic dimensionality =
info-bits per coordinate-bit
– unif. Dataset: slope =1
– multi-point: slope = 0
# of levels (n)
Entropy
E(n)
Skip
![Page 288: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/288.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 236
Entropy plot - Intuition
• Slope ~ intrinsic dimensionality (in fact,
„Information fractal dimension‟)
• = info bit per coordinate bit - eg
Dim = 1
Pick a point;
reveal its coordinate bit-by-bit -
how much info is each bit worth to me?
Skip
![Page 289: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/289.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 237
Entropy plot
• Slope ~ intrinsic dimensionality (in fact,
„Information fractal dimension‟)
• = info bit per coordinate bit - eg
Dim = 1
Is MSB 0?
„info‟ value = E(1): 1 bit
Skip
![Page 290: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/290.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 238
Entropy plot
• Slope ~ intrinsic dimensionality (in fact,
„Information fractal dimension‟)
• = info bit per coordinate bit - eg
Dim = 1
Is MSB 0?
Is next MSB =0?
Skip
![Page 291: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/291.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 239
Entropy plot
• Slope ~ intrinsic dimensionality (in fact,
„Information fractal dimension‟)
• = info bit per coordinate bit - eg
Dim = 1
Is MSB 0?
Is next MSB =0?
Info value =1 bit
= E(2) - E(1) =
slope!
Skip
![Page 292: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/292.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 240
Entropy plot
• Repeat, for all points at same position:
Dim=0
Skip
![Page 293: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/293.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 241
Entropy plot
• Repeat, for all points at same position:
• we need 0 bits of info, to determine position
• -> slope = 0 = intrinsic dimensionality
Dim=0
Skip
![Page 294: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/294.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 242
Entropy plot
• Real (and 80-20) datasets can be in-
between: bursts, gaps, smaller bursts,
smaller gaps, at every scale
Dim = 1
Dim=0
0<Dim<1
Skip
![Page 295: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/295.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 243
(Fractals)
• What set of points could have behavior
between point and line?
![Page 296: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/296.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 244
Cantor dust
• Eliminate the middle third
• Recursively!
![Page 297: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/297.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 245
Cantor dust
![Page 298: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/298.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 246
Cantor dust
![Page 299: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/299.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 247
Cantor dust
![Page 300: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/300.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 248
Cantor dust
![Page 301: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/301.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 249
Dimensionality?
(no length; infinite # points!)
Answer: log2 / log3 = 0.6
Cantor dust
![Page 302: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/302.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 250
Some more entropy plots:
• Poisson vs real
Poisson: slope = ~1 -> uniformly distributed
1 0.73
![Page 303: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/303.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 251
B-model
• b-model traffic gives perfectly
linear plot
• Lemma: its slope is
slope = -b log2b - (1-b) log2 (1-b)
• Fitting: do entropy plot; get
slope; solve for b
E(n)
n
![Page 304: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/304.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 252
Outline
• Motivation
• ...
• Linear Forecasting
• Bursty traffic - fractals and multifractals
– Problem
– Main idea (80/20, Hurst exponent)
– Experiments - Results
![Page 305: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/305.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 253
Experimental setup
• Disk traces (from HP [Wilkes 93])
• web traces from LBL
http://repository.cs.vt.edu/
lbl-conn-7.tar.Z
![Page 306: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/306.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 254
Model validation
• Linear entropy plots
Bias factors b: 0.6-0.8
smallest b / smoothest: nntp traffic
![Page 307: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/307.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 255
Web traffic - results
• LBL, NCDF of queue lengths (log-log scales)
(queue length l)
Prob( >l)
How to give guarantees?
![Page 308: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/308.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 256
Web traffic - results
• LBL, NCDF of queue lengths (log-log scales)
(queue length l)
Prob( >l)20% of the requests
will see
queue lengths <100
![Page 309: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/309.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 257
Conclusions
• Multifractals (80/20, „b-model‟,
Multiplicative Wavelet Model (MWM)) for
analysis and synthesis of bursty traffic
• can give (probabilistic) guarantees
![Page 310: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/310.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 258
Books
• Fractals: Manfred Schroeder: Fractals, Chaos, Power
Laws: Minutes from an Infinite Paradise W.H. Freeman
and Company, 1991 (Probably the BEST book on
fractals!)
![Page 311: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/311.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 259
Further reading:
• Crovella, M. and A. Bestavros (1996). Self-Similarity in
World Wide Web Traffic, Evidence and Possible Causes.
Sigmetrics.
• [ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger,
D.V. Wilson, On the Self-Similar Nature of Ethernet
Traffic, IEEE Transactions on Networking, 2, 1, pp 1-15,
Feb. 1994.
![Page 312: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/312.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 260
Further reading
• [Riedi+99] R. H. Riedi, M. S. Crouse, V. J. Ribeiro, and R.
G. Baraniuk, A Multifractal Wavelet Model with
Application to Network Traffic, IEEE Special Issue on
Information Theory, 45. (April 1999), 992-1018.
• [Wang+02] Mengzhi Wang, Tara Madhyastha, Ngai Hang
Chang, Spiros Papadimitriou and Christos Faloutsos, Data
Mining Meets Performance Evaluation: Fast Algorithms
for Modeling Bursty Traffic, ICDE 2002, San Jose, CA,
2/26/2002 - 3/1/2002.
![Page 313: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/313.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 261
![Page 314: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/314.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 262
Outline
• Motivation
• Similarity Search and Indexing
• DSP
• Linear Forecasting
• Kalman filters
• Bursty traffic - fractals and multifractals
• Non-linear forecasting
• Conclusions
![Page 315: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/315.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 263
Detailed Outline
• Non-linear forecasting
– Problem
– Idea
– How-to
– Experiments
– Conclusions
![Page 316: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/316.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 264
Recall: Problem #1
Given a time series {xt}, predict its future
course, that is, xt+1, xt+2, ...
Time
Value
![Page 317: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/317.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 265
How to forecast?
• ARIMA - but: linearity assumption
• ANSWER: „Delayed Coordinate
Embedding‟ = Lag Plots [Sauer92]
![Page 318: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/318.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 266
General Intuition (Lag Plot)
xt-1
xt
4-NNNew Point
Interpolate
these…
To get the final
prediction
Lag = 1,
k = 4 NN
![Page 319: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/319.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 267
Questions:
• Q1: How to choose lag L?
• Q2: How to choose k (the # of NN)?
• Q3: How to interpolate?
• Q4: why should this work at all?
![Page 320: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/320.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 268
Q1: Choosing lag L
• Manually (16, in award winning system by
[Sauer94])
![Page 321: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/321.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 269
Q2: Choosing number of
neighbors k
• Manually (typically ~ 1-10)
![Page 322: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/322.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 270
Q3: How to interpolate?
How do we interpolate between the
k nearest neighbors?
A3.1: Average
A3.2: Weighted average (weights drop
with distance - how?)
![Page 323: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/323.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 271
Q3: How to interpolate?
A3.3: Using SVD - seems to perform best
([Sauer94] - first place in the Santa Fe
forecasting competition)
Xt-1
xt
![Page 324: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/324.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 272
Q4: Any theory behind it?
A4: YES!
![Page 325: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/325.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 273
Theoretical foundation
• Based on the “Takens‟ Theorem”
[Takens81]
• which says that long enough delay vectors
can do prediction, even if there are
unobserved variables in the dynamical
system (= diff. equations)
![Page 326: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/326.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 274
Theoretical foundation
Example: Lotka-Volterra equations
dH/dt = r H – a H*P
dP/dt = b H*P – m P
H is count of prey (e.g., hare)
P is count of predators (e.g., lynx)
Suppose only P(t) is observed (t=1, 2, …).
H
P
Skip
![Page 327: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/327.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 275
Theoretical foundation
• But the delay vector space is a faithful
reconstruction of the internal system state
• So prediction in delay vector space is as
good as prediction in state space
Skip
H
P
P(t-1)
P(t)
![Page 328: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/328.jpg)
CMU SCS
Solution to Volterra-Lotka eq.
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 276from wikipedia
time# prey
# predators
prey
predators
![Page 329: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/329.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 277
Detailed Outline
• Non-linear forecasting
– Problem
– Idea
– How-to
– Experiments
– Conclusions
![Page 330: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/330.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 278
Datasets
Logistic Parabola:xt = axt-1(1-xt-1) + noise Models population of flies [R. May/1976]
time
x(t
)
Lag-plot
![Page 331: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/331.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 279
Datasets
Logistic Parabola:xt = axt-1(1-xt-1) + noise Models population of flies [R. May/1976]
time
x(t
)
Lag-plot
ARIMA: fails
![Page 332: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/332.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 280
Logistic Parabola
Timesteps
Value
Our Prediction from
here
![Page 333: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/333.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 281
Logistic Parabola
Timesteps
Value
Comparison of prediction
to correct values
![Page 334: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/334.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 282
Datasets
LORENZ: Models convection currents in the air
dx / dt = a (y - x)
dy / dt = x (b - z) - y
dz / dt = xy - c z
Skip
![Page 335: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/335.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 283
LORENZ
Timesteps
Value
Comparison of prediction
to correct values
Skip
![Page 336: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/336.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 284
Datasets
Time
Value
• LASER: fluctuations in a Laser over time (used in Santa Fe competition)
Skip
![Page 337: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/337.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 285
Laser
Timesteps
Value
Comparison of prediction
to correct values
Skip
![Page 338: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/338.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 286
Conclusions
• Lag plots for non-linear forecasting
(Takens‟ theorem)
• suitable for „chaotic‟ signals
![Page 339: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/339.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 287
References
• Deepay Chakrabarti and Christos Faloutsos F4: Large-
Scale Automated Forecasting using Fractals CIKM 2002,
Washington DC, Nov. 2002.
• Sauer, T. (1994). Time series prediction using delay
coordinate embedding. (in book by Weigend and
Gershenfeld, below) Addison-Wesley.
• Takens, F. (1981). Detecting strange attractors in fluid
turbulence. Dynamical Systems and Turbulence. Berlin:
Springer-Verlag.
![Page 340: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/340.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 288
References
• Weigend, A. S. and N. A. Gerschenfeld (1994). Time
Series Prediction: Forecasting the Future and
Understanding the Past, Addison Wesley. (Excellent
collection of papers on chaotic/non-linear forecasting,
describing the algorithms behind the winners of the Santa
Fe competition.)
![Page 341: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/341.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 289
Overall conclusions
• Similarity search: Euclidean/time-warping;
feature extraction and SAMs
![Page 342: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/342.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 290
Overall conclusions
• Similarity search: Euclidean/time-warping;
feature extraction and SAMs
• Signal processing: DWT is a powerful tool
![Page 343: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/343.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 291
Overall conclusions
• Similarity search: Euclidean/time-warping;
feature extraction and SAMs
• Signal processing: DWT is a powerful tool
• Linear Forecasting: AR (Box-Jenkins)
![Page 344: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/344.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 292
Overall conclusions
• Similarity search: Euclidean/time-warping;
feature extraction and SAMs
• Signal processing: DWT is a powerful tool
• Linear Forecasting: AR (Box-Jenkins)
• Kalman filters & extensions: forecasting,
pattern discovery, segmentation
![Page 345: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/345.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 293
Overall conclusions
• Similarity search: Euclidean/time-warping;
feature extraction and SAMs
• Signal processing: DWT is a powerful tool
• Linear Forecasting: AR (Box-Jenkins)
• Kalman filters & extensions: forecasting,
pattern discovery, segmentation
• Bursty traffic: multifractals (80-20 „law‟)
![Page 346: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/346.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 294
Overall conclusions
• Similarity search: Euclidean/time-warping;
feature extraction and SAMs
• Signal processing: DWT is a powerful tool
• Linear Forecasting: AR (Box-Jenkins)
• Kalman filters & extensions: forecasting,
pattern discovery, segmentation
• Bursty traffic: multifractals (80-20 „law‟)
• Non-linear forecasting: lag-plots (Takens)
![Page 347: Indexing and Mining Streamscs.kangwon.ac.kr/~ysmoon/courses/2011_1/grad_mining/slides/03.pdf · 1 11 21 31 41 51 61 71 81 91 101 111 121 Fourier appx actual • Dow Jones Industrial](https://reader033.fdocuments.us/reader033/viewer/2022060606/605b3020beb76723cc1289c2/html5/thumbnails/347.jpg)
CMU SCS
KDD 2010 (c) 2010, C. Faloutsos, Lei Li 295
christos AT cs.cmu.edu
www.cs.cmu.edu/~christos
leili AT cs.cmu.edu
www.cs.cmu.edu/~leili
www.cs.cmu.edu/~leili/pubs/dynammo.2.1.2.zip
THANK YOU!