Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7....

56
Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B. Aditya Prakash, Christos Faloutsos School of Computer Science Carnegie Mellon University Machine Learning Lunch 1 11/29/2010

Transcript of Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7....

Page 1: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Parsimonious Linear Fingerprinting for Time Series

Lei Lijoint work with B. Aditya Prakash, Christos Faloutsos

School of Computer ScienceCarnegie Mellon University

Machine Learning Lunch

111/29/2010

Page 2: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Why study time series?

2

Page 3: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

• Synthesize human motion

• Design robots to assist the disabled

3

Motion Capture

Right hand

Left hand

walking motion

Page 4: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Network Security

• Anomaly detection in computer network

4

Network traffic

Page 5: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Healthcare

• Monitoring physiologic signals at ICU

• Help early detection of fatal event using forecasting, classification/clustering

Blood pressure

Respiratory rate

5

Page 6: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Datacenter Monitoring

• Monitoring a datacenter with 5000 servers: 1TB data per day, 55 million streams ([Reeves+ 2009])

• Goal: save energy in data centers

– US alone, $4.5billionpower consumption (2006)

Temperature in datacenter

6

Page 7: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Environmental Monitoring

• Chlorine sensor in drinking water systems

7

Page 8: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Central Problem

• Estimate “Similarity” among time sequences

8

Are they Similar ?

Extract features

Features (e.g. average, Fourier) features

Distance( , )

Page 9: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

With similarity function:find the most similar motion sequence

SELECT * FROM

WHERE time_seq.

LIKE

9

Page 10: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Underlying Question

10

What are good features / “fingerprints”?

e.g. length=4k, in Chlorine measurement

Requirements of good features:

1. Time Shift2. Frequency

Proximity3. Grouping

Harmonics

Page 11: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Benefits

Good features / “fingerprints” help with

• Answering similarity queries

• Clustering/Classification

• Compression

• Forecasting

11

Page 12: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Outline

• Motivation & Problem Definition

• An Obvious Solution & Related Work

• Proposed Method: Intuition & Example

• Experiments & Results

• PLiF: High Level Ideas

• PLiF: Low Level Details

• Conclusion

12

Page 13: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Related Work

• Time series indexing [Keogh et al 02,04, …]

• Distance function

– Euclidean distance [Rafiei et al 97, Ogras et al 06, …]

– Dynamic time warping [Fu et al 05, Keogh 02, …]

• Fourier / Wavelets [Gilbert et al 01, Jahangiri et al 01, …]

• Dimensionality Reduction

– PCA / SVD [Jolliffe 86]

– ICA [Hyvarinen et al 01]

13

Page 14: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

An Obvious (but failed) Solution

• Principal Component Analysis (PCA) / SVD

– Dimensionality reduction

– Very effective in many cases with high dimensional data

– “Swiss army knife” in linear algebra

14

Solved?

Page 15: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Let us try the “Swiss Army Knife”

15

PCA gets confused

walking motion

running motion

49 motion sequences

Page 16: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Outline

• Motivation

• An Obvious Solution & Related Work

• Proposed Method: Intuition & Example

• Experiments & Results

• PLiF: High Level Ideas

• PLiF: Low Level Details

• Conclusion

16

Page 17: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Why PCA fails?

• Properties of Good features / “fingerprints”:

(a) Time shift / lag independent

(b) frequency proximity

(c) grouping harmonics

17

Page 18: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Example: synthetic signals

Equations

(X1) sin(2πt/100)

(X2) cos(2πt/100)

(X3) sin(2πt/98 + π/6)

(X4)sin(2πt/110) + 0.2sin(2πt/30)

(X5)cos(2πt/110) + 0.2sin(2πt/30 + π/4)

18

Page 19: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Intuition of “fingerprints”

19

(a) Time shift

e.g.left-foot-start walking

v.s.right-foot-start walking

(X1)

(X2)

(X3)

(X4)

(X5)

Page 20: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Intuition of “fingerprints”

20

(b)nearby frequency

e.g.running

v.s.fast running

(X1)

(X2)

(X3)

(X4)

(X5)

Page 21: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Intuition of “fingerprints”

21

(c) groups of harmonics

~ human voices

(X1)

(X2)

(X3)

(X4)

(X5)

Page 22: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

How to extract fingerprints?

22

fingerprint2/harmonics 1/100

fingerprint1/harmonics 1/110 & 1/30

“walking”

“running”

Proposed PLiF

0.1 1.0

0.1 1.0

0.1 0.9

1.0 0.1

1.0 0.1

(X1)

(X2)

(X3)

(X4)

(X5)

Details later

Page 23: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

For expert: Why not SVD/PCA?

23

PCA PLiF

-

+

no clear grouping

Confused!

500

(X1)

(X2)

(X3)

(X4)

(X5)

FP1 FP2PC1 PC2

Page 24: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Beyond features

24

Good ClusteringG1

Good compressionG2

Ability to forecastG3

ScalabilityG4

(a) lag independent(b) frequency proximity(c) grouping harmonics

Find good/interpretable fingerprints for

functionalgoals

computationalgoals

requirements

Page 25: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Outline

• Motivation

• An Obvious Solution & Related Work

• Proposed Method: Intuition & Example

• Experiments & Results

• PLiF: High Level Ideas

• PLiF: Low Level Details

• Conclusion

25

Page 26: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Datasets

26

Size:10 × 103kwww.datapository.net

Size:166 × 4kcourtesy of Prof. J. M. VanBriesen

Size: 49 × 100-500mocap.cs.cmu.edu

MocapBGPnetwork traffic Chlorine

Page 27: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Experiment: Goals to meet

27

Good ClusteringG1

Good compressionG2

Ability to forecastG3

ScalabilityG4

functionalgoals

computationalgoals

Find good/interpretable fingerprints for

Page 28: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Result – Clustering

28

PLiF two “fingerprints”

Accuracy = 93.9% Accuracy = 51.0%

PCA top 2 components

walking motion running motion

Page 29: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Result – Clustering

29

BGP data: PLiF + hierarchical clustering

Page 30: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Experiment: Goals

30

Good ClusteringG1

Good compressionG2

Ability to forecastG3

ScalabilityG4

Find good/interpretable fingerprints for

Page 31: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Result - Compression

31

Chlorine 166 * 4k

Storing only the PLiF features& sampling of hidden variables

Idealcompression ratio

error

Page 32: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Result - Compression

32

Mocap: 93 * 300

Storing only the PLiF features& sampling of hidden variables

Ideal

compression ratio

error

Page 33: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Experiment: Goals

33

Good ClusteringG1

Good compressionG2

Ability to forecastG3

ScalabilityG4

Find good/interpretable fingerprints for

later

Page 34: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Scalability

34

Linear to sequence length!

sequence lengthw

all c

lock

tim

e (s

)sequence length

wal

l clo

ck t

ime

(s)

Page 35: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Outline

• Motivation

• An Obvious Solution & Related Work

• Proposed Method: Intuition & Example

• Experiments & Results

• PLiF: High Level Ideas

• PLiF: Low Level Details

• Conclusion

35

Page 36: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

how PLiF works?

36

find hidden variable/harmonics

500

(X1)

(X2)

(X3)

(X4)

(X5)

HV2HV1 HV3

0 1.0

0 1.0

0 0.9

1.0 0

1.0 0

0

0

0

1.0

1.0

Mixing weights

Page 37: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

An Analog of Hidden Variables

37

Harmonic f=1/100

harmonicsf=1/110

f=1/30

Mixing weights = participation strength of sound sources in observation (mic.)

(X1) (X2) (X3) (X4) (X5)

Page 38: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Grouping Correlated Harmonics

38

harmonicsf=1/110

f=1/30

(X4) (X5)

HV1’ = {HV1, HV2}

(X4)

(X5)

Page 39: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

39

Fingerprints

HV1’ HV2’(=HV3)

0 1.0

0 1.0

0 0.9

1.0 0

1.0 0

(X1)

(X2)

(X3)

(X4)

(X5)

Page 40: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

40

HV1’

0 1.0

0 1.0

0 0.9

1.0 0

1.0 0

“ ”

HV2’

Page 41: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

41

How to interpret?

Proposed PLiF

-

+harmonic. 1/100

Group of harmonics 1/110 & 1/30

Page 42: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Outline

• Motivation

• An Obvious Solution & Related Work

• Proposed Method: Intuition & Example

• Experiments & Results

• PLiF: High Level Ideas

• PLiF: Low Level Details

• Conclusion

42

Page 43: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Steps of PLiFL: Overview

43

S4

S1

S2

S3

Learning Dynamics

Finding Canonical Form

Handling the Lag

Grouping Harmonics

Kalman/LDS

Eigenvaluedecomposition

SVD

Page 44: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Why to do …?

44

(a) lag independent(b) frequency proximity(c) grouping harmonics

S4

S1

S2

S3

Learning Dynamics

Canonical Form

Handling Lag

Grouping Harmonics

PLiF alg. steps

PLiF Goals

Good ClusteringG1

Good compressionG2

Ability to forecastG3

ScalabilityG4

Requirements fingerprints

Page 45: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

45

Warning!A lot math

Only if you want to implement

http://www.cs.cmu.edu/~leili/

Page 46: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Step 1. Learning Dynamics

• Use Kalman filters / Linear Dynamical Systems (LDS) to learn the hidden variables

46

Page 47: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Underlying Model:Linear Dynamical Systems

47

Details

Page 48: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Dynamics/Transition in Hidden Variables

48

HV(t+1)

transition matrixA

HV(t)

- enables forecasting

Page 49: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Learning Mixing Weights

49

mixing/output matrix C-

+

Expectation-Maximization algorithm [Ghahramani 96]

Page 50: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Step 2: Canonicalization

• Use eigen-decomposition on transition matrix A to find “harmonics” and mixing weight of harmonics

50

find

Page 51: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Canonicalization

adds Interpretability

51

Time series of HV after canonicalization (real part)

frequency

Growing/shrinkingtrend

“Harmonics”

HV before

f=1/110

f=1/100

f=1/30

Page 52: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Step 4: Grouping Harmonics

52

-

+

Group of harmonics 1/110 & 1/30

harmonics.1/100

Page 53: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Conclusion

• Intuition of PLiF– three requirements of fingerprints

• How it works– Four steps in the algorithm

• What to do with PLiF– Similarity, clustering, compression, forecasting, etc.

• Experiments on a diverse set of data– It really works

– It is fast & scalable

53

Page 54: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Overview

54

(a) lag independent(b) frequency proximity(c) grouping harmonics

S4

S1

S2

S3

Learning Dynamics

Canonical Form

Handling Lag

Grouping Harmonics

PLiF alg. steps

PLiF Goals

Good ClusteringG1

Good compressionG2

Ability to forecastG3

ScalabilityG4

Requirements fingerprints

Page 55: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Future Work

• Non-Gaussian noise

• Nonlinear transition

55

Page 56: Parsimonious Linear Fingerprinting for Time Serieslilei/TALKS/plif-mltalk2010.pdf · 2021. 7. 21. · Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B.

Parsimonious Linear Fingerprinting for Time Series

Lei Lijoint work with B. Aditya Prakash, Christos Faloutsos

School of Computer ScienceCarnegie Mellon University

56

P LiF

LiP F

PLiF

PLiF two “fingerprints”

Accuracy = 93.9%

walking motion running motion

Thanks! www.cs.cmu.edu/~leili