Advanced Introduction to Machine Learning CMU...

75
Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos

Transcript of Advanced Introduction to Machine Learning CMU...

Page 1: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

Advanced Introduction to

Machine Learning CMU-10715

Principal Component Analysis

Barnabás Póczos

Page 2: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

2

Contents

Motivation

PCA algorithms

Applications

Some of these slides are taken from

• Karl Booksh Research group

• Tom Mitchell

• Ron Parr

Page 3: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

3

Motivation

Page 4: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

4

• Data Visualization

• Data Compression

• Noise Reduction

PCA Applications

Page 5: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

5

Data Visualization

Example:

• Given 53 blood and urine samples (features) from 65 people.

• How can we visualize the measurements?

Page 6: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

6

• Matrix format (65x53)

H-WBC H-RBC H-Hgb H-Hct H-MCV H-MCH H-MCHCH-MCHC

A1 8.0000 4.8200 14.1000 41.0000 85.0000 29.0000 34.0000

A2 7.3000 5.0200 14.7000 43.0000 86.0000 29.0000 34.0000

A3 4.3000 4.4800 14.1000 41.0000 91.0000 32.0000 35.0000

A4 7.5000 4.4700 14.9000 45.0000 101.0000 33.0000 33.0000

A5 7.3000 5.5200 15.4000 46.0000 84.0000 28.0000 33.0000

A6 6.9000 4.8600 16.0000 47.0000 97.0000 33.0000 34.0000

A7 7.8000 4.6800 14.7000 43.0000 92.0000 31.0000 34.0000

A8 8.6000 4.8200 15.8000 42.0000 88.0000 33.0000 37.0000

A9 5.1000 4.7100 14.0000 43.0000 92.0000 30.0000 32.0000

Inst

ance

s

Features

Difficult to see the correlations between the features...

Data Visualization

Page 7: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

7

• Spectral format (65 curves, one for each person)

0 10 20 30 40 50 600

100

200

300

400

500

600

700

800

900

1000

measurement

Val

ue

Measurement

Difficult to compare the different patients...

Data Visualization

Page 8: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

8

0 10 20 30 40 50 60 700

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Person

H-Ba

nds

• Spectral format (53 pictures, one for each feature)

Difficult to see the correlations between the features...

Data Visualization

Page 9: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

9

0 50 150 250 350 45050100150200250300350400450500550

C-Triglycerides

C-L

DH

0100

200300

400500

0

200

400

6000

1

2

3

4

C-TriglyceridesC-LDH

M-E

PI

Bi-variate Tri-variate

How can we visualize the other variables???

… difficult to see in 4 or higher dimensional spaces...

Data Visualization

Page 10: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

10

• Is there a representation better than the coordinate axes?

• Is it really necessary to show all the 53 dimensions?

• … what if there are strong correlations between the features?

• How could we find the smallest subspace of the 53-D space that keeps the most information about the original data?

• A solution: Principal Component Analysis

Data Visualization

Page 11: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

11

PCA Algorithms

Page 12: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

12

Orthogonal projection of the data onto a lower-dimension linear space that...

maximizes variance of projected data (purple line)

minimizes the mean squared distance between

• data point and

• projections (sum of blue lines)

PCA:

Principal Component Analysis

Page 13: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

13

Idea:

Given data points in a d-dimensional space, project them into a lower dimensional space while preserving as much information as possible.

• Find best planar approximation of 3D data

• Find best 12-D approximation of 104-D data

In particular, choose projection that minimizes squared error in reconstructing the original data.

Principal Component Analysis

Page 14: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

14

PCA Vectors originate from the center of mass.

Principal component #1: points in the direction of the largest variance.

Each subsequent principal component

• is orthogonal to the previous ones, and

• points in the directions of the largest variance of the residual subspace

Principal Component Analysis

Properties:

Page 15: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

15

2D Gaussian dataset

Page 16: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

16

1st PCA axis

Page 17: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

17

2nd PCA axis

Page 18: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

18

m

i

i

T

i

T

m 1

2

111

2 })]({[1

maxarg xwwxwww

}){(1

maxarg1

2

i1

1

m

i

T

mxww

w

To find w2, we maximize

the variance of the

projection in the residual

subspace

To find w1, maximize the variance of projection of x

x’ PCA reconstruction

Given the centered data {x1, …, xm}, compute the principal vectors:

1st PCA vector

2nd PCA vector

x

w1

w

x’=w1(w1Tx)

w

PCA algorithm I (sequential)

x-x’ w2

Page 19: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

19

m

i

k

j

i

T

jji

T

km 1

21

11

})]({[1

maxarg xwwxwww

We maximize the variance

of the projection in the

residual subspace

Maximize the variance of projection of x

x’ PCA reconstruction

Given w1,…, wk-1, we calculate wk principal vector as before:

kth PCA vector

w1(w1Tx)

w2(w2Tx)

x

w1

w2 x’=w1(w1

Tx)+w2(w2Tx)

w

PCA algorithm I (sequential)

Page 20: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

20

• Given data {x1, …, xm}, compute covariance matrix

• PCA basis vectors = the eigenvectors of

• Larger eigenvalue more important eigenvectors

m

i

T

im 1

))((1

xxxx

m

i

im 1

1xxwhere

PCA algorithm II (sample covariance matrix)

Page 21: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

21

PCA algorithm(X, k): top k eigenvalues/eigenvectors

% X = N m data matrix,

% … each data point xi = column vector, i=1..m

• X subtract mean x from each column vector xi in X

• X XT … covariance matrix of X

• { i, ui }i=1..N = eigenvectors/eigenvalues of

... 1 2 … N

• Return { i, ui }i=1..k % top k PCA components

m

im 1

1ixx

PCA algorithm II (sample covariance matrix)

Page 22: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

22

Finding Eigenvectors with Power Iteration

v=Sigma*v;

v=v/sqrt(v'*v);

) vPCA1

Power iteration 1:

Power iteration 2:

vPCA1T*Sigma*vPCA1

Sigma2=Sigma- *vPCA1*vPCA1T;

v=Sigma2*v;

v=v/sqrt(v'*v);

) vPCA2

First eigenvector:

Second eigenvector:

Page 23: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

23

Page 24: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

24

Singular Value Decomposition of the centered data matrix X.

Xfeatures samples = USVT

X VT S U =

samples

significant

noise

nois

e noise

signif

ican

t

sig.

PCA algorithm III (SVD of the data matrix)

Page 25: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

25

• Columns of U

• the principal vectors, { u(1), …, u(k) }

• orthogonal and has unit norm – so UTU = I

• Can reconstruct the data using linear combinations of { u(1), …, u(k) }

• Matrix S

• Diagonal

• Shows importance of each eigenvector

• Columns of VT

• The coefficients for reconstructing the samples

PCA algorithm III

Page 26: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

26

Applications

Page 27: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

Want to identify specific person, based on facial image

Robust to glasses, lighting,…

Can’t just use the given 256 x 256 pixels

Face Recognition

27

Page 28: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

Method A: Build a PCA subspace for each person and check which subspace can reconstruct the test image the best

Method B: Build one PCA database for the whole dataset and then classify based on the weights.

Applying PCA: Eigenfaces

28

Page 29: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

Example data set: Images of faces

• Eigenface approach [Turk & Pentland], [Sirovich & Kirby]

Each face x is …

• 256 256 values (luminance at location)

• x in 256256 (view as 64K dim vector)

Form X = [ x1 , …, xm ] centered data mtx

Compute = XXT

Problem: is 64K 64K … HUGE!!!

Applying PCA: Eigenfaces

29

256 x

256

real va

lues

m faces

X =

x1, …, xm

Page 30: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

Suppose m instances, each of size N

• Eigenfaces: m=500 faces, each of size N=64K

Given NN covariance matrix , can compute

• all N eigenvectors/eigenvalues in O(N3)

• first k eigenvectors/eigenvalues in O(k N2)

If N=64K, this is EXPENSIVE!

30

Computational Complexity

Page 31: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

• Note that m<<64K • Use L=XTX instead of =XXT

• If v is eigenvector of L,

then Xv is eigenvector of

Proof: L v = v

XTX v = v

X (XTX v) = X( v) = Xv

(XXT)X v = (Xv)

(Xv) = (Xv)

256 x

256

real va

lues

m faces

X =

x1, …, xm

31

A Clever Workaround

Page 32: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

32

Principle Components (Method B)

Page 33: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

Requires carefully controlled data:

• All faces centered in frame

• Same size

• Some sensitivity to angle

Method is completely knowledge free

• (sometimes this is good!)

• Doesn’t know that faces are wrapped around 3D objects (heads)

• Makes no effort to preserve class distinctions

33

Shortcomings

Page 34: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

34

Happiness subspace

Page 35: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

35

Disgust subspace

Page 36: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

36

Facial Expression Recognition Movies

Page 37: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

37

Facial Expression Recognition Movies

Page 38: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

38

Facial Expression Recognition Movies

Page 39: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

Image Compression

Page 40: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

40

Divide the original 372x492 image into patches:

• Each patch is an instance that contains 12x12 pixels on a grid

Consider each as a 144-D vector

Original Image

Page 41: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

41

L2 error and PCA dim

Page 42: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

42

PCA compression: 144D ) 60D

Page 43: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

43

PCA compression: 144D ) 16D

Page 44: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

44

2 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

122 4 6 8 10 12

2

4

6

8

10

12

16 most important eigenvectors

Page 45: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

45

PCA compression: 144D ) 6D

Page 46: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

46

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

6 most important eigenvectors

Page 47: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

47

PCA compression: 144D ) 3D

Page 48: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

48

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

2 4 6 8 10 12

2

4

6

8

10

12

3 most important eigenvectors

Page 49: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

49

PCA compression: 144D ) 1D

Page 50: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

50

Looks like the discrete cosine bases of JPG!...

60 most important eigenvectors

Page 51: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

51 http://en.wikipedia.org/wiki/Discrete_cosine_transform

2D Discrete Cosine Basis

Page 52: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

Noise Filtering

Page 53: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

53

x x’

U x

Noise Filtering

Page 54: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

54

Noisy image

Page 55: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

55

Denoised image using 15 PCA components

Page 56: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

PCA Shortcomings

Page 57: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

57 PCA doesn’t know about class labels!

Problematic Data Set for PCA

Page 58: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

58

• PCA maximizes variance, independent of class

magenta

• FLD attempts to separate classes

green line

PCA vs Fisher Linear Discriminant

Page 59: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

59 PCA cannot capture NON-LINEAR structure!

Problematic Data Set for PCA

Page 60: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

60

PCA • finds orthonormal basis for data • Sorts dimensions in order of “importance” • Discard low significance dimensions

Applications:

• Get compact description • Remove noise • Improve classification (hopefully)

Not magic:

• Doesn’t know class labels • Can only capture linear variations

One of many tricks to reduce dimensionality!

PCA Conclusions

Page 61: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

61

PCA Theory

Page 62: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

62

GOAL:

Justification of Algorithm II

Page 63: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

63

x is centered!

Justification of Algorithm II

Page 64: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

64

GOAL:

Use Lagrange-multipliers for the constraints.

Justification of Algorithm II

Page 65: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

65

Justification of Algorithm II

Page 66: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

66

Thanks for the Attention!

Page 67: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

67

Attic

Page 68: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

68

Kernel PCA

Page 69: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

69

Performing PCA in the feature space

Lemma

Proof:

Kernel PCA

Page 70: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

70

Lemma

Kernel PCA

Page 71: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

71

Proof

Kernel PCA

Page 72: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

72

How to use to calculate the projection of a new sample t?

Where was I cheating?

The data should be centered in the feature space, too! But this is manageable...

Kernel PCA

Page 73: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

73 http://en.wikipedia.org/wiki/Kernel_principal_component_analysis

Input points before kernel PCA

Page 74: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

74

The three groups are distinguishable using the first component only

Output after kernel PCA

Page 75: Advanced Introduction to Machine Learning CMU …bapoczos/Classes/ML10715_2015Fall/slides/...Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás

… faster if train with…

• only people w/out glasses

• same lighting conditions

75

Reconstructing… (Method B)