1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics,...
Transcript of 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics,...
![Page 1: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/1.jpg)
11
Introduction to Kernel Principal Introduction to Kernel Principal Component Analysis(PCA)Component Analysis(PCA)
Mohammed Nasser Dept. of Statistics, RU,Bangladesh
Email: [email protected]
![Page 2: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/2.jpg)
Contents
Basics of PCA
Application of PCA in Face Recognition
Some Terms in PCA
Motivation for KPCA
Basics of KPCA
Applications of KPCA
![Page 3: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/3.jpg)
High-dimensional Data
Gene expression Face images Handwritten digits
![Page 4: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/4.jpg)
Why Feature Reduction?
• Most machine learning and data mining techniques may not be effective for high-dimensional data – Curse of Dimensionality– Query accuracy and efficiency degrade rapidly as the
dimension increases.
• The intrinsic dimension may be small. – For example, the number of genes responsible for a
certain type of disease may be small.
![Page 5: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/5.jpg)
Why Reduce Dimensionality?
1. Reduces time complexity: Less computation
2. Reduces space complexity: Less parameters
3. Saves the cost of observing the feature
4. Simpler models are more robust on small datasets
5. More interpretable; simpler explanation
6. Data visualization (structure, groups, outliers, etc) if plotted in 2 or 3 dimensions
![Page 6: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/6.jpg)
Feature reduction algorithms
• Unsupervised
– Latent Semantic Indexing (LSI): truncated SVD
– Independent Component Analysis (ICA)
– Principal Component Analysis (PCA)
– Canonical Correlation Analysis (CCA)
• Supervised
– Linear Discriminant Analysis (LDA)
• Semi-supervised
– Research topic
![Page 7: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/7.jpg)
Algebraic derivation of PCs
• Main steps for computing PCs
– Form the covariance matrix S.
– Compute its eigenvectors:
– Use the first d eigenvectors to form the d PCs.
– The transformation G is given by
1 2[ , , , ]dG u u u
1
p
i iu
1
d
i iu
.point A test dTp xGx
![Page 8: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/8.jpg)
Optimality property of PCA
npTndT
ndTnp
XGGXXG
XGX
)(
Dimension reductionReconstruction
ndT XGY
pdTG
npX
Original data
dpG npX
![Page 9: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/9.jpg)
Optimality property of PCA
2
FXX
The matrix G consisting of the first d eigenvectors of the covariance matrix S solves the following min problem:
Main theoretical result:
dF
T
GIGXGGXdp
T2G subject to )(min
reconstruction error
PCA projection minimizes the reconstruction error among all linear projections of size d.
![Page 10: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/10.jpg)
Dimensionality Reduction
• One approach to deal with high dimensional data is by reducing their dimensionality.
• Project high dimensional data onto a lower dimensional sub-space using linear or non-linear transformations.
![Page 11: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/11.jpg)
Dimensionality Reduction
• Linear transformations are simple to compute and tractable.
• Classical –linear- approaches:– Principal Component Analysis (PCA) – Fisher Discriminant Analysis (FDA)
–Singular Value Decomosition (SVD)
--Factor Analysis (FA)
--Canonical Correlation(CCA)
( )ti i iY U X b u a
k x 1 k x d d x 1 (k<<d)k x 1 k x d d x 1 (k<<d)
![Page 12: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/12.jpg)
Principal Component Analysis (PCA)
• Each dimensionality reduction technique finds an appropriate transformation by satisfying certain criteria (e.g., information loss, data discrimination, etc.)
• The goal of PCA is to reduce the dimensionality of the data while retaining as much as possible of the variation present in the dataset.
![Page 13: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/13.jpg)
Principal Component Analysis (PCA)
1 1 2 2
1 2
ˆ ...
where , ,..., is a basein the -dimensionalsub-space (K<N)K K
K
x b u b u b u
u u u K
x̂ x
1 1 2 2
1 2
...
where , ,..., is a basein theoriginal N-dimensionalspaceN N
n
x a v a v a v
v v v
• Find a basis in a low dimensional sub-space:
– Approximate vectors by projecting them in a low dimensional sub-space:
(1) Original space representation:
(2) Lower-dimensional sub-space representation:
• Note: if K=N, then
![Page 14: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/14.jpg)
Principal Component Analysis (PCA)• Example (K=N):
![Page 15: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/15.jpg)
Principal Component Analysis (PCA)
• Methodology
– Suppose x1, x2, ..., xM are N x 1 vectors
![Page 16: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/16.jpg)
Principal Component Analysis (PCA)
• Methodology – cont.
( )Ti ib u x x
![Page 17: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/17.jpg)
Principal Component Analysis (PCA)
• Linear transformation implied by PCA
– The linear transformation RN RK that performs the dimensionality reduction is:
![Page 18: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/18.jpg)
Principal Component Analysis (PCA)
• How many principal components to keep?
– To choose K, you can use the following criterion:
Unfortunately for some data sets to meet this requirement we need K almost equal to N. That is, no effective data reduction is possible.
![Page 19: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/19.jpg)
Principal Component Analysis (PCA)
• Eigenvalue spectrum
λiKλN
Scree plot
![Page 20: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/20.jpg)
Principal Component Analysis (PCA)
• Standardization– The principal components are dependent on the units
used to measure the original variables as well as on the range of values they assume.
– We should always standardize the data prior to using PCA.
– A common standardization method is to transform all the data to have zero mean and unit standard deviation:
![Page 21: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/21.jpg)
CS 479/679Pattern Recognition – Spring 2006
Dimensionality Reduction Using PCA/LDAChapter 3 (Duda et al.) – Section 3.8
Case Studies:Face Recognition Using Dimensionality Reduction
M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991.
D. Swets, J. Weng, "Using Discriminant Eigenfeatures for Image Retrieval", IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(8), pp. 831-836, 1996.
A. Martinez, A. Kak, "PCA versus LDA", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-233, 2001.
![Page 22: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/22.jpg)
Principal Component Analysis (PCA)
• Face Recognition
– The simplest approach is to think of it as a template matching problem
– Problems arise when performing recognition in a high-dimensional space.
– Significant improvements can be achieved by first mapping the data into a lower dimensionality space.
– How to find this lower-dimensional space?
![Page 23: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/23.jpg)
Principal Component Analysis (PCA)• Main idea behind eigenfaces
average face
![Page 24: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/24.jpg)
Principal Component Analysis (PCA)• Computation of the eigenfaces
![Page 25: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/25.jpg)
Principal Component Analysis (PCA)
• Computation of the eigenfaces – cont.
![Page 26: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/26.jpg)
Principal Component Analysis (PCA)• Computation of the eigenfaces – cont.
ui
Mind that this is norm
alized..
![Page 27: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/27.jpg)
Principal Component Analysis (PCA)• Computation of the eigenfaces – cont.
![Page 28: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/28.jpg)
Principal Component Analysis (PCA)• Representing faces onto this basis
![Page 29: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/29.jpg)
Principal Component Analysis (PCA)
• Representing faces onto this basis – cont.
![Page 30: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/30.jpg)
Principal Component Analysis (PCA)
• Face Recognition Using Eigenfaces
![Page 31: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/31.jpg)
Principal Component Analysis (PCA)
• Face Recognition Using Eigenfaces – cont.
– The distance er is called distance within the face space (difs)
– Comment: we can use the common Euclidean distance to compute er, however, it has been reported that the Mahalanobis distance performs better:
![Page 32: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/32.jpg)
Principal Component Analysis (PCA)
• Face Detection Using Eigenfaces
![Page 33: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/33.jpg)
Principal Component Analysis (PCA)
• Face Detection Using Eigenfaces – cont.
![Page 34: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/34.jpg)
Principal Components Analysis
So, principal components are given by:
b1 = u11x1 + u12x2 + ... + u1NxN
b2 = u21x1 + u22x2 + ... + u2NxN
...
bN= aN1x1 + aN2x2 + ... + aNNxN
xj’s are standardized if correlation matrix is used (mean 0.0, SD 1.0)
Score of ith unit on jth principal component
bi,j = uj1xi1 + uj2xi2 + ... + ujNxiN
![Page 35: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/35.jpg)
PCA Scores
4.0 4.5 5.0 5.5 6.02
3
4
5
xi2
xi1
bi,1 bi,2
![Page 36: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/36.jpg)
Principal Components Analysis
Amount of variance accounted for by:
1st principal component, λ1, 1st eigenvalue
2nd principal component, λ2, 2ndeigenvalue
...
λ1 > λ2 > λ3 > λ4 > ...
Average λj = 1 (correlation matrix)
![Page 37: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/37.jpg)
Principal Components Analysis:Eigenvalues
4.0 4.5 5.0 5.5 6.02
3
4
5
λ1λ2
U1
![Page 38: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/38.jpg)
PCA: Terminology• jth principal component is jth eigenvector of
correlation/covariance matrix• coefficients, ujk, are elements of eigenvectors and relate original
variables (standardized if using correlation matrix) to components• scores are values of units on components (produced using
coefficients)• amount of variance accounted for by component is given by
eigenvalue, λj
• proportion of variance accounted for by component is given by λj / Σ λj
• loading of kth original variable on jth component is given by ujk
√λj --correlation between variable and component
![Page 39: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/39.jpg)
Principal Components Analysis
• Covariance Matrix:
– Variables must be in same units
– Emphasizes variables with most variance
– Mean eigenvalue ≠1.0
– Useful in morphometrics, a few other cases
• Correlation Matrix:
– Variables are standardized (mean 0.0, SD 1.0)
– Variables can be in different units
– All variables have same impact on analysis
– Mean eigenvalue = 1.0
![Page 40: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/40.jpg)
PCA: Potential Problems
• Lack of Independence– NO PROBLEM
• Lack of Normality– Normality desirable but not essential
• Lack of Precision– Precision desirable but not essential
• Many Zeroes in Data Matrix– Problem (use Correspondence Analysis)
![Page 41: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/41.jpg)
Principal Component Analysis (PCA)
• PCA and classification (cont’d)
![Page 42: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/42.jpg)
z
v
-3 -2 -1 0 1 2 3
-4-2
02
4Motivation
![Page 43: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/43.jpg)
z
u
-3 -2 -1 0 1 2 3
02
46
8 ???????
Motivation
![Page 44: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/44.jpg)
Motivation
Linear projections will not detect thepattern.
![Page 45: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/45.jpg)
Limitations of linear PCA
1,2,3=1/3
![Page 46: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/46.jpg)
Nonlinear PCA
Three popular methods are available:
1) Neural-network based PCA (E. Oja, 1982)
2)Method of Principal Curves (T.J. Hastie and W. Stuetzle, 1989)
3) Kernel based PCA (B. Schölkopf, A. Smola, and K. Müller, 1998)
![Page 47: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/47.jpg)
PCA
NPCA
![Page 48: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/48.jpg)
Kernel PCA: The main ideaKernel PCA: The main idea
![Page 49: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/49.jpg)
A Useful Theorem for Hilbert space
Let be a Hilbert space and x1, ……xn in . Let =span{x1, ……xn}. Also u and v in .
<xi,u>=<xi,v>, i=1,……,n implies u=v
Proof.
Try your self.
![Page 50: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/50.jpg)
Kernel methods in PCAKernel methods in PCA
Linear PCA Cw w ( 1)
where C is covariance matrix for centered data X:
1
1 2
1Cw (x ' )
span{ ,..... } if 0
n
i ii
w x wn
w x x
'
1
1C x x
l
i iin
(1) and (2) are equivalent conditions.
, , i=1......l i ix w x Cw (2)
![Page 51: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/51.jpg)
Kernel methods in PCAKernel methods in PCA
Now let us suppose:
In Kernel PCA, we do the PCA in feature space.
1
1C (x ) (x ) (what is its meaning??)
lT
i iil
remember about centering!
1
1Cv (x ), (x )
l
i ii
v vl
(*)
: ,the feature spacepR F
Possibly is a very high dimension space.
![Page 52: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/52.jpg)
Kernel Methods in PCAKernel Methods in PCA
Again all solutions with lie in the space generated by
v 0
{ ( ), , ( )}i lx x
It has two useful consequences:
1}
1
span of{ ( ), , ( )}
( )
i ll
i ii
v x x
v x
2) We may instead solve the set of equations
( ), ( ), i=1......l i ix v x Cv
![Page 53: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/53.jpg)
Defining an lxl kernel matrix K:
)x(,)x(x,x jijik
Kernel Methods in PCAKernel Methods in PCA
And using the result (1) in ( 2) we get
2 (3)l K K
But we need not solve (3). It can be shown easily that the following simpler system gives us solutions that are interesting to us.
(4)l K
![Page 54: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/54.jpg)
αKα
Compute eigenvalue problem for the kernel matrix
The solutions (k, k) further need to be normalized
by imposing , 1 since should be with 1k k k k
kv v
If x is our new observation, the feature value (??) will be ( )x
and kth principal score will be
1 1
, ( ) ( ), ( ) ( , )l l
k k ki i i i
i i
v x x x K x x
Kernel Methods in PCAKernel Methods in PCA
![Page 55: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/55.jpg)
Data centering:
l
iiS l 1
)x(1
)x()x()x()x(ˆ
l
jiji
l
ii
l
ii
l
ii
l
ii
kl
zkl
kl
k
llk
1,2
11
11
)x,x(1
)x,(1
)x,x(1
)zx,(
)x(1
)z(ˆ,)x(1
)x()z(ˆ),x(ˆ)zx,(ˆ
Hence, the kernel for the transformed space is
Kernel Methods in PCAKernel Methods in PCA
![Page 56: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/56.jpg)
Expressed as an operation on the kernel matrix this
can be rewritten as
j'jj)K(j'1
j'jK1
Kj'j1
KK̂2
lll
where jj is the all 1s vector.
Kernel Methods in PCAKernel Methods in PCA
![Page 57: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/57.jpg)
Linear PCA
Kernel PCA captures the nonlinear structure of the data
![Page 58: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/58.jpg)
Linear PCA
Kernel PCA captures the nonlinear structure of the data
![Page 59: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/59.jpg)
AlgorithmAlgorithm
Input: Data X={x1, x2, …, xl} in n-dimensional space.
Process: Ki,j= k(xi,xj); i,j=1,…, l.
2
( )
( )
1 1
1 1 1K̂ K j j' K K j j' (j' K j) j j';
ˆ[V, ] eig(K);
1, 1,..., .
x (x ,x)
jj
jk
lj
j i ii j
l l l
v j l
k
Output: Transformed data
… for centered data
Kernel matrix ...
k-dimensional vector projection of new
data into this subspace
![Page 60: 1 Introduction to Kernel Principal Component Analysis(PCA) Mohammed Nasser Dept. of Statistics, RU,Bangladesh Email: mnasser.ru@gmail.com.](https://reader035.fdocuments.us/reader035/viewer/2022062314/56649f395503460f94c55ec0/html5/thumbnails/60.jpg)
Reference
• I.T. Jolliffe. (2002)Principal Component Analysis. • . Schölkopf, et al. (1998 Kernel Principal Component
Analysis)/• B. . Schölkopf and A.J. Smola(2000/20012002)
Learning with Kernels • Christopher J C Burges (2005).Geometric Methods for
Feature Extraction and Dimensional Reduction.