Lecture 4
description
Transcript of Lecture 4
Data mining and statistical learning, lecture 4
Outline
Regression on a large number of correlated inputs
A few comments about shrinkage methods, such as ridge regression
Methods using derived input directions Principal components regression Partial least squares regression (PLS)
Data mining and statistical learning, lecture 4
Partitioning of the expected squared prediction error
bias
Shrinkage decreases the variance but increases the bias
Shrinkage methods are more robust to structural changes in the analysed data
jjjjjj yyVaryEyEEyyE ˆ)ˆ()()ˆ( 22
Data mining and statistical learning, lecture 4
Advantages of ridge regression over OLS
The models are easier to comprehend because strongly correlated inputs tend to get similar regression coefficients
Generalizations to new data sets are facilitated by a larger robustness to structural changes in the analysed data set
Data mining and statistical learning, lecture 4
Ridge regression
- a note on standardization
The principal components and the shrinkage in ridge regression are scale-dependent.
Inputs are normally standardized to mean zero and variance one prior to the regression
Data mining and statistical learning, lecture 4
Regression methods using derived input directions
Extract linear combinations of the inputs as derived features, and then model the target (response) as a linear function of these features
MmZ Tmmm ,...,1,0 Xα
x1 x2 xp
z1 z2 zM…
…
y
ZβTy 0
Data mining and statistical learning, lecture 4
Absorbance records for ten samples of chopped meat
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1 12 23 34 45 56 67 78 89 100
Channel
Ab
sorb
ance
Sample_1
Sample_2
Sample_3
Sample_4
Sample_5
Sample_6
Sample_7
Sample_8
Sample_9
Sample_10
1 response variable (fat)
100 predictors (absorbance at 100 wavelengths or channels)
The predictors are strongly correlated to each other
Data mining and statistical learning, lecture 4
Absorbance records for ten samples of chopped meat
0.0
1.0
2.0
3.0
4.0
5.0
6.0
1 12 23 34 45 56 67 78 89 100
Channel
Ab
sorb
ance
Sample_12
Sample_48
Sample_133
Sample_145
Sample_176
Sample_186
Sample_215
Sample_43
Sample_44
Sample_45
High fat
samples
Low fat
samples
Data mining and statistical learning, lecture 4
3-D plots of absorbance records for samples of meat
- channels 1, 50 and 100
2
3
223
4
4
5
4
3
25
Channel1
Channel50
Channel100
3D Scatterplot of Channel1 vs Channel50 vs Channel100
Data mining and statistical learning, lecture 4
3-D plots of absorbance records for samples of meat
- channels 40, 50 and 60
3
4
223
4
5
5
4
3
25
Channel60
Channel50
Channel40
3D Scatterplot of Channel60 vs Channel50 vs Channel40
Data mining and statistical learning, lecture 4
3-D plot of absorbance records for samples of meat
- channels 49, 50 and 51
2
3
4
34
4
5
5
4
3
25
Channel49
Channel50
Channel51
3D Scatterplot of Channel49 vs Channel50 vs Channel51
Data mining and statistical learning, lecture 4
Matrix plot of absorbance records for samples of meat
- channels 1, 50 and 100
4
3
2
543
543
5
4
3
432
5
4
3
Channel1
Channel50
Channel100
Matrix Plot of Channel1, Channel50, Channel100
Data mining and statistical learning, lecture 4
Principal Component Analysis (PCA)
• PCA is a technique for reducing the complexity of high dimensional data
• It can be used to approximate high dimensional data with a few dimensions so that important features can be visually examined
Data mining and statistical learning, lecture 4
Principal Component Analysis- two inputs
5
10
15
0 5 10
X1
X2
PC1
PC2
Data mining and statistical learning, lecture 4
3-D plot of artificially generated data
- three inputs
-2
0
2
-20 2
2
4
-2
-44
0
-2
2
z
y
x
Surface Plot of z vs y, xPC1
PC2
Data mining and statistical learning, lecture 4
Principal Component Analysis
The first principal component (PC1) is the direction that maximizes the variance of the projected data
The second principal component (PC2) is the direction that maximizes the variance of the projected data after the variation along PC1 has been removed
The third principal component (PC3) is the direction that maximizes the variance of the projected data after the variation along PC1 and PC2 has been removed
Data mining and statistical learning, lecture 4
Eigenvector and eigenvalue
In this shear transformation of the Mona Lisa, the picture was deformed in such a way that its central vertical axis (red vector) was not modified, but the diagonal vector (blue) has changed direction. Hence the red vector is an eigenvector of the transformation and the blue vector is not. Since the red vector was neither stretched nor compressed, its eigenvalue is 1.
Data mining and statistical learning, lecture 4
Sample covariance matrix
where
mmm
m
ss
ss
...
..
..
..
...
1
111
mjmin
xxxxs
n
kjjkiik
íj ,...,1,,...,1,1
))(((1
2..
Data mining and statistical learning, lecture 4
Eigenvectors of covariance and correlation matrices
The eigenvectors of a covariance matrix provide information about the major orthogonal directions of the variation in the inputs
The eigenvalues provide information about the strength of the variation along the different eigenvectors
The eigenvectors and eigenvalues of the correlation matrix provide scale-independent information about the variation of the inputs
Data mining and statistical learning, lecture 4
Principal Component Analysis
Eigenanalysis of the Covariance Matrix
Eigenvalue 2.8162 0.3835
Proportion 0.880 0.120
Cumulative 0.880 1.000
Variable PC1 PC2
X1 0.523 0.852
X2 0.852 -0.523
5
10
15
0 5 10
X1
X2
Loadings
Data mining and statistical learning, lecture 4
Principal Component Analysis
161514131211109876
1
0
-1
-2
-3
First Component
Sec
ond
Com
pone
nt
Score Plot of X1-X2
Coordinates in the coordinate system determined by the principal components
Data mining and statistical learning, lecture 4
Principal Component Analysis
Eigenanalysis of the Covariance Matrix
Eigenvalue 1.6502 0.7456 0.0075Proportion 0.687 0.310 0.003Cumulative 0.687 0.997 1.000
Variable PC1 PC2 PC3x 0.887 0.218 -0.407y 0.034 -0.909 -0.414z 0.460 -0.354 0.814
-2
0
2
-20 2
2
4
-2
-44
0
-2
2
z
y
x
Surface Plot of z vs y, x
Data mining and statistical learning, lecture 4
Scree plot
321
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Component Number
Eig
envalu
e
Scree Plot of x, ..., z
Data mining and statistical learning, lecture 4
Principal Component Analysis- absorbance data from samples of chopped meat
Eigenanalysis of the Covariance Matrix
Eigenvalue 26.127 0.239 0.078 0.030 0.002 0.001 0.000 0.000 0.000Proportion 0.987 0.009 0.003 0.001 0.000 0.000 0.000 0.000 0.000Cumulative 0.987 0.996 0.999 1.000 1.000 1.000 1.000 1.000 1.000
Data mining and statistical learning, lecture 4
Scree plot- absorbance data
1009080706050403020101
25
20
15
10
5
0
Component Number
Eig
envalu
e
Scree Plot of Channel1, ..., Channel100
One direction is responsible for most of the variation in the inputs
Data mining and statistical learning, lecture 4
Loadings of PC1, PC2 and PC3- absorbance data
9181716151413121111
0.2
0.1
0.0
-0.1
-0.2
Data
PC1PC2PC3
Variable
Loadings of PC1, PC2, PC3
The loadings define derived inputs (linear combinations of the inputs)
Data mining and statistical learning, lecture 4
Software recommendations
Minitab 15 Stat Multivariate Principal components
SAS Enterprise Miner Princomp/Dmneural
Data mining and statistical learning, lecture 4
Regression methods using derived input directions
- Partial Least Squares Regression
Extract linear combinations of the inputs as derived features, and then model the target (response) as a linear function of these features
x1 x1 xp
z1 z2 zM…
…
y
Select the intermediates so that the covariance with the response variable is maximized
Normally, the inputs are standardized to mean zero and variance one prior to the PLS analysis
Data mining and statistical learning, lecture 4
Partial least squares regression (PLS)
Step 1: Standardize inputs to mean zero and variance one
Step 2: Compute the first derived input by setting
where the 1j are standardized univariate regression coefficients of the response vs each of the inputs
Repeat:Remove the variation in the inputs along the directions
determined by existing z-vectors
Compute another derived input
p
jjjxz
111
Data mining and statistical learning, lecture 4
Methods using derived input directions
Principal components regression (PCR)The derived directions are determined by the X-matrix alone,
and are orthogonal
Partial least squares regression (PLS)The derived directions are determined by the covariance of the output and linear combinations of the inputs, and are orthogonal
Data mining and statistical learning, lecture 4
PLS in SAS
The following statements are available in PROC PLS. Items within the brackets < > are optional.
PROC PLS < options > ;
BY variables ;
CLASS variables < / option > ;
MODEL dependent-variables = effects < / options > ;
OUTPUT OUT= SAS-data-set < options > ;
To analyze a data set, you must use the PROC PLS and MODEL statements. You can use the other statements as needed.
Data mining and statistical learning, lecture 4
proc PLS in SAS
proc pls data=mining.tecatorscores method=pls nfac=10;
model fat=channel1-channel100;
output out=tecatorpls predicted=predpls;
proc pls data=mining.tecatorscores method=pcr nfac=10;
model fat=channel1-channel100;
output out=tecatorpcr predicted=predpcr;
run;