Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis...
-
Upload
sierra-verrier -
Category
Documents
-
view
218 -
download
0
Transcript of Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis...
![Page 1: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/1.jpg)
Pattern Recognition for the Natural SciencesPattern Recognition for the Natural Sciences
Explorative Data AnalysisExplorative Data Analysis
Principal Component Analysis (PCA)Principal Component Analysis (PCA)
Lutgarde Buydens, IMM, Analytical ChemistryLutgarde Buydens, IMM, Analytical Chemistry
![Page 2: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/2.jpg)
Why Explorative Data Analysis ?Why Explorative Data Analysis ?
Classical ScienceClassical Science
?
[System
Paradigm change in natural sciences
Hypothesis driven
![Page 3: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/3.jpg)
Why Explorative Data Analysis? Why Explorative Data Analysis?
Classical ScienceClassical Science Science Science with advanced technologies with advanced technologies
?
[System
ExplorativeAnalysis of data ?
System
Paradigm change in natural sciences
Hypothesis driven Data driven
![Page 4: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/4.jpg)
Explorative Data AnalysisExplorative Data Analysis
Advanced technology: High throughput (high quality) analysis
NMR, HPLC, GC, MS/MS, immune assays, HybridsNano/Sensor technology
Genomics (gene expression profiling)
Proteomics, Metabolomics
Fingerprinting
Profiling in drug design
Overwhelming amount of dataOverwhelming amount of data
![Page 5: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/5.jpg)
Explorative Data AnalysisExplorative Data Analysis
Visualization (principal component analysis, projections)
Unsupervised Pattern recognition (clustering)
Supervised Pattern recognition (classification)
Quantitative analysis (correlations, predictions)
![Page 6: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/6.jpg)
Principal Component Analysis: an ExamplePrincipal Component Analysis: an Example
150 samples of Italian wines from the same region 3 different cultivars
Is it possible to characterise cultivars ?Which variables are relevant for which cultivars ?
![Page 7: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/7.jpg)
p (13 properties) (variables)
(150 wine samples) n(objects)
Xij Flavanoid concentration of sample 75
X
xij
1 7
75
xj
xi
Flavanoid concentration
Data MatrixData Matrix
![Page 8: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/8.jpg)
Principal Component AnalysisPrincipal Component Analysis
Barplot of 1 wine sample
![Page 9: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/9.jpg)
Principal Component AnalysisPrincipal Component Analysis
Line plot of 1 wine sampleBarplot of 1 wine sample
![Page 10: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/10.jpg)
Principal Component AnalysisPrincipal Component Analysis
Line plot of 1 wine sampleBarplot of 1 wine sample
![Page 11: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/11.jpg)
Principal Component AnalysisPrincipal Component Analysis
Line plot of 1 wine sampleBarplot of 1 wine sample
![Page 12: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/12.jpg)
Data Matrix RepresentationData Matrix Representation
xj
xi
X
xij
1 p
n xj
xi
# samples # properties
![Page 13: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/13.jpg)
xj
xi
X
xij
1 13
150
13
1
p (13)- dimensionalVariable space
150 samples
j
xi
Sample 75
Sp (13)
Data Matrix RepresentationData Matrix Representation
![Page 14: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/14.jpg)
xj
xi
X
xij
1 13
150
13
1
150
1
i
p (13)- dimensionalVariable space
13 variables150 samples
n (150)-dimensionalObject space
j
xi
Sample 75Property 7 (flavanoids)
Sp (13) Sn (150)
Data Matrix RepresentationData Matrix Representation
![Page 15: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/15.jpg)
Explorative Data AnalysisExplorative Data Analysis
![Page 16: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/16.jpg)
r (2)-dim. space of variables
Principal Component AnalysisPrincipal Component Analysis
PCA: visualization : projection in 2 dimensions
1
p (13)- dim. space of variables
Sp (13)
j
xi
1
i
n (150)-dim. space of objects
Sn (150)
13 variables150 samples
lv2
lv1
S2
13 variables
x
x
xx
xxx
xx
x
x
lv1
lv2
S2
150 samples
r (2)-dim. space of objects
13 150
![Page 17: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/17.jpg)
Principal Component AnalysisPrincipal Component Analysis
x3
x1
x2
3 variables : S3
••
•• ••
•••
•
•• 12 samples
![Page 18: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/18.jpg)
Principal Component AnalysisPrincipal Component Analysis
x3
x1
x2
3 variables : S3
••
•• ••
•••
•
•• 12 samples
![Page 19: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/19.jpg)
Principal Component AnalysisPrincipal Component Analysis
S3 12 samples
PC1
PC1 = l11 x1 + l12x2 + l13x3
x3
x1
x2
••
•• ••
•••
•
••
![Page 20: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/20.jpg)
x3
x1
x2
••
•• ••
•••
•
•• PC1
PC1 = l11 x1 + l12x2 + l13x3
Criterion: Maximum variance of projections (x)
x x xx x
xx x
xx
x
S3 12 samples
Principal Component AnalysisPrincipal Component Analysis
![Page 21: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/21.jpg)
PC1 = l11 x1 + l12x2 + l13x3
PC2 = l21 x1 + l22x2 + l23x3
Criterion: Maximum variance of projections (x)
PC1 PC2
x2
x3
x1
x2
••
•• ••
•••
•
•• PC1
x x xx x
xx x
xx
x
S312 samples
PC2
Principal Component AnalysisPrincipal Component Analysis
![Page 22: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/22.jpg)
Principal Components SpacePrincipal Components Space
•
•
•••• ••
•
•
••
PC1
PC2
S2 12 samples
![Page 23: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/23.jpg)
r (2)-dim. space
pc2
pc1
S2
1
p (13)- dim. space of variables
Sp (13)
j
xi
13
150 samples
150 samples
Principal Component AnalysisPrincipal Component Analysis
Score plot
![Page 24: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/24.jpg)
r (2)-dim. space
pc2
pc1
S2
1
p (13)- dim. space of variables
Sp (13)
j
xi
13
150 samples
150 samples
Principal Component AnalysisPrincipal Component Analysis
Score plot
PC1 (38%)
PC
2 (2
0%)
Wine data: score plot
![Page 25: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/25.jpg)
pc2
pc1
S2
150
1
i
n (150)- dim. Space of objects
Sn (150)
13 variables
13 variables
x
x
xx
xxx
xx
x
x
Loading plot
Principal Component AnalysisPrincipal Component Analysis
![Page 26: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/26.jpg)
pc2
pc1
S2
150
1
i
n (150)- dim. Space of objects
Sn (150)
13 variables
13 variables
x
x
xx
xxx
xx
x
x
Loading plot
Principal Component AnalysisPrincipal Component Analysis
Wine data: loading plot
PC1 (38%)
PC
2 (2
0%)
![Page 27: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/27.jpg)
Singular Value Decomposition (SVD)Singular Value Decomposition (SVD)
Xnp = Unr Drr VTrp
Left singular vectors
PC scores
Right singular vectors
PC loadings
p
n
rr
r
n
p
r
X UVT
=
UTU =VTV =I
![Page 28: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/28.jpg)
S2
Sp (13)
i
Sn (150)
n
11
j
xi
p
S2
Loading plot
13 variables
pc1
pc2
pc1
Score plot
150 samples
pc2
x
x
xx
xxx
xx
x
x
Principal Component Analysis : Biplot Principal Component Analysis : Biplot
pc2
pc1
x
xx
xxx
xxx
x
x150 samples + 13 variables
BIPLOTBIPLOT
![Page 29: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/29.jpg)
Principal Component Analysis: an ExamplePrincipal Component Analysis: an Example
PC1 (38%)
PC
2 (2
0%)
![Page 30: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/30.jpg)
Principal Component Analysis: Some IssuesPrincipal Component Analysis: Some Issues
• How many PC’s ?
• Scaling
• Outliers
![Page 31: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/31.jpg)
How many PC’s ? How many PC’s ?
No of PC’s
Cumulative % of variance Scree plot
p
1i
2
i
2
i2
i
d
dd
100%
No of PC’s
Log
varia
nce
2 3 11 5 64 2 3 5 64
![Page 32: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/32.jpg)
How many PC’s ? How many PC’s ?
Wine data
![Page 33: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/33.jpg)
How many PC’s ? How many PC’s ?
![Page 34: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/34.jpg)
PCA: ScalingPCA: Scaling
For better interpretation; may obscure results
raw data;
Mean-centering: (column wise, row wise, double)
Auto-scaling (column wise, row wise)
…..
![Page 35: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/35.jpg)
Wine datamean-centered
Wine dataautoscaled
PCA: ScalingPCA: Scaling
![Page 36: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/36.jpg)
Wine dataraw
Wine datamean-centered
PC1 (99.79%)
PC
2 (0
.20%
)
PC1 (99.79%)
PC
2 (0
.20%
)
PCA: ScalingPCA: Scaling
![Page 37: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/37.jpg)
x3
x1
x2
3 variables : S3
••
•• ••
••••
••
12 samples
PC1
PCA: OutliersPCA: Outliers
![Page 38: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/38.jpg)
x3
x1
x2
3 variables : S3
••
•• ••
••••
••
12 + 1 outlier
•
PC1
PCA: OutliersPCA: Outliers
![Page 39: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/39.jpg)
x3
x1
x2
3 variables : S3
••
•• ••
••••
••
•
PC1
PC1
Leverage effect
PCA: OutliersPCA: Outliers
![Page 40: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/40.jpg)
Gene expression values
Principal Component Analysis: a Recent Research ExamplePrincipal Component Analysis: a Recent Research Example
X
xij
1 4 Treatments
genes 50.000
xj
OrganonDepartment of Cell Biology
![Page 41: Pattern Recognition for the Natural Sciences Explorative Data Analysis Principal Component Analysis (PCA) Lutgarde Buydens, IMM, Analytical Chemistry.](https://reader035.fdocuments.us/reader035/viewer/2022062423/56649c7c5503460f94930d41/html5/thumbnails/41.jpg)
PCA Interaction Gene TreatmentPCA Interaction Gene Treatment