Felix Mbuga Math 251 HW 03 - sjsu.edu€¦ · Problem 1: LDA vs PCA Black - setosa Red - versicolor...
Transcript of Felix Mbuga Math 251 HW 03 - sjsu.edu€¦ · Problem 1: LDA vs PCA Black - setosa Red - versicolor...
MATH 251 CLASSIFICATION HW 03Felix Mbuga
Problem 1: LDA vs PCA
Sepal.Length
2.0
2.5
3.0
3.5
4.0
4.5 5.5 6.5 7.5
0.5
1.0
1.5
2.0
2.5
2.0 2.5 3.0 3.5 4.0
Sepal.Width
Petal.Length
1 2 3 4 5 6 7
0.5 1.0 1.5 2.0 2.5
4.5
5.5
6.5
7.5
12
34
56
7
Petal.Widthsetosaversicolorvirginica
Raw data
Problem 1: LDA vs PCA
Black - setosaRed - versicolorGreen - virginica
-10 -5 0 5
45
67
89
Iris - LDA
Linear Discriminant 1
Line
ar D
iscr
imin
ant 2
-3 -2 -1 0 1 2 3
-2-1
01
2
Iris - PCA
Principal Component 1
Prin
cipa
l Com
pone
nt 2
Problem 3: PCA 95% + LDA
-15 -10 -5 0
-10
-50
5
USPS - PCA 95% - 0 & 1
Principal Component 1
Prin
cipa
l Com
pone
nt 2
01
Problem 3: PCA 95% + LDA
0 500 1000 1500 2000
05
10
USPS - PCA 95% + LDA - 0 & 1
Index
usps.train.0and1.pca95.lda.proj
01
Problem 3: PCA 95% + LDA
-12 -10 -8 -6 -4
-50
510
USPS - PCA 95% - 4 & 9
Principal Component 1
Prin
cipa
l Com
pone
nt 2
49
Problem 3: PCA 95% + LDA
0 200 400 600 800 1000 1200
-8-6
-4-2
02
USPS - PCA 95% + LDA - 4 & 9
Index
usps.train.4and9.pca95.lda.proj
49
Problem 3: PCA 95% + LDA
-14 -12 -10 -8 -6 -4 -2
-10
-8-6
-4-2
02
4
USPS - PCA 95% - 1, 2 & 3
Principal Component 1
Prin
cipa
l Com
pone
nt 2
123
Problem 3: PCA 95% + LDA
-10 -5 0
-4-2
02
46
8
USPS - PCA 95% + LDA - 1, 2 & 3
Linear Discriminant 1
Line
ar D
iscr
imin
ant 2
123
Problem 3: PCA 95% + LDA
-12 -10 -8 -6 -4
-50
5
USPS - PCA 95% - 3, 5 & 8
Principal Component 1
Prin
cipa
l Com
pone
nt 2
358
Problem 3: PCA 95% + LDA
-4 -2 0 2 4 6
-20
24
6
USPS - PCA 95% + LDA - 3, 5 & 8
Linear Discriminant 1
Line
ar D
iscr
imin
ant 2
358
Problem 4: PCA 95% + LDA + kNN vs PCA 95% + kNN
2 4 6 8 10
0.05
0.06
0.07
0.08
0.09
0.10
0.11
k (number of nearest neighbors used)
Mis
clas
sific
atio
n E
rror
Rat
e
PCA 95% + LDAPCA 95%
PCA 95% + LDA has higher misclassification error rate for all k. Lowest misclassification error rates are ~ 5.1% (PCA 95%, k = 1) and ~ 9.3% (PCA 95% + LDA , k = 9)
Problem 5: PCA 95% + LDA + Nearest Local Centroid vs PCA 95% + Nearest Local Centroid
PCA 95% + LDA has higher misclassification error rate for all k. Lowest misclassification error rates are ~ 4.2% (PCA 95%, k = 4) and ~ 8.6% (PCA 95% + LDA , k = 10)
2 4 6 8 10
0.04
0.05
0.06
0.07
0.08
0.09
0.10
0.11
USPS Dataset
Nearest Local Centroid with PCA 95% vs with PCA 95% and LDAk (number of nearest neighbors used)
Mis
clas
sific
atio
n E
rror
Rat
e
PCA 95%PCA 95% + LDA