When Dictionary Learning Meets Classificationbertozzi/WORKFORCE/REU 2013... · When Dictionary...
Transcript of When Dictionary Learning Meets Classificationbertozzi/WORKFORCE/REU 2013... · When Dictionary...
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
When Dictionary Learning Meets Classification
Bufford, Teresa1 Chen, Yuxin2
Horning, Mitchell3 Shee, Liberty1
Mentor: Professor Yohann Tendero
1UCLA2Dalhousie University3Harvey Mudd College
August 9, 2013
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Outline
1. The Method: Dictionary Learning
2. Experiments and Results
2.1 Supervised Dictionary Learning
2.2 Unsupervised Dictionary Learning
2.3 Robustness w.r.t. Noise
3. Conclusion
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Dictionary LearningGOAL: Classify discrete image signals x ∈ Rn.
The Dictionary, D ∈ Rn×K
x ≈ Dα =
| |atom1 · · · atomK
| |
α1
...αK
• Each dictionary can be represented as a matrix, where each
column is an atom ∈ Rn, learned from a set of training data.
• A signal x ∈ Rn can be approximated by a linear combinationof atoms in a dictionary, represented by α ∈ RK .
• We seek a sparse signal representation:
arg minα∈RK
‖x − Dα‖22 + λ‖α‖1. (1)
0Sprechmann, et al. ”Dictionary learning and sparse coding for unsupervisedclustering.” Acoustics Speech and Signal Processing (ICASSP), 2010 IEEEInternational Conference on. IEEE, 2010.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Supervised Dictionary Learning Algorithm
Illustrating Example
Training image signals: x1, x2, x3, x4 X = [x1 x2 x3 x4]Classes: 0, 1
Group training images according to their labels:
class 0 x1 x4
class 1 x2 x3
Use training images with label i to train dictionary i :
class 0 x1 x4 −→ D0
class 1 x2 x3 −→ D1
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Signal Classification
Illustrating Example
Test image signals: y1, y2, y3 Y = [y1 y2 y3]Dictionaries: D0,D1
Take one test image. Compute its optimal sparse representation α foreach dictionary. Use α to compute energy:
y1Ei (y1) = min
α‖y1 − Diα‖22 + λ‖α‖1
−−−−−−−−−−−−−−−−−−−−−−−−−→
[E0(y1)E1(y1)
]Do this for each test signal:
E(Y ) =
[E0(y1) E0(y2) E0(y3)E1(y1) E1(y2) E1(y3)
]Classify the test signal as class i∗ = arg minEi (y). For example:
E(Y ) =
[5 12 824 6 4
]−→
class 0 y1
class 1 y2 y3
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Supervised Dictionary Learning Results
Table : Supervised Results
Cluster Centered& Digits K misclassificationType Normalized rate average
Supervised False {0, ..., 4} 500 1.3431%
Supervised True {0, ..., 4} 500 0.5449%
Supervised False {0, ..., 4} 800 0.7784%
Supervised True {0, ..., 4} 800 0.3892%
Supervised False {0, ..., 9} 800 3.1100%
Supervised True {0, ..., 9} 800 1.6800%
Error rate for digits {0, ..., 9} and K = 800 from [1] is 1.26%.
1Sprechmann, et al. ”Dictionary learning and sparse coding for unsupervisedclustering.” Acoustics Speech and Signal Processing (ICASSP), 2010 IEEEInternational Conference on. IEEE, 2010.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Supervised Results, ctd.
Here is a confusion matrix for the supervised, centered, normalizedcase with k = 800 and digits {0, . . . , 4}.
Element cij ∈ C is the number of times that an image of digit iwas classified as digit j .
C =
0 1 2 3 4
0 978 0 1 1 01 0 1132 2 1 02 5 2 1023 0 23 0 0 3 1006 14 1 0 1 0 980
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Unsupervised Dictionary Learning Algorithm(Spectral Clustering)
Illustrating Example
Training image signals: x1, x2, x3, x4 X = [x1 x2 x3 x4]Classes: 0, 1
Train a dictionary D from all training images:
Allimages
x1 x2x3 x4
−→ D =
| | |atom1 atom2 atom3
| | |
Reminder: The number of atoms in the dictionary is a parameter.
For each image, compute optimal sparse representation α w.r.t. D:
A =
x1 x2 x3 x4
atom1 | | | |atom2 α1 α2 α3 α4
atom3 | | | |
Reminder: α is a linear combination of atoms in D.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Illustrating Example
Training image signals: x1, x2, x3, x4 X = [x1 x2 x3 x4]Classes: 0, 1
Construct a similarity matrix S1 = |A|t |A|.Perform spectral clustering on the graph G1 = {X ,S1}:
G1 = {X ,S1}spectral−−−−−−−−→..clustering..
class 0 x1 x4
class 1 x2 x3
Use signal clusters to train initial dictionaries:
class 0 x1 x4 −→ D0
class 1 x2 x3 −→ D1
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Refining Dictionaries
Repeat:
1. Classify the training images using current dictionaries:
x1, x2, x3, x4classify
−−−−−−−−−−→..with D0,D1..
class 0 x4
class 1x1 x2x3
2. Use these classifications to train new, refined dictionariesfor each cluster:
class 0 x4 −→ D0,new
class 1x1 x2x3
−→ D1,new
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Unsupervised Dictionary LearningSpectral Clustering Results
Table : Unsupervised Results (spectral clustering)
Cluster Centered & Digits K misclassificationNormalized rate average
Atoms True {0, ..., 4} 500 24.8881%
Atoms True {0, ..., 4} 800 27.1843%
Signals True {0, ..., 4} 500 27.4372%
Signals True {0, ..., 4} 800 29.5777%
Misclassification rate for digits {0, ..., 4} and K = 500 from [1] is 1.44%.
1Sprechmann, et al. ”Dictionary learning and sparse coding for unsupervisedclustering.” Acoustics Speech and Signal Processing (ICASSP), 2010 IEEEInternational Conference on. IEEE, 2010.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Why do our results differ from Sprechmann, et al.’s?
Hypothesis
The problems lie in the author’s choice of similarity measures,S = |A|t |A|.
Possible Problem: Normalization (A’s cols not of constant l2 norm)
Illustration of Problem
• Assume that A contains only positive entries. ThenS = |A|t |A| = AtA and the ij th entry of S is
< αi , αj >= ‖αi‖‖αj‖ cos(αi , αj).
• Nearly orthogonal vectors are more similar than co-linear onesif the products of their norms are big enough.
“more similar”
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Attempted Solution
Normalized each column of A before computing S . However, thiscaused little change in the results, due to the l2 norms of A’scolumns being almost constant.
Figure : The normalized histogram of the l2 norm of the columns of A.Note: The l2 norms of most columns in A are in [3, 4].
2 3 4 5 6 7 8 9 10 110
0.1
0.2
0.3
0.4
0.5
0.6
0.7Histogram of the l2−norm of the columns of A (Normalized)
Norm interval
Em
irica
l pro
babi
lity
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Possible Problem: Absolute Value (use |A|t |A| instead of |AtA|)
Illustration of Problem
a1 = (1,−1)t and a2 = (1, 1)t , which are orthogonal vectors,become co-linear when the entries are replaced by their absolutevalues.
Attempted Solution
In the experiments, changing |A||A|t to |AAt | does not significantlyimprove the results, because among all the entries of A associatedwith MNIST data, only ≈ 13.5% are negative.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
In the last part of Sprechmann et al, the authors state:
“We observed that best results are obtained for all the experimentswhen the initial dictionaries in the learning stage are constructedby randomly selecting signals from the training set. If the size ofthe dictionary compared to the dimension of the data is small, [it]is better to first partition the dataset (using for example Euclideank-means) in order to obtain a more representative sample.”
Algorithm
Use k-means to cluster the training signals, instead of spectralclustering.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Unsupervised Dictionary Learning (kmeans)
Illustrating Example
Training image signals: x1, x2, x3, x4 X = [x1 x2 x3 x4]Classes: 0, 1
Use kmeans to cluster training images:
x1, x2, x3, x4..kmeans..−−−−−−→
class 0 x1 x4
class 1 x2 x3
Note: Must center and normalize after clustering.
Use clusters to train dictionaries:
class 0 x1 x4 −→ D0
class 1 x2 x3 −→ D1
Refinement process is same as spectral clustering case.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Unsupervised Dictionary Learning K-means Results
Table : Unsupervised Results (kmeans), without refinement
Cluster Centered & Digits K iter misclassificationNormalized rate average
Signals True {0,...,4} 500 2 7.1025%
Table : Unsupervised Results (kmeans), with refinement
Cluster Centered & Digits K iter misclassificationNormalized rate average
Signals True {0,...,4} 500 2 1.1286%Signals True {0,...,4} 500 20 0.5449%
Misclassification rate for digits {0, ..., 4} and K = 500 from [1] is 1.44%.
1Sprechmann, et al. ”Dictionary learning and sparse coding for unsupervisedclustering.” Acoustics Speech and Signal Processing (ICASSP), 2010 IEEEInternational Conference on. IEEE, 2010.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Unsupervised Results, ctd.
Here is a confusion matrix for the k-means, unsupervised, centered,normalized case with k = 500, iter = 20, and digits {0, . . . , 4}.
Element cij ∈ C is the number of times that an image of digit iwas classified as digit j .
C =
0 1 2 3 4
0 979 0 1 0 01 0 1129 4 1 12 4 2 1023 2 13 0 0 3 1006 14 0 1 3 0 978
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Gaussian Noise Experiments
Classification of Noisy Images
Goal: To analyze the robustness of our method with respect tonoise in images
Questions to consider:
• How does adding noise affect misclassification rates?
• How does centering and normalizing the data after addingnoise affect the results?
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Results: Classifying Pure Gaussian NoiseClassification Rates for Each Digit
Train dictionaries on Train dictionaries oncentered and normalized data not centered and not normalized data
noise variance = 0.5 noise variance = 0.5
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
MNIST Images with Gaussian Noise Added
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Results: Adding Gaussian Noise to MNISTNoise in test images only
Centered and Normalized Not Centered and Not Normalizedtest noise variance = 0.5 test noise variance = 0.5
Misclassification rate 8.21% Misclassification rate 87.05%
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Results: Adding Gaussian Noise to MNISTSame noise variance for test & training images
Misclassification Rates
Noise variance 0.1 0.5 1.0Centered & 3.2% 17.33% 38.92%Normalized
Not Centered & 11.08% 57.56% 75.87%Not Normalized
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Conclusions:
• Unable to reproduce results reported by Sprechmann
• Method of k-means produces better results than spectralclustering
• Centering and normalizing data consistently improves results
• Method is robust to added noise
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Thanks for your attention!
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Unsupervised Dictionary Learning Algorithm 2(with split initialization)
Step 1 Train a dictionary D0 ∈ Rnxk from all training images
Step 2 Obtain matrix A0 and similarity matrix S0 = |A0||A0|t asbefore
Step 3 Apply spectral clustering to dictionary D0 to split its atomsinto D1 and D2
Step 4 Obtain matrices A1, S1, A2, and S2 for dictionaries D1 andD2
Step 5 Apply spectral clustering to dictionaries D1 and D2 to spliteach into two clusters. Choose the segmentation thatresults in the lower energy and keep this one. Return theother dictionary to its original form. Now you have threedictionaries.
Step 6 Repeat this process so that after each iteration the numberof dictionaries increases by one. Stop when the desirednumber of clusters is reached.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Semisupervised Dictionary Learning Algorithm
Step 1 Use training images, a percentage of which have knownlabels. The remaining percentage are randomly labeled.
Step 2 Train dictionaries {D0, . . . ,DN−1} independently.
Step 3 Classify the training images using current dictionaries.Then use these classifications to train new, refineddictionaries for each cluster.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Semisupervised Dictionary Learning Results
Dictionaries for digits {0, . . . , 4}.Cluster Type Centered & K perturb. misclassification
Normalized percent rate average
Semisupervised True 200 40 0.9730%
Semisupervised True 200 70 0.8951%
Semisupervised True 200 90 1.4205%
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
1 2 3 4 5 6 7 8 9 10 11
5
10
15
20
25
30
Average misclassification Rate at Each Iteration, digits 0−4, 40%
Iteration (iteration 1 = first refinement)
Ave
rage
mis
clas
sific
atio
n R
ate
1 2 3 4 5 6 7 8 9 10 11
5
10
15
20
25
30
Average misclassification Rate at Each Iteration, digits 0−4, 70%
Iteration (iteration 1 = first refinement)
Ave
rage
mis
clas
sific
atio
n R
ate
1 2 3 4 5 6 7 8 9 10 11
5
10
15
20
25
30
Average misclassification Rate at Each Iteration, digits 0−4, 90%
Iteration (iteration 1 = first refinement)
Ave
rage
mis
clas
sific
atio
n R
ate
Figure : Misclassification rate average per refinement iteration. Left toright: 40%, 70%, 90% perturbation.
1 2 3 4 5 6 7 8 9 10 11
0.129
0.13
0.131
0.132
0.133
0.134
0.135
0.136
0.137
0.138
0.139
Energy at Each Iteration, digits 0−4, 40%
Iteration (iteration 1 = first refinement)
Ene
rgy
1 2 3 4 5 6 7 8 9 10 11
0.129
0.13
0.131
0.132
0.133
0.134
0.135
0.136
0.137
0.138
0.139
Energy at Each Iteration, digits 0−4, 70%
Iteration (iteration 1 = first refinement)
Ene
rgy
1 2 3 4 5 6 7 8 9 10 11
0.129
0.13
0.131
0.132
0.133
0.134
0.135
0.136
0.137
0.138
0.139
Energy at Each Iteration, digits 0−4, 90%
Iteration (iteration 1 = first refinement)
Ene
rgy
Figure : Energy per refinement iteration. Left to right: 40%, 70%, 90%perturbation.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
1 2 3 4 5 6 7 8 9 10 11
5
10
15
20
25
30
Average misclassification Rate at Each Iteration, digits 0−4, 40%
Iteration (iteration 1 = first refinement)
Ave
rage
mis
clas
sific
atio
n R
ate
1 2 3 4 5 6 7 8 9 10 11
5
10
15
20
25
30
Average misclassification Rate at Each Iteration, digits 0−4, 70%
Iteration (iteration 1 = first refinement)
Ave
rage
mis
clas
sific
atio
n R
ate
1 2 3 4 5 6 7 8 9 10 11
5
10
15
20
25
30
Average misclassification Rate at Each Iteration, digits 0−4, 90%
Iteration (iteration 1 = first refinement)
Ave
rage
mis
clas
sific
atio
n R
ate
Figure : Misclassification rate average per refinement iteration. Left toright: 40%, 70%, 90% perturbation.
2 3 4 5 6 7 8 9 10 11
1000
2000
3000
4000
5000
6000
7000
Changes in Training Signals at Each Iteration, digits 0−4, 40%
Change to refinement # (change 1 = change to first refinement)
Num
ber
of s
igna
ls th
at c
hang
ed
2 3 4 5 6 7 8 9 10 11
1000
2000
3000
4000
5000
6000
7000
Changes in Training Signals at Each Iteration, digits 0−4, 70%
Change to refinement # (change 1 = change to first refinement)
Num
ber
of s
igna
ls th
at c
hang
ed
2 3 4 5 6 7 8 9 10 11
1000
2000
3000
4000
5000
6000
7000
Changes in Training Signals at Each Iteration, digits 0−4, 90%
Change to refinement # (change 1 = change to first refinement)
Num
ber
of s
igna
ls th
at c
hang
ed
Figure : Number of training images whose classification changed perrefinement iteration. Left to right: 40%, 70%, 90% perturbation.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Unsupervised - Atoms (change over refinement iterations)
1 2 3 4 5 6 7 8 9 10
0.113
0.114
0.115
0.116
0.117
0.118
0.119
0.12
0.121
0.122
0.123
Energy at Each Iteration, digits 0−4, CN
Iteration (iteration 1 = first refinement)
Ene
rgy
1 2 3 4 5 6 7 8 9 10
24
24.5
25
25.5
Average misclassification Rate at Each Iteration, digits 0−4, CN
Iteration (iteration 1 = first refinement)
Ave
rage
mis
clas
sific
atio
n R
ate
2 3 4 5 6 7 8 9 10
20
30
40
50
60
70
80
90
Changes in Training Signals at Each Iteration, digits 0−4, CN
Change to refinement # (change 1 = change to first refinement)
Num
ber
of s
igna
ls th
at c
hang
ed
Figure : Unsupervised dictionary learning (K = 500), spectral clusteringof centered and normalized signals by atoms. Left to right: (1) energy,(2) misclassification rate average, (3) number of training images whoseclassification changed per refinement iteration.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Unsupervised - Signals (change over refinement iterations)
1 2 3 4 5 6 7 8 9 10
0.0855
0.086
0.0865
0.087
0.0875
0.088
Energy at Each Iteration, digits 0−4, CN
Iteration (iteration 1 = first refinement)
Ene
rgy
1 2 3 4 5 6 7 8 9 1027
27.5
28
28.5
29
29.5
Average misclassification Rate at Each Iteration, digits 0−4, CN
Iteration (iteration 1 = first refinement)
Ave
rage
mis
clas
sific
atio
n R
ate
2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Changes in Training Signals at Each Iteration, digits 0−4, CN
Change to refinement # (change 1 = change to first refinement)
Num
ber
of s
igna
ls th
at c
hang
ed
Figure : Unsupervised dictionary learning (K = 500), spectral clusteringof centered and normalized signals by signals. Left to right: (1) energy,(2) misclassification rate average, (3) number of training images whoseclassification changed per refinement iteration.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Unsupervised - kmeans (change over refinement iterations)
1 2 3 4 5 6 7 8 9 100.1282
0.1284
0.1286
0.1288
0.129
0.1292
0.1294
0.1296
0.1298Energy at Each Iteration, digits 0−4, CN
Iteration (iteration 1 = first refinement)
Ene
rgy
1 2 3 4 5 6 7 8 9 101
1.5
2
2.5
3
3.5
4
4.5
5Average misclassification Rate at Each Iteration, digits 0−4, CN
Iteration (iteration 1 = first refinement)
Ave
rage
mis
clas
sific
atio
n R
ate
2 3 4 5 6 7 8 9 10200
300
400
500
600
700
800Changes in Training Signals at Each Iteration, digits 0−4, CN
Change to refinement # (change 1 = change to first refinement)
Num
ber
of s
igna
ls th
at c
hang
ed
Figure : Unsupervised dictionary learning (K = 500), kmeans clusteringof centered and normalized signals. Left to right: (1) energy, (2)misclassification rate average, (3) number of training images whoseclassification changed per refinement iteration.
The Method Supervised Unsupervised - spectral Unsupervised - k-means Gaussian Noise Experiments Additional Material
Unsupervised - kmeans (change over refinement iterations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
0.114
0.116
0.118
0.12
0.122
0.124
0.126
0.128
Energy at Each Iteration, digits 0−4, CN
Iteration (iteration 1 = first refinement)
Ene
rgy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
0.114
0.116
0.118
0.12
0.122
0.124
0.126
0.128
Energy at Each Iteration, digits 0−4, CN
Iteration (iteration 1 = first refinement)E
nerg
y
Figure : Unsupervised dictionary learning (K = 500), kmeans clusteringof centered and normalized signals. Left: 2 dictionary learningrefinements. Right: 20 dictionary learning refinements.