Post on 02-Jul-2018
Sparse Representation and Low Rank Methods for Image Restoration and Classification
Lei Zhang
Dept. of ComputingThe Hong Kong Polytechnic University
http://www.comp.polyu.edu.hk/~cslzhang/
My recent research focuses
• Sparse Representation, Dictionary Learning, Low Rank
– Image restoration
– Collaborative representation based pattern classification
• Image Quality Assessment
– Full reference and no reference IQA models
• Visual Tracking
– Fast and robust trackers
• Image Segmentation Evaluation
• Biometrics (face, finger-knuckle-print, palmprint)
A linear system
3
??
.
.
.
??
A dense solution
A sparse solution
Sparse solutions
4
How to solve the sparse coding problem?
• Greedy Search Approach for L0-minimization– Orthogonal Matching Pursuit
– Least angle regression
• Convex Optimization Method for L1-minimization– Interior Point
– Gradient Projection
– Proximal Gradient Descent (Iterative soft-thresholding)
– Augmented Lagrangian Methods
– Alternating Direction Method of Multipliers
• Non-convex Lp-minimization– W. Zuo, D. Meng, L. Zhang, X. Feng and D. Zhang, “A Generalized Iterated
Shrinkage Algorithm for Non-convex Sparse Coding,” In ICCV 2013.
5
Example applications
• Denoising
6
Example applications
• Deblurring
7
Example applications
• Superresolution
8
Example applications
• Medical image reconstruction (e.g., CT)
9
Qiong Xu, Hengyong Yu, Xuanqin Mou, Lei Zhang, Jiang Hsieh, and Ge Wang, “Low-dose X-ray CT Reconstruction via Dictionary Learning”, IEEE Transactions on Medical Imaging, vol. 31, pp. 1682-1697, 2012.
TV-based method Our method
Example applications
• Inpainting
10
Example applications
• Morphological component analysis (cartoon-texture decomposition)
11
J. Bobin, J.-L. Starck J. Fadili, Y. Moudden and D.L Donoho, "Morphological Component Analysis: an adaptive thresholding strategy", IEEE Transactions on Image Processing , Vol 16, No 11, pp 2675--2681, 2007.
= +
Why sparse: neuroscience perspective
• Observations on Primary Visual Cortex
– The Monkey Experiment by Hubel and Wiesel, 1968
Responses of a simple cell inmonkeys’ right striate cortex.
David Hubel and Torsten WieselNobel Prize Winner
12
Why sparse: neuroscience perspective
• Olshausen and Field’s Sparse Coding, 1996
– The basis function can be updated by gradient descent:
Resulted basis functions.
Courtesy by Olshausen and Field1996
13
Why sparse: probabilistic Bayes perspective
• Signal recovery in a Bayesian viewpoint
– Represent x as a linear combination of bases (dictionaryatoms)
– And assume that the representation coefficients are i.i.d. and follow some prior distribution:
x
~ exp( )p
i li
|ˆ arg max | arg maxP P P x xx y y xx x
Likelihood Prior
14
– The maximum a posteriori (MAP) solution:
– We have:
• If p=0, it is the L0-norm sparse coding problem.
• If p=1, it becomes the convex L1-norm sparse coding.
• If 0<p<1, it will be non-convex Lp-norm minimization.
2
2
( )
log ( ) log (
arg max |
arg max |
arg m ||
)
in ||p
MAP
i li
p
p p
y
y
y
Why sparse: probabilistic Bayes perspective
15
Why sparse: signal processing perspective
• x is called K-sparse if it is a linear combination of only K basis vectors. If K<<N, we say x is compressible.
• Measurement of x
1
K
i ii
x
y x
16
Why sparse: signal processing perspective
• Reconstruction
– If x is k-sparse, we can reconstruct x from y with M (M<<N) measurements:
– But the measurement matrix should satisfy the restricted isometry property (RIP) condition:• For any vector v sharing the same K nonzero entries as :
0ˆ arg min || | , s.t. | y
17
Image reconstruction: the problem
• Reconstruct x from its degraded measurement y
y = Hx + vH: the degradation matrix
v: Gaussian white noise
18
x y
Image reconstruction by sparse coding: the basic procedures
1. Partition the degraded image into overlapped patches.
2. Denote by the employed dictionary. For each patch, solve the following L1-norm sparse coding problem:
3. Reconstruct each patch by .
4. Put the reconstructed patch back to the original image. For the overlapped pixels between patches, average them.
5. In practice, the above procedures can be iterated for several rounds.
1
2
2ˆ arg min || | || ||| y
x̂
19
ˆ
How sparsity helps?
An illustrative example
• You are looking for a girlfriend/boyfriend.
– i.e., you are “reconstructing” the desired signal.
• Your objective is that she/he is “白-富-美”/ “高-富-帅”.
– i.e., you want a “clean” and “perfect” reconstruction.
• However, the candidates are limited.
– i.e., the dictionary is small.
• Can you find your ideal girlfriend/boyfriend?
20
How sparsity helps?
An illustrative example
• Candidate A is tall; however, he is not handsome.
• Candidate B is rich; however, he is too fat.
• Candidate C is handsome; however, he is poor.
• If you sparsely select one of them, none is ideal for you
– i.e., a sparse representation vector such as [0, 1, 0].
• How about a dense solution: (A+B+C)/3?
– i.e., a dense representation vector [1, 1, 1]/3
– The “reconstructed” boyfriend is a compromise of “高-富-帅”, and he is fat (i.e., has some noise) at the same time.
21
How sparsity helps?
An illustrative example
• So what’s wrong?
– This is because the dictionary is too small!
• If you can select your boyfriend/girlfriend from boys/girls all over the world (i.e., a large enough dictionary), there is a very high probability (nearly 1) that you will find him/her!
– i.e., a very sparse solution such as [0, …, 1, …, 0]
• In summary, a sparse solution with an over-completedictionary often works!
• Sparsity and redundancy are the two sides of the same coin.
22
The dictionary
• Usually, an over-complete dictionary is required in doing sparse representation.
• The dictionary can be formed by the off-the-shelf bases such as DCT bases, wavelets, curvelets, etc.
• Learning dictionaries from natural images has shown very promising results in image reconstruction.
• Dictionary learning has become a hot topic in image processing and computer vision.
23
M. Aharon, M. Elad, and A.M. Bruckstein, “The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311-4322, Nov. 2006.
Nonlocal self-similarity
• In natural images, usually we can find many similar patches
to a given path, which can be spatially far from it. This is
called nonlocal self-similarity.
• Nonlocal self-similarity has been widely and successfully
used in image reconstruction.
Representative image restoration methods
• K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-d transform-domain collaborative filtering," TIP 2007. (BM3D)
• J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Non-local sparse models for image restoration," ICCV 2009. (LSSC)
• J. Yang, J. Wright, T. Huang and Y. Ma, “Image super-resolution via sparse representation,” TIP 2010. (ScSR)
• D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” ICCV 2011. (EPLL)
• W. Dong, L. Zhang, G. Shi and X. Wu, “Image deblurring and supper-resolution by adaptive sparse domain selection and adaptive regularization,” TIP 2011. (ASDS)
• S. Wang, L. Zhang, Y. Liang and Q. Pan, “Semi-Coupled Dictionary Learning with Applications to Image Superresolution and Photo-Sketch Image Synthesis,” CVPR 2012. (SCDL)
• W. Dong, L. Zhang, G. Shi, and X. Li, “Nonlocally Centralized Sparse Representation for Image Restoration,” TIP 2013. (NCSR)
25
NCSR (ICCV’11, TIP’13)
• A simple but very effective sparse representation model was proposed.
• It outperforms many state-of-the-arts in image denoising, deblurring and super-resolution.
W. Dong, L. Zhang and G. Shi, “Centralized Sparse Representation for
Image Restoration”, in ICCV 2011.
W. Dong, L. Zhang, G. Shi and X. Li, “Nonlocally Centralized Sparse Representation for Image Restoration”, IEEE Trans. on Image Processing, vol. 22, no. 4, pp. 1620-1630, April 2013.
26
NCSR: The idea
• For true signal
• For degraded signal
• The sparse coding noise (SCN)
• To better reconstruct the signal, we need to reduce the SCN because:
2
1 2arg min , s.t.
xx
2
1 2arg min , s.t.
yy H
= yx
ˆx
y αx = x x Φα Φα Φυ
27
NCSR: The objective function
• The proposed objective function
• Key idea: Suppressing the SCN
• How to estimate x?
• The unbiased estimate:
• The zero-mean property of SCN makes
ˆ [ ]E x x
ˆ [ ] [ ]E E x x y
2
2ˆarg min +
pl
y xy H
28
• The nonlocal estimation of
• The simplified objective function
• The iterative solution:
[ ]E y
, , ,i
i i j i j
j C
μ2
, , 2ˆ ˆexp( / ) /i j i i j h W x x
2
21
arg min +p
N
i i li
y
y H μ
2( ) ( 1)
21
arg min +p
Nj j
i i li
y
y H μ
NCSR: The solution
29
• The Lp-norm is set to L1-norm since SCN is generally Laplacian distributed.
• The regularization parameter is adaptively
determined based on the MAP estimation principle.
• Local PCA dictionaries are used, and they are
adaptively learned from the image.
– We cluster the image patches, and for each cluster, a
PCA dictionary is learned and used to code the patches
within this cluster.
NSCR: The parameters and dictionaries
30
Denoising results
From left to right and top to bottom: original image, noisy image (=100), denoised images by SAPCA-BM3D (PSNR=25.20 dB; FSIM=0.8065), LSSC (PSNR=25.63 dB; FSIM=0.8017), EPLL (PSNR=25.44 dB; FSIM= 0.8100), and NCSR (PSNR=25.65 dB; FSIM=0.8068).
31
Deblurring results
Blurred FISTA (27.75 dB) BM3D (28.61 dB) NCSR (30.30 dB)
Blurred Fergus, et al [SIGGRAPH06]
NCSR Close up View
32
Super-resolution results
Low resolution TV (31.24 dB) ScSR (32.87 dB) NCSR (33.68 dB)
TV (31.34 dB) ScSR (31.55 dB) NCSR (34.00 dB)Low resolution
33
GHP (CVPR’13, TIP’14)
• Like noise, textures are fine scale structures in images, and most of the denoising algorithms will remove the textures while removing noise.
• Is it possible to preserve the texture structures, to some extent, in denoising?
• We made a good attempt in:
34
W. Zuo, L. Zhang, C. Song, and D. Zhang, “Texture Enhanced Image Denoising via Gradient Histogram Preservation,” in CVPR 2013.W. Zuo, L. Zhang, C. Song, D. Zhang, and H. Gao, “Gradient Histogram Estimation and Preservation for Texture Enhanced Image Denoising,” in TIP 2014.
GHP
• The key is to estimate and preserve the gradient histogram of true images and preserve it in the denoised image:
• An iterative histogram specification algorithm is developed for the efficient solution of the GHP model.
• Similar PSNR/SSIM measures to BM3D, LSSC and NCSR, but more natural and visually pleasant denoising results.
35
2
21
12
, 2ˆ arg min
( )
s.t. ,
i ii
F
F r
F
x
y x α βx
x x
x D α h h
GHP results: CVPR’13 logo
36
Original
GHPBM3D
Noisy
37GHP
BM3D
NCSR
LSSC
Ground truth
Noisy image
Group sparsity
38
??
.
.
.
??
??
.
.
.
??
??
.
.
.
??
. . .
A sparse solution
A group sparse solution
From 1D to 2D: rank minimization
39
Nuclear norm
40
Nuclear Norm Minimization(NNM)
41
NNM: pros and cons
• Pros
– Tightest convex envelope of rank minimization.
– Closed form solution.
• Cons
– Treat equally all the singular values. Ignore the different significances of matrix singular values.
42
Weighted nuclear norm minimization (WNNM)
43
Optimization of WNNM
44
Q. Xie, D. Meng, S. Gu, L. Zhang, W. Zuo, X. Feng, and Z. Xu, “On the optimization of weighted nuclear norm minimization,” Technical Report, to be online soon.
An important corollary
45
Q. Xie, D. Meng, S. Gu, L. Zhang, W. Zuo, X. Feng, and Z. Xu, “On the optimization of weighted nuclear norm minimization,” Technical Report, to be online soon.
Application of WNNM to image denoising
1. For each noisy patch, search in the image for its nonlocal similar patches to form matrix Y.
2. Solve the WNNM problem to estimate the clean patches X from Y.
3. Put the clean patch back to the image.
4. Repeat the above procedures several times to obtain the denoised image.
46
WNNM based image denoising
47
… …WNNM
S. Gu, L. Zhang, W. Zuo and X. Feng, “Weighted Nuclear NormMinimization with Application to Image Denoising,” CVPR 2014.
The weights
48
Experimental Results
49
(a) Ground truth (b) Noisy image (PSNR: 14.16dB) (c) BM3D (PSNR: 26.78dB) (d) EPLL (PSNR: 26.65dB)
(e) LSSC (PSNR: 26.77dB) (f) NCSR (PSNR: 26.66dB) (g) SAIST (PSNR: 26.63dB) (h) WNNM (PSNR: 26.98dB)
Denoising results on image Boats by different method (noise level sigma=50).
Experimental Results
50
(a) Ground truth (b) Noisy image (PSNR:dB) (c) BM3D (PSNR: 24.22dB) (d) EPLL (PSNR: 22.46dB)
(e) LSSC (PSNR: 24.04dB) (f) NCSR (PSNR: 23.76dB) (g) SAIST (PSNR: 24.26dB) (h) WNNM (PSNR: 24.68dB)
Denoising results on image Fence by different method (noise level sigma=75).
Experimental Results
51
(a) Ground truth (b) Noisy image ( PSNR: 8.10dB) (c) BM3D (PSNR: 22.52dB) (d) EPLL (PSNR: 22.23dB)
(e) SSC (PSNR: 22.24dB) (f) NCSR (PSNR: 22.11dB) (g) SAIST (PSNR: 22.61dB) (h) WNNM (PSNR: 22.91dB)
Denoising results on image Monarch by different method (noise level sigma=100).
Experimental Results
52
(a) Ground truth (b) Noisy image (PSNR:8.10dB) (c) BM3D (PSNR: 33.05dB) (d) EPLL (PSNR: 32.61dB)
(e) LSSC (PSNR: 32.88dB) (f) NCSR (PSNR: 32.95dB) (g) SAIST (PSNR: 33.08dB) (h) WNNM (PSNR: 33.12dB)
Denoising results on image House by different method (noise level sigma=100).
Experimental ResultsSigma=20
BM3D EPLL LSSC NCSR SAIST WNNMC-Man 30.48 30.34 30.61 30.47 30.45 30.75House 33.77 32.98 34.11 33.87 33.75 34.04
Peppers 31.29 31.16 31.37 31.19 31.32 31.55Montage 33.61 32.56 33.51 33.26 33.41 34.16
Leaves 30.10 29.39 30.47 30.45 30.64 31.10Starfish 29.68 29.57 29.96 29.93 29.97 30.28Mornar. 30.35 30.48 30.59 30.62 30.76 31.10Airplane 29.55 29.67 29.69 29.58 29.65 29.89
Paint 30.36 30.39 30.59 30.33 30.51 30.75JellyBean 34.18 33.81 34.55 34.52 34.31 34.85
Fence 29.93 29.25 30.07 30.11 30.16 30.37Parrot 29.96 29.96 29.95 29.90 29.97 30.19Lena 33.05 32.61 32.88 32.95 33.08 33.12
Barbara 31.78 29.76 31.59 31.78 32.16 32.21Boat 30.88 30.66 30.91 30.79 30.84 31.00Hill 30.72 30.49 30.72 30.65 30.69 30.80
F.print 28.81 28.28 28.78 28.96 29.01 29.02Man 30.59 30.63 30.72 30.59 30.67 30.74
Couple 30.76 30.54 30.74 30.60 30.66 30.82Straw 26.98 26.80 27.17 27.32 27.42 27.44
AVE 30.84 30.47 30.95 30.89 30.97 31.21
53
Experimental ResultsSigma=40
BM3D EPLL LSSC NCSR SAIST WNNMC-Man 27.18 27.04 27.33 27.12 27.09 27.48House 30.65 29.88 31.10 30.80 31.14 31.31
Peppers 27.70 27.73 27.86 27.68 27.77 28.05Montage 29.52 28.47 29.43 29.00 29.32 29.92
Leaves 25.71 25.62 26.04 26.25 26.49 26.95Starfish 26.06 26.11 26.22 26.21 26.39 26.60Mornar. 26.72 26.89 26.87 26.85 27.16 27.47Airplane 26.08 26.29 26.23 25.96 26.26 26.51
Paint 26.69 26.88 26.77 26.50 26.94 27.10JellyBean 30.21 29.98 30.76 30.56 30.51 30.85
Fence 26.84 25.75 26.89 26.77 27.01 27.32Parrot 26.69 26.80 26.75 26.66 26.88 27.10Lena 29.86 29.47 29.90 29.92 30.07 30.11
Barbara 27.99 26.02 28.17 28.20 28.77 28.77Boat 27.74 27.64 27.77 27.65 27.68 27.96Hill 27.99 27.81 27.99 27.83 27.96 28.12
F.print 25.30 24.73 25.30 25.51 25.60 25.57Man 27.65 27.63 27.64 27.54 27.60 27.80
Couple 27.48 27.27 27.41 27.24 27.40 27.61Straw 23.05 23.11 23.56 23.46 23.78 23.76
AVE 27.36 27.06 27.50 27.39 27.59 27.82
54
Experimental ResultsSigma=50
BM3D EPLL LSSC NCSR SAIST WNNMC-Man 26.12 26.02 26.35 26.14 26.15 26.42House 29.69 28.76 29.99 29.62 30.17 30.32
Peppers 26.68 26.63 26.79 26.82 26.73 26.91Montage 27.9 27.17 28.10 27.84 28.0 28.27
Leaves 24.68 24.38 24.81 25.04 25.25 25.47Starfish 25.04 25.04 25.12 25.07 25.29 25.44Mornar. 25.82 25.78 25.88 25.73 26.1 26.32Airplane 25.10 25.24 25.25 24.93 25.34 25.43
Paint 25.67 25.77 25.59 25.37 25.77 25.98JellyBean 29.26 28.75 29.42 29.29 29.32 29.62
Fence 25.92 24.58 25.87 25.78 26.00 26.43Parrot 25.90 25.84 25.82 25.71 25.95 26.09Lena 29.05 28.42 28.95 28.90 29.01 29.24
Barbara 27.23 24.82 27.03 26.99 27.51 27.79Boat 26.78 26.65 26.77 26.66 26.63 26.97Hill 27.19 26.96 27.14 26.99 27.04 27.34
F.print 24.53 23.59 24.26 24.48 24.52 24.67Man 26.81 26.72 26.72 26.67 26.68 26.94
Couple 26.46 26.24 26.35 26.19 26.30 26.65Straw 22.29 21.93 22.51 22.30 22.65 22.74
AVE 26.406 25.9645 26.436 26.326 26.520 26.752
55
Experimental ResultsSigma=75
BM3D EPLL LSSC NCSR SAIST WNNMC-Man 24.33 24.20 24.39 24.23 24.30 24.55House 27.51 26.68 27.75 27.22 28.08 28.25
Peppers 24.73 24.56 24.65 24.36 24.66 24.93Montage 25.52 24.90 25.40 25.49 25.46 25.73
Leaves 22.49 22.03 22.16 22.60 22.89 23.06Starfish 23.27 23.16 23.12 23.20 23.35 23.47Mornar. 23.91 23.72 23.66 23.67 23.98 24.31Airplane 23.47 23.35 23.41 23.18 23.60 23.75
Paint 23.80 23.82 23.52 23.44 23.83 24.07JellyBean 27.22 26.58 27.21 27.18 27.08 27.44
Fence 24.22 22.46 24.04 23.76 24.26 24.68Parrot 24.19 24.04 24.01 23.90 24.17 24.37Lena 27.26 26.57 27.21 27.00 27.23 27.54
Barbara 25.12 22.94 25.01 24.73 25.54 25.81Boat 25.12 24.89 25.03 24.87 24.99 25.29Hill 25.68 25.46 25.57 25.40 25.56 25.88
F.print 22.83 21.46 22.55 22.66 22.88 23.02Man 25.32 25.14 25.10 25.10 25.14 25.42
Couple 24.70 24.45 24.51 24.33 24.54 24.86Straw 20.56 19.94 20.71 20.39 20.90 21.00
AVE 24.56 24.02 24.45 24.34 24.62 24.86
56
Experimental ResultsSigma=100
BM3D EPLL LSSC NCSR SAIST WNNMC-Man 23.07 22.86 23.15 22.93 23.09 23.36House 25.87 25.19 25.71 25.56 26.53 26.68
Peppers 23.39 23.08 23.20 22.84 23.32 23.46Montage 23.89 23.42 23.77 23.74 23.98 24.16
Leaves 20.91 20.25 20.58 20.86 21.40 21.57Starfish 22.10 21.92 21.77 21.91 22.10 22.22Mornar. 22.52 22.23 22.24 22.11 22.61 22.95Airplane 22.11 22.02 21.69 21.83 22.27 22.55
Paint 22.51 22.50 22.14 22.11 22.42 22.74JellyBean 25.80 25.17 25.64 26.66 25.82 26.04
Fence 22.92 21.11 22.71 22.23 22.98 23.37Parrot 22.96 22.71 22.79 22.53 23.04 23.19Lena 25.95 25.30 25.96 25.71 25.93 26.20
Barbara 23.62 22.14 23.54 23.20 24.07 24.37Boat 23.97 23.71 23.87 23.68 23.80 24.10Hill 24.58 24.43 24.47 24.36 24.29 24.75
F.print 21.61 19.85 21.30 21.39 21.62 21.81Man 24.22 24.07 23.98 24.02 24.01 24.36
Couple 23.51 23.32 23.27 23.15 23.21 23.56Straw 19.43 18.84 19.43 19.10 19.42 19.67
AVE 23.247 22.706 23.0605 22.996 23.2955 23.555
57
Image Patches
Nonlocal Dictionary Learning
Sparse representation
With patch based image modeling, nonlocal, sparse representation, low rank and dictionary leaning can be used individually or jointly for
image processing.
Low Rank
What’s next?
• Actually I don’t know …
• Probably “Sparse/Low-rank + Big Data”?
– Theoretical analysis?
–Algorithms and implementation?
• W.r.t. image restoration, one interesting topic (at least I think) is perceptual quality oriented image restoration.
59
Sparse representation: data perspective
• Curse of dimensionality
– For real-world high-dimensional data, the available samples are usually insufficient.
– Fortunately, real data often lie on low-dimensional, sparse, or degenerate structures in the high-dimensional space.
Subspace methods:PCA, LLE, ISOMAP, ICA,…
Coding methods:Bag-of-words, Mixture-models, …
60
Sparse representation based classification (SRC)
coefficientsTraining dictionaryTest image
is sparse: ideally, only supported on images of the same subject
J. Wright, A. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. Robust Face Recognition
via Sparse Representation, PAMI 2009 .
Representation:1
min s.t. y X
Classification: ( ) arg mink klabel ry2
ˆk k kr y X
y 1 2, ,..., KX = X X X 1 2; ;...; K=
61
How to process a corrupted face?
coefficientsTraining dictionaryTest image
Equivalent to
Representation: 1
min s.t. ,y X I
Classification: ( ) arg mink klabel ry2
ˆ ˆk k kr y X e
y 1 2; ;...; K= e
1 1min s.t. e y X e
62
1 2, ,..., KX = X X X
• Can we have a more principled way to deal with various types of outliers in face images?
Regularized robust coding
Meng Yang Lei Zhang, Jian Yang and David Zhang. Robust sparse coding for facerecognition. In CVPR 2011.Meng Yang, Lei Zhang, Jian Yang, and David Zhang, Regularized robust coding for facerecognition. IEEE Trans. Image Processing, 2013.
Our solution:
63
One big question!
• Is it true that the sparse representation helps face recognition?
64
L. Zhang, M. Yang, and X. Feng. Sparse Representation or Collaborative Representation: Which Helps Face Recognition? In ICCV 2011.L. Zhang, M. Yang, X. Feng, Y. Ma and D. Zhang, Collaborative Representation based Classification for Face Recognition, arXiv:1204.2358.
Within-class or across-class representation
65
Enough?
Training samples
Yes
Within-class representation
Regularized nearest subspace
Across-class representation
No
Collaborative representation
Analyze the working mechanism of SRC.
Regularized nearest subspace
66
Regularized nearest subspace (RNS)
Enough?
Yes
2
2min . .
pi l
s t y X
The query sample is represented on the training samples from the same class with regularized representation coefficient:
Training samples
Why regularization?
Assume that we have enough training samples for each
class so that all the images of class i can be faithfully
represented by Xi.
All the face images are somewhat similar, while some
subjects may have very similar face images.
Let Xj = Xi +. If is small (meet some condition), there is
222
2
1 ( ) min 1,j i
i m n
e e
Xy
ej and ej are the representation error of Xj and Xi to represent y
without any constraint on the representation coefficient.
67
Regularized nearest subspace with Lp-norm
68
• Regularization makes classification more stable.• L2-norm regularization can play the similar role to L0
and L1 norm in this classification task.
0 10 20 300.45
0.5
0.55
0.6
0.65
L0 norm of representation coefficients
Re
pre
se
nta
tio
n r
esid
ua
l
Correct class
Wrong class
1 2 4 6 8 10 11
0.46
0.48
0.5
0.52
0.54
0.56
L1 norm of representation coefficients
Re
pre
se
nta
tio
n r
esid
ua
l
Correct class
Wrong class
2 4 6 8 10 12
0.46
0.48
0.5
0.52
0.54
L2 norm of representation coefficients
Re
pre
se
nta
tio
n r
esid
ua
l
Correct class
Wrong class
2
2min . .
pi l
s t y X
Why collaborative representation?
69
Collaborative Representation
Enough?
No
• Dilute the small-size-sample problem• Consider the competition between different classes
2
1 22min [ , , , ]
pKl
y X X X X X
• FR is a typical small-sample-size
problem, and Xi is under-complete
in general.
• Face images of different classes
share similarities.
• Samples from other classes can
be used to collaboratively
represent the sample in one class.
Training samples
Why collaborative representation?
Without considering the lp-norm regularization in coding, the associated representation is actually the perpendicular projection of y onto the space spanned by X.
2
2ˆ ˆ ˆarg min i ii αα y Xα y X α
70
2*
2ˆ ˆOnly works for classificationi i ie y X α
The “double checking” in e* makes the classification more effective and robust.
L1 vs. L2 in regularization
71
0 100 200 300 400 500 600 700-0.1
0
0.1
0.2
0.3
0.4
0.5
Coefficients of l1-regularized minimization
0 100 200 300 400 500 600 700-0.05
0
0.05
0.1
0.15
0.2
Coefficients of l2-regularized minimization
0 1e-6 5e-5 5e-4 5e-3 1e-2 0.1 0.5 1 50
0.2
0.4
0.6
0.8
1
Re
co
gn
itio
n r
ate
l1-regularized minimization
l2-regularized minimization
Sparse! Non-sparse!
2
2min
pl y X
Though L1 leads to sparser coefficients, the classification rates are similar.
Collaborative representation model
72
min , 1 2q pl l
p q or y X
q=2,p=1, Sparse Representation based Classification (S-SRC)
q=1,p=1, Robust Sparse Representation based Classification (R-SRC)
q=2,p=2, Collaborative Representation based Classification with regularized least square (CRC_RLS)
q=1,p=2, Robust Collaborative Representation based Classification (R-CRC)
CRC_RLS has a closed-form solution; others have iterative solutions.
Gender classification
73
Male?Female?
700 Male Samples 700 Female Samples
Training Set
Feature (AR) RNS_L1 RNS_L2 CRC-RLS S-SRC SVM LRC NN
300-d Eigenface 94.9% 94.9% 93.7% 92.3% 92.4% 27.3% 90.7%
Big benefit (67% improvements) brought by regularization on coding vector!
Face recognition without occlusion
74
Identity?Training Set
Training samples per subject are limited.
50 100 150 200 250 30065
70
75
80
85
90
95
Eigenface dimension
Recognitio
n r
ate
AR database
NN
LRC
SVM
S-SRC
CRC_RLS
Highest accuracy when feature dimension is not too low.
Face recognition with pixel corruption
75
0
20
40
60
80
100
120
0~20 30 40 50 60 70 80 90
R-SRC
R-CRC
Original image
70% random corruption
Identity?
Corruption percent (%) on EYB
Rec
ogn
itio
n r
ate
Face recognition with real disguise
76
Identity?
Disguise(AR) Sunglass (test 1) Scarf (test 1) Sunglass (test 2) Scarf (test 2)
R-SRC 87.0% 59.5% 69.8% 40.8%
CRC-RLS 68.5% 90.5% 57.2% 71.8%
R-CRC 87.0% 86.0% 65.8% 73.2%
Significant improvement in the case of scarf
Running time
77
Corruption (MPIE) L1_ls Homotopy SpaRSA FISTA ALM R-CRC
Average running time 17.35 8.05 15.97 8.76 16.02 0.916
No occlusion (MPIE) L1_ls Homotopy FISTA ALM CRC-RLS
Recognition rate 92.6% 92.0% 79.6% 92.0% 92.2%
Running time (s) 21.290 1.7600 1.6360 0.5277 0.0133
Speed-up 39.7-1600.7 times!
Speed-up 8.79-18.94 times!
One even bigger question!
78
L. Zhang, W. Zuo, X. Feng, and Y. Ma, “A Probabilistic Formulation of Collaborative Representation based Classifier,” Preprint, to be online soon.
• SRC/CRC represents the query face by gallery faces from all classes. However, it uses the representation residual by each class for classification.
• So what kind of classifier SRC/CRC is?
• Why SRC/CRC works?
Probabilistic subspace of Xk
• Samples of class k: Xk = [x1, x2, ..., xn].
• S : the subspace spanned by Xk.
– Each data point x in S can be written as: x= Xk.
• We assume that the probability that x belongs to class k is determined by :
• It can be shown that such a probability depends
on the distribution of the samples of Xk.
• The red point will have much higher probability than the green one.
79
2
2( ( ) ) expP label k c x α
Representation of query sample y
• The query sample y usually lies outside the subspace {Xk}.
• The probability that y belongs to class k is determined by two factors:
– Given x= Xk, how likely y has the same class label as x?
– What is the probability that x belong to
class k?
• By maximizing the product of the two
probabilities, we have
80
y
Xk
2 2
2 2max log ( ( ) ) mink kp P label k αy y X α α
Two classes
• X1 = [x1,1, x1,2, ..., x1,n]; X2 = [x2,1, x2,2, ..., x2,n]
• S : the subspace spanned by [X1 X2]
– Each data point x in S can be written as: x= X11+ X22.
• x belongs to X1 or X2 with certain probability:
81
x 2 2
1 1 12 2
( ( ) 1)
exp
P label
x
x X α α
2 2
2 2 22 2
( ( ) 2)
exp
P label
x
x X α α
Collaborative representation of y
• y lies outside the subspace {X11+ X22}. The probability that y belongs to class 1 or 2 depends on how likely y has the same class label as x=X11+X22 and the probability that x belongs to class 1 or 2:
82
x
y
1 2
1
2 2 2
1 1 2 2 1 1 12 2 2{ , }
maxlog ( ( ) 1)
min ( )
p P label
α α
y
y X α X α Xα X α α
1 2
2
2 2 2
1 1 2 2 2 2 22 2 2{ , }
maxlog ( ( ) 2)
min ( )
p P label
α α
y
y X α X α Xα X α α
General case
• The probability that query sample y belongs to class k can be computed as:
• The classification rule:
• Problem:
– For each class k, we need to solve the optimization once. This can be costly.
83
2 22
21 1{ } 2 2
mini
K K
k i i i i k k ki ip
αy X α X α X α α
( ) arg max { }k klabel py
Joint probability
• For a data point x in the subspace spanned by all classes X, we define the following joint probability:
• For the query sample y outside the subspace of X, we have
• We use the marginal probability for classification:
• We only need to solve the optimization once.
84
2 2
2 21( ( ) 1,..., ( ) ) exp
K
i i iiP label label K
x x x X α α
2
2 2
2 21 12
max log( ( )) minK K
i i i i ii iP
αy X α Xα X α α
2 2 2
2 2 2ˆ ˆ ˆ ˆ( ( ) ) expk k k kp P label k y y Xα Xα X α α
( ) arg max { }k klabel py
Variants
• ProCRC-l2 (closed form solution)
• ProCRC-l1
• Robust ProCRC (ProCRC-r )
85
2
2 2
2 21 12
minK K
i i i i ii i
αy X α Xα X α α
2
2
21 112
minK K
i i ii i ii
αy X α Xα X αα
2
11 11 2
minK K
i i ii i ii
αx αX αy X α
Face recognition: AR
Dim. 2580 500 300 100 50
SVM 86.98 86.98 86.84 84.26 78.11
NSC 76.40 76.40 76.10 74.39 70.39
CRC 92.13 93.70 93.85 88.84 78.97
SRC 93.85 93.56 92.85 90.99 84.26
CROC 92.28 92.42 91.56 86.84 78.97
ProCRC-l2 93.99 94.13 93.85 89.41 80.26
ProCRC-l1 94.13 94.13 93.42 90.99 84.55
Face recognition: Extended Yale B
Dim. 1024 500 300 100 50
SVM 93.72 94.74 95.68 93.72 89.72
NSC 93.49 94.11 92.07 91.99 89.17
CRC 96.86 96.78 95.92 91.99 83.60
SRC 97.17 97.10 96.31 93.80 90.27
CROC 95.76 96.39 94.51 92.47 89.17
ProCRC-l2 98.04 97.57 96.63 94.19 90.03
ProCRC-l1 97.65 98.12 97.10 94.74 90.98
Robust face recognition
• Random corruption (YaleB)
• Block occlusion (YaleB)
• Disguise (AR)
Corruption ratio 10% 20% 40% 60%
SRC-r 97.49 95.60 90.19 76.85
ProCRC-r 98.45 98.20 93.25 82.42
Occlusion ratio 10% 20% 30% 40%
SRC-r 90.42 85.64 78.89 70.09
ProCRC-r 98.12 92.62 86.42 77.16
Disguise Sunglasses Scarf
SRC-r 69.17 69.50
ProCRC-r 70.50 70.67
Handwritten digit recognition: MNIST
Number of training
samples per class50 100 300 500
SVM 89.35 92.10 94.88 95.93
NSC 91.06 92.86 85.29 78.26
CRC 72.21 82.22 86.54 87.46
SRC 80.12 85.63 89.30 92.70
CROC 91.06 92.86 89.93 89.37
ProCRC-l2 92.16 94.56 95.58 95.88
ProCRC-l1 92.59 94.83 95.97 96.26
Handwritten digit recognition: USPS
Number of training
samples per class50 100 200 300
SVM 93.46 95.31 95.91 96.30
NSC 93.48 93.25 90.21 87.85
CRC 89.89 91.67 92.36 92.79
SRC 92.58 93.99 95.63 95.86
CROC 93.48 93.25 91.40 91.87
ProCRC-l2 93.84 95.62 96.03 96.43
ProCRC-l1 94.69 96.19 97.03 97.27
Running time
• Intel Core (TM) i7-2720QM 2.20 GHz CPU with 8 GB RAM
• Running time (second) of different methods on the Extended Yale B dataset:
Remarks
• ProCRC provides a good probabilistic interpretationof collaborative representation based classifiers (NSC, SRC and CRC).
• ProCRC achieves higher classification accuracy than the competing classifiers in most experiments.
• ProCRC has small performance variation under different number of training samples and feature dimension.
It is robust to training sample size and feature dimension.
Take Research as Fun!Thank you!