Sparse Representation and Low Rank Methods for …it.ouc.edu.cn/valse/word/leizhang.pdfMy recent...

Post on 02-Jul-2018

217 views 0 download

Transcript of Sparse Representation and Low Rank Methods for …it.ouc.edu.cn/valse/word/leizhang.pdfMy recent...

Sparse Representation and Low Rank Methods for Image Restoration and Classification

Lei Zhang

Dept. of ComputingThe Hong Kong Polytechnic University

http://www.comp.polyu.edu.hk/~cslzhang/

My recent research focuses

• Sparse Representation, Dictionary Learning, Low Rank

– Image restoration

– Collaborative representation based pattern classification

• Image Quality Assessment

– Full reference and no reference IQA models

• Visual Tracking

– Fast and robust trackers

• Image Segmentation Evaluation

• Biometrics (face, finger-knuckle-print, palmprint)

A linear system

3

??

.

.

.

??

A dense solution

A sparse solution

Sparse solutions

4

How to solve the sparse coding problem?

• Greedy Search Approach for L0-minimization– Orthogonal Matching Pursuit

– Least angle regression

• Convex Optimization Method for L1-minimization– Interior Point

– Gradient Projection

– Proximal Gradient Descent (Iterative soft-thresholding)

– Augmented Lagrangian Methods

– Alternating Direction Method of Multipliers

• Non-convex Lp-minimization– W. Zuo, D. Meng, L. Zhang, X. Feng and D. Zhang, “A Generalized Iterated

Shrinkage Algorithm for Non-convex Sparse Coding,” In ICCV 2013.

5

Example applications

• Denoising

6

Example applications

• Deblurring

7

Example applications

• Superresolution

8

Example applications

• Medical image reconstruction (e.g., CT)

9

Qiong Xu, Hengyong Yu, Xuanqin Mou, Lei Zhang, Jiang Hsieh, and Ge Wang, “Low-dose X-ray CT Reconstruction via Dictionary Learning”, IEEE Transactions on Medical Imaging, vol. 31, pp. 1682-1697, 2012.

TV-based method Our method

Example applications

• Inpainting

10

Example applications

• Morphological component analysis (cartoon-texture decomposition)

11

J. Bobin, J.-L. Starck J. Fadili, Y. Moudden and D.L Donoho, "Morphological Component Analysis: an adaptive thresholding strategy", IEEE Transactions on Image Processing , Vol 16, No 11, pp 2675--2681, 2007.

= +

Why sparse: neuroscience perspective

• Observations on Primary Visual Cortex

– The Monkey Experiment by Hubel and Wiesel, 1968

Responses of a simple cell inmonkeys’ right striate cortex.

David Hubel and Torsten WieselNobel Prize Winner

12

Why sparse: neuroscience perspective

• Olshausen and Field’s Sparse Coding, 1996

– The basis function can be updated by gradient descent:

Resulted basis functions.

Courtesy by Olshausen and Field1996

13

Why sparse: probabilistic Bayes perspective

• Signal recovery in a Bayesian viewpoint

– Represent x as a linear combination of bases (dictionaryatoms)

– And assume that the representation coefficients are i.i.d. and follow some prior distribution:

x

~ exp( )p

i li

|ˆ arg max | arg maxP P P x xx y y xx x

Likelihood Prior

14

– The maximum a posteriori (MAP) solution:

– We have:

• If p=0, it is the L0-norm sparse coding problem.

• If p=1, it becomes the convex L1-norm sparse coding.

• If 0<p<1, it will be non-convex Lp-norm minimization.

2

2

( )

log ( ) log (

arg max |

arg max |

arg m ||

)

in ||p

MAP

i li

p

p p

y

y

y

Why sparse: probabilistic Bayes perspective

15

Why sparse: signal processing perspective

• x is called K-sparse if it is a linear combination of only K basis vectors. If K<<N, we say x is compressible.

• Measurement of x

1

K

i ii

x

y x

16

Why sparse: signal processing perspective

• Reconstruction

– If x is k-sparse, we can reconstruct x from y with M (M<<N) measurements:

– But the measurement matrix should satisfy the restricted isometry property (RIP) condition:• For any vector v sharing the same K nonzero entries as :

0ˆ arg min || | , s.t. | y

17

Image reconstruction: the problem

• Reconstruct x from its degraded measurement y

y = Hx + vH: the degradation matrix

v: Gaussian white noise

18

x y

Image reconstruction by sparse coding: the basic procedures

1. Partition the degraded image into overlapped patches.

2. Denote by the employed dictionary. For each patch, solve the following L1-norm sparse coding problem:

3. Reconstruct each patch by .

4. Put the reconstructed patch back to the original image. For the overlapped pixels between patches, average them.

5. In practice, the above procedures can be iterated for several rounds.

1

2

2ˆ arg min || | || ||| y

19

ˆ

How sparsity helps?

An illustrative example

• You are looking for a girlfriend/boyfriend.

– i.e., you are “reconstructing” the desired signal.

• Your objective is that she/he is “白-富-美”/ “高-富-帅”.

– i.e., you want a “clean” and “perfect” reconstruction.

• However, the candidates are limited.

– i.e., the dictionary is small.

• Can you find your ideal girlfriend/boyfriend?

20

How sparsity helps?

An illustrative example

• Candidate A is tall; however, he is not handsome.

• Candidate B is rich; however, he is too fat.

• Candidate C is handsome; however, he is poor.

• If you sparsely select one of them, none is ideal for you

– i.e., a sparse representation vector such as [0, 1, 0].

• How about a dense solution: (A+B+C)/3?

– i.e., a dense representation vector [1, 1, 1]/3

– The “reconstructed” boyfriend is a compromise of “高-富-帅”, and he is fat (i.e., has some noise) at the same time.

21

How sparsity helps?

An illustrative example

• So what’s wrong?

– This is because the dictionary is too small!

• If you can select your boyfriend/girlfriend from boys/girls all over the world (i.e., a large enough dictionary), there is a very high probability (nearly 1) that you will find him/her!

– i.e., a very sparse solution such as [0, …, 1, …, 0]

• In summary, a sparse solution with an over-completedictionary often works!

• Sparsity and redundancy are the two sides of the same coin.

22

The dictionary

• Usually, an over-complete dictionary is required in doing sparse representation.

• The dictionary can be formed by the off-the-shelf bases such as DCT bases, wavelets, curvelets, etc.

• Learning dictionaries from natural images has shown very promising results in image reconstruction.

• Dictionary learning has become a hot topic in image processing and computer vision.

23

M. Aharon, M. Elad, and A.M. Bruckstein, “The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311-4322, Nov. 2006.

Nonlocal self-similarity

• In natural images, usually we can find many similar patches

to a given path, which can be spatially far from it. This is

called nonlocal self-similarity.

• Nonlocal self-similarity has been widely and successfully

used in image reconstruction.

Representative image restoration methods

• K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-d transform-domain collaborative filtering," TIP 2007. (BM3D)

• J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Non-local sparse models for image restoration," ICCV 2009. (LSSC)

• J. Yang, J. Wright, T. Huang and Y. Ma, “Image super-resolution via sparse representation,” TIP 2010. (ScSR)

• D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” ICCV 2011. (EPLL)

• W. Dong, L. Zhang, G. Shi and X. Wu, “Image deblurring and supper-resolution by adaptive sparse domain selection and adaptive regularization,” TIP 2011. (ASDS)

• S. Wang, L. Zhang, Y. Liang and Q. Pan, “Semi-Coupled Dictionary Learning with Applications to Image Superresolution and Photo-Sketch Image Synthesis,” CVPR 2012. (SCDL)

• W. Dong, L. Zhang, G. Shi, and X. Li, “Nonlocally Centralized Sparse Representation for Image Restoration,” TIP 2013. (NCSR)

25

NCSR (ICCV’11, TIP’13)

• A simple but very effective sparse representation model was proposed.

• It outperforms many state-of-the-arts in image denoising, deblurring and super-resolution.

W. Dong, L. Zhang and G. Shi, “Centralized Sparse Representation for

Image Restoration”, in ICCV 2011.

W. Dong, L. Zhang, G. Shi and X. Li, “Nonlocally Centralized Sparse Representation for Image Restoration”, IEEE Trans. on Image Processing, vol. 22, no. 4, pp. 1620-1630, April 2013.

26

NCSR: The idea

• For true signal

• For degraded signal

• The sparse coding noise (SCN)

• To better reconstruct the signal, we need to reduce the SCN because:

2

1 2arg min , s.t.

xx

2

1 2arg min , s.t.

yy H

= yx

ˆx

y αx = x x Φα Φα Φυ

27

NCSR: The objective function

• The proposed objective function

• Key idea: Suppressing the SCN

• How to estimate x?

• The unbiased estimate:

• The zero-mean property of SCN makes

ˆ [ ]E x x

ˆ [ ] [ ]E E x x y

2

2ˆarg min +

pl

y xy H

28

• The nonlocal estimation of

• The simplified objective function

• The iterative solution:

[ ]E y

, , ,i

i i j i j

j C

μ2

, , 2ˆ ˆexp( / ) /i j i i j h W x x

2

21

arg min +p

N

i i li

y

y H μ

2( ) ( 1)

21

arg min +p

Nj j

i i li

y

y H μ

NCSR: The solution

29

• The Lp-norm is set to L1-norm since SCN is generally Laplacian distributed.

• The regularization parameter is adaptively

determined based on the MAP estimation principle.

• Local PCA dictionaries are used, and they are

adaptively learned from the image.

– We cluster the image patches, and for each cluster, a

PCA dictionary is learned and used to code the patches

within this cluster.

NSCR: The parameters and dictionaries

30

Denoising results

From left to right and top to bottom: original image, noisy image (=100), denoised images by SAPCA-BM3D (PSNR=25.20 dB; FSIM=0.8065), LSSC (PSNR=25.63 dB; FSIM=0.8017), EPLL (PSNR=25.44 dB; FSIM= 0.8100), and NCSR (PSNR=25.65 dB; FSIM=0.8068).

31

Deblurring results

Blurred FISTA (27.75 dB) BM3D (28.61 dB) NCSR (30.30 dB)

Blurred Fergus, et al [SIGGRAPH06]

NCSR Close up View

32

Super-resolution results

Low resolution TV (31.24 dB) ScSR (32.87 dB) NCSR (33.68 dB)

TV (31.34 dB) ScSR (31.55 dB) NCSR (34.00 dB)Low resolution

33

GHP (CVPR’13, TIP’14)

• Like noise, textures are fine scale structures in images, and most of the denoising algorithms will remove the textures while removing noise.

• Is it possible to preserve the texture structures, to some extent, in denoising?

• We made a good attempt in:

34

W. Zuo, L. Zhang, C. Song, and D. Zhang, “Texture Enhanced Image Denoising via Gradient Histogram Preservation,” in CVPR 2013.W. Zuo, L. Zhang, C. Song, D. Zhang, and H. Gao, “Gradient Histogram Estimation and Preservation for Texture Enhanced Image Denoising,” in TIP 2014.

GHP

• The key is to estimate and preserve the gradient histogram of true images and preserve it in the denoised image:

• An iterative histogram specification algorithm is developed for the efficient solution of the GHP model.

• Similar PSNR/SSIM measures to BM3D, LSSC and NCSR, but more natural and visually pleasant denoising results.

35

2

21

12

, 2ˆ arg min

( )

s.t. ,

i ii

F

F r

F

x

y x α βx

x x

x D α h h

GHP results: CVPR’13 logo

36

Original

GHPBM3D

Noisy

37GHP

BM3D

NCSR

LSSC

Ground truth

Noisy image

Group sparsity

38

??

.

.

.

??

??

.

.

.

??

??

.

.

.

??

. . .

A sparse solution

A group sparse solution

From 1D to 2D: rank minimization

39

Nuclear norm

40

Nuclear Norm Minimization(NNM)

41

NNM: pros and cons

• Pros

– Tightest convex envelope of rank minimization.

– Closed form solution.

• Cons

– Treat equally all the singular values. Ignore the different significances of matrix singular values.

42

Weighted nuclear norm minimization (WNNM)

43

Optimization of WNNM

44

Q. Xie, D. Meng, S. Gu, L. Zhang, W. Zuo, X. Feng, and Z. Xu, “On the optimization of weighted nuclear norm minimization,” Technical Report, to be online soon.

An important corollary

45

Q. Xie, D. Meng, S. Gu, L. Zhang, W. Zuo, X. Feng, and Z. Xu, “On the optimization of weighted nuclear norm minimization,” Technical Report, to be online soon.

Application of WNNM to image denoising

1. For each noisy patch, search in the image for its nonlocal similar patches to form matrix Y.

2. Solve the WNNM problem to estimate the clean patches X from Y.

3. Put the clean patch back to the image.

4. Repeat the above procedures several times to obtain the denoised image.

46

WNNM based image denoising

47

… …WNNM

S. Gu, L. Zhang, W. Zuo and X. Feng, “Weighted Nuclear NormMinimization with Application to Image Denoising,” CVPR 2014.

The weights

48

Experimental Results

49

(a) Ground truth (b) Noisy image (PSNR: 14.16dB) (c) BM3D (PSNR: 26.78dB) (d) EPLL (PSNR: 26.65dB)

(e) LSSC (PSNR: 26.77dB) (f) NCSR (PSNR: 26.66dB) (g) SAIST (PSNR: 26.63dB) (h) WNNM (PSNR: 26.98dB)

Denoising results on image Boats by different method (noise level sigma=50).

Experimental Results

50

(a) Ground truth (b) Noisy image (PSNR:dB) (c) BM3D (PSNR: 24.22dB) (d) EPLL (PSNR: 22.46dB)

(e) LSSC (PSNR: 24.04dB) (f) NCSR (PSNR: 23.76dB) (g) SAIST (PSNR: 24.26dB) (h) WNNM (PSNR: 24.68dB)

Denoising results on image Fence by different method (noise level sigma=75).

Experimental Results

51

(a) Ground truth (b) Noisy image ( PSNR: 8.10dB) (c) BM3D (PSNR: 22.52dB) (d) EPLL (PSNR: 22.23dB)

(e) SSC (PSNR: 22.24dB) (f) NCSR (PSNR: 22.11dB) (g) SAIST (PSNR: 22.61dB) (h) WNNM (PSNR: 22.91dB)

Denoising results on image Monarch by different method (noise level sigma=100).

Experimental Results

52

(a) Ground truth (b) Noisy image (PSNR:8.10dB) (c) BM3D (PSNR: 33.05dB) (d) EPLL (PSNR: 32.61dB)

(e) LSSC (PSNR: 32.88dB) (f) NCSR (PSNR: 32.95dB) (g) SAIST (PSNR: 33.08dB) (h) WNNM (PSNR: 33.12dB)

Denoising results on image House by different method (noise level sigma=100).

Experimental ResultsSigma=20

BM3D EPLL LSSC NCSR SAIST WNNMC-Man 30.48 30.34 30.61 30.47 30.45 30.75House 33.77 32.98 34.11 33.87 33.75 34.04

Peppers 31.29 31.16 31.37 31.19 31.32 31.55Montage 33.61 32.56 33.51 33.26 33.41 34.16

Leaves 30.10 29.39 30.47 30.45 30.64 31.10Starfish 29.68 29.57 29.96 29.93 29.97 30.28Mornar. 30.35 30.48 30.59 30.62 30.76 31.10Airplane 29.55 29.67 29.69 29.58 29.65 29.89

Paint 30.36 30.39 30.59 30.33 30.51 30.75JellyBean 34.18 33.81 34.55 34.52 34.31 34.85

Fence 29.93 29.25 30.07 30.11 30.16 30.37Parrot 29.96 29.96 29.95 29.90 29.97 30.19Lena 33.05 32.61 32.88 32.95 33.08 33.12

Barbara 31.78 29.76 31.59 31.78 32.16 32.21Boat 30.88 30.66 30.91 30.79 30.84 31.00Hill 30.72 30.49 30.72 30.65 30.69 30.80

F.print 28.81 28.28 28.78 28.96 29.01 29.02Man 30.59 30.63 30.72 30.59 30.67 30.74

Couple 30.76 30.54 30.74 30.60 30.66 30.82Straw 26.98 26.80 27.17 27.32 27.42 27.44

AVE 30.84 30.47 30.95 30.89 30.97 31.21

53

Experimental ResultsSigma=40

BM3D EPLL LSSC NCSR SAIST WNNMC-Man 27.18 27.04 27.33 27.12 27.09 27.48House 30.65 29.88 31.10 30.80 31.14 31.31

Peppers 27.70 27.73 27.86 27.68 27.77 28.05Montage 29.52 28.47 29.43 29.00 29.32 29.92

Leaves 25.71 25.62 26.04 26.25 26.49 26.95Starfish 26.06 26.11 26.22 26.21 26.39 26.60Mornar. 26.72 26.89 26.87 26.85 27.16 27.47Airplane 26.08 26.29 26.23 25.96 26.26 26.51

Paint 26.69 26.88 26.77 26.50 26.94 27.10JellyBean 30.21 29.98 30.76 30.56 30.51 30.85

Fence 26.84 25.75 26.89 26.77 27.01 27.32Parrot 26.69 26.80 26.75 26.66 26.88 27.10Lena 29.86 29.47 29.90 29.92 30.07 30.11

Barbara 27.99 26.02 28.17 28.20 28.77 28.77Boat 27.74 27.64 27.77 27.65 27.68 27.96Hill 27.99 27.81 27.99 27.83 27.96 28.12

F.print 25.30 24.73 25.30 25.51 25.60 25.57Man 27.65 27.63 27.64 27.54 27.60 27.80

Couple 27.48 27.27 27.41 27.24 27.40 27.61Straw 23.05 23.11 23.56 23.46 23.78 23.76

AVE 27.36 27.06 27.50 27.39 27.59 27.82

54

Experimental ResultsSigma=50

BM3D EPLL LSSC NCSR SAIST WNNMC-Man 26.12 26.02 26.35 26.14 26.15 26.42House 29.69 28.76 29.99 29.62 30.17 30.32

Peppers 26.68 26.63 26.79 26.82 26.73 26.91Montage 27.9 27.17 28.10 27.84 28.0 28.27

Leaves 24.68 24.38 24.81 25.04 25.25 25.47Starfish 25.04 25.04 25.12 25.07 25.29 25.44Mornar. 25.82 25.78 25.88 25.73 26.1 26.32Airplane 25.10 25.24 25.25 24.93 25.34 25.43

Paint 25.67 25.77 25.59 25.37 25.77 25.98JellyBean 29.26 28.75 29.42 29.29 29.32 29.62

Fence 25.92 24.58 25.87 25.78 26.00 26.43Parrot 25.90 25.84 25.82 25.71 25.95 26.09Lena 29.05 28.42 28.95 28.90 29.01 29.24

Barbara 27.23 24.82 27.03 26.99 27.51 27.79Boat 26.78 26.65 26.77 26.66 26.63 26.97Hill 27.19 26.96 27.14 26.99 27.04 27.34

F.print 24.53 23.59 24.26 24.48 24.52 24.67Man 26.81 26.72 26.72 26.67 26.68 26.94

Couple 26.46 26.24 26.35 26.19 26.30 26.65Straw 22.29 21.93 22.51 22.30 22.65 22.74

AVE 26.406 25.9645 26.436 26.326 26.520 26.752

55

Experimental ResultsSigma=75

BM3D EPLL LSSC NCSR SAIST WNNMC-Man 24.33 24.20 24.39 24.23 24.30 24.55House 27.51 26.68 27.75 27.22 28.08 28.25

Peppers 24.73 24.56 24.65 24.36 24.66 24.93Montage 25.52 24.90 25.40 25.49 25.46 25.73

Leaves 22.49 22.03 22.16 22.60 22.89 23.06Starfish 23.27 23.16 23.12 23.20 23.35 23.47Mornar. 23.91 23.72 23.66 23.67 23.98 24.31Airplane 23.47 23.35 23.41 23.18 23.60 23.75

Paint 23.80 23.82 23.52 23.44 23.83 24.07JellyBean 27.22 26.58 27.21 27.18 27.08 27.44

Fence 24.22 22.46 24.04 23.76 24.26 24.68Parrot 24.19 24.04 24.01 23.90 24.17 24.37Lena 27.26 26.57 27.21 27.00 27.23 27.54

Barbara 25.12 22.94 25.01 24.73 25.54 25.81Boat 25.12 24.89 25.03 24.87 24.99 25.29Hill 25.68 25.46 25.57 25.40 25.56 25.88

F.print 22.83 21.46 22.55 22.66 22.88 23.02Man 25.32 25.14 25.10 25.10 25.14 25.42

Couple 24.70 24.45 24.51 24.33 24.54 24.86Straw 20.56 19.94 20.71 20.39 20.90 21.00

AVE 24.56 24.02 24.45 24.34 24.62 24.86

56

Experimental ResultsSigma=100

BM3D EPLL LSSC NCSR SAIST WNNMC-Man 23.07 22.86 23.15 22.93 23.09 23.36House 25.87 25.19 25.71 25.56 26.53 26.68

Peppers 23.39 23.08 23.20 22.84 23.32 23.46Montage 23.89 23.42 23.77 23.74 23.98 24.16

Leaves 20.91 20.25 20.58 20.86 21.40 21.57Starfish 22.10 21.92 21.77 21.91 22.10 22.22Mornar. 22.52 22.23 22.24 22.11 22.61 22.95Airplane 22.11 22.02 21.69 21.83 22.27 22.55

Paint 22.51 22.50 22.14 22.11 22.42 22.74JellyBean 25.80 25.17 25.64 26.66 25.82 26.04

Fence 22.92 21.11 22.71 22.23 22.98 23.37Parrot 22.96 22.71 22.79 22.53 23.04 23.19Lena 25.95 25.30 25.96 25.71 25.93 26.20

Barbara 23.62 22.14 23.54 23.20 24.07 24.37Boat 23.97 23.71 23.87 23.68 23.80 24.10Hill 24.58 24.43 24.47 24.36 24.29 24.75

F.print 21.61 19.85 21.30 21.39 21.62 21.81Man 24.22 24.07 23.98 24.02 24.01 24.36

Couple 23.51 23.32 23.27 23.15 23.21 23.56Straw 19.43 18.84 19.43 19.10 19.42 19.67

AVE 23.247 22.706 23.0605 22.996 23.2955 23.555

57

What’s next?

• Actually I don’t know …

• Probably “Sparse/Low-rank + Big Data”?

– Theoretical analysis?

–Algorithms and implementation?

• W.r.t. image restoration, one interesting topic (at least I think) is perceptual quality oriented image restoration.

59

Sparse representation: data perspective

• Curse of dimensionality

– For real-world high-dimensional data, the available samples are usually insufficient.

– Fortunately, real data often lie on low-dimensional, sparse, or degenerate structures in the high-dimensional space.

Subspace methods:PCA, LLE, ISOMAP, ICA,…

Coding methods:Bag-of-words, Mixture-models, …

60

Sparse representation based classification (SRC)

coefficientsTraining dictionaryTest image

is sparse: ideally, only supported on images of the same subject

J. Wright, A. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. Robust Face Recognition

via Sparse Representation, PAMI 2009 .

Representation:1

min s.t. y X

Classification: ( ) arg mink klabel ry2

ˆk k kr y X

y 1 2, ,..., KX = X X X 1 2; ;...; K=

61

How to process a corrupted face?

coefficientsTraining dictionaryTest image

Equivalent to

Representation: 1

min s.t. ,y X I

Classification: ( ) arg mink klabel ry2

ˆ ˆk k kr y X e

y 1 2; ;...; K= e

1 1min s.t. e y X e

62

1 2, ,..., KX = X X X

• Can we have a more principled way to deal with various types of outliers in face images?

Regularized robust coding

Meng Yang Lei Zhang, Jian Yang and David Zhang. Robust sparse coding for facerecognition. In CVPR 2011.Meng Yang, Lei Zhang, Jian Yang, and David Zhang, Regularized robust coding for facerecognition. IEEE Trans. Image Processing, 2013.

Our solution:

63

One big question!

• Is it true that the sparse representation helps face recognition?

64

L. Zhang, M. Yang, and X. Feng. Sparse Representation or Collaborative Representation: Which Helps Face Recognition? In ICCV 2011.L. Zhang, M. Yang, X. Feng, Y. Ma and D. Zhang, Collaborative Representation based Classification for Face Recognition, arXiv:1204.2358.

Within-class or across-class representation

65

Enough?

Training samples

Yes

Within-class representation

Regularized nearest subspace

Across-class representation

No

Collaborative representation

Analyze the working mechanism of SRC.

Regularized nearest subspace

66

Regularized nearest subspace (RNS)

Enough?

Yes

2

2min . .

pi l

s t y X

The query sample is represented on the training samples from the same class with regularized representation coefficient:

Training samples

Why regularization?

Assume that we have enough training samples for each

class so that all the images of class i can be faithfully

represented by Xi.

All the face images are somewhat similar, while some

subjects may have very similar face images.

Let Xj = Xi +. If is small (meet some condition), there is

222

2

1 ( ) min 1,j i

i m n

e e

Xy

ej and ej are the representation error of Xj and Xi to represent y

without any constraint on the representation coefficient.

67

Regularized nearest subspace with Lp-norm

68

• Regularization makes classification more stable.• L2-norm regularization can play the similar role to L0

and L1 norm in this classification task.

0 10 20 300.45

0.5

0.55

0.6

0.65

L0 norm of representation coefficients

Re

pre

se

nta

tio

n r

esid

ua

l

Correct class

Wrong class

1 2 4 6 8 10 11

0.46

0.48

0.5

0.52

0.54

0.56

L1 norm of representation coefficients

Re

pre

se

nta

tio

n r

esid

ua

l

Correct class

Wrong class

2 4 6 8 10 12

0.46

0.48

0.5

0.52

0.54

L2 norm of representation coefficients

Re

pre

se

nta

tio

n r

esid

ua

l

Correct class

Wrong class

2

2min . .

pi l

s t y X

Why collaborative representation?

69

Collaborative Representation

Enough?

No

• Dilute the small-size-sample problem• Consider the competition between different classes

2

1 22min [ , , , ]

pKl

y X X X X X

• FR is a typical small-sample-size

problem, and Xi is under-complete

in general.

• Face images of different classes

share similarities.

• Samples from other classes can

be used to collaboratively

represent the sample in one class.

Training samples

Why collaborative representation?

Without considering the lp-norm regularization in coding, the associated representation is actually the perpendicular projection of y onto the space spanned by X.

2

2ˆ ˆ ˆarg min i ii αα y Xα y X α

70

2*

2ˆ ˆOnly works for classificationi i ie y X α

The “double checking” in e* makes the classification more effective and robust.

L1 vs. L2 in regularization

71

0 100 200 300 400 500 600 700-0.1

0

0.1

0.2

0.3

0.4

0.5

Coefficients of l1-regularized minimization

0 100 200 300 400 500 600 700-0.05

0

0.05

0.1

0.15

0.2

Coefficients of l2-regularized minimization

0 1e-6 5e-5 5e-4 5e-3 1e-2 0.1 0.5 1 50

0.2

0.4

0.6

0.8

1

Re

co

gn

itio

n r

ate

l1-regularized minimization

l2-regularized minimization

Sparse! Non-sparse!

2

2min

pl y X

Though L1 leads to sparser coefficients, the classification rates are similar.

Collaborative representation model

72

min , 1 2q pl l

p q or y X

q=2,p=1, Sparse Representation based Classification (S-SRC)

q=1,p=1, Robust Sparse Representation based Classification (R-SRC)

q=2,p=2, Collaborative Representation based Classification with regularized least square (CRC_RLS)

q=1,p=2, Robust Collaborative Representation based Classification (R-CRC)

CRC_RLS has a closed-form solution; others have iterative solutions.

Gender classification

73

Male?Female?

700 Male Samples 700 Female Samples

Training Set

Feature (AR) RNS_L1 RNS_L2 CRC-RLS S-SRC SVM LRC NN

300-d Eigenface 94.9% 94.9% 93.7% 92.3% 92.4% 27.3% 90.7%

Big benefit (67% improvements) brought by regularization on coding vector!

Face recognition without occlusion

74

Identity?Training Set

Training samples per subject are limited.

50 100 150 200 250 30065

70

75

80

85

90

95

Eigenface dimension

Recognitio

n r

ate

AR database

NN

LRC

SVM

S-SRC

CRC_RLS

Highest accuracy when feature dimension is not too low.

Face recognition with pixel corruption

75

0

20

40

60

80

100

120

0~20 30 40 50 60 70 80 90

R-SRC

R-CRC

Original image

70% random corruption

Identity?

Corruption percent (%) on EYB

Rec

ogn

itio

n r

ate

Face recognition with real disguise

76

Identity?

Disguise(AR) Sunglass (test 1) Scarf (test 1) Sunglass (test 2) Scarf (test 2)

R-SRC 87.0% 59.5% 69.8% 40.8%

CRC-RLS 68.5% 90.5% 57.2% 71.8%

R-CRC 87.0% 86.0% 65.8% 73.2%

Significant improvement in the case of scarf

Running time

77

Corruption (MPIE) L1_ls Homotopy SpaRSA FISTA ALM R-CRC

Average running time 17.35 8.05 15.97 8.76 16.02 0.916

No occlusion (MPIE) L1_ls Homotopy FISTA ALM CRC-RLS

Recognition rate 92.6% 92.0% 79.6% 92.0% 92.2%

Running time (s) 21.290 1.7600 1.6360 0.5277 0.0133

Speed-up 39.7-1600.7 times!

Speed-up 8.79-18.94 times!

One even bigger question!

78

L. Zhang, W. Zuo, X. Feng, and Y. Ma, “A Probabilistic Formulation of Collaborative Representation based Classifier,” Preprint, to be online soon.

• SRC/CRC represents the query face by gallery faces from all classes. However, it uses the representation residual by each class for classification.

• So what kind of classifier SRC/CRC is?

• Why SRC/CRC works?

Probabilistic subspace of Xk

• Samples of class k: Xk = [x1, x2, ..., xn].

• S : the subspace spanned by Xk.

– Each data point x in S can be written as: x= Xk.

• We assume that the probability that x belongs to class k is determined by :

• It can be shown that such a probability depends

on the distribution of the samples of Xk.

• The red point will have much higher probability than the green one.

79

2

2( ( ) ) expP label k c x α

Representation of query sample y

• The query sample y usually lies outside the subspace {Xk}.

• The probability that y belongs to class k is determined by two factors:

– Given x= Xk, how likely y has the same class label as x?

– What is the probability that x belong to

class k?

• By maximizing the product of the two

probabilities, we have

80

y

Xk

2 2

2 2max log ( ( ) ) mink kp P label k αy y X α α

Two classes

• X1 = [x1,1, x1,2, ..., x1,n]; X2 = [x2,1, x2,2, ..., x2,n]

• S : the subspace spanned by [X1 X2]

– Each data point x in S can be written as: x= X11+ X22.

• x belongs to X1 or X2 with certain probability:

81

x 2 2

1 1 12 2

( ( ) 1)

exp

P label

x

x X α α

2 2

2 2 22 2

( ( ) 2)

exp

P label

x

x X α α

Collaborative representation of y

• y lies outside the subspace {X11+ X22}. The probability that y belongs to class 1 or 2 depends on how likely y has the same class label as x=X11+X22 and the probability that x belongs to class 1 or 2:

82

x

y

1 2

1

2 2 2

1 1 2 2 1 1 12 2 2{ , }

maxlog ( ( ) 1)

min ( )

p P label

α α

y

y X α X α Xα X α α

1 2

2

2 2 2

1 1 2 2 2 2 22 2 2{ , }

maxlog ( ( ) 2)

min ( )

p P label

α α

y

y X α X α Xα X α α

General case

• The probability that query sample y belongs to class k can be computed as:

• The classification rule:

• Problem:

– For each class k, we need to solve the optimization once. This can be costly.

83

2 22

21 1{ } 2 2

mini

K K

k i i i i k k ki ip

αy X α X α X α α

( ) arg max { }k klabel py

Joint probability

• For a data point x in the subspace spanned by all classes X, we define the following joint probability:

• For the query sample y outside the subspace of X, we have

• We use the marginal probability for classification:

• We only need to solve the optimization once.

84

2 2

2 21( ( ) 1,..., ( ) ) exp

K

i i iiP label label K

x x x X α α

2

2 2

2 21 12

max log( ( )) minK K

i i i i ii iP

αy X α Xα X α α

2 2 2

2 2 2ˆ ˆ ˆ ˆ( ( ) ) expk k k kp P label k y y Xα Xα X α α

( ) arg max { }k klabel py

Variants

• ProCRC-l2 (closed form solution)

• ProCRC-l1

• Robust ProCRC (ProCRC-r )

85

2

2 2

2 21 12

minK K

i i i i ii i

αy X α Xα X α α

2

2

21 112

minK K

i i ii i ii

αy X α Xα X αα

2

11 11 2

minK K

i i ii i ii

αx αX αy X α

Face recognition: AR

Dim. 2580 500 300 100 50

SVM 86.98 86.98 86.84 84.26 78.11

NSC 76.40 76.40 76.10 74.39 70.39

CRC 92.13 93.70 93.85 88.84 78.97

SRC 93.85 93.56 92.85 90.99 84.26

CROC 92.28 92.42 91.56 86.84 78.97

ProCRC-l2 93.99 94.13 93.85 89.41 80.26

ProCRC-l1 94.13 94.13 93.42 90.99 84.55

Face recognition: Extended Yale B

Dim. 1024 500 300 100 50

SVM 93.72 94.74 95.68 93.72 89.72

NSC 93.49 94.11 92.07 91.99 89.17

CRC 96.86 96.78 95.92 91.99 83.60

SRC 97.17 97.10 96.31 93.80 90.27

CROC 95.76 96.39 94.51 92.47 89.17

ProCRC-l2 98.04 97.57 96.63 94.19 90.03

ProCRC-l1 97.65 98.12 97.10 94.74 90.98

Robust face recognition

• Random corruption (YaleB)

• Block occlusion (YaleB)

• Disguise (AR)

Corruption ratio 10% 20% 40% 60%

SRC-r 97.49 95.60 90.19 76.85

ProCRC-r 98.45 98.20 93.25 82.42

Occlusion ratio 10% 20% 30% 40%

SRC-r 90.42 85.64 78.89 70.09

ProCRC-r 98.12 92.62 86.42 77.16

Disguise Sunglasses Scarf

SRC-r 69.17 69.50

ProCRC-r 70.50 70.67

Handwritten digit recognition: MNIST

Number of training

samples per class50 100 300 500

SVM 89.35 92.10 94.88 95.93

NSC 91.06 92.86 85.29 78.26

CRC 72.21 82.22 86.54 87.46

SRC 80.12 85.63 89.30 92.70

CROC 91.06 92.86 89.93 89.37

ProCRC-l2 92.16 94.56 95.58 95.88

ProCRC-l1 92.59 94.83 95.97 96.26

Handwritten digit recognition: USPS

Number of training

samples per class50 100 200 300

SVM 93.46 95.31 95.91 96.30

NSC 93.48 93.25 90.21 87.85

CRC 89.89 91.67 92.36 92.79

SRC 92.58 93.99 95.63 95.86

CROC 93.48 93.25 91.40 91.87

ProCRC-l2 93.84 95.62 96.03 96.43

ProCRC-l1 94.69 96.19 97.03 97.27

Running time

• Intel Core (TM) i7-2720QM 2.20 GHz CPU with 8 GB RAM

• Running time (second) of different methods on the Extended Yale B dataset:

Remarks

• ProCRC provides a good probabilistic interpretationof collaborative representation based classifiers (NSC, SRC and CRC).

• ProCRC achieves higher classification accuracy than the competing classifiers in most experiments.

• ProCRC has small performance variation under different number of training samples and feature dimension.

It is robust to training sample size and feature dimension.

Take Research as Fun!Thank you!