Sparse Representation and Low Rank Methods for …it.ouc.edu.cn/valse/word/leizhang.pdfMy recent...

Sparse Representation and Low Rank Methods for Image Restoration and Classification

Lei Zhang

Dept. of ComputingThe Hong Kong Polytechnic University

http://www.comp.polyu.edu.hk/~cslzhang/

My recent research focuses

• Sparse Representation, Dictionary Learning, Low Rank

– Image restoration

– Collaborative representation based pattern classification

• Image Quality Assessment

– Full reference and no reference IQA models

• Visual Tracking

– Fast and robust trackers

• Image Segmentation Evaluation

• Biometrics (face, finger-knuckle-print, palmprint)

A linear system

A dense solution

A sparse solution

Sparse solutions

How to solve the sparse coding problem?

• Greedy Search Approach for L0-minimization– Orthogonal Matching Pursuit

– Least angle regression

• Convex Optimization Method for L1-minimization– Interior Point

– Gradient Projection

– Proximal Gradient Descent (Iterative soft-thresholding)

– Augmented Lagrangian Methods

– Alternating Direction Method of Multipliers

• Non-convex Lp-minimization– W. Zuo, D. Meng, L. Zhang, X. Feng and D. Zhang, “A Generalized Iterated

Shrinkage Algorithm for Non-convex Sparse Coding,” In ICCV 2013.

Example applications

• Denoising

• Deblurring

• Superresolution

• Medical image reconstruction (e.g., CT)

Qiong Xu, Hengyong Yu, Xuanqin Mou, Lei Zhang, Jiang Hsieh, and Ge Wang, “Low-dose X-ray CT Reconstruction via Dictionary Learning”, IEEE Transactions on Medical Imaging, vol. 31, pp. 1682-1697, 2012.

TV-based method Our method

• Inpainting

• Morphological component analysis (cartoon-texture decomposition)

J. Bobin, J.-L. Starck J. Fadili, Y. Moudden and D.L Donoho, "Morphological Component Analysis: an adaptive thresholding strategy", IEEE Transactions on Image Processing , Vol 16, No 11, pp 2675--2681, 2007.

Why sparse: neuroscience perspective

• Observations on Primary Visual Cortex

– The Monkey Experiment by Hubel and Wiesel, 1968

Responses of a simple cell inmonkeys’ right striate cortex.

David Hubel and Torsten WieselNobel Prize Winner

Why sparse: neuroscience perspective

• Olshausen and Field’s Sparse Coding, 1996

– The basis function can be updated by gradient descent:

Resulted basis functions.

Courtesy by Olshausen and Field1996

Why sparse: probabilistic Bayes perspective

• Signal recovery in a Bayesian viewpoint

– Represent x as a linear combination of bases (dictionaryatoms)

– And assume that the representation coefficients are i.i.d. and follow some prior distribution:

~ exp( )p

|ˆ arg max | arg maxP P P x xx y y xx x

Likelihood Prior

– The maximum a posteriori (MAP) solution:

– We have:

• If p=0, it is the L0-norm sparse coding problem.

• If p=1, it becomes the convex L1-norm sparse coding.

• If 0<p<1, it will be non-convex Lp-norm minimization.

log ( ) log (

arg max |

arg m ||

in ||p

Why sparse: probabilistic Bayes perspective

Why sparse: signal processing perspective

• x is called K-sparse if it is a linear combination of only K basis vectors. If K<<N, we say x is compressible.

• Measurement of x

Why sparse: signal processing perspective

• Reconstruction

– If x is k-sparse, we can reconstruct x from y with M (M<<N) measurements:

– But the measurement matrix should satisfy the restricted isometry property (RIP) condition:• For any vector v sharing the same K nonzero entries as :

0ˆ arg min || | , s.t. | y

Image reconstruction: the problem

• Reconstruct x from its degraded measurement y

y = Hx + vH: the degradation matrix

v: Gaussian white noise

Image reconstruction by sparse coding: the basic procedures

1. Partition the degraded image into overlapped patches.

2. Denote by the employed dictionary. For each patch, solve the following L1-norm sparse coding problem:

3. Reconstruct each patch by .

4. Put the reconstructed patch back to the original image. For the overlapped pixels between patches, average them.

5. In practice, the above procedures can be iterated for several rounds.

2ˆ arg min || | || ||| y

How sparsity helps?

An illustrative example

• You are looking for a girlfriend/boyfriend.

– i.e., you are “reconstructing” the desired signal.

• Your objective is that she/he is “白-富-美”/ “高-富-帅”.

– i.e., you want a “clean” and “perfect” reconstruction.

• However, the candidates are limited.

– i.e., the dictionary is small.

• Can you find your ideal girlfriend/boyfriend?

How sparsity helps?

• Candidate A is tall; however, he is not handsome.

• Candidate B is rich; however, he is too fat.

• Candidate C is handsome; however, he is poor.

• If you sparsely select one of them, none is ideal for you

– i.e., a sparse representation vector such as [0, 1, 0].

• How about a dense solution: (A+B+C)/3?

– i.e., a dense representation vector [1, 1, 1]/3

– The “reconstructed” boyfriend is a compromise of “高-富-帅”, and he is fat (i.e., has some noise) at the same time.

How sparsity helps?

• So what’s wrong?

– This is because the dictionary is too small!

• If you can select your boyfriend/girlfriend from boys/girls all over the world (i.e., a large enough dictionary), there is a very high probability (nearly 1) that you will find him/her!

– i.e., a very sparse solution such as [0, …, 1, …, 0]

• In summary, a sparse solution with an over-completedictionary often works!

• Sparsity and redundancy are the two sides of the same coin.

The dictionary

• Usually, an over-complete dictionary is required in doing sparse representation.

• The dictionary can be formed by the off-the-shelf bases such as DCT bases, wavelets, curvelets, etc.

• Learning dictionaries from natural images has shown very promising results in image reconstruction.

• Dictionary learning has become a hot topic in image processing and computer vision.

M. Aharon, M. Elad, and A.M. Bruckstein, “The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311-4322, Nov. 2006.

Nonlocal self-similarity

• In natural images, usually we can find many similar patches

to a given path, which can be spatially far from it. This is

called nonlocal self-similarity.

• Nonlocal self-similarity has been widely and successfully

used in image reconstruction.

Representative image restoration methods

• K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3-d transform-domain collaborative filtering," TIP 2007. (BM3D)

• J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Non-local sparse models for image restoration," ICCV 2009. (LSSC)

• J. Yang, J. Wright, T. Huang and Y. Ma, “Image super-resolution via sparse representation,” TIP 2010. (ScSR)

• D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” ICCV 2011. (EPLL)

• W. Dong, L. Zhang, G. Shi and X. Wu, “Image deblurring and supper-resolution by adaptive sparse domain selection and adaptive regularization,” TIP 2011. (ASDS)

• S. Wang, L. Zhang, Y. Liang and Q. Pan, “Semi-Coupled Dictionary Learning with Applications to Image Superresolution and Photo-Sketch Image Synthesis,” CVPR 2012. (SCDL)

• W. Dong, L. Zhang, G. Shi, and X. Li, “Nonlocally Centralized Sparse Representation for Image Restoration,” TIP 2013. (NCSR)

NCSR (ICCV’11, TIP’13)

• A simple but very effective sparse representation model was proposed.

• It outperforms many state-of-the-arts in image denoising, deblurring and super-resolution.

W. Dong, L. Zhang and G. Shi, “Centralized Sparse Representation for

Image Restoration”, in ICCV 2011.

W. Dong, L. Zhang, G. Shi and X. Li, “Nonlocally Centralized Sparse Representation for Image Restoration”, IEEE Trans. on Image Processing, vol. 22, no. 4, pp. 1620-1630, April 2013.

NCSR: The idea

• For true signal

• For degraded signal

• The sparse coding noise (SCN)

• To better reconstruct the signal, we need to reduce the SCN because:

1 2arg min , s.t.

y αx = x x Φα Φα Φυ

NCSR: The objective function

• The proposed objective function

• Key idea: Suppressing the SCN

• How to estimate x?

• The unbiased estimate:

• The zero-mean property of SCN makes

ˆ [ ]E x x

ˆ [ ] [ ]E E x x y

2ˆarg min +

y xy H

• The nonlocal estimation of

• The simplified objective function

• The iterative solution:

[ ]E y

, , ,i

i i j i j

, , 2ˆ ˆexp( / ) /i j i i j h W x x

arg min +p

i i li

y H μ

2( ) ( 1)

arg min +p

i i li

y H μ

NCSR: The solution

• The Lp-norm is set to L1-norm since SCN is generally Laplacian distributed.

• The regularization parameter is adaptively

determined based on the MAP estimation principle.

• Local PCA dictionaries are used, and they are

adaptively learned from the image.

– We cluster the image patches, and for each cluster, a

PCA dictionary is learned and used to code the patches

within this cluster.

NSCR: The parameters and dictionaries

Denoising results

From left to right and top to bottom: original image, noisy image (=100), denoised images by SAPCA-BM3D (PSNR=25.20 dB; FSIM=0.8065), LSSC (PSNR=25.63 dB; FSIM=0.8017), EPLL (PSNR=25.44 dB; FSIM= 0.8100), and NCSR (PSNR=25.65 dB; FSIM=0.8068).

Deblurring results

Blurred FISTA (27.75 dB) BM3D (28.61 dB) NCSR (30.30 dB)

Blurred Fergus, et al [SIGGRAPH06]

NCSR Close up View

Super-resolution results

Low resolution TV (31.24 dB) ScSR (32.87 dB) NCSR (33.68 dB)

TV (31.34 dB) ScSR (31.55 dB) NCSR (34.00 dB)Low resolution

GHP (CVPR’13, TIP’14)

• Like noise, textures are fine scale structures in images, and most of the denoising algorithms will remove the textures while removing noise.

• Is it possible to preserve the texture structures, to some extent, in denoising?

• We made a good attempt in:

W. Zuo, L. Zhang, C. Song, and D. Zhang, “Texture Enhanced Image Denoising via Gradient Histogram Preservation,” in CVPR 2013.W. Zuo, L. Zhang, C. Song, D. Zhang, and H. Gao, “Gradient Histogram Estimation and Preservation for Texture Enhanced Image Denoising,” in TIP 2014.

• The key is to estimate and preserve the gradient histogram of true images and preserve it in the denoised image:

• An iterative histogram specification algorithm is developed for the efficient solution of the GHP model.

• Similar PSNR/SSIM measures to BM3D, LSSC and NCSR, but more natural and visually pleasant denoising results.

, 2ˆ arg min

s.t. ,

y x α βx

x D α h h

GHP results: CVPR’13 logo

Original

GHPBM3D

Ground truth

Noisy image

Group sparsity

A sparse solution

A group sparse solution

From 1D to 2D: rank minimization

Nuclear norm

Nuclear Norm Minimization(NNM)

NNM: pros and cons

• Pros

– Tightest convex envelope of rank minimization.

– Closed form solution.

• Cons

– Treat equally all the singular values. Ignore the different significances of matrix singular values.

Weighted nuclear norm minimization (WNNM)

Optimization of WNNM

Q. Xie, D. Meng, S. Gu, L. Zhang, W. Zuo, X. Feng, and Z. Xu, “On the optimization of weighted nuclear norm minimization,” Technical Report, to be online soon.

An important corollary

Q. Xie, D. Meng, S. Gu, L. Zhang, W. Zuo, X. Feng, and Z. Xu, “On the optimization of weighted nuclear norm minimization,” Technical Report, to be online soon.

Application of WNNM to image denoising

1. For each noisy patch, search in the image for its nonlocal similar patches to form matrix Y.

2. Solve the WNNM problem to estimate the clean patches X from Y.

3. Put the clean patch back to the image.

4. Repeat the above procedures several times to obtain the denoised image.

WNNM based image denoising

… …WNNM

S. Gu, L. Zhang, W. Zuo and X. Feng, “Weighted Nuclear NormMinimization with Application to Image Denoising,” CVPR 2014.

The weights

Experimental Results

(a) Ground truth (b) Noisy image (PSNR: 14.16dB) (c) BM3D (PSNR: 26.78dB) (d) EPLL (PSNR: 26.65dB)

(e) LSSC (PSNR: 26.77dB) (f) NCSR (PSNR: 26.66dB) (g) SAIST (PSNR: 26.63dB) (h) WNNM (PSNR: 26.98dB)

Denoising results on image Boats by different method (noise level sigma=50).

(a) Ground truth (b) Noisy image (PSNR:dB) (c) BM3D (PSNR: 24.22dB) (d) EPLL (PSNR: 22.46dB)

Denoising results on image Fence by different method (noise level sigma=75).

(a) Ground truth (b) Noisy image ( PSNR: 8.10dB) (c) BM3D (PSNR: 22.52dB) (d) EPLL (PSNR: 22.23dB)

(e) SSC (PSNR: 22.24dB) (f) NCSR (PSNR: 22.11dB) (g) SAIST (PSNR: 22.61dB) (h) WNNM (PSNR: 22.91dB)

Denoising results on image Monarch by different method (noise level sigma=100).

(a) Ground truth (b) Noisy image (PSNR:8.10dB) (c) BM3D (PSNR: 33.05dB) (d) EPLL (PSNR: 32.61dB)

Denoising results on image House by different method (noise level sigma=100).

Experimental ResultsSigma=20

BM3D EPLL LSSC NCSR SAIST WNNMC-Man 30.48 30.34 30.61 30.47 30.45 30.75House 33.77 32.98 34.11 33.87 33.75 34.04

Peppers 31.29 31.16 31.37 31.19 31.32 31.55Montage 33.61 32.56 33.51 33.26 33.41 34.16

Leaves 30.10 29.39 30.47 30.45 30.64 31.10Starfish 29.68 29.57 29.96 29.93 29.97 30.28Mornar. 30.35 30.48 30.59 30.62 30.76 31.10Airplane 29.55 29.67 29.69 29.58 29.65 29.89

Paint 30.36 30.39 30.59 30.33 30.51 30.75JellyBean 34.18 33.81 34.55 34.52 34.31 34.85

Fence 29.93 29.25 30.07 30.11 30.16 30.37Parrot 29.96 29.96 29.95 29.90 29.97 30.19Lena 33.05 32.61 32.88 32.95 33.08 33.12

Barbara 31.78 29.76 31.59 31.78 32.16 32.21Boat 30.88 30.66 30.91 30.79 30.84 31.00Hill 30.72 30.49 30.72 30.65 30.69 30.80

F.print 28.81 28.28 28.78 28.96 29.01 29.02Man 30.59 30.63 30.72 30.59 30.67 30.74

Couple 30.76 30.54 30.74 30.60 30.66 30.82Straw 26.98 26.80 27.17 27.32 27.42 27.44

AVE 30.84 30.47 30.95 30.89 30.97 31.21

Peppers 27.70 27.73 27.86 27.68 27.77 28.05Montage 29.52 28.47 29.43 29.00 29.32 29.92

Paint 26.69 26.88 26.77 26.50 26.94 27.10JellyBean 30.21 29.98 30.76 30.56 30.51 30.85

F.print 25.30 24.73 25.30 25.51 25.60 25.57Man 27.65 27.63 27.64 27.54 27.60 27.80

Couple 27.48 27.27 27.41 27.24 27.40 27.61Straw 23.05 23.11 23.56 23.46 23.78 23.76

AVE 27.36 27.06 27.50 27.39 27.59 27.82

Peppers 26.68 26.63 26.79 26.82 26.73 26.91Montage 27.9 27.17 28.10 27.84 28.0 28.27

Paint 25.67 25.77 25.59 25.37 25.77 25.98JellyBean 29.26 28.75 29.42 29.29 29.32 29.62

F.print 24.53 23.59 24.26 24.48 24.52 24.67Man 26.81 26.72 26.72 26.67 26.68 26.94

Couple 26.46 26.24 26.35 26.19 26.30 26.65Straw 22.29 21.93 22.51 22.30 22.65 22.74

AVE 26.406 25.9645 26.436 26.326 26.520 26.752

Peppers 24.73 24.56 24.65 24.36 24.66 24.93Montage 25.52 24.90 25.40 25.49 25.46 25.73

Paint 23.80 23.82 23.52 23.44 23.83 24.07JellyBean 27.22 26.58 27.21 27.18 27.08 27.44

F.print 22.83 21.46 22.55 22.66 22.88 23.02Man 25.32 25.14 25.10 25.10 25.14 25.42

Couple 24.70 24.45 24.51 24.33 24.54 24.86Straw 20.56 19.94 20.71 20.39 20.90 21.00

AVE 24.56 24.02 24.45 24.34 24.62 24.86

Peppers 23.39 23.08 23.20 22.84 23.32 23.46Montage 23.89 23.42 23.77 23.74 23.98 24.16

Paint 22.51 22.50 22.14 22.11 22.42 22.74JellyBean 25.80 25.17 25.64 26.66 25.82 26.04

F.print 21.61 19.85 21.30 21.39 21.62 21.81Man 24.22 24.07 23.98 24.02 24.01 24.36

Couple 23.51 23.32 23.27 23.15 23.21 23.56Straw 19.43 18.84 19.43 19.10 19.42 19.67

AVE 23.247 22.706 23.0605 22.996 23.2955 23.555

Image Patches

Nonlocal Dictionary Learning

Sparse representation

With patch based image modeling, nonlocal, sparse representation, low rank and dictionary leaning can be used individually or jointly for

image processing.

Low Rank

What’s next?

• Actually I don’t know …

• Probably “Sparse/Low-rank + Big Data”?

– Theoretical analysis?

–Algorithms and implementation?

• W.r.t. image restoration, one interesting topic (at least I think) is perceptual quality oriented image restoration.

Sparse representation: data perspective

• Curse of dimensionality

– For real-world high-dimensional data, the available samples are usually insufficient.

– Fortunately, real data often lie on low-dimensional, sparse, or degenerate structures in the high-dimensional space.

Subspace methods:PCA, LLE, ISOMAP, ICA,…

Coding methods:Bag-of-words, Mixture-models, …

Sparse representation based classification (SRC)

coefficientsTraining dictionaryTest image

is sparse: ideally, only supported on images of the same subject

J. Wright, A. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. Robust Face Recognition

via Sparse Representation, PAMI 2009 .

Representation:1

min s.t. y X

Classification: ( ) arg mink klabel ry2

ˆk k kr y X

y 1 2, ,..., KX = X X X 1 2; ;...; K=

How to process a corrupted face?

coefficientsTraining dictionaryTest image

Equivalent to

Representation: 1

min s.t. ,y X I

Classification: ( ) arg mink klabel ry2

ˆ ˆk k kr y X e

y 1 2; ;...; K= e

1 1min s.t. e y X e

1 2, ,..., KX = X X X

• Can we have a more principled way to deal with various types of outliers in face images?

Regularized robust coding

Meng Yang Lei Zhang, Jian Yang and David Zhang. Robust sparse coding for facerecognition. In CVPR 2011.Meng Yang, Lei Zhang, Jian Yang, and David Zhang, Regularized robust coding for facerecognition. IEEE Trans. Image Processing, 2013.

Our solution:

One big question!

• Is it true that the sparse representation helps face recognition?

L. Zhang, M. Yang, and X. Feng. Sparse Representation or Collaborative Representation: Which Helps Face Recognition? In ICCV 2011.L. Zhang, M. Yang, X. Feng, Y. Ma and D. Zhang, Collaborative Representation based Classification for Face Recognition, arXiv:1204.2358.

Within-class or across-class representation

Enough?

Training samples

Within-class representation

Regularized nearest subspace

Across-class representation

Collaborative representation

Analyze the working mechanism of SRC.

Regularized nearest subspace

Regularized nearest subspace (RNS)

Enough?

2min . .

s t y X

The query sample is represented on the training samples from the same class with regularized representation coefficient:

Training samples

Why regularization?

Assume that we have enough training samples for each

class so that all the images of class i can be faithfully

represented by Xi.

All the face images are somewhat similar, while some

subjects may have very similar face images.

Let Xj = Xi +. If is small (meet some condition), there is

1 ( ) min 1,j i

ej and ej are the representation error of Xj and Xi to represent y

without any constraint on the representation coefficient.

Regularized nearest subspace with Lp-norm

• Regularization makes classification more stable.• L2-norm regularization can play the similar role to L0

and L1 norm in this classification task.

0 10 20 300.45

L0 norm of representation coefficients

Correct class

Wrong class

1 2 4 6 8 10 11

Correct class

Wrong class

2 4 6 8 10 12

Correct class

Wrong class

2min . .

s t y X

Why collaborative representation?

Collaborative Representation

Enough?

• Dilute the small-size-sample problem• Consider the competition between different classes

1 22min [ , , , ]

y X X X X X

• FR is a typical small-sample-size

problem, and Xi is under-complete

in general.

• Face images of different classes

share similarities.

• Samples from other classes can

be used to collaboratively

represent the sample in one class.

Training samples

Why collaborative representation?

Without considering the lp-norm regularization in coding, the associated representation is actually the perpendicular projection of y onto the space spanned by X.

2ˆ ˆ ˆarg min i ii αα y Xα y X α

2ˆ ˆOnly works for classificationi i ie y X α

The “double checking” in e* makes the classification more effective and robust.

L1 vs. L2 in regularization

0 100 200 300 400 500 600 700-0.1

Coefficients of l1-regularized minimization

0 100 200 300 400 500 600 700-0.05

Coefficients of l2-regularized minimization

0 1e-6 5e-5 5e-4 5e-3 1e-2 0.1 0.5 1 50

l1-regularized minimization

l2-regularized minimization

Sparse! Non-sparse!

pl y X

Though L1 leads to sparser coefficients, the classification rates are similar.

Collaborative representation model

min , 1 2q pl l

p q or y X

q=2,p=1, Sparse Representation based Classification (S-SRC)

q=1,p=1, Robust Sparse Representation based Classification (R-SRC)

q=2,p=2, Collaborative Representation based Classification with regularized least square (CRC_RLS)

q=1,p=2, Robust Collaborative Representation based Classification (R-CRC)

CRC_RLS has a closed-form solution; others have iterative solutions.

Gender classification

Male?Female?

700 Male Samples 700 Female Samples

Training Set

Feature (AR) RNS_L1 RNS_L2 CRC-RLS S-SRC SVM LRC NN

300-d Eigenface 94.9% 94.9% 93.7% 92.3% 92.4% 27.3% 90.7%

Big benefit (67% improvements) brought by regularization on coding vector!

Face recognition without occlusion

Identity?Training Set

Training samples per subject are limited.

50 100 150 200 250 30065

Eigenface dimension

Recognitio

AR database

CRC_RLS

Highest accuracy when feature dimension is not too low.

Face recognition with pixel corruption

0~20 30 40 50 60 70 80 90

Original image

70% random corruption

Identity?

Corruption percent (%) on EYB

Face recognition with real disguise

Identity?

Disguise(AR) Sunglass (test 1) Scarf (test 1) Sunglass (test 2) Scarf (test 2)

R-SRC 87.0% 59.5% 69.8% 40.8%

CRC-RLS 68.5% 90.5% 57.2% 71.8%

R-CRC 87.0% 86.0% 65.8% 73.2%

Significant improvement in the case of scarf

Running time

Corruption (MPIE) L1_ls Homotopy SpaRSA FISTA ALM R-CRC

Average running time 17.35 8.05 15.97 8.76 16.02 0.916

No occlusion (MPIE) L1_ls Homotopy FISTA ALM CRC-RLS

Recognition rate 92.6% 92.0% 79.6% 92.0% 92.2%

Running time (s) 21.290 1.7600 1.6360 0.5277 0.0133

Speed-up 39.7-1600.7 times!

Speed-up 8.79-18.94 times!

One even bigger question!

L. Zhang, W. Zuo, X. Feng, and Y. Ma, “A Probabilistic Formulation of Collaborative Representation based Classifier,” Preprint, to be online soon.

• SRC/CRC represents the query face by gallery faces from all classes. However, it uses the representation residual by each class for classification.

• So what kind of classifier SRC/CRC is?

• Why SRC/CRC works?

Probabilistic subspace of Xk

• Samples of class k: Xk = [x1, x2, ..., xn].

• S : the subspace spanned by Xk.

– Each data point x in S can be written as: x= Xk.

• We assume that the probability that x belongs to class k is determined by :

• It can be shown that such a probability depends

on the distribution of the samples of Xk.

• The red point will have much higher probability than the green one.

2( ( ) ) expP label k c x α

Representation of query sample y

• The query sample y usually lies outside the subspace {Xk}.

• The probability that y belongs to class k is determined by two factors:

– Given x= Xk, how likely y has the same class label as x?

– What is the probability that x belong to

class k?

• By maximizing the product of the two

probabilities, we have

2 2max log ( ( ) ) mink kp P label k αy y X α α

Two classes

• X1 = [x1,1, x1,2, ..., x1,n]; X2 = [x2,1, x2,2, ..., x2,n]

• S : the subspace spanned by [X1 X2]

– Each data point x in S can be written as: x= X11+ X22.

• x belongs to X1 or X2 with certain probability:

1 1 12 2

( ( ) 1)

P label

x X α α

2 2 22 2

( ( ) 2)

P label

x X α α

Collaborative representation of y

• y lies outside the subspace {X11+ X22}. The probability that y belongs to class 1 or 2 depends on how likely y has the same class label as x=X11+X22 and the probability that x belongs to class 1 or 2:

1 1 2 2 1 1 12 2 2{ , }

maxlog ( ( ) 1)

min ( )

p P label

y X α X α Xα X α α

1 1 2 2 2 2 22 2 2{ , }

maxlog ( ( ) 2)

min ( )

p P label

y X α X α Xα X α α

General case

• The probability that query sample y belongs to class k can be computed as:

• The classification rule:

• Problem:

– For each class k, we need to solve the optimization once. This can be costly.

21 1{ } 2 2

k i i i i k k ki ip

αy X α X α X α α

( ) arg max { }k klabel py

Joint probability

• For a data point x in the subspace spanned by all classes X, we define the following joint probability:

• For the query sample y outside the subspace of X, we have

• We use the marginal probability for classification:

• We only need to solve the optimization once.

2 21( ( ) 1,..., ( ) ) exp

i i iiP label label K

x x x X α α

2 21 12

max log( ( )) minK K

i i i i ii iP

αy X α Xα X α α

2 2 2ˆ ˆ ˆ ˆ( ( ) ) expk k k kp P label k y y Xα Xα X α α

( ) arg max { }k klabel py

Variants

• ProCRC-l2 (closed form solution)

• ProCRC-l1

• Robust ProCRC (ProCRC-r )

2 21 12

minK K

i i i i ii i

αy X α Xα X α α

21 112

minK K

i i ii i ii

αy X α Xα X αα

11 11 2

minK K

i i ii i ii

αx αX αy X α

Face recognition: AR

Dim. 2580 500 300 100 50

SVM 86.98 86.98 86.84 84.26 78.11

NSC 76.40 76.40 76.10 74.39 70.39

CRC 92.13 93.70 93.85 88.84 78.97

SRC 93.85 93.56 92.85 90.99 84.26

CROC 92.28 92.42 91.56 86.84 78.97

ProCRC-l2 93.99 94.13 93.85 89.41 80.26

ProCRC-l1 94.13 94.13 93.42 90.99 84.55

Face recognition: Extended Yale B

Dim. 1024 500 300 100 50

SVM 93.72 94.74 95.68 93.72 89.72

NSC 93.49 94.11 92.07 91.99 89.17

CRC 96.86 96.78 95.92 91.99 83.60

SRC 97.17 97.10 96.31 93.80 90.27

CROC 95.76 96.39 94.51 92.47 89.17

ProCRC-l2 98.04 97.57 96.63 94.19 90.03

ProCRC-l1 97.65 98.12 97.10 94.74 90.98

Robust face recognition

• Random corruption (YaleB)

• Block occlusion (YaleB)

• Disguise (AR)

Corruption ratio 10% 20% 40% 60%

SRC-r 97.49 95.60 90.19 76.85

ProCRC-r 98.45 98.20 93.25 82.42

Occlusion ratio 10% 20% 30% 40%

SRC-r 90.42 85.64 78.89 70.09

ProCRC-r 98.12 92.62 86.42 77.16

Disguise Sunglasses Scarf

SRC-r 69.17 69.50

ProCRC-r 70.50 70.67

Handwritten digit recognition: MNIST

Number of training

samples per class50 100 300 500

SVM 89.35 92.10 94.88 95.93

NSC 91.06 92.86 85.29 78.26

CRC 72.21 82.22 86.54 87.46

SRC 80.12 85.63 89.30 92.70

CROC 91.06 92.86 89.93 89.37

ProCRC-l2 92.16 94.56 95.58 95.88

ProCRC-l1 92.59 94.83 95.97 96.26

Handwritten digit recognition: USPS

Number of training

samples per class50 100 200 300

SVM 93.46 95.31 95.91 96.30

NSC 93.48 93.25 90.21 87.85

CRC 89.89 91.67 92.36 92.79

SRC 92.58 93.99 95.63 95.86

CROC 93.48 93.25 91.40 91.87

ProCRC-l2 93.84 95.62 96.03 96.43

ProCRC-l1 94.69 96.19 97.03 97.27

Running time

• Intel Core (TM) i7-2720QM 2.20 GHz CPU with 8 GB RAM

• Running time (second) of different methods on the Extended Yale B dataset:

Remarks

• ProCRC provides a good probabilistic interpretationof collaborative representation based classifiers (NSC, SRC and CRC).

• ProCRC achieves higher classification accuracy than the competing classifiers in most experiments.

• ProCRC has small performance variation under different number of training samples and feature dimension.

It is robust to training sample size and feature dimension.

Take Research as Fun!Thank you!

Sparse Representation and Low Rank Methods for …it.ouc.edu.cn/valse/word/leizhang.pdfMy recent...

Documents

Transcript of Sparse Representation and Low Rank Methods for …it.ouc.edu.cn/valse/word/leizhang.pdfMy recent...

Joint-Sparse-Blocks and Low-Rank Representation …IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 57, NO. 4, APRIL 2019 2419 Joint-Sparse-Blocks and Low-Rank Representation

Low-rank Convex/Sparse Thermal Matrix Approximation for ...

Sparse & Redundant Representation Modeling ... - Michael Elad

Hierarchical Sparse Representation for Robust Image ...ranger.uta.edu/~huang/papers/TPAMI18_Reg.pdf · mated to seek a low rank and sparse representation of the aligned images. Two

sparse image representation via combined transforms

LOCALITY-CONSTRAINED GROUP SPARSE REPRESENTATION …yuhjye/assets... · 3.1. LGSR Algorithm Our locality and group sensitive sparse representation (LGSR) algorithm advances the sparse

Nonlocally Centralized Sparse Representation for …cslzhang/paper/NCSR_TIP_final.pdf1 Nonlocally Centralized Sparse Representation for Image Restoration Weisheng Donga, Lei Zhangb,1,

Learning Sparse Representation

ISSN 1751-8784 Sparse representation-based synthetic ...people.sabanciuniv.edu › mcetin › publications › samadi_IET_RSN11.p… · Sparse representation-based synthetic aperture

Sparse Representation · 2017-03-15 · 1 is dense in the time domain, but sparse in the frequency domain time-representation of y 1 frequency-representation of y 1 Sparse representation

Progressive sparse representation- based classification using local ...epubs.surrey.ac.uk/813638/1/Progressive sparse representation base… · Progressive sparse representation-based

Sparse and Low-Rank Techniques for the Efﬁcient ...

Image Restoration and Background Separation Using Sparse Representation … · 2019. 11. 28. · Image Restoration and Background Separation Using Sparse Representation Framework

Sparse representation and compressive sensing

Sparse Representation Based Fisher Discrimination Dictionary

Capped 1-Norm Sparse Representation Method for Graph ...crabwq.github.io/pdf/2019 Capped l1-Norm Sparse... · 1-Norm Sparse Representation Method for Graph Clustering MULIN CHEN 1,

Person Re-identification using Sparse Representation with ... re-identification using sparse representation. Sparse representation is used to recover or reconstruct signals using a

Sparse Reduced-Rank Regression for Simultaneous …lc436/Chen_Huang_2012_JASA.pdf · Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection in

Sparse Representation

A Distributed Parallel Algorithm Based on Low-Rank and Sparse … · 2018-10-26 · sensors Article A Distributed Parallel Algorithm Based on Low-Rank and Sparse Representation for