Compressed Sensing and Tomography

Post on 21-Jun-2015

812 views 1 download

Tags:

description

Presentation at the workshop "Workshop on tomography reconstruction", December 11th, 2012, ENS Paris

Transcript of Compressed Sensing and Tomography

Overview

•Compressive Sensing Acquisition

•Theoretical Guarantees

•Fourier Domain Measurements

•Parameters Selection

f

Single Pixel Camera (Rice)

f

P measures � N micro-mirrors

Single Pixel Camera (Rice)

y[i] = �f, �i�

f

P/N = 0.16 P/N = 0.02P/N = 1

P measures � N micro-mirrors

Single Pixel Camera (Rice)

y[i] = �f, �i�

Physical hardware resolution limit: target resolution f � RN .

f � L2 f � RN y � RPmicromirrors

arrayresolution

CS hardwareK

CS Hardware Model

CS is about designing hardware: input signals f � L2(R2).

Physical hardware resolution limit: target resolution f � RN .

f � L2 f � RN y � RPmicromirrors

arrayresolution

CS hardware

,

...

K

CS Hardware Model

CS is about designing hardware: input signals f � L2(R2).

,

,

Physical hardware resolution limit: target resolution f � RN .

f � L2 f � RN y � RPmicromirrors

arrayresolution

CS hardware

,

...

fOperator K

K

CS Hardware Model

CS is about designing hardware: input signals f � L2(R2).

,

,

f0 � RN sparse in ortho-basis �

Sparse CS Recovery

���

x0 � RN

f0 � RN

(Discretized) sampling acquisition:

f0 � RN sparse in ortho-basis �

y = Kf0 + w = K � �(x0) + w= �

Sparse CS Recovery

���

x0 � RN

f0 � RN

(Discretized) sampling acquisition:

f0 � RN sparse in ortho-basis �

y = Kf0 + w = K � �(x0) + w= �

K drawn from the Gaussian matrix ensemble

Ki,j � N (0, P�1/2) i.i.d.

� � drawn from the Gaussian matrix ensemble

Sparse CS Recovery

���

x0 � RN

f0 � RN

(Discretized) sampling acquisition:

f0 � RN sparse in ortho-basis �

y = Kf0 + w = K � �(x0) + w= �

K drawn from the Gaussian matrix ensemble

Ki,j � N (0, P�1/2) i.i.d.

� � drawn from the Gaussian matrix ensemble

Sparse recovery: min||�x�y||�||w||

||x||1

Sparse CS Recovery

���

x0 � RN

f0 � RN

� = translation invariantwavelet frame

Original f0

CS Simulation Example

Overview

•Compressive Sensing Acquisition

•Theoretical Guarantees

•Fourier Domain Measurements

•Parameters Selection

⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:

�1 recovery:

CS with RIP

x⇥ � argmin||�x�y||��

||x||1 where�

y = �x0 + w||w|| � �

⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:

�1 recovery:

CS with RIP

[Candes 2009]

x⇥ � argmin||�x�y||��

||x||1 where�

y = �x0 + w||w|| � �

Theorem: If �2k ��

2� 1, then

where xk is the best k-term approximation of x0.

||x0 � x�|| � C0⇥k

||x0 � xk||1 + C1�

0 0.5 1 1.5 2 2.50

0.5

1

1.5

P=200, k=10

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

P=200, k=30

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

P=200, k=50

f�(⇥) =1

2⇤�⇥

�(⇥� b)+(a� ⇥)+

Eigenvalues of ��I�I with |I| = k are essentially in [a, b]

a = (1��

�)2 and b = (1��

�)2 where � = k/P

When k = �P � +�, the eigenvalue distribution tends to

[Marcenko-Pastur]

Large deviation inequality [Ledoux]

P = 200, k = 30

Singular Values Distributions

f�(�)

0 0.5 1 1.5 2 2.50

0.5

1

1.5

P=200, k=10

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

P=200, k=30

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

P=200, k=50

f�(⇥) =1

2⇤�⇥

�(⇥� b)+(a� ⇥)+

Eigenvalues of ��I�I with |I| = k are essentially in [a, b]

a = (1��

�)2 and b = (1��

�)2 where � = k/P

When k = �P � +�, the eigenvalue distribution tends to

[Marcenko-Pastur]

Large deviation inequality [Ledoux]

P = 200, k = 30

Singular Values Distributions

f�(�)

k � C

log(N/P )PTheorem: If

then �2k ��

2� 1 with high probability.

(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:

smallest / largest eigenvalues of A�A

Numerics with RIP

�2� 1

(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:

Upper/lower RIC:

�ik = max

|I|=k�i(�I)

�k = min(�1k, �2

k)

k

�2k

�2k

Monte-Carlo estimation:�k � �k

smallest / largest eigenvalues of A�A

N = 4000, P = 1000

Numerics with RIP

All MostRIP

� Sharp constants.

� No noise robustness.

All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.

Call(1/4) � 0.065

Cmost(1/4) � 0.25

[Donoho]

Polytope Noiseless Recovery

50 100 150 200 250 300 350 4000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Counting faces of random polytopes:

All MostRIP

� Sharp constants.

� No noise robustness.

All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.

Call(1/4) � 0.065

Cmost(1/4) � 0.25

[Donoho]

� Computation of“pathological” signals

[Dossal, P, Fadili, 2010]

Polytope Noiseless Recovery

50 100 150 200 250 300 350 4000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Counting faces of random polytopes:

Overview

•Compressive Sensing Acquisition

•Theoretical Guarantees

•Fourier Domain Measurements

•Parameters Selection

Tomography and Fourier Measures

Kf = (f [!])!2⌦

Tomography and Fourier Measures

Fourier slice theorem: p�(⇥) = f(⇥ cos(�), ⇥ sin(�))

1D 2D Fourier

�k

f = FFT2(f)

Partial Fourier measurements:

Equivalent to:

{p�k(t)}t�R0�k<K

Regularized Inversion

f⇥ = argminf

12

���

|y[⇤] � f [⇤]|2 + ��

m

|⇥f, ⇥m⇤|.�1 regularization:

Noisy measurements: ⇥� � �, y[�] = f0[�] + w[�].

Noise: w[⇥] � N (0,�), white noise.

MRI ImagingFrom [Lutsig et al.]

Fourier sub-sampling pattern:

randomization

MRI Reconstruction

High resolution Linear SparsityLow resolution

From [Lutsig et al.]

Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: {��}� orthogonal basis.

Fast measurements: (e.g. Fourier basis)

Kf = (h'!, fi)!2⌦ where |⌦| = P uniformly random.

Structured Measurements

Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: {��}� orthogonal basis.

Fast measurements: (e.g. Fourier basis)

Mutual incoherence: µ =⌅

Nmax�,m

|⇥⇥�, �m⇤| � [1,⌅

N ]

Kf = (h'!, fi)!2⌦ where |⌦| = P uniformly random.

Structured Measurements

�� not universal: requires incoherence.

Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: {��}� orthogonal basis.

Fast measurements: (e.g. Fourier basis)

Mutual incoherence: µ =⌅

Nmax�,m

|⇥⇥�, �m⇤| � [1,⌅

N ]

Kf = (h'!, fi)!2⌦ where |⌦| = P uniformly random.

Structured Measurements

Theorem: with high probability on �,

[Rudelson, Vershynin, 2006]

� = K

If k 6 CP

µ2log(N)

4, then �2k 6

p2� 1

Overview

•Compressive Sensing Acquisition

•Theoretical Guarantees

•Fourier Domain Measurements

•Parameter Selection

Estimator: e.g.

Risk Minimization

x

(y) 2 argminx

1

2||y � �x||2 + �||x||1

Estimator: e.g.

Risk Minimization

Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2

1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I

Problem statement

Consider the convex but non-smooth Analysis Sparsity Regularization problem

x

?(y,�) 2 argminx2RN

1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))

which aims at inverting

y = �x0 + w

by promoting sparsity and with

Ix0 2 RN the unknown image of interest,

Iy 2 RQ the low-dimensional noisy observation of x0,

I � 2 RQ⇥N a linear operator that models the acquisition process,

Iw ⇠ N (0, �2Id

Q

) the noise component,

ID 2 RN⇥P an analysis dictionary, and

I� > 0 a regularization parameter.

How to choose the value of the parameter �?

Risk-based selection of �

I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,

R(�) = Ew

||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.

The risk is unknown since it depends on x0.

Can we estimate the risk solely from x

?(y,�)?

Risk estimation

I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).

Prediction risk estimation via SURE

I The Stein Unbiased Risk Estimator (SURE):

SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

| {z }Estimator of the DOF

is an unbiased estimator of the prediction risk [Stein, 1981]:

Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .

Projection risk estimation via GSURE

I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):

GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

is an unbiased estimator of the projection risk [Vaiter et al., 2012]

Ew

(GSURE(y,�)) = Ew

(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).

Illustration of risk estimation

(here, x? denotes x?(y,�) for an arbitrary value of �)

How to estimate the quantity tr⇣(��⇤)+@x

?(y,�)@y

⌘?

Main notations and assumptions

I Let I = supp(D⇤x

?(y,�)) be the support of D⇤x

?(y,�),I Let J = I

c be the co-support of D⇤x

?(y,�),I Let D

I

be the submatrix of D whose columns are indexed by I ,

I Let sI

= sign(D⇤x

?(y,�))I

be the subvector of D⇤x

?(y,�) whose entries are indexed by I ,

I Let GJ

= KerD⇤J

be the “cospace” associated to x

?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G

J

:

GJ

\ Ker� = {0},I It allows us to define the matrix

A

[J ] = U(U⇤�⇤�U)�1U

⇤,

where U is a matrix whose columns form a basis of GJ

,

I In this case, we obtain an implicit equation:

x

?(y,�) solution of P�

(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.

Is this relation true in a neighbourhood of (y,�)?

Theorem (Local Parameterization)

I Even if the solutions x?(y,�) of P�

(y) might benot unique, �x?(y,�) is uniquely defined.

I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where

x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.

I Hence, it allows us writing

@�x?(y,�)

@y

= �A[J ]�⇤,

I Moreover, the DOF can be estimated by

tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .

Can we compute this quantity e�ciently?

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)

Monday, September 24, 12

Computation of GSURE

I One has for Z ⇠ N (0, IdP

),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)

where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D

J

D

⇤J

0

◆✓⌫

◆=

✓�⇤

z

0

◆.

I In practice, with law of large number, the empirical mean is replaced for the expectation.

I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.

Numerical example

Super-resolution using (anisotropic) Total-Variation

(a) y

(b) x?(y,�) at the optimal � 2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qua

dra

tic lo

ss

Projection RiskGSURETrue Risk

Compressed-sensing using multi-scale wavelet thresholding

(c) xML

(d) x?(y,�) at the optimal �2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qua

dra

tic lo

ss

Projection RiskGSURETrue Risk

Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?

References

Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.

Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.

Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.

Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.

Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.

http://www.ceremade.dauphine.fr/~deledall/ deledalle@ceremade.dauphine.fr

Average risk: R(�) = Ew(||x�(y)� x0||2)

x

(y) 2 argminx

1

2||y � �x||2 + �||x||1

Plugin-estimator: x�?(y)(y)�?(y) = argmin�

R(�)

But:Ew is not accessible ! use one observation.

Estimator: e.g.

Risk Minimization

Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2

1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I

Problem statement

Consider the convex but non-smooth Analysis Sparsity Regularization problem

x

?(y,�) 2 argminx2RN

1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))

which aims at inverting

y = �x0 + w

by promoting sparsity and with

Ix0 2 RN the unknown image of interest,

Iy 2 RQ the low-dimensional noisy observation of x0,

I � 2 RQ⇥N a linear operator that models the acquisition process,

Iw ⇠ N (0, �2Id

Q

) the noise component,

ID 2 RN⇥P an analysis dictionary, and

I� > 0 a regularization parameter.

How to choose the value of the parameter �?

Risk-based selection of �

I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,

R(�) = Ew

||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.

The risk is unknown since it depends on x0.

Can we estimate the risk solely from x

?(y,�)?

Risk estimation

I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).

Prediction risk estimation via SURE

I The Stein Unbiased Risk Estimator (SURE):

SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

| {z }Estimator of the DOF

is an unbiased estimator of the prediction risk [Stein, 1981]:

Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .

Projection risk estimation via GSURE

I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):

GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

is an unbiased estimator of the projection risk [Vaiter et al., 2012]

Ew

(GSURE(y,�)) = Ew

(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).

Illustration of risk estimation

(here, x? denotes x?(y,�) for an arbitrary value of �)

How to estimate the quantity tr⇣(��⇤)+@x

?(y,�)@y

⌘?

Main notations and assumptions

I Let I = supp(D⇤x

?(y,�)) be the support of D⇤x

?(y,�),I Let J = I

c be the co-support of D⇤x

?(y,�),I Let D

I

be the submatrix of D whose columns are indexed by I ,

I Let sI

= sign(D⇤x

?(y,�))I

be the subvector of D⇤x

?(y,�) whose entries are indexed by I ,

I Let GJ

= KerD⇤J

be the “cospace” associated to x

?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G

J

:

GJ

\ Ker� = {0},I It allows us to define the matrix

A

[J ] = U(U⇤�⇤�U)�1U

⇤,

where U is a matrix whose columns form a basis of GJ

,

I In this case, we obtain an implicit equation:

x

?(y,�) solution of P�

(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.

Is this relation true in a neighbourhood of (y,�)?

Theorem (Local Parameterization)

I Even if the solutions x?(y,�) of P�

(y) might benot unique, �x?(y,�) is uniquely defined.

I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where

x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.

I Hence, it allows us writing

@�x?(y,�)

@y

= �A[J ]�⇤,

I Moreover, the DOF can be estimated by

tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .

Can we compute this quantity e�ciently?

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)

Monday, September 24, 12

Computation of GSURE

I One has for Z ⇠ N (0, IdP

),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)

where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D

J

D

⇤J

0

◆✓⌫

◆=

✓�⇤

z

0

◆.

I In practice, with law of large number, the empirical mean is replaced for the expectation.

I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.

Numerical example

Super-resolution using (anisotropic) Total-Variation

(a) y

(b) x?(y,�) at the optimal � 2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qua

dra

tic lo

ss

Projection RiskGSURETrue Risk

Compressed-sensing using multi-scale wavelet thresholding

(c) xML

(d) x?(y,�) at the optimal �2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qua

dra

tic lo

ss

Projection RiskGSURETrue Risk

Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?

References

Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.

Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.

Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.

Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.

Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.

http://www.ceremade.dauphine.fr/~deledall/ deledalle@ceremade.dauphine.fr

Average risk: R(�) = Ew(||x�(y)� x0||2)

x

(y) 2 argminx

1

2||y � �x||2 + �||x||1

Plugin-estimator: x�?(y)(y)�?(y) = argmin�

R(�)

But:x0 is not accessible ! needs risk estimators.

Ew is not accessible ! use one observation.

Estimator: e.g.

Risk Minimization

Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2

1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I

Problem statement

Consider the convex but non-smooth Analysis Sparsity Regularization problem

x

?(y,�) 2 argminx2RN

1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))

which aims at inverting

y = �x0 + w

by promoting sparsity and with

Ix0 2 RN the unknown image of interest,

Iy 2 RQ the low-dimensional noisy observation of x0,

I � 2 RQ⇥N a linear operator that models the acquisition process,

Iw ⇠ N (0, �2Id

Q

) the noise component,

ID 2 RN⇥P an analysis dictionary, and

I� > 0 a regularization parameter.

How to choose the value of the parameter �?

Risk-based selection of �

I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,

R(�) = Ew

||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.

The risk is unknown since it depends on x0.

Can we estimate the risk solely from x

?(y,�)?

Risk estimation

I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).

Prediction risk estimation via SURE

I The Stein Unbiased Risk Estimator (SURE):

SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

| {z }Estimator of the DOF

is an unbiased estimator of the prediction risk [Stein, 1981]:

Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .

Projection risk estimation via GSURE

I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):

GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

is an unbiased estimator of the projection risk [Vaiter et al., 2012]

Ew

(GSURE(y,�)) = Ew

(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).

Illustration of risk estimation

(here, x? denotes x?(y,�) for an arbitrary value of �)

How to estimate the quantity tr⇣(��⇤)+@x

?(y,�)@y

⌘?

Main notations and assumptions

I Let I = supp(D⇤x

?(y,�)) be the support of D⇤x

?(y,�),I Let J = I

c be the co-support of D⇤x

?(y,�),I Let D

I

be the submatrix of D whose columns are indexed by I ,

I Let sI

= sign(D⇤x

?(y,�))I

be the subvector of D⇤x

?(y,�) whose entries are indexed by I ,

I Let GJ

= KerD⇤J

be the “cospace” associated to x

?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G

J

:

GJ

\ Ker� = {0},I It allows us to define the matrix

A

[J ] = U(U⇤�⇤�U)�1U

⇤,

where U is a matrix whose columns form a basis of GJ

,

I In this case, we obtain an implicit equation:

x

?(y,�) solution of P�

(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.

Is this relation true in a neighbourhood of (y,�)?

Theorem (Local Parameterization)

I Even if the solutions x?(y,�) of P�

(y) might benot unique, �x?(y,�) is uniquely defined.

I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where

x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.

I Hence, it allows us writing

@�x?(y,�)

@y

= �A[J ]�⇤,

I Moreover, the DOF can be estimated by

tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .

Can we compute this quantity e�ciently?

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)

Monday, September 24, 12

Computation of GSURE

I One has for Z ⇠ N (0, IdP

),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)

where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D

J

D

⇤J

0

◆✓⌫

◆=

✓�⇤

z

0

◆.

I In practice, with law of large number, the empirical mean is replaced for the expectation.

I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.

Numerical example

Super-resolution using (anisotropic) Total-Variation

(a) y

(b) x?(y,�) at the optimal � 2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qua

dra

tic lo

ss

Projection RiskGSURETrue Risk

Compressed-sensing using multi-scale wavelet thresholding

(c) xML

(d) x?(y,�) at the optimal �2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qua

dra

tic lo

ss

Projection RiskGSURETrue Risk

Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?

References

Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.

Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.

Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.

Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.

Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.

http://www.ceremade.dauphine.fr/~deledall/ deledalle@ceremade.dauphine.fr

Average risk: R(�) = Ew(||x�(y)� x0||2)

x

(y) 2 argminx

1

2||y � �x||2 + �||x||1

Plugin-estimator: x�?(y)(y)�?(y) = argmin�

R(�)

Prediction: µ�(y) = �x�(y)

Sensitivity analysis: if µ� is weakly di↵erentiable

µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)

Prediction Risk Estimation

Prediction: µ�(y) = �x�(y)

Sensitivity analysis: if µ� is weakly di↵erentiable

Stein Unbiased Risk Estimator:

µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)

df�(y) = tr(@µ�(y)) = div(µ�)(y)

SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)

Prediction Risk Estimation

Prediction: µ�(y) = �x�(y)

Sensitivity analysis: if µ� is weakly di↵erentiable

Theorem: [Stein, 1981]

Stein Unbiased Risk Estimator:

µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)

df�(y) = tr(@µ�(y)) = div(µ�)(y)

SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)

Ew(SURE�(y)) = Ew(||�x0 � µ�(y)||2)

Prediction Risk Estimation

Prediction: µ�(y) = �x�(y)

Sensitivity analysis: if µ� is weakly di↵erentiable

Theorem: [Stein, 1981]

Other estimators: GCV, BIC, AIC, . . .

Stein Unbiased Risk Estimator:

µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)

df�(y) = tr(@µ�(y)) = div(µ�)(y)

SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)

Ew(SURE�(y)) = Ew(||�x0 � µ�(y)||2)

Prediction Risk Estimation

Prediction: µ�(y) = �x�(y)

Sensitivity analysis: if µ� is weakly di↵erentiable

Theorem: [Stein, 1981]

Other estimators: GCV, BIC, AIC, . . .

Stein Unbiased Risk Estimator:

µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)

df�(y) = tr(@µ�(y)) = div(µ�)(y)

SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)

Ew(SURE�(y)) = Ew(||�x0 � µ�(y)||2)

Generalized SURE: estimate Ew(||Pker(�)?(x0 � x�(y))||2)

Prediction Risk Estimation

Sparse estimator:

Computation for L1 Regularization

x

(y) 2 argminx

1

2||y � �x||2 + �||x||1

Sparse estimator:

Theorem: for all y, there exists x

?s.t. �I injective.

df�(y) = div (�x�) (y) = ||x?||0 [Dossal et al. 2011]

Computation for L1 Regularization

x

(y) 2 argminx

1

2||y � �x||2 + �||x||1

Sparse estimator:

Theorem: for all y, there exists x

?s.t. �I injective.

df�(y) = div (�x�) (y) = ||x?||0 [Dossal et al. 2011]

: TI wavelets.

Computation for L1 Regularization

x

(y) 2 argminx

1

2||y � �x||2 + �||x||1

�+y

Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2

1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I

Problem statement

Consider the convex but non-smooth Analysis Sparsity Regularization problem

x

?(y,�) 2 argminx2RN

1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))

which aims at inverting

y = �x0 + w

by promoting sparsity and with

Ix0 2 RN the unknown image of interest,

Iy 2 RQ the low-dimensional noisy observation of x0,

I � 2 RQ⇥N a linear operator that models the acquisition process,

Iw ⇠ N (0, �2Id

Q

) the noise component,

ID 2 RN⇥P an analysis dictionary, and

I� > 0 a regularization parameter.

How to choose the value of the parameter �?

Risk-based selection of �

I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,

R(�) = Ew

||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.

The risk is unknown since it depends on x0.

Can we estimate the risk solely from x

?(y,�)?

Risk estimation

I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).

Prediction risk estimation via SURE

I The Stein Unbiased Risk Estimator (SURE):

SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

| {z }Estimator of the DOF

is an unbiased estimator of the prediction risk [Stein, 1981]:

Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .

Projection risk estimation via GSURE

I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):

GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

is an unbiased estimator of the projection risk [Vaiter et al., 2012]

Ew

(GSURE(y,�)) = Ew

(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).

Illustration of risk estimation

(here, x? denotes x?(y,�) for an arbitrary value of �)

How to estimate the quantity tr⇣(��⇤)+@x

?(y,�)@y

⌘?

Main notations and assumptions

I Let I = supp(D⇤x

?(y,�)) be the support of D⇤x

?(y,�),I Let J = I

c be the co-support of D⇤x

?(y,�),I Let D

I

be the submatrix of D whose columns are indexed by I ,

I Let sI

= sign(D⇤x

?(y,�))I

be the subvector of D⇤x

?(y,�) whose entries are indexed by I ,

I Let GJ

= KerD⇤J

be the “cospace” associated to x

?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G

J

:

GJ

\ Ker� = {0},I It allows us to define the matrix

A

[J ] = U(U⇤�⇤�U)�1U

⇤,

where U is a matrix whose columns form a basis of GJ

,

I In this case, we obtain an implicit equation:

x

?(y,�) solution of P�

(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.

Is this relation true in a neighbourhood of (y,�)?

Theorem (Local Parameterization)

I Even if the solutions x?(y,�) of P�

(y) might benot unique, �x?(y,�) is uniquely defined.

I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where

x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.

I Hence, it allows us writing

@�x?(y,�)

@y

= �A[J ]�⇤,

I Moreover, the DOF can be estimated by

tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .

Can we compute this quantity e�ciently?

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)

Monday, September 24, 12

Computation of GSURE

I One has for Z ⇠ N (0, IdP

),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)

where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D

J

D

⇤J

0

◆✓⌫

◆=

✓�⇤

z

0

◆.

I In practice, with law of large number, the empirical mean is replaced for the expectation.

I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.

Numerical example

Super-resolution using (anisotropic) Total-Variation

(a) y

(b) x?(y,�) at the optimal � 2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qu

ad

ratic

loss

Projection RiskGSURETrue Risk

Compressed-sensing using multi-scale wavelet thresholding

(c) xML

(d) x?(y,�) at the optimal �2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qu

ad

ratic

loss

Projection RiskGSURETrue Risk

Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?

References

Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.

Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.

Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.

Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.

Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.

http://www.ceremade.dauphine.fr/~deledall/ deledalle@ceremade.dauphine.fr

Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2

1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I

Problem statement

Consider the convex but non-smooth Analysis Sparsity Regularization problem

x

?(y,�) 2 argminx2RN

1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))

which aims at inverting

y = �x0 + w

by promoting sparsity and with

Ix0 2 RN the unknown image of interest,

Iy 2 RQ the low-dimensional noisy observation of x0,

I � 2 RQ⇥N a linear operator that models the acquisition process,

Iw ⇠ N (0, �2Id

Q

) the noise component,

ID 2 RN⇥P an analysis dictionary, and

I� > 0 a regularization parameter.

How to choose the value of the parameter �?

Risk-based selection of �

I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,

R(�) = Ew

||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.

The risk is unknown since it depends on x0.

Can we estimate the risk solely from x

?(y,�)?

Risk estimation

I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).

Prediction risk estimation via SURE

I The Stein Unbiased Risk Estimator (SURE):

SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

| {z }Estimator of the DOF

is an unbiased estimator of the prediction risk [Stein, 1981]:

Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .

Projection risk estimation via GSURE

I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):

GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

is an unbiased estimator of the projection risk [Vaiter et al., 2012]

Ew

(GSURE(y,�)) = Ew

(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).

Illustration of risk estimation

(here, x? denotes x?(y,�) for an arbitrary value of �)

How to estimate the quantity tr⇣(��⇤)+@x

?(y,�)@y

⌘?

Main notations and assumptions

I Let I = supp(D⇤x

?(y,�)) be the support of D⇤x

?(y,�),I Let J = I

c be the co-support of D⇤x

?(y,�),I Let D

I

be the submatrix of D whose columns are indexed by I ,

I Let sI

= sign(D⇤x

?(y,�))I

be the subvector of D⇤x

?(y,�) whose entries are indexed by I ,

I Let GJ

= KerD⇤J

be the “cospace” associated to x

?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G

J

:

GJ

\ Ker� = {0},I It allows us to define the matrix

A

[J ] = U(U⇤�⇤�U)�1U

⇤,

where U is a matrix whose columns form a basis of GJ

,

I In this case, we obtain an implicit equation:

x

?(y,�) solution of P�

(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.

Is this relation true in a neighbourhood of (y,�)?

Theorem (Local Parameterization)

I Even if the solutions x?(y,�) of P�

(y) might benot unique, �x?(y,�) is uniquely defined.

I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where

x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.

I Hence, it allows us writing

@�x?(y,�)

@y

= �A[J ]�⇤,

I Moreover, the DOF can be estimated by

tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .

Can we compute this quantity e�ciently?

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)

Monday, September 24, 12

Computation of GSURE

I One has for Z ⇠ N (0, IdP

),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)

where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D

J

D

⇤J

0

◆✓⌫

◆=

✓�⇤

z

0

◆.

I In practice, with law of large number, the empirical mean is replaced for the expectation.

I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.

Numerical example

Super-resolution using (anisotropic) Total-Variation

(a) y

(b) x?(y,�) at the optimal � 2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Quadra

tic lo

ss

Projection RiskGSURETrue Risk

Compressed-sensing using multi-scale wavelet thresholding

(c) xML

(d) x?(y,�) at the optimal �2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Quadra

tic lo

ss

Projection RiskGSURETrue Risk

Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?

References

Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.

Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.

Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.

Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.

Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.

http://www.ceremade.dauphine.fr/~deledall/ deledalle@ceremade.dauphine.fr

x�?(y)

Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2

1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I

Problem statement

Consider the convex but non-smooth Analysis Sparsity Regularization problem

x

?(y,�) 2 argminx2RN

1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))

which aims at inverting

y = �x0 + w

by promoting sparsity and with

Ix0 2 RN the unknown image of interest,

Iy 2 RQ the low-dimensional noisy observation of x0,

I � 2 RQ⇥N a linear operator that models the acquisition process,

Iw ⇠ N (0, �2Id

Q

) the noise component,

ID 2 RN⇥P an analysis dictionary, and

I� > 0 a regularization parameter.

How to choose the value of the parameter �?

Risk-based selection of �

I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,

R(�) = Ew

||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.

The risk is unknown since it depends on x0.

Can we estimate the risk solely from x

?(y,�)?

Risk estimation

I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).

Prediction risk estimation via SURE

I The Stein Unbiased Risk Estimator (SURE):

SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

| {z }Estimator of the DOF

is an unbiased estimator of the prediction risk [Stein, 1981]:

Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .

Projection risk estimation via GSURE

I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):

GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

is an unbiased estimator of the projection risk [Vaiter et al., 2012]

Ew

(GSURE(y,�)) = Ew

(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).

Illustration of risk estimation

(here, x? denotes x?(y,�) for an arbitrary value of �)

How to estimate the quantity tr⇣(��⇤)+@x

?(y,�)@y

⌘?

Main notations and assumptions

I Let I = supp(D⇤x

?(y,�)) be the support of D⇤x

?(y,�),I Let J = I

c be the co-support of D⇤x

?(y,�),I Let D

I

be the submatrix of D whose columns are indexed by I ,

I Let sI

= sign(D⇤x

?(y,�))I

be the subvector of D⇤x

?(y,�) whose entries are indexed by I ,

I Let GJ

= KerD⇤J

be the “cospace” associated to x

?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G

J

:

GJ

\ Ker� = {0},I It allows us to define the matrix

A

[J ] = U(U⇤�⇤�U)�1U

⇤,

where U is a matrix whose columns form a basis of GJ

,

I In this case, we obtain an implicit equation:

x

?(y,�) solution of P�

(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.

Is this relation true in a neighbourhood of (y,�)?

Theorem (Local Parameterization)

I Even if the solutions x?(y,�) of P�

(y) might benot unique, �x?(y,�) is uniquely defined.

I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where

x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.

I Hence, it allows us writing

@�x?(y,�)

@y

= �A[J ]�⇤,

I Moreover, the DOF can be estimated by

tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .

Can we compute this quantity e�ciently?

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)

Monday, September 24, 12

Computation of GSURE

I One has for Z ⇠ N (0, IdP

),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)

where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D

J

D

⇤J

0

◆✓⌫

◆=

✓�⇤

z

0

◆.

I In practice, with law of large number, the empirical mean is replaced for the expectation.

I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.

Numerical example

Super-resolution using (anisotropic) Total-Variation

(a) y

(b) x?(y,�) at the optimal � 2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qu

ad

ratic

loss

Projection RiskGSURETrue Risk

Compressed-sensing using multi-scale wavelet thresholding

(c) xML

(d) x?(y,�) at the optimal �2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qu

ad

ratic

loss

Projection RiskGSURETrue Risk

Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?

References

Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.

Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.

Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.

Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.

Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.

http://www.ceremade.dauphine.fr/~deledall/ deledalle@ceremade.dauphine.fr

��?

Quadraticloss

� 2 RP⇥Nrealization of a random vector. P = N/4

Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2

1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I

Problem statement

Consider the convex but non-smooth Analysis Sparsity Regularization problem

x

?(y,�) 2 argminx2RN

1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))

which aims at inverting

y = �x0 + w

by promoting sparsity and with

Ix0 2 RN the unknown image of interest,

Iy 2 RQ the low-dimensional noisy observation of x0,

I � 2 RQ⇥N a linear operator that models the acquisition process,

Iw ⇠ N (0, �2Id

Q

) the noise component,

ID 2 RN⇥P an analysis dictionary, and

I� > 0 a regularization parameter.

How to choose the value of the parameter �?

Risk-based selection of �

I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,

R(�) = Ew

||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.

The risk is unknown since it depends on x0.

Can we estimate the risk solely from x

?(y,�)?

Risk estimation

I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).

Prediction risk estimation via SURE

I The Stein Unbiased Risk Estimator (SURE):

SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

| {z }Estimator of the DOF

is an unbiased estimator of the prediction risk [Stein, 1981]:

Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .

Projection risk estimation via GSURE

I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):

GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

is an unbiased estimator of the projection risk [Vaiter et al., 2012]

Ew

(GSURE(y,�)) = Ew

(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).

Illustration of risk estimation

(here, x? denotes x?(y,�) for an arbitrary value of �)

How to estimate the quantity tr⇣(��⇤)+@x

?(y,�)@y

⌘?

Main notations and assumptions

I Let I = supp(D⇤x

?(y,�)) be the support of D⇤x

?(y,�),I Let J = I

c be the co-support of D⇤x

?(y,�),I Let D

I

be the submatrix of D whose columns are indexed by I ,

I Let sI

= sign(D⇤x

?(y,�))I

be the subvector of D⇤x

?(y,�) whose entries are indexed by I ,

I Let GJ

= KerD⇤J

be the “cospace” associated to x

?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G

J

:

GJ

\ Ker� = {0},I It allows us to define the matrix

A

[J ] = U(U⇤�⇤�U)�1U

⇤,

where U is a matrix whose columns form a basis of GJ

,

I In this case, we obtain an implicit equation:

x

?(y,�) solution of P�

(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.

Is this relation true in a neighbourhood of (y,�)?

Theorem (Local Parameterization)

I Even if the solutions x?(y,�) of P�

(y) might benot unique, �x?(y,�) is uniquely defined.

I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where

x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.

I Hence, it allows us writing

@�x?(y,�)

@y

= �A[J ]�⇤,

I Moreover, the DOF can be estimated by

tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .

Can we compute this quantity e�ciently?

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)

Monday, September 24, 12

Computation of GSURE

I One has for Z ⇠ N (0, IdP

),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)

where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D

J

D

⇤J

0

◆✓⌫

◆=

✓�⇤

z

0

◆.

I In practice, with law of large number, the empirical mean is replaced for the expectation.

I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.

Numerical example

Super-resolution using (anisotropic) Total-Variation

(a) y

(b) x?(y,�) at the optimal � 2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qu

ad

ratic

loss

Projection RiskGSURETrue Risk

Compressed-sensing using multi-scale wavelet thresholding

(c) xML

(d) x?(y,�) at the optimal �2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qu

ad

ratic

loss

Projection RiskGSURETrue Risk

Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?

References

Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.

Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.

Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.

Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.

Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.

http://www.ceremade.dauphine.fr/~deledall/ deledalle@ceremade.dauphine.fr

Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2

1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I

Problem statement

Consider the convex but non-smooth Analysis Sparsity Regularization problem

x

?(y,�) 2 argminx2RN

1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))

which aims at inverting

y = �x0 + w

by promoting sparsity and with

Ix0 2 RN the unknown image of interest,

Iy 2 RQ the low-dimensional noisy observation of x0,

I � 2 RQ⇥N a linear operator that models the acquisition process,

Iw ⇠ N (0, �2Id

Q

) the noise component,

ID 2 RN⇥P an analysis dictionary, and

I� > 0 a regularization parameter.

How to choose the value of the parameter �?

Risk-based selection of �

I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,

R(�) = Ew

||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.

The risk is unknown since it depends on x0.

Can we estimate the risk solely from x

?(y,�)?

Risk estimation

I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).

Prediction risk estimation via SURE

I The Stein Unbiased Risk Estimator (SURE):

SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

| {z }Estimator of the DOF

is an unbiased estimator of the prediction risk [Stein, 1981]:

Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .

Projection risk estimation via GSURE

I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):

GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

is an unbiased estimator of the projection risk [Vaiter et al., 2012]

Ew

(GSURE(y,�)) = Ew

(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).

Illustration of risk estimation

(here, x? denotes x?(y,�) for an arbitrary value of �)

How to estimate the quantity tr⇣(��⇤)+@x

?(y,�)@y

⌘?

Main notations and assumptions

I Let I = supp(D⇤x

?(y,�)) be the support of D⇤x

?(y,�),I Let J = I

c be the co-support of D⇤x

?(y,�),I Let D

I

be the submatrix of D whose columns are indexed by I ,

I Let sI

= sign(D⇤x

?(y,�))I

be the subvector of D⇤x

?(y,�) whose entries are indexed by I ,

I Let GJ

= KerD⇤J

be the “cospace” associated to x

?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G

J

:

GJ

\ Ker� = {0},I It allows us to define the matrix

A

[J ] = U(U⇤�⇤�U)�1U

⇤,

where U is a matrix whose columns form a basis of GJ

,

I In this case, we obtain an implicit equation:

x

?(y,�) solution of P�

(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.

Is this relation true in a neighbourhood of (y,�)?

Theorem (Local Parameterization)

I Even if the solutions x?(y,�) of P�

(y) might benot unique, �x?(y,�) is uniquely defined.

I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where

x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.

I Hence, it allows us writing

@�x?(y,�)

@y

= �A[J ]�⇤,

I Moreover, the DOF can be estimated by

tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .

Can we compute this quantity e�ciently?

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)

Monday, September 24, 12

Computation of GSURE

I One has for Z ⇠ N (0, IdP

),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)

where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D

J

D

⇤J

0

◆✓⌫

◆=

✓�⇤

z

0

◆.

I In practice, with law of large number, the empirical mean is replaced for the expectation.

I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.

Numerical example

Super-resolution using (anisotropic) Total-Variation

(a) y

(b) x?(y,�) at the optimal � 2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qu

ad

ratic

loss

Projection RiskGSURETrue Risk

Compressed-sensing using multi-scale wavelet thresholding

(c) xML

(d) x?(y,�) at the optimal �2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Qu

ad

ratic

loss

Projection RiskGSURETrue Risk

Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?

References

Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.

Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.

Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.

Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.

Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.

http://www.ceremade.dauphine.fr/~deledall/ deledalle@ceremade.dauphine.fr

Observations y

Anisotropic Total-Variation

Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2

1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I

Problem statement

Consider the convex but non-smooth Analysis Sparsity Regularization problem

x

?(y,�) 2 argminx2RN

1

2||y � �x||2 + �||D⇤

x||1 (P�

(y))

which aims at inverting

y = �x0 + w

by promoting sparsity and with

Ix0 2 RN the unknown image of interest,

Iy 2 RQ the low-dimensional noisy observation of x0,

I � 2 RQ⇥N a linear operator that models the acquisition process,

Iw ⇠ N (0, �2Id

Q

) the noise component,

ID 2 RN⇥P an analysis dictionary, and

I� > 0 a regularization parameter.

How to choose the value of the parameter �?

Risk-based selection of �

I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,

R(�) = Ew

||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.

The risk is unknown since it depends on x0.

Can we estimate the risk solely from x

?(y,�)?

Risk estimation

I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).

Prediction risk estimation via SURE

I The Stein Unbiased Risk Estimator (SURE):

SURE(y,�) =||y � �x?(y,�)||2 � �

2Q + 2�2 tr

✓@�x?(y,�)

@y

| {z }Estimator of the DOF

is an unbiased estimator of the prediction risk [Stein, 1981]:

Ew

(SURE(y,�)) = Ew

(||�x0 � �x?(y,�)||2) .

Projection risk estimation via GSURE

I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):

GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �

2 tr((��⇤)+) + 2�2 tr

✓(��⇤)+@�x

?(y,�)

@y

is an unbiased estimator of the projection risk [Vaiter et al., 2012]

Ew

(GSURE(y,�)) = Ew

(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).

Illustration of risk estimation

(here, x? denotes x?(y,�) for an arbitrary value of �)

How to estimate the quantity tr⇣(��⇤)+@x

?(y,�)@y

⌘?

Main notations and assumptions

I Let I = supp(D⇤x

?(y,�)) be the support of D⇤x

?(y,�),I Let J = I

c be the co-support of D⇤x

?(y,�),I Let D

I

be the submatrix of D whose columns are indexed by I ,

I Let sI

= sign(D⇤x

?(y,�))I

be the subvector of D⇤x

?(y,�) whose entries are indexed by I ,

I Let GJ

= KerD⇤J

be the “cospace” associated to x

?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G

J

:

GJ

\ Ker� = {0},I It allows us to define the matrix

A

[J ] = U(U⇤�⇤�U)�1U

⇤,

where U is a matrix whose columns form a basis of GJ

,

I In this case, we obtain an implicit equation:

x

?(y,�) solution of P�

(y) , x

?(y,�) = x(y,�) , A

[J ]�⇤y � �A

[J ]D

I

s

I

.

Is this relation true in a neighbourhood of (y,�)?

Theorem (Local Parameterization)

I Even if the solutions x?(y,�) of P�

(y) might benot unique, �x?(y,�) is uniquely defined.

I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where

x(y, �) = A

[J ]�⇤y � �A

[J ]D

I

s

I

.

I Hence, it allows us writing

@�x?(y,�)

@y

= �A[J ]�⇤,

I Moreover, the DOF can be estimated by

tr

✓@�x?(y,�)

@y

◆= dim(G

J

) .

Can we compute this quantity e�ciently?

x1

x2

�0 = 0 �k

x�k = 0

x�0

P0(y)

Monday, September 24, 12

Computation of GSURE

I One has for Z ⇠ N (0, IdP

),

tr

✓(��⇤)+@�x

?(y,�)

@y

◆= E

Z

(h⌫(Z), �⇤(��⇤)+Zi)

where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D

J

D

⇤J

0

◆✓⌫

◆=

✓�⇤

z

0

◆.

I In practice, with law of large number, the empirical mean is replaced for the expectation.

I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.

Numerical example

Super-resolution using (anisotropic) Total-Variation

(a) y

(b) x?(y,�) at the optimal � 2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Quadra

tic lo

ss

Projection RiskGSURETrue Risk

Compressed-sensing using multi-scale wavelet thresholding

(c) xML

(d) x?(y,�) at the optimal �2 4 6 8 10 12

1

1.5

2

2.5x 10

6

Regularization parameter !

Quadra

tic lo

ss

Projection RiskGSURETrue Risk

Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?

References

Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.

Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.

Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.

Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.

Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.

http://www.ceremade.dauphine.fr/~deledall/ deledalle@ceremade.dauphine.fr

��?

Quadraticloss

x�?(y)

Extension to `1 analysis, TV.

[Vaiter et al. 2012]

�: vertical sub-sampling.

D = [@1, @2]

Finite di↵erences gradient:

dictionary

ConclusionSparsity: approximate signals with few atoms.

�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.

Compressed sensing ideas:

�� CS is about designing new hardware.

dictionary

ConclusionSparsity: approximate signals with few atoms.

�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.

Compressed sensing ideas:

The devil is in the constants:

�� Worse case analysis is problematic.

�� Designing good signal models.

�� CS is about designing new hardware.

dictionary

ConclusionSparsity: approximate signals with few atoms.