Sparsity and Compressed Sensing

Sparsity andCompressed Sensing

Gabriel Peyré

www.numerical-tours.com

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Forward model:

Observations Operator Noise(Unknown)Input� : RQ � RP

Inverse Problems

y = K f0 + w � RP

Forward model:

Observations Operator Noise(Unknown)Input� : RQ � RP

Denoising: K = IdQ, P = Q.

Inverse Problems

y = K f0 + w � RP

Forward model:

Observations Operator Noise(Unknown)Input

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

� : RQ � RP

Inpainting: set � of missing pixels, P = Q� |�|.

Inverse Problems

y = K f0 + w � RP

Forward model:

Observations Operator Noise(Unknown)Input

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

� : RQ � RP

Inpainting: set � of missing pixels, P = Q� |�|.

Super-resolution: Kf = (f � k) �� , P = Q/� .

Inverse Problems

y = K f0 + w � RP

Kf = (p�k)1�k�K

Inverse Problem in Medical Imaging

Magnetic resonance imaging (MRI):

Kf = (f(�))��

Magnetic resonance imaging (MRI):

Other examples: MEG, EEG, . . .

Kf = (f(�))��

Noisy measurements: y = Kf0 + w.

f� � argminf�RQ

12||y �Kf ||2 + � J(f)

Prior model: J : RQ � R assigns a score to images.

Inverse Problem Regularization

12||y �Kf ||2 + � J(f)

Data fidelity Regularity

Choice of �: tradeo�

||w||Regularity of f0

J(f0)Noise level

12||y �Kf ||2 + � J(f)

No noise: �� 0+, minimize

Choice of �: tradeo�

||w||Regularity of f0

J(f0)Noise level

12||y �Kf ||2 + � J(f)

f� � argminf�RQ,Kf=y

J(f) =�

||�f(x)||2dx

Smooth and Cartoon Priors

�|�f |2

J(f) =�

||�f(x)||2dx

J(f) =�

||�f(x)||dx

J(f) =�

Rlength(Ct)dt

Smooth and Cartoon Priors

�|�f |2 �

|�f |

Inpainting Example

Input y = Kf0 + w Sobolev Total variation

Overview

Dictionary � = (�m)m � RQ�N , N � Q.

Redundant Dictionaries

�m = ei�·, m�

frequencyFourier:

Wavelets:�m = �(2�jR��x� n)

m = (j, �, n)

orientation

position

�m = ei�·, m�

frequencyFourier:

� = 2� = 1

m = (j, �, n)

orientation

position

�m = ei�·, m�

frequencyFourier:

DCT, Curvelets, bandlets, . . .

� = 2� = 1

Synthesis: f =�

m xm�m = �x.

Image f = �xCoe�cients x

m = (j, �, n)

orientation

position

�m = ei�·, m�

frequencyFourier:

DCT, Curvelets, bandlets, . . .

� =�

� = 2� = 1

Ideal sparsity: for most m, xm = 0.

J0(x) = # {m \ xm �= 0}

Sparse Priors

Image f0

Coe�cients x

J0(x) = # {m \ xm �= 0}

Sparse approximation: f = �x whereargminx�RN

||f0 ��x||2 + TJ0(x)

Sparse Priors

Image f0

Coe�cients x

J0(x) = # {m \ xm �= 0}

Sparse approximation: f = �x where

Orthogonal �: �� = �� = IdN

xm =�

�f0, �m� if |�f0, �m�| > T,0 otherwise.

��

f = �� ST � �(f0)ST

argminx�RN

||f0 ��x||2 + TJ0(x)

Sparse Priors

Image f0

Coe�cients x

J0(x) = # {m \ xm �= 0}

Sparse approximation: f = �x where

Orthogonal �: �� = �� = IdN

xm =�

�f0, �m� if |�f0, �m�| > T,0 otherwise.

��

f = �� ST � �(f0)ST

Non-orthogonal �:�� NP-hard.

argminx�RN

||f0 ��x||2 + TJ0(x)

Sparse Priors

Image f0

Coe�cients x

Image with 2 pixels:

J0(x) = # {m \ xm �= 0}J0(x) = 0 �� null image.J0(x) = 1 �� sparse image.J0(x) = 2 �� non-sparse image.

Convex Relaxation: L1 Prior

q = 0 q = 1 q = 2q = 3/2q = 1/2

Jq(x) =�

�q priors: (convex for q � 1)

q = 0 q = 1 q = 2q = 3/2q = 1/2

Jq(x) =�

J1(x) =�

Sparse �1 prior:

�q priors: (convex for q � 1)

L1 Regularization

coe�cientsx0 � RN

L1 Regularization

coe�cients image�

x0 � RN f0 = �x0 � RQ

L1 Regularization

observations

coe�cients image� K

x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

L1 Regularization

observations

� = K �⇥ ⇥ RP�N

x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

Fidelity Regularization

minx�RN

||y � �x||2 + �||x||1

L1 Regularization

Sparse recovery: f� = �x� where x� solves

observations

� = K �⇥ ⇥ RP�N

x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

x� � argmin�x=y

�x = y

Noiseless Sparse Regularization

Noiseless measurements: y = �x0

�x = y

Convex linear program.Interior points, cf. [Chen, Donoho, Saunders] “basis pursuit”.

Douglas-Rachford splitting, see [Combettes, Pesquet].

�x = y

RegularizationData fidelity

y = �x0 + wNoisy measurements:

x� � argminx�RQ

||y � �x||2 + � ||x||1

Noisy Sparse Regularization

�� RegularizationData fidelityEquivalence

||�x =y|| �

||y � �x||2 + � ||x||1

x� � argmin||�x�y||��

||x||1

Iterative soft thresholdingForward-backward splitting

Algorithms:

�� RegularizationData fidelityEquivalence

||�x =y|| �

||y � �x||2 + � ||x||1

x� � argmin||�x�y||��

||x||1

Nesterov multi-steps schemes.

see [Daubechies et al], [Pesquet et al], etc

��

Image De-blurring

Original f0 y = h � f0 + w

f� = argminf�RN

||f ⇥ h� y||2 + �||⇥f ||2

f�(⇥) =h(⇥)

|h(⇥)|2 + �|⇥|2y(⇥)

Sobolev regularization:

Image De-blurring

Original f0 y = h � f0 + w SobolevSNR=22.7dB

f� = argminf�RN

||f ⇥ h� y||2 + �||⇥f ||2

f�(⇥) =h(⇥)

|h(⇥)|2 + �|⇥|2y(⇥)

Sobolev regularization:

� = translation invariant wavelets.

x� � argminx

||h � (�x)� y||2 + �||x||1f� = �x� where

Sparsity

Image De-blurring

Original f0 y = h � f0 + w

Sparsity regularization:

SNR=24.7dBSobolev

SNR=22.7dB

y = Kf0 + wMeasures:

Inpainting Problem

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

Image SeparationModel: f = f1 + f2 + w, (f1, f2) components, w noise.

Union dictionary:

(x�1, x

�2) � argmin

x=(x1,x2)�RN

||f ��x||2 + �||x||1

� = [�1,�2] � RQ�(N1+N2)

Image Separation

Recovered component: f�i = �ix�

Model: f = f1 + f2 + w, (f1, f2) components, w noise.

Examples of Decompositions

Cartoon+Texture Separation

Overview

Basics of Convex Analysis

Setting: Here: H = RN .G : H� R ⇤ {+⇥}

minx�H

G(x)Problem:

� t � [0, 1]

minx�H

G(x)Problem:

G(tx + (1� t)y) � tG(x) + (1� t)G(y)Convex:

�G(x) = {u ⇥ H \ ⇤ z, G(z) � G(x) + ⌅u, z � x⇧}Sub-di�erential:

G(x) = |x|

�G(0) = [�1, 1]

� t � [0, 1]

minx�H

G(x)Problem:

�G(x) = {u ⇥ H \ ⇤ z, G(z) � G(x) + ⌅u, z � x⇧}

If F is C1, �F (x) = {�F (x)}

Sub-di�erential:

Smooth functions: G(x) = |x|

�G(0) = [�1, 1]

� t � [0, 1]

minx�H

G(x)Problem:

�G(x) = {u ⇥ H \ ⇤ z, G(z) � G(x) + ⌅u, z � x⇧}

If F is C1, �F (x) = {�F (x)}

Sub-di�erential:

Smooth functions: G(x) = |x|

�G(0) = [�1, 1]First-order conditions:

�� 0 � �G(x�)x� � argminx�H

� t � [0, 1]

minx�H

G(x)Problem:

⇥G(x) = ��(�x� y) + �⇥|| · ||1(x)

�|| · ||1(x)i =�

sign(xi) if xi ⇥= 0,[�1, 1] if xi = 0.

L1 Regularization: First Order Conditions

x� ⇥ argminx�RQ

G(x) =12

||y � �x||2 + �||x||1

I = {i ⇥ {0, . . . , N � 1} \ x�i ⇤= 0}

Support of the solution:

⇥G(x) = ��(�x� y) + �⇥|| · ||1(x)

�|| · ||1(x)i =�

sign(xi) if xi ⇥= 0,[�1, 1] if xi = 0.

G(x) =12

||y � �x||2 + �||x||1

I = {i ⇥ {0, . . . , N � 1} \ x�i ⇤= 0}

Support of the solution:

⇥G(x) = ��(�x� y) + �⇥|| · ||1(x)

�|| · ||1(x)i =�

sign(xi) if xi ⇥= 0,[�1, 1] if xi = 0.

Restrictions:xI = (xi)i�I � R|I| �I = (�i)i�I � RP�|I|

G(x) =12

||y � �x||2 + �||x||1

First order condition:

��(�x� � y) + �s = 0

where�

sI = sign(x�I),

||sIc ||� � 1

P�(y)

x� � argminx�RN

||�x� y||2 + �||x||1i

��(�x� � y) + �s = 0

where�

sI = sign(x�I),

||sIc ||� � 1

=� sIc =1�

��Ic(y � �x�)

P�(y)

||�x� y||2 + �||x||1i

��i, y � �x��

��

��(�x� � y) + �s = 0

where�

sI = sign(x�I),

||sIc ||� � 1

x� solution of P�(y)||��Ic(�x� � y)||� � � ��

=� sIc =1�

��Ic(y � �x�)

P�(y)

Theorem:

||�x� y||2 + �||x||1i

��i, y � �x��

��

Theorem:

then x� is the unique solution of P�(y)

��(�x� � y) + �s = 0

where�

sI = sign(x�I),

||sIc ||� � 1

If �I has full rank and

x� solution of P�(y)||��Ic(�x� � y)||� � � ��

=� sIc =1�

��Ic(y � �x�)

P�(y)

||��Ic(�x� � y)||� < �

Theorem:

||�x� y||2 + �||x||1i

��i, y � �x��

��

(implicit equation)

= x0,I + �+I w � �(��I�I)�1sI

x�I = �+

I y � �(��I�I)�1sign(x�I)

Local Behavior of the Solution

��(�x� � y) + �s = 0First order condition:

||�x� y||2 + �||x||1

(implicit equation)

Intuition: for small w.(unknown) (known)

= x0,I + �+I w � �(��I�I)�1sI

x�I = �+

sI = sign(x�I) = sign(x0,I) = s0,I

||�x� y||2 + �||x||1

(implicit equation)

Intuition: for small w.

To prove:

(unknown) (known)

is the unique solution.

= x0,I + �+I w � �(��I�I)�1sI

x�I = �+

xI = x0,I + �+I w � �(��I�I)�1s0,I

sI = sign(x�I) = sign(x0,I) = s0,I

||�x� y||2 + �||x||1

Candidate for the solution:

xI = x0,I + �+I w � �(��I�I)�1s0,I

To prove: ||�Ic(�I xI � y)||� < 1

xI = x0,I + �+I w � �(��I�I)�1s0,I

�I = ��Ic(�I�+

I � Id) �I = ��Ic�+,�

��Ic(�I xI � y) = �I

�� I(s0,I)

To prove: ||�Ic(�I xI � y)||� < 1

xI = x0,I + �+I w � �(��I�I)�1s0,I

can be madesmall when w � 0

|| · ||� mustbe < 1

�I = ��Ic(�I�+

I � Id) �I = ��Ic�+,�

��Ic(�I xI � y) = �I

�� I(s0,I)

To prove: ||�Ic(�I xI � y)||� < 1

F(s) = ||�IsI ||� where �I = ��Ic�+,�

Robustness to Small Noise

Identifiability crition: [Fuchs]For s ⇥ {�1, 0,+1}N , let I = supp(s)

is the unique solution of P�(y).

If ||w||/T is small enough and � � ||w||, then

If F (sign(x0)) < 1, T = mini�I

|x0,i|

F(s) = ||�IsI ||� where �I = ��Ic�+,�

x0,I + �+I w � �(��I�I)�1 sign(x0,I)

Theorem: [Fuchs 2004]

|x0,i|

F(s) = ||�IsI ||� where �I = ��Ic�+,�

x0,I + �+I w � �(��I�I)�1 sign(x0,I)

When w = 0, F (sign(x0) < 1 =� x� = x0.

|x0,i|

F(s) = ||�IsI ||� where �I = ��Ic�+,�

x0,I + �+I w � �(��I�I)�1 sign(x0,I)

When w = 0, F (sign(x0) < 1 =� x� = x0.

Theorem: [Grassmair et al. 2010] If F (sign(x0)) < 1

if � � ||w||, ||x� � x0|| = O(||w||)

where dI defined by:� i � I, �dI , �i� = si

dI = �I(��I�I)�1sI

F(s) = ||�IsI ||� = maxj /�I

|�dI , �j�|

Geometric Interpretation

�idI = �+,�

Condition F (s) < 1: no vector �j inside the cap Cs.

dI = �I(��I�I)�1sI

F(s) = ||�IsI ||� = maxj /�I

|�dI , �j�|

|�dI , �⇥| < 1

dI = �+,�I sI

Condition F (s) < 1: no vector �j inside the cap Cs.

dI = �I(��I�I)�1sI

F(s) = ||�IsI ||� = maxj /�I

|�dI , �j�|

|�dI , �⇥| < 1

|�dI ,

�⇥|<

dI = �+,�I sI

Exact Recovery Criterion (ERC): [Tropp]

Relation with F criterion: ERC(I) = maxs,supp(s)�I

For a support I ⇥ {0, . . . , N � 1} with �I full rank,

= ||�+I �Ic ||1,1 = max

j�Ic||�+

I �j ||1

(use ||(aj)j ||1,1 = maxj ||aj ||1)

ERC(I) = ||�I ||�,� where �I = ��Ic�+,�

Robustness to Bounded Noise

Exact Recovery Criterion (ERC): [Tropp]

Relation with F criterion: ERC(I) = maxs,supp(s)�I

For a support I ⇥ {0, . . . , N � 1} with �I full rank,

= ||�+I �Ic ||1,1 = max

j�Ic||�+

I �j ||1

(use ||(aj)j ||1,1 = maxj ||aj ||1)

ERC(I) = ||�I ||�,� where �I = ��Ic�+,�

Robustness to Bounded Noise

Theorem: If ERC(supp(x0)) < 1 and � � ||w||, then

||x0 � x�|| = O(||w||)x� is unique, satisfies supp(x�) � supp(x0), and

P = 200, N = 1000

F < 1ERC < 1 x� = x0

w-ERC < 1

Example: Random Matrix

0 10 20 30 40 50

⇥x =�

xi�(·��i)

Increasing �:� reduces correlation.

F (s)ERC(I)

w-ERC(I)

� reduces resolution.

Example: Deconvolution

Coherence Boundsµ(�) = max

i �=j|��i, �j⇥|Mutual coherence:

Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)

Coherence Bounds

Theorem:

||x0 � x�|| = O(||w||)

||x0||0 <12

�1 +

1µ(�)

µ(�) = maxi �=j

|��i, �j⇥|Mutual coherence:

one has supp(x�) � I, and

and � � ||w||,

Coherence Bounds

Theorem:

||x0 � x�|| = O(||w||)

||x0||0 <12

�1 +

1µ(�)

|��i, �j⇥|Mutual coherence:

one has supp(x�) � I, and

and � � ||w||,

For Gaussian matrices:

For convolution matrices: useless criterion.

µ(�) ��

log(PN)/P

One has: Optimistic setting:||x0||0 � O(

�P )

µ(�) ��

N � P

P (N � 1)

Incoherent pair of orthobases:

�2 =�k �� N�1/2e

2i�N mk

m�1 = {k ⇤⇥ �[k �m]}m

Diracs/Fourier

� = [�1,�2] � RN�2N

Spikes and Sinusoids Separation

�2 =�k �� N�1/2e

2i�N mk

m�1 = {k ⇤⇥ �[k �m]}m

Diracs/Fourier

� = [�1,�2] � RN�2N

minx�R2N

||y � �x||2 + �||x||1

minx1,x2�RN

||y � �1x1 � �2x2||2 + �||x1||1 + �||x2||1��

�2 =�k �� N�1/2e

2i�N mk

m�1 = {k ⇤⇥ �[k �m]}m

Diracs/Fourier

� = [�1,�2] � RN�2N

µ(�) =1�N

=� separates up to�

N/2 Diracs + sines.

minx�R2N

||y � �x||2 + �||x||1

minx1,x2�RN

||y � �1x1 � �2x2||2 + �||x1||1 + �||x2||1��

Overview

Data aquisition:

Sensors

Shannon interpolation: if Supp( ˆf) � [�N�, N�]

f [i] = f(i/N) = �f , �i��1

�2(�i)i

(Diracs)

Pointwise Sampling and Smoothness

f � L2 f � RN

Data aquisition:

Sensors

where h(t) =sin(�t)

f(t) =�

f [i]h(Nt� i)

f [i] = f(i/N) = �f , �i��1

�2(�i)i

(Diracs)

f � L2 f � RN

Data aquisition:

Sensors

where h(t) =sin(�t)

f(t) =�

f [i]h(Nt� i)

�� Natural images are not smooth.

�� But can be compressed e�ciently.

f [i] = f(i/N) = �f , �i��1

�2(�i)i

(Diracs)

f � L2 f � RN

Single Pixel Camera (Rice)

y[i] = �f0, �i⇥

Single Pixel Camera (Rice)

y[i] = �f0, �i⇥

f0, N = 2562 f�, P/N = 0.16 f�, P/N = 0.02

Physical hardware resolution limit: target resolution f � RN .

f � L2 f � RN y � RPmicromirrors

arrayresolution

CS hardwareK

CS Hardware ModelCS is about designing hardware: input signals f � L2(R2).

arrayresolution

CS hardware

arrayresolution

CS hardware

fOperator K

f0 � RN sparse in ortho-basis �

Sparse CS Recovery

��

x0 � RN

f0 � RN

(Discretized) sampling acquisition:

y = Kf0 + w = K � �(x0) + w= �

Sparse CS Recovery

��

x0 � RN

f0 � RN

y = Kf0 + w = K � �(x0) + w= �

K drawn from the Gaussian matrix ensemble

Ki,j � N (0, P�1/2) i.i.d.

� � drawn from the Gaussian matrix ensemble

Sparse CS Recovery

��

x0 � RN

f0 � RN

y = Kf0 + w = K � �(x0) + w= �

K drawn from the Gaussian matrix ensemble

Ki,j � N (0, P�1/2) i.i.d.

� � drawn from the Gaussian matrix ensemble

Sparse recovery:min

||�x�y||�||w||||x||1 min

||�x� y||2 + �||x||1||w||��

Sparse CS Recovery

��

x0 � RN

f0 � RN

� = translation invariantwavelet frame

Original f0

CS Simulation Example

Overview

⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:

�1 recovery:

CS with RIP

x⇥ � argmin||�x�y||��

||x||1 where�

y = �x0 + w||w|| � �

⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:

�1 recovery:

CS with RIP

[Candes 2009]

x⇥ � argmin||�x�y||��

||x||1 where�

y = �x0 + w||w|| � �

Theorem: If �2k ��

2� 1, then

where xk is the best k-term approximation of x0.

||x0 � x�|| � C0⇥k

||x0 � xk||1 + C1�

f�(⇥) =1

2⇤�⇥

�(⇥� b)+(a� ⇥)+

Eigenvalues of ��I�I with |I| = k are essentially in [a, b]

a = (1��

�)2 and b = (1��

�)2 where � = k/P

When k = �P � +�, the eigenvalue distribution tends to

[Marcenko-Pastur]

Large deviation inequality [Ledoux]

Singular Values Distributions

0 0.5 1 1.5 2 2.50

P=200, k=10

0 0.5 1 1.5 2 2.50

P=200, k=30

0 0.5 1 1.5 2 2.50

P=200, k=50

0 0.5 1 1.5 2 2.50

P=200, k=10

0 0.5 1 1.5 2 2.50

P=200, k=30

0 0.5 1 1.5 2 2.50

P=200, k=50

P = 200, k = 10

f�(�)

�k = 30

Link with coherence:

�k � (k � 1)µ(�)

�2 = µ(�)

RIP for Gaussian Matrices

|��i, �j⇥|

�k � (k � 1)µ(�)

�2 = µ(�)

|��i, �j⇥|

µ(�) ��

log(PN)/P

�k � (k � 1)µ(�)

Stronger result:

�2 = µ(�)

k � C

log(N/P )PTheorem: If

then �2k ��

2� 1 with high probability.

|��i, �j⇥|

µ(�) ��

log(PN)/P

(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:

smallest / largest eigenvalues of A�A

Numerics with RIP

�2� 1

(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:

Upper/lower RIC:

�ik = max

|I|=k�i(�I)

�k = min(�1k, �2

Monte-Carlo estimation:�k � �k

smallest / largest eigenvalues of A�A

N = 4000, P = 1000

Numerics with RIP

�(B�)

x0 �x0

��2

�2�3

��3

��1

� = (�i)i � R2�3

B� = {x \ ||x||1 � �}� = ||x0||1

||x||1 (P0(y))Noiseless recovery:

y �� x�

Polytopes-based Guarantees

�(B�)

x0 �x0

��2

�2�3

��3

��1

� = (�i)i � R2�3

B� = {x \ ||x||1 � �}� = ||x0||1

x0 solution of P0(�x0) �⇥ �x0 ⇤ ��(B�)

||x||1 (P0(y))Noiseless recovery:

y �� x�

Polytopes-based Guarantees

C(0,1,1)

K(0,1,1)

Ks =�(�isi)i � R3 \ �i � 0

� 2-D conesCs = �Ks

2-D quadrant

L1 Recovery in 2-D

��1

�2�3

� = (�i)i � R2�3

y �� x�

All MostRIP

� Sharp constants.

� No noise robustness.

All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.

Call(1/4) � 0.065

Cmost(1/4) � 0.25

[Donoho]

Polytope Noiseless Recovery

50 100 150 200 250 300 350 4000

Counting faces of random polytopes:

All MostRIP

� Sharp constants.

� No noise robustness.

All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.

Call(1/4) � 0.065

Cmost(1/4) � 0.25

[Donoho]

� Computation of“pathological” signals

[Dossal, P, Fadili, 2010]

Polytope Noiseless Recovery

50 100 150 200 250 300 350 4000

Counting faces of random polytopes:

Overview

Tomography and Fourier Measures

Fourier slice theorem: p�(⇥) = f(⇥ cos(�), ⇥ sin(�))

1D 2D Fourier

f = FFT2(f)

Partial Fourier measurements:

Equivalent to:

{p�k(t)}t�R0�k<K

�f = {f [�]}��

Disclaimer: this is not compressed sensing.

Regularized Inversion

f⇥ = argminf

��

|y[⇤] � f [⇤]|2 + ��

|⇥f, ⇥m⇤|.�1 regularization:

Noisy measurements: ⇥� � �, y[�] = f0[�] + w[�].

Noise: w[⇥] � N (0,�), white noise.

f+ f�

MRI ImagingFrom [Lutsig et al.]

Fourier sub-sampling pattern:randomization

MRI Reconstruction

High resolution Linear SparsityLow resolution

From [Lutsig et al.]

Pseudo inverse Sparse wavelets

�� Sampling low frequencies helps.

Compressive Fourier Measurements

Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: {��}� orthogonal basis.

where |�| = P drawn uniformly at random.

�� , y[�] = �f, ⇥�� = f [�]

Fast measurements: (e.g. Fourier basis)

Structured Measurements

� = (��)��

�� , y[�] = �f, ⇥�� = f [�]

Mutual incoherence: µ =⌅

Nmax�,m

|⇥⇥�, �m⇤| � [1,⌅

� = (��)��

�� not universal: requires incoherence.

�� , y[�] = �f, ⇥�� = f [�]

Mutual incoherence: µ =⌅

Nmax�,m

|⇥⇥�, �m⇤| � [1,⌅

Theorem: with high probability on �,

If M � CP

µ2 log(N)4, then �2M �

�2� 1

[Rudelson, Vershynin, 2006]

� = (��)��

Overview

Setting:H: Hilbert space. Here: H = RN .

G : H� R ⇤ {+⇥}

Convex Optimization

minx�H

G(x)Problem:

Class of functions:

G : H� R ⇤ {+⇥}

G(tx + (1� t)y) � tG(x) + (1� t)G(y) t � [0, 1]

Convex Optimization

Convex:

minx�H

G(x)Problem:

Class of functions:

G : H� R ⇤ {+⇥}

lim infx�x0

G(x) � G(x0)

{x ⇥ H \ G(x) ⇤= +�} ⇤= ⌅

G(tx + (1� t)y) � tG(x) + (1� t)G(y) t � [0, 1]

Convex Optimization

Lower semi-continuous:

Convex:

Proper:

minx�H

G(x)Problem:

Class of functions:

G : H� R ⇤ {+⇥}

lim infx�x0

G(x) � G(x0)

{x ⇥ H \ G(x) ⇤= +�} ⇤= ⌅

�C(x) =�

0 if x ⇥ C,+� otherwise.

(C closed and convex)

G(tx + (1� t)y) � tG(x) + (1� t)G(y) t � [0, 1]

Convex Optimization

Indicator:

Lower semi-continuous:

Convex:

Proper:

minx�H

G(x)Problem:

Proximal operator of G:Prox�G(x) = argmin

||x� z||2 + �G(z)

Proximal Operators

||x� z||2 + �G(z)

G(x) = ||x||1 =�

G(x) = ||x||0 = | {i \ xi �= 0} |

G(x) =�

log(1 + |xi|2)

Proximal Operators

−10 −8 −6 −4 −2 0 2 4 6 8 10

||x||0|x|log(1 + x2)

�� 3rd order polynomial root.

||x� z||2 + �G(z)

G(x) = ||x||1 =�

Prox�G(x)i = max�

0, 1� �

G(x) = ||x||0 = | {i \ xi �= 0} |

Prox�G(x)i =�

xi if |xi| � �2�,0 otherwise.

G(x) =�

log(1 + |xi|2)

Proximal Operators

−10 −8 −6 −4 −2 0 2 4 6 8 10

−10 −8 −6 −4 −2 0 2 4 6 8 10−10

||x||0|x|log(1 + x2)

ProxG(x)

Separability: G(x) = G1(x1) + . . . + Gn(xn)

ProxG(x) = (ProxG1(x1), . . . ,ProxGn(xn))

Proximal Calculus

Separability:

Quadratic functionals:

= ��(Id + ��)�1

G(x) = G1(x1) + . . . + Gn(xn)

G(x) =12

||�x� y||2

Prox�G = (Id + ��)�1��

Proximal Calculus

Separability:

= ��(Id + ��)�1

G(x) = G1(x1) + . . . + Gn(xn)

G(x) =12

||�x� y||2

Prox�G = (Id + ��)�1��

Composition by tight frame:

Proximal Calculus

ProxG�A(x) = A� � ProxG �A + Id�A� �A

A � A� = Id

Separability:

Indicators:

= ��(Id + ��)�1

G(x) = G1(x1) + . . . + Gn(xn)

G(x) =12

||�x� y||2

Prox�G = (Id + ��)�1��

G(x) = �C(x) x

Prox�G(x) = ProjC(x)= argmin

z�C||x� z||

Composition by tight frame:

Proximal Calculus

ProjC(x)C

ProxG�A(x) = A� � ProxG �A + Id�A� �A

A � A� = Id

If 0 < �� < 2/L, x(�) � x� a solution.Theorem:

Gradient descent:G is C1 and �G is L-Lipschitz

Gradient and Proximal Descents

[explicit]x(�+1) = x(�) � ��G(x(�))

If 0 < �� < 2/L, x(�) � x� a solution.Theorem:

Gradient descent:

x(�+1) = x(�) � ��v(�),

�� Problem: slow.

G is C1 and �G is L-Lipschitz

v(�) � �G(x(�))

Sub-gradient descent:

[explicit]

If �� 1/⇥, x(�) � x� a solution.Theorem:

x(�+1) = x(�) � ��G(x(�))

If 0 < �� < 2/L, x(�) � x� a solution.

If �� c > 0, x(�) � x� a solution.

Theorem:

Gradient descent:

x(�+1) = x(�) � ��v(�),

�� Problem: slow.

G is C1 and �G is L-Lipschitz

v(�) � �G(x(�))

x(⇥+1) = Prox��G(x(⇥))

�� Prox�G hard to compute.

Sub-gradient descent:

Proximal-point algorithm:

[explicit]

[implicit]

If �� 1/⇥, x(�) � x� a solution.Theorem:

Theorem:

x(�+1) = x(�) � ��G(x(�))

Solve minx�H

Problem: Prox�E is not available.

Proximal Splitting Methods

Solve minx�H

Splitting: E(x) = F (x) +�

SimpleSmooth

Solve minx�H

Splitting: E(x) = F (x) +�

SimpleSmooth

Iterative algorithms using: �F (x)Prox�Gi(x)

Forward-Backward:Douglas-Rachford:

Primal-Dual:Generalized FB:

�Gi�Gi � A

F +�

F + Gsolves

SimpleSmooth

Data fidelity:

Regularization:

f0 = �x0 sparse in dictionary �.

Inverse problem: y = Kf0 + wmeasurements

K : RN � RP , P � NK

� = K � ⇥F (x) =12

||y � �x||2

G(x) = ||x||1 =�

minx�RN

F (x) + G(x)

Sparse recovery: f� = �x� where x� solves

Model:

Smooth + Simple Splitting

x� � argminx

F (x) + G(x) 0 � �F (x�) + �G(x�)

(x� � ��F (x�)) � x� + �⇥G(x�)

Fix point equation:

x⇥ = Prox�G(x⇥ � ��F (x⇥))

Forward-Backward

��

x� � argminx

F (x) + G(x) 0 � �F (x�) + �G(x�)

(x� � ��F (x�)) � x� + �⇥G(x�)

Fix point equation:

x(⇥+1) = Prox�G

�x(⇥) � ��F (x(⇥))

�x⇥ = Prox�G(x⇥ � ��F (x⇥))

Forward-Backward

��

Forward-backward:

x� � argminx

F (x) + G(x) 0 � �F (x�) + �G(x�)

(x� � ��F (x�)) � x� + �⇥G(x�)

Fix point equation:

G = �C

x(⇥+1) = Prox�G

�x(⇥) � ��F (x(⇥))

�x⇥ = Prox�G(x⇥ � ��F (x⇥))

Forward-Backward

��

Forward-backward:

Projected gradient descent:

x� � argminx

F (x) + G(x) 0 � �F (x�) + �G(x�)

(x� � ��F (x�)) � x� + �⇥G(x�)

Fix point equation:

G = �C

x(⇥+1) = Prox�G

�x(⇥) � ��F (x(⇥))

�x⇥ = Prox�G(x⇥ � ��F (x⇥))

Forward-Backward

��

Forward-backward:

Projected gradient descent:

Theorem:

a solution of (�)If � < 2/L,

Let �F be L-Lipschitz.x(�) � x�

||�x� y||2 + �||x||1 minx

F (x) + G(x)

F (x) =12

||�x� y||2

G(x) = �||x||1

�F (x) = ��(�x� y)

Prox�G(x)i = max�

0, 1� �⇥

L = ||��||

Example: L1 Regularization

��

Forward-backward Iterative soft thresholding��

Douglas-Rachford iterations:

RProx�G(x) = 2Prox�G(x)� x

x(⇥+1) = Prox�G2(z(⇥+1))

z(⇥+1) =�1� �

�z(⇥) +

2RProx�G2 � RProx�G1(z

(⇥))

Reflexive prox:

Douglas Rachford Scheme

Simple Simple

G1(x) + G2(x)

Douglas-Rachford iterations:

Theorem:

a solution of (�)

RProx�G(x) = 2Prox�G(x)� x

x(�) � x�

x(⇥+1) = Prox�G2(z(⇥+1))

z(⇥+1) =�1� �

�z(⇥) +

2RProx�G2 � RProx�G1(z

(⇥))

If 0 < � < 2 and ⇥ > 0,

Reflexive prox:

Douglas Rachford Scheme

Simple Simple

G1(x) + G2(x)

G1(x) = iC(x), C = {x \ �x = y}

Prox�G1(x) = ProjC(x) = x + �⇥(��⇥)�1(y � �x)

G2(x) = ||x||1 Prox�G2(x) =�

max�

0, 1� �

�� e⇥cient if �� easy to invert.

Example: Constrainted L1

min�x=y

||x||1 ��

50 100 150 200 250

G1(x) + G2(x)

G1(x) = iC(x), C = {x \ �x = y}

Prox�G1(x) = ProjC(x) = x + �⇥(��⇥)�1(y � �x)

G2(x) = ||x||1 Prox�G2(x) =�

max�

0, 1� �

�� e⇥cient if �� easy to invert.

� = 0.01� = 1� = 10

Example: compressed sensing

� � R100�400 Gaussian matrix

||x0||0 = 17y = �x0

log10(||x(�)||1 � ||x�||1)

Example: Constrainted L1

min�x=y

||x||1 ��

C =�(x1, . . . , xk) � Hk \ x1 = . . . = xk

each Fi is simpleminx

G1(x) + . . . + Gk(x)

G(x1, . . . , xk) + �C(x1, . . . , xk)

G(x1, . . . , xk) = G1(x1) + . . . + Gk(xk)

More than 2 Functionals

��

C =�(x1, . . . , xk) � Hk \ x1 = . . . = xk

Prox�⇥C (x1, . . . , xk) = (x, . . . , x) where x =1k

each Fi is simpleminx

G1(x) + . . . + Gk(x)

G(x1, . . . , xk) + �C(x1, . . . , xk)

G(x1, . . . , xk) = G1(x1) + . . . + Gk(xk)

G and �C are simple:

Prox�G(x1, . . . , xk) = (Prox�Gi(xi))i

More than 2 Functionals

��

Linear map A : E � H.

C = {(x, y) ⇥ H� E \ Ax = y}

G1(x) + G2 � A(x)

G1, G2 simple.minz⇥H�E

G(z) + �C(z)

G(x, y) = G1(x) + G2(y)

Auxiliary Variables

��

Linear map A : E � H.

C = {(x, y) ⇥ H� E \ Ax = y}

Prox�C (x, y) = (x + A�y, y � y) = (x, Ax)

wherey = (Id + AA�)�1(Ax� y)

x = (Id + A�A)�1(A�y + x)

�� e�cient if Id + AA� or Id + A�A easy to invert.

G1(x) + G2 � A(x)

G1, G2 simple.minz⇥H�E

G(z) + �C(z)

G(x, y) = G1(x) + G2(y)

Prox�G(x, y) = (Prox�G1(x),Prox�G2(y))

Auxiliary Variables

��

G1(u) = ||u||1 Prox�G1(u)i = max�

0, 1� �

||ui||

12||Kf � y||2 + �||⇥f ||1

G1(f) + G2 � �(f)

G2(f) =12||Kf � y||2 Prox�G2 = (Id + �K�K)�1K�

C =�(f, u) ⇥ RN � RN�2 \ u = ⇤f

Prox�C (f, u) = (f ,�f)

Example: TV Regularization

||u||1 =�

||ui||

��

Compute the solution of:

�� O(N log(N)) operations using FFT.

G1(u) = ||u||1 Prox�G1(u)i = max�

0, 1� �

||ui||

12||Kf � y||2 + �||⇥f ||1

G1(f) + G2 � �(f)

G2(f) =12||Kf � y||2 Prox�G2 = (Id + �K�K)�1K�

C =�(f, u) ⇥ RN � RN�2 \ u = ⇤f

Prox�C (f, u) = (f ,�f)

(Id + �)f = �div(u) + f

||u||1 =�

||ui||

��

Iteration �y = Kx0

y = �f0 + wOrignal f0 Recovery f�

dictionary

ConclusionSparsity: approximate signals with few atoms.

�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.

Compressed sensing ideas:

�� CS is about designing new hardware.

dictionary

�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.

Compressed sensing ideas:

The devil is in the constants:

�� Worse case analysis is problematic.

�� Designing good signal models.

�� CS is about designing new hardware.

dictionary

Dictionary learning:

learning

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

Letal.:SPA

Fig.7.D

atasetused

forevaluating

denoisingexperim

L[28]W

L.”T

H,20IT

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

decompo-

sitionas

thisnum

steps.The

stoppingcriteria

be-com

number

ofatom

insteadof

thereconstruction

error.Using

allduring

permits

tolearn

tionaryspecialized

inproviding

acoarse

approximation.

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

ations,preventingthe

dictionaryfrom

learningthem

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

Fig.6are

sufficienttogive

satisfactoryresults,w

ithinthe

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

odifiedK

algorithmthatdeals

nonuniformnoise,as

describedin

previoussection,and

itanadaptive

dictionarythathas

beenlearned

lowpatch

sparsityin

orderto

avoidover-fitting

osaicpattern.T

etechnique

appliedto

genericcolor

inpaintingas

demonstrated

nextsection.

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

Denoising

Images

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

nowevaluate

ourextension

forcolor

images.

trainedsom

edictionaries

differentsizesof

000patches

takenfrom

adatabase

iththe

patch-sparsityparam

eter(six

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

database.T

etrained

eachdictionary

600iterations.

providedus

ofgeneric

dictionariesthat

usedas

initialdictionaries

denoisingalgorithm

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Analysis vs. synthesis:

learning

Js(f) = minf=�x

||x||1

Some Hot Topics

Letal.:SPA

Fig.7.D

atasetused

forevaluating

denoisingexperim

L[28]W

L.”T

H,20IT

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

decompo-

sitionas

thisnum

steps.The

stoppingcriteria

be-com

number

ofatom

insteadof

thereconstruction

error.Using

allduring

permits

tolearn

tionaryspecialized

inproviding

acoarse

approximation.

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

dictionaryfrom

learningthem

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

Fig.6are

sufficienttogive

ithinthe

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

odifiedK

algorithmthatdeals

nonuniformnoise,as

describedin

previoussection,and

itanadaptive

dictionarythathas

beenlearned

lowpatch

sparsityin

orderto

avoidover-fitting

osaicpattern.T

etechnique

appliedto

genericcolor

inpaintingas

demonstrated

nextsection.

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

obtainedw

iththe

pro-posed

framew

Denoising

Images

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

nowevaluate

ourextension

forcolor

images.

trainedsom

edictionaries

differentsizesof

000patches

takenfrom

adatabase

iththe

patch-sparsityparam

eter(six

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

database.T

etrained

eachdictionary

600iterations.

providedus

ofgeneric

dictionariesthat

usedas

initialdictionaries

denoisingalgorithm

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

Image f = �x

Coe�cients x

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

Some Hot Topics

Letal.:SPA

Fig.7.D

atasetused

forevaluating

denoisingexperim

L[28]W

L.”T

H,20IT

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

decompo-

sitionas

thisnum

steps.The

stoppingcriteria

be-com

number

ofatom

insteadof

thereconstruction

error.Using

allduring

permits

tolearn

tionaryspecialized

inproviding

acoarse

approximation.

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

dictionaryfrom

learningthem

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

Fig.6are

sufficienttogive

ithinthe

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

odifiedK

algorithmthatdeals

nonuniformnoise,as

describedin

previoussection,and

itanadaptive

dictionarythathas

beenlearned

lowpatch

sparsityin

orderto

avoidover-fitting

osaicpattern.T

etechnique

appliedto

genericcolor

inpaintingas

demonstrated

nextsection.

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

obtainedw

iththe

pro-posed

framew

Denoising

Images

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

nowevaluate

ourextension

forcolor

images.

trainedsom

edictionaries

differentsizesof

000patches

takenfrom

adatabase

iththe

patch-sparsityparam

eter(six

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

database.T

etrained

eachdictionary

600iterations.

providedus

ofgeneric

dictionariesthat

usedas

initialdictionaries

denoisingalgorithm

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

Image f = �x

Coe�cients x c = D�f

� D�

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

(a) (b) (c)

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

Some Hot Topics

Letal.:SPA

Fig.7.D

atasetused

forevaluating

denoisingexperim

L[28]W

L.”T

H,20IT

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

decompo-

sitionas

thisnum

steps.The

stoppingcriteria

be-com

number

ofatom

insteadof

thereconstruction

error.Using

allduring

permits

tolearn

tionaryspecialized

inproviding

acoarse

approximation.

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

dictionaryfrom

learningthem

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

Fig.6are

sufficienttogive

ithinthe

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

odifiedK

algorithmthatdeals

nonuniformnoise,as

describedin

previoussection,and

itanadaptive

dictionarythathas

beenlearned

lowpatch

sparsityin

orderto

avoidover-fitting

osaicpattern.T

etechnique

appliedto

genericcolor

inpaintingas

demonstrated

nextsection.

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

obtainedw

iththe

pro-posed

framew

Denoising

Images

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

nowevaluate

ourextension

forcolor

images.

trainedsom

edictionaries

differentsizesof

000patches

takenfrom

adatabase

iththe

patch-sparsityparam

eter(six

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

database.T

etrained

eachdictionary

600iterations.

providedus

ofgeneric

dictionariesthat

usedas

initialdictionaries

denoisingalgorithm

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

Other sparse priors:

Image f = �x

� D�

|x1| + |x2| max(|x1|, |x2|)

(a) (b) (c)

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

|x1| + (x22 + x2

Some Hot Topics

Letal.:SPA

Fig.7.D

atasetused

forevaluating

denoisingexperim

L[28]W

L.”T

H,20IT

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

decompo-

sitionas

thisnum

steps.The

stoppingcriteria

be-com

number

ofatom

insteadof

thereconstruction

error.Using

allduring

permits

tolearn

tionaryspecialized

inproviding

acoarse

approximation.

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

dictionaryfrom

learningthem

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

Fig.6are

sufficienttogive

ithinthe

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

odifiedK

algorithmthatdeals

nonuniformnoise,as

describedin

previoussection,and

itanadaptive

dictionarythathas

beenlearned

lowpatch

sparsityin

orderto

avoidover-fitting

osaicpattern.T

etechnique

appliedto

genericcolor

inpaintingas

demonstrated

nextsection.

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

obtainedw

iththe

pro-posed

framew

Denoising

Images

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

nowevaluate

ourextension

forcolor

images.

trainedsom

edictionaries

differentsizesof

000patches

takenfrom

adatabase

iththe

patch-sparsityparam

eter(six

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

database.T

etrained

eachdictionary

600iterations.

providedus

ofgeneric

dictionariesthat

usedas

initialdictionaries

denoisingalgorithm

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

Image f = �x

� D�

|x1| + |x2| max(|x1|, |x2|)

(a) (b) (c)

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

|x1| + (x22 + x2

Some Hot Topics

Letal.:SPA

Fig.7.D

atasetused

forevaluating

denoisingexperim

L[28]W

L.”T

H,20IT

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

decompo-

sitionas

thisnum

steps.The

stoppingcriteria

be-com

number

ofatom

insteadof

thereconstruction

error.Using

allduring

permits

tolearn

tionaryspecialized

inproviding

acoarse

approximation.

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

dictionaryfrom

learningthem

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

Fig.6are

sufficienttogive

ithinthe

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

odifiedK

algorithmthatdeals

nonuniformnoise,as

describedin

previoussection,and

itanadaptive

dictionarythathas

beenlearned

lowpatch

sparsityin

orderto

avoidover-fitting

osaicpattern.T

etechnique

appliedto

genericcolor

inpaintingas

demonstrated

nextsection.

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

obtainedw

iththe

pro-posed

framew

Denoising

Images

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

nowevaluate

ourextension

forcolor

images.

trainedsom

edictionaries

differentsizesof

000patches

takenfrom

adatabase

iththe

patch-sparsityparam

eter(six

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

database.T

etrained

eachdictionary

600iterations.

providedus

ofgeneric

dictionariesthat

usedas

initialdictionaries

denoisingalgorithm

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

Image f = �x

� D�

|x1| + |x2| max(|x1|, |x2|)

(a) (b) (c)

Nuclear

Sparsity and Compressed Sensing

Documents

Transcript of Sparsity and Compressed Sensing

A hand-waving introduction to sparsity for compressed tomography reconstruction

DSP Chip of Compressed Sensing Algorithm for Bio- Sensor ...DSP Chip of Compressed Sensing Algorithm for Bio-Sensor Application Xinping Zhang ... DSP Chip of Compressed Sensing Algorithm

Wavelets and Sparse Methods for Image Reconstruction and … · 2017-02-17 · Chapter 2. Wavelets, sparsity and compressed sensing 29 2.1. Wavelets 29 2.1.1. Continuous wavelets

Introduction to Compressed Sensing · Introduction to Compressed Sensing Gitta Kutyniok (Institut fu¨r Mathematik, Technische Universitat Berlin) Winter School on “Compressed Sensing”,

Compressed Sensing Project Hassan 2010315440

Compressed sensing of streaming data€¦ · Recursive Compressed Sensing Recursive sampling Recursive estimation Analysis Simulations. Compressed sensing Sampling: m

Compressed Sensing Crash Intro

Introduction to Compressive Sensing (Compressed Sensing)

Compressed Sensing, Sparsity, and Dimensionality …neurophysics.huji.ac.il/sites/default/files/sompolinsky...NE35CH23-Ganguli ARI 26 March 2012 13:50 provides an algorithm (called

Compressed Sensing, Sparsity, and Dimensionality in ...NE35CH23-Ganguli ARI 14 May 2012 15:29 Compressed Sensing, Sparsity, and Dimensionality in Neuronal Information Processing and

Compressed-SensingRecoverywith Super … · 2019-03-18 · Compressed Sensing Basics and Practical Relevance Bayesian Approximate Message Passing (BAMP) for Compressed Sensing Recovery

Compressed Sensing, Sparsity, and Dimensionality in ...ganguli-gang.stanford.edu/pdf/12.CompSense.pdfexplore more complex problems, such as the brain’s ability to process images

Measurements and Bits: Compressed Sensing · • Compress data using model (e.g., sparsity) – encode coefficient locations and values • Lots of work to throw away >80% of the

Sparsity, Randomness and Compressed Sensing

Compressed Sensing using Prior Rank, Intensity and ... · propose a novel CS method for dynamic MRI applications using Prior Rank, Intensity and Sparsity Model (PRISM) 3,4 and evaluate

Compressed sensing in astronomy and remote sensing: a data …authors.library.caltech.edu/73674/1/74460I_1.pdf · 2017-01-24 · Compressed Sensing in astronomy and remote sensing:

Modern electron microscopy goes high dimensions: handling ... · Make use of sparsity in big data Compressed sensing in tomogram reconstruction . Thanks for your attention! Christina

Compressive Sampling a.k.a. Compressed Sensing...Compressive Sampling: Sub-Nyquist sampling maybe possible. Two principles of Compressive Sampling 1 Sparsity: Signals are sparse in

University of California, Los Angeles - Hong …ttang/UsefulCollections/compressed...Introduction Compressed sensing Variants Applications Compressed sensing Or: the equation Ax =

Compressed Sensing: A Tutorial - Yonsei Universityweb.yonsei.ac.kr/nipi/lectureNote/Compressed Sensing by Romberg an… · Compressed Sensing: A Tutorial IEEE Statistical Signal Processing