Sparsity and Compressed Sensing

Post on 08-Jun-2015

831 views 4 download

Tags:

description

Slides of the lectures given at the summer school "Biomedical Image Analysis Summer School : Modalities, Methodologies & Clinical Research", Centrale Paris, Paris, July 9-13, 2012

Transcript of Sparsity and Compressed Sensing

Sparsity andCompressed Sensing

Gabriel Peyré

www.numerical-tours.com

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Forward model:

Observations Operator Noise(Unknown)Input� : RQ � RP

Inverse Problems

y = K f0 + w � RP

Forward model:

Observations Operator Noise(Unknown)Input� : RQ � RP

Denoising: K = IdQ, P = Q.

Inverse Problems

y = K f0 + w � RP

Forward model:

Observations Operator Noise(Unknown)Input

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

K

� : RQ � RP

Denoising: K = IdQ, P = Q.

Inpainting: set � of missing pixels, P = Q� |�|.

Inverse Problems

y = K f0 + w � RP

Forward model:

Observations Operator Noise(Unknown)Input

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

K

� : RQ � RP

Denoising: K = IdQ, P = Q.

Inpainting: set � of missing pixels, P = Q� |�|.

Super-resolution: Kf = (f � k) �� , P = Q/� .

Inverse Problems

K

y = K f0 + w � RP

Kf = (p�k)1�k�K

Inverse Problem in Medical Imaging

Magnetic resonance imaging (MRI):

Kf = (p�k)1�k�K

Kf = (f(�))���

Inverse Problem in Medical Imaging

f

Magnetic resonance imaging (MRI):

Other examples: MEG, EEG, . . .

Kf = (p�k)1�k�K

Kf = (f(�))���

Inverse Problem in Medical Imaging

f

Noisy measurements: y = Kf0 + w.

f� � argminf�RQ

12||y �Kf ||2 + � J(f)

Prior model: J : RQ � R assigns a score to images.

Inverse Problem Regularization

Noisy measurements: y = Kf0 + w.

f� � argminf�RQ

12||y �Kf ||2 + � J(f)

Prior model: J : RQ � R assigns a score to images.

Inverse Problem Regularization

Data fidelity Regularity

Noisy measurements: y = Kf0 + w.

Choice of �: tradeo�

||w||Regularity of f0

J(f0)Noise level

f� � argminf�RQ

12||y �Kf ||2 + � J(f)

Prior model: J : RQ � R assigns a score to images.

Inverse Problem Regularization

Data fidelity Regularity

Noisy measurements: y = Kf0 + w.

No noise: �� 0+, minimize

Choice of �: tradeo�

||w||Regularity of f0

J(f0)Noise level

f� � argminf�RQ

12||y �Kf ||2 + � J(f)

Prior model: J : RQ � R assigns a score to images.

f� � argminf�RQ,Kf=y

J(f)

Inverse Problem Regularization

Data fidelity Regularity

J(f) =�

||�f(x)||2dx

Smooth and Cartoon Priors

�|�f |2

J(f) =�

||�f(x)||2dx

J(f) =�

||�f(x)||dx

J(f) =�

Rlength(Ct)dt

Smooth and Cartoon Priors

�|�f |2 �

|�f |

Inpainting Example

Input y = Kf0 + w Sobolev Total variation

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Q

N

Dictionary � = (�m)m � RQ�N , N � Q.

Redundant Dictionaries

�m = ei�·, m�

frequencyFourier:

Q

N

Dictionary � = (�m)m � RQ�N , N � Q.

Redundant Dictionaries

Wavelets:�m = �(2�jR��x� n)

m = (j, �, n)

scale

orientation

position

�m = ei�·, m�

frequencyFourier:

Q

N

Dictionary � = (�m)m � RQ�N , N � Q.

Redundant Dictionaries

� = 2� = 1

Wavelets:�m = �(2�jR��x� n)

m = (j, �, n)

scale

orientation

position

�m = ei�·, m�

frequencyFourier:

DCT, Curvelets, bandlets, . . .

Q

N

Dictionary � = (�m)m � RQ�N , N � Q.

Redundant Dictionaries

� = 2� = 1

Synthesis: f =�

m xm�m = �x.

xf

Image f = �xCoe�cients x

Wavelets:�m = �(2�jR��x� n)

m = (j, �, n)

scale

orientation

position

�m = ei�·, m�

frequencyFourier:

DCT, Curvelets, bandlets, . . .

Q

N

Dictionary � = (�m)m � RQ�N , N � Q.

Redundant Dictionaries

� =�

� = 2� = 1

Ideal sparsity: for most m, xm = 0.

J0(x) = # {m \ xm �= 0}

Sparse Priors

Image f0

Coe�cients x

Ideal sparsity: for most m, xm = 0.

J0(x) = # {m \ xm �= 0}

Sparse approximation: f = �x whereargminx�RN

||f0 ��x||2 + TJ0(x)

Sparse Priors

Image f0

Coe�cients x

Ideal sparsity: for most m, xm = 0.

J0(x) = # {m \ xm �= 0}

Sparse approximation: f = �x where

Orthogonal �: ��� = ��� = IdN

xm =�

�f0, �m� if |�f0, �m�| > T,0 otherwise.

��

f = �� � ST � �(f0)ST

argminx�RN

||f0 ��x||2 + TJ0(x)

Sparse Priors

Image f0

Coe�cients x

Ideal sparsity: for most m, xm = 0.

J0(x) = # {m \ xm �= 0}

Sparse approximation: f = �x where

Orthogonal �: ��� = ��� = IdN

xm =�

�f0, �m� if |�f0, �m�| > T,0 otherwise.

��

f = �� � ST � �(f0)ST

Non-orthogonal �:�� NP-hard.

argminx�RN

||f0 ��x||2 + TJ0(x)

Sparse Priors

Image f0

Coe�cients x

Image with 2 pixels:

q = 0

J0(x) = # {m \ xm �= 0}J0(x) = 0 �� null image.J0(x) = 1 �� sparse image.J0(x) = 2 �� non-sparse image.

x2

Convex Relaxation: L1 Prior

x1

Image with 2 pixels:

q = 0 q = 1 q = 2q = 3/2q = 1/2

Jq(x) =�

m

|xm|q

J0(x) = # {m \ xm �= 0}J0(x) = 0 �� null image.J0(x) = 1 �� sparse image.J0(x) = 2 �� non-sparse image.

x2

Convex Relaxation: L1 Prior

�q priors: (convex for q � 1)

x1

Image with 2 pixels:

q = 0 q = 1 q = 2q = 3/2q = 1/2

Jq(x) =�

m

|xm|q

J0(x) = # {m \ xm �= 0}J0(x) = 0 �� null image.J0(x) = 1 �� sparse image.J0(x) = 2 �� non-sparse image.

x2

J1(x) =�

m

|xm|

Convex Relaxation: L1 Prior

Sparse �1 prior:

�q priors: (convex for q � 1)

x1

L1 Regularization

coe�cientsx0 � RN

L1 Regularization

coe�cients image�

x0 � RN f0 = �x0 � RQ

L1 Regularization

observations

w

coe�cients image� K

x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

L1 Regularization

observations

� = K �⇥ ⇥ RP�N

w

coe�cients image� K

x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

Fidelity Regularization

minx�RN

12

||y � �x||2 + �||x||1

L1 Regularization

Sparse recovery: f� = �x� where x� solves

observations

� = K �⇥ ⇥ RP�N

w

coe�cients image� K

x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

x� � argmin�x=y

m

|xm|

x�

�x = y

Noiseless Sparse Regularization

Noiseless measurements: y = �x0

x� � argmin�x=y

m

|xm|

x�

�x = y

x� � argmin�x=y

m

|xm|2

Noiseless Sparse Regularization

x�

�x = y

Noiseless measurements: y = �x0

x� � argmin�x=y

m

|xm|

x�

�x = y

x� � argmin�x=y

m

|xm|2

Noiseless Sparse Regularization

Convex linear program.Interior points, cf. [Chen, Donoho, Saunders] “basis pursuit”.

Douglas-Rachford splitting, see [Combettes, Pesquet].

x�

�x = y

Noiseless measurements: y = �x0

RegularizationData fidelity

y = �x0 + wNoisy measurements:

x� � argminx�RQ

12

||y � �x||2 + � ||x||1

Noisy Sparse Regularization

�� �RegularizationData fidelityEquivalence

||�x =y|| �

y = �x0 + wNoisy measurements:

x� � argminx�RQ

12

||y � �x||2 + � ||x||1

x� � argmin||�x�y||��

||x||1

Noisy Sparse Regularization

x�

Iterative soft thresholdingForward-backward splitting

Algorithms:

�� �RegularizationData fidelityEquivalence

||�x =y|| �

y = �x0 + wNoisy measurements:

x� � argminx�RQ

12

||y � �x||2 + � ||x||1

x� � argmin||�x�y||��

||x||1

Noisy Sparse Regularization

Nesterov multi-steps schemes.

see [Daubechies et al], [Pesquet et al], etc

��

x�

Image De-blurring

Original f0 y = h � f0 + w

f� = argminf�RN

||f ⇥ h� y||2 + �||⇥f ||2

f�(⇥) =h(⇥)

|h(⇥)|2 + �|⇥|2y(⇥)

Sobolev regularization:

Image De-blurring

Original f0 y = h � f0 + w SobolevSNR=22.7dB

f� = argminf�RN

||f ⇥ h� y||2 + �||⇥f ||2

f�(⇥) =h(⇥)

|h(⇥)|2 + �|⇥|2y(⇥)

Sobolev regularization:

� = translation invariant wavelets.

x� � argminx

12

||h � (�x)� y||2 + �||x||1f� = �x� where

Sparsity

Image De-blurring

Original f0 y = h � f0 + w

Sparsity regularization:

SNR=24.7dBSobolev

SNR=22.7dB

K

y = Kf0 + wMeasures:

Inpainting Problem

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

Image SeparationModel: f = f1 + f2 + w, (f1, f2) components, w noise.

Image SeparationModel: f = f1 + f2 + w, (f1, f2) components, w noise.

Union dictionary:

(x�1, x

�2) � argmin

x=(x1,x2)�RN

12

||f ��x||2 + �||x||1

� = [�1,�2] � RQ�(N1+N2)

Image Separation

Recovered component: f�i = �ix�

i .

Model: f = f1 + f2 + w, (f1, f2) components, w noise.

Examples of Decompositions

Cartoon+Texture Separation

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Basics of Convex Analysis

Setting: Here: H = RN .G : H� R ⇤ {+⇥}

minx�H

G(x)Problem:

� t � [0, 1]

Basics of Convex Analysis

Setting: Here: H = RN .G : H� R ⇤ {+⇥}

x y

minx�H

G(x)Problem:

G(tx + (1� t)y) � tG(x) + (1� t)G(y)Convex:

�G(x) = {u ⇥ H \ ⇤ z, G(z) � G(x) + ⌅u, z � x⇧}Sub-di�erential:

G(x) = |x|

�G(0) = [�1, 1]

� t � [0, 1]

Basics of Convex Analysis

Setting: Here: H = RN .G : H� R ⇤ {+⇥}

x y

minx�H

G(x)Problem:

G(tx + (1� t)y) � tG(x) + (1� t)G(y)Convex:

�G(x) = {u ⇥ H \ ⇤ z, G(z) � G(x) + ⌅u, z � x⇧}

If F is C1, �F (x) = {�F (x)}

Sub-di�erential:

Smooth functions: G(x) = |x|

�G(0) = [�1, 1]

� t � [0, 1]

Basics of Convex Analysis

Setting: Here: H = RN .G : H� R ⇤ {+⇥}

x y

minx�H

G(x)Problem:

G(tx + (1� t)y) � tG(x) + (1� t)G(y)Convex:

�G(x) = {u ⇥ H \ ⇤ z, G(z) � G(x) + ⌅u, z � x⇧}

If F is C1, �F (x) = {�F (x)}

Sub-di�erential:

Smooth functions: G(x) = |x|

�G(0) = [�1, 1]First-order conditions:

�� 0 � �G(x�)x� � argminx�H

G(x)

� t � [0, 1]

Basics of Convex Analysis

Setting: Here: H = RN .G : H� R ⇤ {+⇥}

x y

minx�H

G(x)Problem:

G(tx + (1� t)y) � tG(x) + (1� t)G(y)Convex:

⇥G(x) = ��(�x� y) + �⇥|| · ||1(x)

�|| · ||1(x)i =�

sign(xi) if xi ⇥= 0,[�1, 1] if xi = 0.

L1 Regularization: First Order Conditions

x� ⇥ argminx�RQ

G(x) =12

||y � �x||2 + �||x||1

I = {i ⇥ {0, . . . , N � 1} \ x�i ⇤= 0}

Support of the solution:

⇥G(x) = ��(�x� y) + �⇥|| · ||1(x)

�|| · ||1(x)i =�

sign(xi) if xi ⇥= 0,[�1, 1] if xi = 0.

L1 Regularization: First Order Conditions

i

x�i

x� ⇥ argminx�RQ

G(x) =12

||y � �x||2 + �||x||1

I = {i ⇥ {0, . . . , N � 1} \ x�i ⇤= 0}

Support of the solution:

⇥G(x) = ��(�x� y) + �⇥|| · ||1(x)

�|| · ||1(x)i =�

sign(xi) if xi ⇥= 0,[�1, 1] if xi = 0.

Restrictions:xI = (xi)i�I � R|I| �I = (�i)i�I � RP�|I|

L1 Regularization: First Order Conditions

i

x�i

x� ⇥ argminx�RQ

G(x) =12

||y � �x||2 + �||x||1

First order condition:

��(�x� � y) + �s = 0

where�

sI = sign(x�I),

||sIc ||� � 1

P�(y)

L1 Regularization: First Order Conditions

x� � argminx�RN

12

||�x� y||2 + �||x||1i

x�i

First order condition:

��(�x� � y) + �s = 0

where�

sI = sign(x�I),

||sIc ||� � 1

=� sIc =1�

��Ic(y � �x�)

P�(y)

L1 Regularization: First Order Conditions

x� � argminx�RN

12

||�x� y||2 + �||x||1i

x�i

��i, y � �x��

��

� i

First order condition:

��(�x� � y) + �s = 0

where�

sI = sign(x�I),

||sIc ||� � 1

x� solution of P�(y)||��Ic(�x� � y)||� � � ��

=� sIc =1�

��Ic(y � �x�)

P�(y)

L1 Regularization: First Order Conditions

Theorem:

x� � argminx�RN

12

||�x� y||2 + �||x||1i

x�i

��i, y � �x��

��

� i

Theorem:

First order condition:

then x� is the unique solution of P�(y)

��(�x� � y) + �s = 0

where�

sI = sign(x�I),

||sIc ||� � 1

If �I has full rank and

x� solution of P�(y)||��Ic(�x� � y)||� � � ��

=� sIc =1�

��Ic(y � �x�)

P�(y)

L1 Regularization: First Order Conditions

||��Ic(�x� � y)||� < �

Theorem:

x� � argminx�RN

12

||�x� y||2 + �||x||1i

x�i

��i, y � �x��

��

� i

(implicit equation)

= x0,I + �+I w � �(��I�I)�1sI

x�I = �+

I y � �(��I�I)�1sign(x�I)

Local Behavior of the Solution

=�

��(�x� � y) + �s = 0First order condition:

x� � argminx�RN

12

||�x� y||2 + �||x||1

(implicit equation)

Intuition: for small w.(unknown) (known)

= x0,I + �+I w � �(��I�I)�1sI

x�I = �+

I y � �(��I�I)�1sign(x�I)

sI = sign(x�I) = sign(x0,I) = s0,I

Local Behavior of the Solution

=�

��(�x� � y) + �s = 0First order condition:

x� � argminx�RN

12

||�x� y||2 + �||x||1

(implicit equation)

Intuition: for small w.

To prove:

(unknown) (known)

is the unique solution.

= x0,I + �+I w � �(��I�I)�1sI

x�I = �+

I y � �(��I�I)�1sign(x�I)

xI = x0,I + �+I w � �(��I�I)�1s0,I

sI = sign(x�I) = sign(x0,I) = s0,I

Local Behavior of the Solution

=�

��(�x� � y) + �s = 0First order condition:

x� � argminx�RN

12

||�x� y||2 + �||x||1

Candidate for the solution:

xI = x0,I + �+I w � �(��I�I)�1s0,I

Local Behavior of the Solution

Candidate for the solution:

xI = x0,I + �+I w � �(��I�I)�1s0,I

Local Behavior of the Solution

To prove: ||�Ic(�I xI � y)||� < 1

Candidate for the solution:

xI = x0,I + �+I w � �(��I�I)�1s0,I

�I = ��Ic(�I�+

I � Id) �I = ��Ic�+,�

I

1�

��Ic(�I xI � y) = �I

�w

�� �I(s0,I)

Local Behavior of the Solution

To prove: ||�Ic(�I xI � y)||� < 1

Candidate for the solution:

xI = x0,I + �+I w � �(��I�I)�1s0,I

can be madesmall when w � 0

|| · ||� mustbe < 1

�I = ��Ic(�I�+

I � Id) �I = ��Ic�+,�

I

1�

��Ic(�I xI � y) = �I

�w

�� �I(s0,I)

Local Behavior of the Solution

To prove: ||�Ic(�I xI � y)||� < 1

F(s) = ||�IsI ||� where �I = ��Ic�+,�

I

Robustness to Small Noise

Identifiability crition: [Fuchs]For s ⇥ {�1, 0,+1}N , let I = supp(s)

is the unique solution of P�(y).

If ||w||/T is small enough and � � ||w||, then

If F (sign(x0)) < 1, T = mini�I

|x0,i|

F(s) = ||�IsI ||� where �I = ��Ic�+,�

I

x0,I + �+I w � �(��I�I)�1 sign(x0,I)

Theorem: [Fuchs 2004]

Robustness to Small Noise

Identifiability crition: [Fuchs]For s ⇥ {�1, 0,+1}N , let I = supp(s)

is the unique solution of P�(y).

If ||w||/T is small enough and � � ||w||, then

If F (sign(x0)) < 1, T = mini�I

|x0,i|

F(s) = ||�IsI ||� where �I = ��Ic�+,�

I

x0,I + �+I w � �(��I�I)�1 sign(x0,I)

Theorem: [Fuchs 2004]

When w = 0, F (sign(x0) < 1 =� x� = x0.

Robustness to Small Noise

Identifiability crition: [Fuchs]For s ⇥ {�1, 0,+1}N , let I = supp(s)

is the unique solution of P�(y).

If ||w||/T is small enough and � � ||w||, then

If F (sign(x0)) < 1, T = mini�I

|x0,i|

F(s) = ||�IsI ||� where �I = ��Ic�+,�

I

x0,I + �+I w � �(��I�I)�1 sign(x0,I)

Theorem: [Fuchs 2004]

When w = 0, F (sign(x0) < 1 =� x� = x0.

Robustness to Small Noise

Identifiability crition: [Fuchs]For s ⇥ {�1, 0,+1}N , let I = supp(s)

Theorem: [Grassmair et al. 2010] If F (sign(x0)) < 1

if � � ||w||, ||x� � x0|| = O(||w||)

where dI defined by:� i � I, �dI , �i� = si

dI = �I(��I�I)�1sI

F(s) = ||�IsI ||� = maxj /�I

|�dI , �j�|

Geometric Interpretation

�j

�idI = �+,�

I sI

where dI defined by:� i � I, �dI , �i� = si

Condition F (s) < 1: no vector �j inside the cap Cs.

dI

Cs

dI = �I(��I�I)�1sI

F(s) = ||�IsI ||� = maxj /�I

|�dI , �j�|

Geometric Interpretation

�j

�i

�i

�j

|�dI , �⇥| < 1

dI = �+,�I sI

where dI defined by:� i � I, �dI , �i� = si

Condition F (s) < 1: no vector �j inside the cap Cs.

dI

Cs

dI

�i

�j

�k

dI = �I(��I�I)�1sI

F(s) = ||�IsI ||� = maxj /�I

|�dI , �j�|

Geometric Interpretation

�j

�i

�i

�j

|�dI , �⇥| < 1

|�dI ,

�⇥|<

1

dI = �+,�I sI

Exact Recovery Criterion (ERC): [Tropp]

Relation with F criterion: ERC(I) = maxs,supp(s)�I

F(s)

For a support I ⇥ {0, . . . , N � 1} with �I full rank,

= ||�+I �Ic ||1,1 = max

j�Ic||�+

I �j ||1

(use ||(aj)j ||1,1 = maxj ||aj ||1)

ERC(I) = ||�I ||�,� where �I = ��Ic�+,�

I

Robustness to Bounded Noise

Exact Recovery Criterion (ERC): [Tropp]

Relation with F criterion: ERC(I) = maxs,supp(s)�I

F(s)

For a support I ⇥ {0, . . . , N � 1} with �I full rank,

= ||�+I �Ic ||1,1 = max

j�Ic||�+

I �j ||1

(use ||(aj)j ||1,1 = maxj ||aj ||1)

ERC(I) = ||�I ||�,� where �I = ��Ic�+,�

I

Robustness to Bounded Noise

Theorem: If ERC(supp(x0)) < 1 and � � ||w||, then

||x0 � x�|| = O(||w||)x� is unique, satisfies supp(x�) � supp(x0), and

P = 200, N = 1000

F < 1ERC < 1 x� = x0

w-ERC < 1

Example: Random Matrix

0 10 20 30 40 50

0

0.2

0.4

0.6

0.8

1

⇥x =�

i

xi�(·��i)

Increasing �:� reduces correlation.

F (s)ERC(I)

w-ERC(I)

� reduces resolution.

Example: Deconvolution

�x0

x0�

Coherence Boundsµ(�) = max

i �=j|��i, �j⇥|Mutual coherence:

Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)

Coherence Bounds

Theorem:

||x0 � x�|| = O(||w||)

||x0||0 <12

�1 +

1µ(�)

�If

µ(�) = maxi �=j

|��i, �j⇥|Mutual coherence:

one has supp(x�) � I, and

and � � ||w||,

Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)

Coherence Bounds

Theorem:

||x0 � x�|| = O(||w||)

||x0||0 <12

�1 +

1µ(�)

�If

µ(�) = maxi �=j

|��i, �j⇥|Mutual coherence:

one has supp(x�) � I, and

and � � ||w||,

Theorem: F(s) � ERC(I) � w-ERC(I) � |I|µ(�)1� (|I|� 1)µ(�)

For Gaussian matrices:

For convolution matrices: useless criterion.

µ(�) ��

log(PN)/P

One has: Optimistic setting:||x0||0 � O(

�P )

µ(�) ��

N � P

P (N � 1)

Incoherent pair of orthobases:

�2 =�k �� N�1/2e

2i�N mk

m�1 = {k ⇤⇥ �[k �m]}m

Diracs/Fourier

� = [�1,�2] � RN�2N

Spikes and Sinusoids Separation

Incoherent pair of orthobases:

�2 =�k �� N�1/2e

2i�N mk

m�1 = {k ⇤⇥ �[k �m]}m

Diracs/Fourier

� = [�1,�2] � RN�2N

minx�R2N

12

||y � �x||2 + �||x||1

minx1,x2�RN

12

||y � �1x1 � �2x2||2 + �||x1||1 + �||x2||1��

= +

Spikes and Sinusoids Separation

Incoherent pair of orthobases:

�2 =�k �� N�1/2e

2i�N mk

m�1 = {k ⇤⇥ �[k �m]}m

Diracs/Fourier

� = [�1,�2] � RN�2N

µ(�) =1�N

=� separates up to�

N/2 Diracs + sines.

minx�R2N

12

||y � �x||2 + �||x||1

minx1,x2�RN

12

||y � �1x1 � �2x2||2 + �||x1||1 + �||x2||1��

= +

Spikes and Sinusoids Separation

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Data aquisition:

Sensors

Shannon interpolation: if Supp( ˆf) � [�N�, N�]

f [i] = f(i/N) = �f , �i��1

�2(�i)i

(Diracs)

Pointwise Sampling and Smoothness

f � L2 f � RN

�0

Data aquisition:

Sensors

where h(t) =sin(�t)

�t

f(t) =�

i

f [i]h(Nt� i)

Shannon interpolation: if Supp( ˆf) � [�N�, N�]

f [i] = f(i/N) = �f , �i��1

�2(�i)i

(Diracs)

Pointwise Sampling and Smoothness

f � L2 f � RN

�0

Data aquisition:

Sensors

where h(t) =sin(�t)

�t

f(t) =�

i

f [i]h(Nt� i)

�� Natural images are not smooth.

�� But can be compressed e�ciently.

Shannon interpolation: if Supp( ˆf) � [�N�, N�]

f [i] = f(i/N) = �f , �i��1

�2(�i)i

(Diracs)

Pointwise Sampling and Smoothness

f � L2 f � RN

�0

Single Pixel Camera (Rice)

y[i] = �f0, �i⇥

Single Pixel Camera (Rice)

y[i] = �f0, �i⇥

f0, N = 2562 f�, P/N = 0.16 f�, P/N = 0.02

Physical hardware resolution limit: target resolution f � RN .

f � L2 f � RN y � RPmicromirrors

arrayresolution

CS hardwareK

CS Hardware ModelCS is about designing hardware: input signals f � L2(R2).

Physical hardware resolution limit: target resolution f � RN .

f � L2 f � RN y � RPmicromirrors

arrayresolution

CS hardware

,

...

K

CS Hardware ModelCS is about designing hardware: input signals f � L2(R2).

,

,

Physical hardware resolution limit: target resolution f � RN .

f � L2 f � RN y � RPmicromirrors

arrayresolution

CS hardware

,

...

fOperator K

K

CS Hardware ModelCS is about designing hardware: input signals f � L2(R2).

,

,

f0 � RN sparse in ortho-basis �

Sparse CS Recovery

���

x0 � RN

f0 � RN

(Discretized) sampling acquisition:

f0 � RN sparse in ortho-basis �

y = Kf0 + w = K � �(x0) + w= �

Sparse CS Recovery

���

x0 � RN

f0 � RN

(Discretized) sampling acquisition:

f0 � RN sparse in ortho-basis �

y = Kf0 + w = K � �(x0) + w= �

K drawn from the Gaussian matrix ensemble

Ki,j � N (0, P�1/2) i.i.d.

� � drawn from the Gaussian matrix ensemble

Sparse CS Recovery

���

x0 � RN

f0 � RN

(Discretized) sampling acquisition:

f0 � RN sparse in ortho-basis �

y = Kf0 + w = K � �(x0) + w= �

K drawn from the Gaussian matrix ensemble

Ki,j � N (0, P�1/2) i.i.d.

� � drawn from the Gaussian matrix ensemble

Sparse recovery:min

||�x�y||�||w||||x||1 min

x

12

||�x� y||2 + �||x||1||w||�� �

Sparse CS Recovery

���

x0 � RN

f0 � RN

� = translation invariantwavelet frame

Original f0

CS Simulation Example

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:

�1 recovery:

CS with RIP

x⇥ � argmin||�x�y||��

||x||1 where�

y = �x0 + w||w|| � �

⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:

�1 recovery:

CS with RIP

[Candes 2009]

x⇥ � argmin||�x�y||��

||x||1 where�

y = �x0 + w||w|| � �

Theorem: If �2k ��

2� 1, then

where xk is the best k-term approximation of x0.

||x0 � x�|| � C0⇥k

||x0 � xk||1 + C1�

f�(⇥) =1

2⇤�⇥

�(⇥� b)+(a� ⇥)+

Eigenvalues of ��I�I with |I| = k are essentially in [a, b]

a = (1��

�)2 and b = (1��

�)2 where � = k/P

When k = �P � +�, the eigenvalue distribution tends to

[Marcenko-Pastur]

Large deviation inequality [Ledoux]

Singular Values Distributions

0 0.5 1 1.5 2 2.50

0.5

1

1.5

P=200, k=10

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

P=200, k=30

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

P=200, k=50

0 0.5 1 1.5 2 2.50

0.5

1

1.5

P=200, k=10

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

1

P=200, k=30

0 0.5 1 1.5 2 2.50

0.2

0.4

0.6

0.8

P=200, k=50

P = 200, k = 10

f�(�)

�k = 30

Link with coherence:

�k � (k � 1)µ(�)

�2 = µ(�)

RIP for Gaussian Matrices

µ(�) = maxi �=j

|��i, �j⇥|

Link with coherence:

�k � (k � 1)µ(�)

For Gaussian matrices:

�2 = µ(�)

RIP for Gaussian Matrices

µ(�) = maxi �=j

|��i, �j⇥|

µ(�) ��

log(PN)/P

Link with coherence:

�k � (k � 1)µ(�)

For Gaussian matrices:

Stronger result:

�2 = µ(�)

RIP for Gaussian Matrices

k � C

log(N/P )PTheorem: If

then �2k ��

2� 1 with high probability.

µ(�) = maxi �=j

|��i, �j⇥|

µ(�) ��

log(PN)/P

(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:

smallest / largest eigenvalues of A�A

Numerics with RIP

�2� 1

(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:

Upper/lower RIC:

�ik = max

|I|=k�i(�I)

�k = min(�1k, �2

k)

k

�2k

�2k

Monte-Carlo estimation:�k � �k

smallest / largest eigenvalues of A�A

N = 4000, P = 1000

Numerics with RIP

�(B�)

x0 �x0

�1

��2

�2�3

��3

��1

� = (�i)i � R2�3

B� = {x \ ||x||1 � �}� = ||x0||1

x� � argmin�x=y

||x||1 (P0(y))Noiseless recovery:

y �� x�

Polytopes-based Guarantees

�(B�)

x0 �x0

�1

��2

�2�3

��3

��1

� = (�i)i � R2�3

B� = {x \ ||x||1 � �}� = ||x0||1

x0 solution of P0(�x0) �⇥ �x0 ⇤ ��(B�)

x� � argmin�x=y

||x||1 (P0(y))Noiseless recovery:

y �� x�

Polytopes-based Guarantees

C(0,1,1)

K(0,1,1)

Ks =�(�isi)i � R3 \ �i � 0

� 2-D conesCs = �Ks

2-D quadrant

L1 Recovery in 2-D

��1

�2�3

� = (�i)i � R2�3

y �� x�

All MostRIP

� Sharp constants.

� No noise robustness.

All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.

Call(1/4) � 0.065

Cmost(1/4) � 0.25

[Donoho]

Polytope Noiseless Recovery

50 100 150 200 250 300 350 4000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Counting faces of random polytopes:

All MostRIP

� Sharp constants.

� No noise robustness.

All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.

Call(1/4) � 0.065

Cmost(1/4) � 0.25

[Donoho]

� Computation of“pathological” signals

[Dossal, P, Fadili, 2010]

Polytope Noiseless Recovery

50 100 150 200 250 300 350 4000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Counting faces of random polytopes:

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Tomography and Fourier Measures

Tomography and Fourier Measures

Fourier slice theorem: p�(⇥) = f(⇥ cos(�), ⇥ sin(�))

1D 2D Fourier

�k

f = FFT2(f)

Partial Fourier measurements:

Equivalent to:

{p�k(t)}t�R0�k<K

�f = {f [�]}���

Disclaimer: this is not compressed sensing.

Regularized Inversion

f⇥ = argminf

12

���

|y[⇤] � f [⇤]|2 + ��

m

|⇥f, ⇥m⇤|.�1 regularization:

Noisy measurements: ⇥� � �, y[�] = f0[�] + w[�].

Noise: w[⇥] � N (0,�), white noise.

f+ f�

MRI ImagingFrom [Lutsig et al.]

Fourier sub-sampling pattern:randomization

MRI Reconstruction

High resolution Linear SparsityLow resolution

From [Lutsig et al.]

Pseudo inverse Sparse wavelets

�� Sampling low frequencies helps.

Compressive Fourier Measurements

Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: {��}� orthogonal basis.

where |�| = P drawn uniformly at random.

�� � �, y[�] = �f, ⇥�� = f [�]

Fast measurements: (e.g. Fourier basis)

Structured Measurements

� = (��)���

Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: {��}� orthogonal basis.

where |�| = P drawn uniformly at random.

�� � �, y[�] = �f, ⇥�� = f [�]

Fast measurements: (e.g. Fourier basis)

Mutual incoherence: µ =⌅

Nmax�,m

|⇥⇥�, �m⇤| � [1,⌅

N ]

Structured Measurements

� = (��)���

�� not universal: requires incoherence.

Gaussian matrices: intractable for large N .

Random partial orthogonal matrix: {��}� orthogonal basis.

where |�| = P drawn uniformly at random.

�� � �, y[�] = �f, ⇥�� = f [�]

Fast measurements: (e.g. Fourier basis)

Mutual incoherence: µ =⌅

Nmax�,m

|⇥⇥�, �m⇤| � [1,⌅

N ]

Structured Measurements

Theorem: with high probability on �,

If M � CP

µ2 log(N)4, then �2M �

�2� 1

[Rudelson, Vershynin, 2006]

� = (��)���

Overview

• Inverse Problems Regularization

• Sparse Synthesis Regularization

• Theoritical Recovery Guarantees

• Compressed Sensing

• RIP and Polytopes CS Theory

• Fourier Measurements

• Convex Optimization via Proximal Splitting

Setting:H: Hilbert space. Here: H = RN .

G : H� R ⇤ {+⇥}

Convex Optimization

minx�H

G(x)Problem:

Setting:H: Hilbert space. Here: H = RN .

Class of functions:

G : H� R ⇤ {+⇥}

x y

G(tx + (1� t)y) � tG(x) + (1� t)G(y) t � [0, 1]

Convex Optimization

Convex:

minx�H

G(x)Problem:

Setting:H: Hilbert space. Here: H = RN .

Class of functions:

G : H� R ⇤ {+⇥}

lim infx�x0

G(x) � G(x0)

{x ⇥ H \ G(x) ⇤= +�} ⇤= ⌅

x y

G(tx + (1� t)y) � tG(x) + (1� t)G(y) t � [0, 1]

Convex Optimization

Lower semi-continuous:

Convex:

Proper:

minx�H

G(x)Problem:

Setting:H: Hilbert space. Here: H = RN .

Class of functions:

G : H� R ⇤ {+⇥}

lim infx�x0

G(x) � G(x0)

{x ⇥ H \ G(x) ⇤= +�} ⇤= ⌅

x y

�C(x) =�

0 if x ⇥ C,+� otherwise.

(C closed and convex)

G(tx + (1� t)y) � tG(x) + (1� t)G(y) t � [0, 1]

Convex Optimization

Indicator:

Lower semi-continuous:

Convex:

Proper:

minx�H

G(x)Problem:

Proximal operator of G:Prox�G(x) = argmin

z

12

||x� z||2 + �G(z)

Proximal Operators

Proximal operator of G:Prox�G(x) = argmin

z

12

||x� z||2 + �G(z)

G(x) = ||x||1 =�

i

|xi|

G(x) = ||x||0 = | {i \ xi �= 0} |

G(x) =�

i

log(1 + |xi|2)

Proximal Operators

−10 −8 −6 −4 −2 0 2 4 6 8 10

−2

0

2

4

6

8

10

12

||x||0|x|log(1 + x2)

G(x)

�� 3rd order polynomial root.

Proximal operator of G:Prox�G(x) = argmin

z

12

||x� z||2 + �G(z)

G(x) = ||x||1 =�

i

|xi|

Prox�G(x)i = max�

0, 1� �

|xi|

�xi

G(x) = ||x||0 = | {i \ xi �= 0} |

Prox�G(x)i =�

xi if |xi| � �2�,0 otherwise.

G(x) =�

i

log(1 + |xi|2)

Proximal Operators

−10 −8 −6 −4 −2 0 2 4 6 8 10

−2

0

2

4

6

8

10

12

−10 −8 −6 −4 −2 0 2 4 6 8 10−10

−8

−6

−4

−2

0

2

4

6

8

10

||x||0|x|log(1 + x2)

G(x)

ProxG(x)

Separability: G(x) = G1(x1) + . . . + Gn(xn)

ProxG(x) = (ProxG1(x1), . . . ,ProxGn(xn))

Proximal Calculus

Separability:

Quadratic functionals:

= ��(Id + ����)�1

G(x) = G1(x1) + . . . + Gn(xn)

ProxG(x) = (ProxG1(x1), . . . ,ProxGn(xn))

G(x) =12

||�x� y||2

Prox�G = (Id + ����)�1��

Proximal Calculus

Separability:

Quadratic functionals:

= ��(Id + ����)�1

G(x) = G1(x1) + . . . + Gn(xn)

ProxG(x) = (ProxG1(x1), . . . ,ProxGn(xn))

G(x) =12

||�x� y||2

Prox�G = (Id + ����)�1��

Composition by tight frame:

Proximal Calculus

ProxG�A(x) = A� � ProxG �A + Id�A� �A

A � A� = Id

Separability:

Quadratic functionals:

Indicators:

= ��(Id + ����)�1

G(x) = G1(x1) + . . . + Gn(xn)

ProxG(x) = (ProxG1(x1), . . . ,ProxGn(xn))

G(x) =12

||�x� y||2

Prox�G = (Id + ����)�1��

G(x) = �C(x) x

Prox�G(x) = ProjC(x)= argmin

z�C||x� z||

Composition by tight frame:

Proximal Calculus

ProjC(x)C

ProxG�A(x) = A� � ProxG �A + Id�A� �A

A � A� = Id

If 0 < �� < 2/L, x(�) � x� a solution.Theorem:

Gradient descent:G is C1 and �G is L-Lipschitz

Gradient and Proximal Descents

[explicit]x(�+1) = x(�) � ���G(x(�))

If 0 < �� < 2/L, x(�) � x� a solution.Theorem:

Gradient descent:

x(�+1) = x(�) � ��v(�),

�� Problem: slow.

G is C1 and �G is L-Lipschitz

v(�) � �G(x(�))

Gradient and Proximal Descents

Sub-gradient descent:

[explicit]

If �� � 1/⇥, x(�) � x� a solution.Theorem:

x(�+1) = x(�) � ���G(x(�))

If 0 < �� < 2/L, x(�) � x� a solution.

If �� � c > 0, x(�) � x� a solution.

Theorem:

Gradient descent:

x(�+1) = x(�) � ��v(�),

�� Problem: slow.

G is C1 and �G is L-Lipschitz

v(�) � �G(x(�))

x(⇥+1) = Prox��G(x(⇥))

�� Prox�G hard to compute.

Gradient and Proximal Descents

Sub-gradient descent:

Proximal-point algorithm:

[explicit]

[implicit]

If �� � 1/⇥, x(�) � x� a solution.Theorem:

Theorem:

x(�+1) = x(�) � ���G(x(�))

Solve minx�H

E(x)

Problem: Prox�E is not available.

Proximal Splitting Methods

Solve minx�H

E(x)

Splitting: E(x) = F (x) +�

i

Gi(x)

SimpleSmooth

Problem: Prox�E is not available.

Proximal Splitting Methods

Solve minx�H

E(x)

Splitting: E(x) = F (x) +�

i

Gi(x)

SimpleSmooth

Problem: Prox�E is not available.

Iterative algorithms using: �F (x)Prox�Gi(x)

Forward-Backward:Douglas-Rachford:

Primal-Dual:Generalized FB:

�Gi�Gi � A

F +�

Gi

F + Gsolves

Proximal Splitting Methods

SimpleSmooth

Data fidelity:

Regularization:

f0 = �x0 sparse in dictionary �.

Inverse problem: y = Kf0 + wmeasurements

K : RN � RP , P � NK

� = K � ⇥F (x) =12

||y � �x||2

G(x) = ||x||1 =�

i

|xi|

minx�RN

F (x) + G(x)

Sparse recovery: f� = �x� where x� solves

Model:

Smooth + Simple Splitting

Kf0f0

x� � argminx

F (x) + G(x) 0 � �F (x�) + �G(x�)

(x� � ��F (x�)) � x� + �⇥G(x�)

Fix point equation:

x⇥ = Prox�G(x⇥ � ��F (x⇥))

Forward-Backward

����

��

x� � argminx

F (x) + G(x) 0 � �F (x�) + �G(x�)

(x� � ��F (x�)) � x� + �⇥G(x�)

Fix point equation:

x(⇥+1) = Prox�G

�x(⇥) � ��F (x(⇥))

�x⇥ = Prox�G(x⇥ � ��F (x⇥))

Forward-Backward

����

��

Forward-backward:

x� � argminx

F (x) + G(x) 0 � �F (x�) + �G(x�)

(x� � ��F (x�)) � x� + �⇥G(x�)

Fix point equation:

G = �C

x(⇥+1) = Prox�G

�x(⇥) � ��F (x(⇥))

�x⇥ = Prox�G(x⇥ � ��F (x⇥))

Forward-Backward

����

��

Forward-backward:

Projected gradient descent:

x� � argminx

F (x) + G(x) 0 � �F (x�) + �G(x�)

(x� � ��F (x�)) � x� + �⇥G(x�)

Fix point equation:

G = �C

x(⇥+1) = Prox�G

�x(⇥) � ��F (x(⇥))

�x⇥ = Prox�G(x⇥ � ��F (x⇥))

Forward-Backward

����

��

Forward-backward:

Projected gradient descent:

Theorem:

a solution of (�)If � < 2/L,

Let �F be L-Lipschitz.x(�) � x�

minx

12

||�x� y||2 + �||x||1 minx

F (x) + G(x)

F (x) =12

||�x� y||2

G(x) = �||x||1

�F (x) = ��(�x� y)

Prox�G(x)i = max�

0, 1� �⇥

|xi|

�xi

L = ||���||

Example: L1 Regularization

��

Forward-backward Iterative soft thresholding��

Douglas-Rachford iterations:

(�)

RProx�G(x) = 2Prox�G(x)� x

x(⇥+1) = Prox�G2(z(⇥+1))

z(⇥+1) =�1� �

2

�z(⇥) +

2RProx�G2 � RProx�G1(z

(⇥))

Reflexive prox:

Douglas Rachford Scheme

Simple Simple

minx

G1(x) + G2(x)

Douglas-Rachford iterations:

Theorem:

(�)

a solution of (�)

RProx�G(x) = 2Prox�G(x)� x

x(�) � x�

x(⇥+1) = Prox�G2(z(⇥+1))

z(⇥+1) =�1� �

2

�z(⇥) +

2RProx�G2 � RProx�G1(z

(⇥))

If 0 < � < 2 and ⇥ > 0,

Reflexive prox:

Douglas Rachford Scheme

Simple Simple

minx

G1(x) + G2(x)

minx

G1(x) + G2(x)

G1(x) = iC(x), C = {x \ �x = y}

Prox�G1(x) = ProjC(x) = x + �⇥(��⇥)�1(y � �x)

G2(x) = ||x||1 Prox�G2(x) =�

max�

0, 1� �

|xi|

�xi

i

�� e⇥cient if ��� easy to invert.

Example: Constrainted L1

min�x=y

||x||1 ��

50 100 150 200 250

−5

−4

−3

−2

−1

0

1

minx

G1(x) + G2(x)

G1(x) = iC(x), C = {x \ �x = y}

Prox�G1(x) = ProjC(x) = x + �⇥(��⇥)�1(y � �x)

G2(x) = ||x||1 Prox�G2(x) =�

max�

0, 1� �

|xi|

�xi

i

�� e⇥cient if ��� easy to invert.

� = 0.01� = 1� = 10

Example: compressed sensing

� � R100�400 Gaussian matrix

||x0||0 = 17y = �x0

log10(||x(�)||1 � ||x�||1)

Example: Constrainted L1

min�x=y

||x||1 ��

C =�(x1, . . . , xk) � Hk \ x1 = . . . = xk

each Fi is simpleminx

G1(x) + . . . + Gk(x)

minx

G(x1, . . . , xk) + �C(x1, . . . , xk)

G(x1, . . . , xk) = G1(x1) + . . . + Gk(xk)

More than 2 Functionals

��

C =�(x1, . . . , xk) � Hk \ x1 = . . . = xk

Prox�⇥C (x1, . . . , xk) = (x, . . . , x) where x =1k

i

xi

each Fi is simpleminx

G1(x) + . . . + Gk(x)

minx

G(x1, . . . , xk) + �C(x1, . . . , xk)

G(x1, . . . , xk) = G1(x1) + . . . + Gk(xk)

G and �C are simple:

Prox�G(x1, . . . , xk) = (Prox�Gi(xi))i

More than 2 Functionals

��

Linear map A : E � H.

C = {(x, y) ⇥ H� E \ Ax = y}

minx

G1(x) + G2 � A(x)

G1, G2 simple.minz⇥H�E

G(z) + �C(z)

G(x, y) = G1(x) + G2(y)

Auxiliary Variables

��

Linear map A : E � H.

C = {(x, y) ⇥ H� E \ Ax = y}

Prox�C (x, y) = (x + A�y, y � y) = (x, Ax)

wherey = (Id + AA�)�1(Ax� y)

x = (Id + A�A)�1(A�y + x)

�� e�cient if Id + AA� or Id + A�A easy to invert.

minx

G1(x) + G2 � A(x)

G1, G2 simple.minz⇥H�E

G(z) + �C(z)

G(x, y) = G1(x) + G2(y)

Prox�G(x, y) = (Prox�G1(x),Prox�G2(y))

Auxiliary Variables

��

G1(u) = ||u||1 Prox�G1(u)i = max�

0, 1� �

||ui||

�ui

minf

12||Kf � y||2 + �||⇥f ||1

minx

G1(f) + G2 � �(f)

G2(f) =12||Kf � y||2 Prox�G2 = (Id + �K�K)�1K�

C =�(f, u) ⇥ RN � RN�2 \ u = ⇤f

Prox�C (f, u) = (f ,�f)

Example: TV Regularization

||u||1 =�

i

||ui||

��

Compute the solution of:

�� O(N log(N)) operations using FFT.

G1(u) = ||u||1 Prox�G1(u)i = max�

0, 1� �

||ui||

�ui

minf

12||Kf � y||2 + �||⇥f ||1

minx

G1(f) + G2 � �(f)

G2(f) =12||Kf � y||2 Prox�G2 = (Id + �K�K)�1K�

C =�(f, u) ⇥ RN � RN�2 \ u = ⇤f

Prox�C (f, u) = (f ,�f)

(Id + �)f = �div(u) + f

Example: TV Regularization

||u||1 =�

i

||ui||

��

Iteration �y = Kx0

y = �f0 + wOrignal f0 Recovery f�

Example: TV Regularization

dictionary

ConclusionSparsity: approximate signals with few atoms.

�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.

Compressed sensing ideas:

�� CS is about designing new hardware.

dictionary

ConclusionSparsity: approximate signals with few atoms.

�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.

Compressed sensing ideas:

The devil is in the constants:

�� Worse case analysis is problematic.

�� Designing good signal models.

�� CS is about designing new hardware.

dictionary

ConclusionSparsity: approximate signals with few atoms.

Dictionary learning:

learning

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20IT

ER

AT

ION

SH

AV

EB

EE

NP

ER

FOR

ME

D.BO

LD

IND

ICA

TE

ST

HE

BE

STR

ESU

LT,IN

DIC

AT

ING

ON

CE

AG

AIN

TH

EC

ON

SISTE

NT

IMPR

OV

EM

EN

TO

BTA

INE

DW

ITH

OU

RP

RO

POSE

DT

EC

HN

IQU

E

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

ofthe

decompo-

sitionas

thisnum

berof

steps.The

stoppingcriteria

in(2)

be-com

esthe

number

ofatom

sused

insteadof

thereconstruction

error.Using

asm

allduring

theO

MP

permits

tolearn

adic-

tionaryspecialized

inproviding

acoarse

approximation.

Our

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

ations,preventingthe

dictionaryfrom

learningthem

.W

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

ein

Fig.6are

sufficienttogive

satisfactoryresults,w

hilew

ithinthe

K-SV

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

euse

them

odifiedK

-SVD

algorithmthatdeals

with

nonuniformnoise,as

describedin

previoussection,and

addto

itanadaptive

dictionarythathas

beenlearned

with

lowpatch

sparsityin

orderto

avoidover-fitting

them

osaicpattern.T

hesam

etechnique

canbe

appliedto

genericcolor

inpaintingas

demonstrated

inthe

nextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Dictionary learning:

Analysis vs. synthesis:

learning

Js(f) = minf=�x

||x||1

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20IT

ER

AT

ION

SH

AV

EB

EE

NP

ER

FOR

ME

D.BO

LD

IND

ICA

TE

ST

HE

BE

STR

ESU

LT,IN

DIC

AT

ING

ON

CE

AG

AIN

TH

EC

ON

SISTE

NT

IMPR

OV

EM

EN

TO

BTA

INE

DW

ITH

OU

RP

RO

POSE

DT

EC

HN

IQU

E

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

ofthe

decompo-

sitionas

thisnum

berof

steps.The

stoppingcriteria

in(2)

be-com

esthe

number

ofatom

sused

insteadof

thereconstruction

error.Using

asm

allduring

theO

MP

permits

tolearn

adic-

tionaryspecialized

inproviding

acoarse

approximation.

Our

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

ations,preventingthe

dictionaryfrom

learningthem

.W

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

ein

Fig.6are

sufficienttogive

satisfactoryresults,w

hilew

ithinthe

K-SV

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

euse

them

odifiedK

-SVD

algorithmthatdeals

with

nonuniformnoise,as

describedin

previoussection,and

addto

itanadaptive

dictionarythathas

beenlearned

with

lowpatch

sparsityin

orderto

avoidover-fitting

them

osaicpattern.T

hesam

etechnique

canbe

appliedto

genericcolor

inpaintingas

demonstrated

inthe

nextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Image f = �x

Coe�cients x

Dictionary learning:

Analysis vs. synthesis:

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20IT

ER

AT

ION

SH

AV

EB

EE

NP

ER

FOR

ME

D.BO

LD

IND

ICA

TE

ST

HE

BE

STR

ESU

LT,IN

DIC

AT

ING

ON

CE

AG

AIN

TH

EC

ON

SISTE

NT

IMPR

OV

EM

EN

TO

BTA

INE

DW

ITH

OU

RP

RO

POSE

DT

EC

HN

IQU

E

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

ofthe

decompo-

sitionas

thisnum

berof

steps.The

stoppingcriteria

in(2)

be-com

esthe

number

ofatom

sused

insteadof

thereconstruction

error.Using

asm

allduring

theO

MP

permits

tolearn

adic-

tionaryspecialized

inproviding

acoarse

approximation.

Our

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

ations,preventingthe

dictionaryfrom

learningthem

.W

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

ein

Fig.6are

sufficienttogive

satisfactoryresults,w

hilew

ithinthe

K-SV

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

euse

them

odifiedK

-SVD

algorithmthatdeals

with

nonuniformnoise,as

describedin

previoussection,and

addto

itanadaptive

dictionarythathas

beenlearned

with

lowpatch

sparsityin

orderto

avoidover-fitting

them

osaicpattern.T

hesam

etechnique

canbe

appliedto

genericcolor

inpaintingas

demonstrated

inthe

nextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Image f = �x

Coe�cients x c = D�f

� D�

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

Dictionary learning:

Analysis vs. synthesis:

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20IT

ER

AT

ION

SH

AV

EB

EE

NP

ER

FOR

ME

D.BO

LD

IND

ICA

TE

ST

HE

BE

STR

ESU

LT,IN

DIC

AT

ING

ON

CE

AG

AIN

TH

EC

ON

SISTE

NT

IMPR

OV

EM

EN

TO

BTA

INE

DW

ITH

OU

RP

RO

POSE

DT

EC

HN

IQU

E

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

ofthe

decompo-

sitionas

thisnum

berof

steps.The

stoppingcriteria

in(2)

be-com

esthe

number

ofatom

sused

insteadof

thereconstruction

error.Using

asm

allduring

theO

MP

permits

tolearn

adic-

tionaryspecialized

inproviding

acoarse

approximation.

Our

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

ations,preventingthe

dictionaryfrom

learningthem

.W

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

ein

Fig.6are

sufficienttogive

satisfactoryresults,w

hilew

ithinthe

K-SV

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

euse

them

odifiedK

-SVD

algorithmthatdeals

with

nonuniformnoise,as

describedin

previoussection,and

addto

itanadaptive

dictionarythathas

beenlearned

with

lowpatch

sparsityin

orderto

avoidover-fitting

them

osaicpattern.T

hesam

etechnique

canbe

appliedto

genericcolor

inpaintingas

demonstrated

inthe

nextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Other sparse priors:

Image f = �x

Coe�cients x c = D�f

� D�

|x1| + |x2| max(|x1|, |x2|)

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

Dictionary learning:

Analysis vs. synthesis:

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

|x1| + (x22 + x2

3)12

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20IT

ER

AT

ION

SH

AV

EB

EE

NP

ER

FOR

ME

D.BO

LD

IND

ICA

TE

ST

HE

BE

STR

ESU

LT,IN

DIC

AT

ING

ON

CE

AG

AIN

TH

EC

ON

SISTE

NT

IMPR

OV

EM

EN

TO

BTA

INE

DW

ITH

OU

RP

RO

POSE

DT

EC

HN

IQU

E

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

ofthe

decompo-

sitionas

thisnum

berof

steps.The

stoppingcriteria

in(2)

be-com

esthe

number

ofatom

sused

insteadof

thereconstruction

error.Using

asm

allduring

theO

MP

permits

tolearn

adic-

tionaryspecialized

inproviding

acoarse

approximation.

Our

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

ations,preventingthe

dictionaryfrom

learningthem

.W

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

ein

Fig.6are

sufficienttogive

satisfactoryresults,w

hilew

ithinthe

K-SV

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

euse

them

odifiedK

-SVD

algorithmthatdeals

with

nonuniformnoise,as

describedin

previoussection,and

addto

itanadaptive

dictionarythathas

beenlearned

with

lowpatch

sparsityin

orderto

avoidover-fitting

them

osaicpattern.T

hesam

etechnique

canbe

appliedto

genericcolor

inpaintingas

demonstrated

inthe

nextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Other sparse priors:

Image f = �x

Coe�cients x c = D�f

� D�

|x1| + |x2| max(|x1|, |x2|)

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

Dictionary learning:

Analysis vs. synthesis:

learning

Ja(f) = ||D�f ||1

Js(f) = minf=�x

||x||1

|x1| + (x22 + x2

3)12

Some Hot Topics

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 57

Fig. 2. Dictionaries with 256 atoms learned on a generic database of natural images, with two different sizes of patches. Note the large number of color-less atoms.Since the atoms can have negative values, the vectors are presented scaled and shifted to the [0,255] range per channel: (a) 5 5 3 patches; (b) 8 8 3 patches.

Fig. 3. Examples of color artifacts while reconstructing a damaged version of the image (a) without the improvement here proposed ( in the new metric).Color artifacts are reduced with our proposed technique ( in our proposed new metric). Both images have been denoised with the same global dictionary.In (b), one observes a bias effect in the color from the castle and in some part of the water. What is more, the color of the sky is piecewise constant when(false contours), which is another artifact our approach corrected. (a) Original. (b) Original algorithm, dB. (c) Proposed algorithm,

dB.

Fig. 4. (a) Training Image; (b) resulting dictionary; (b) is the dictionary learned in the image in (a). The dictionary is more colored than the global one.

MA

IRA

Letal.:SPA

RSE

RE

PRE

SEN

TAT

ION

FOR

CO

LO

RIM

AG

ER

EST

OR

AT

ION

61

Fig.7.D

atasetused

forevaluating

denoisingexperim

ents.

TAB

LE

IPSN

RR

ESU

LTS

OF

OU

RD

EN

OISIN

GA

LG

OR

ITH

MW

ITH

256A

TO

MS

OF

SIZ

E7

73

FOR

AN

D6

63

FOR

.EA

CH

CA

SEIS

DIV

IDE

DIN

FO

UR

PA

RT

S:TH

ET

OP-L

EFT

RE

SULT

SA

RE

TH

OSE

GIV

EN

BY

MCA

UL

EY

AN

DA

L[28]W

ITH

TH

EIR

“33

MO

DE

L.”T

HE

TO

P-RIG

HT

RE

SULT

SA

RE

TH

OSE

OB

TAIN

ED

BY

APPLY

ING

TH

EG

RA

YSC

AL

EK

-SVD

AL

GO

RIT

HM

[2]O

NE

AC

HC

HA

NN

EL

SE

PAR

AT

ELY

WIT

H8

8A

TO

MS.T

HE

BO

TT

OM

-LE

FTA

RE

OU

RR

ESU

LTS

OB

TAIN

ED

WIT

HA

GL

OB

AL

LYT

RA

INE

DD

ICT

ION

AR

Y.TH

EB

OT

TO

M-R

IGH

TA

RE

TH

EIM

PRO

VE

ME

NT

SO

BTA

INE

DW

ITH

TH

EA

DA

PTIV

EA

PPRO

AC

HW

ITH

20IT

ER

AT

ION

S.B

OL

DIN

DIC

AT

ES

TH

EB

EST

RE

SULT

SFO

RE

AC

HG

RO

UP.

AS

CA

NB

ESE

EN,

OU

RP

RO

POSE

DT

EC

HN

IQU

EC

ON

SISTE

NT

LYP

RO

DU

CE

ST

HE

BE

STR

ESU

LTS

TAB

LE

IIC

OM

PAR

ISON

OF

TH

EPSN

RR

ESU

LTS

ON

TH

EIM

AG

E“C

AST

LE”

BE

TW

EE

N[28]

AN

DW

HA

TW

EO

BTA

INE

DW

ITH

2566

63

AN

D7

73

PA

TC

HE

S.F

OR

TH

EA

DA

PTIV

EA

PPRO

AC

H,20IT

ER

AT

ION

SH

AV

EB

EE

NP

ER

FOR

ME

D.BO

LD

IND

ICA

TE

ST

HE

BE

STR

ESU

LT,IN

DIC

AT

ING

ON

CE

AG

AIN

TH

EC

ON

SISTE

NT

IMPR

OV

EM

EN

TO

BTA

INE

DW

ITH

OU

RP

RO

POSE

DT

EC

HN

IQU

E

patch),inorderto

preventanylearning

ofthese

artifacts(over-

fitting).W

edefine

thenthe

patchsparsity

ofthe

decompo-

sitionas

thisnum

berof

steps.The

stoppingcriteria

in(2)

be-com

esthe

number

ofatom

sused

insteadof

thereconstruction

error.Using

asm

allduring

theO

MP

permits

tolearn

adic-

tionaryspecialized

inproviding

acoarse

approximation.

Our

assumption

isthat

(pattern)artifacts

areless

presentin

coarseapproxim

ations,preventingthe

dictionaryfrom

learningthem

.W

epropose

thenthe

algorithmdescribed

inFig.6.W

etypically

usedto

preventthe

learningof

artifactsand

foundout

thattwo

outeriterationsin

theschem

ein

Fig.6are

sufficienttogive

satisfactoryresults,w

hilew

ithinthe

K-SV

D,10–20

itera-tions

arerequired.

Toconclude,in

ordertoaddressthe

demosaicing

problem,w

euse

them

odifiedK

-SVD

algorithmthatdeals

with

nonuniformnoise,as

describedin

previoussection,and

addto

itanadaptive

dictionarythathas

beenlearned

with

lowpatch

sparsityin

orderto

avoidover-fitting

them

osaicpattern.T

hesam

etechnique

canbe

appliedto

genericcolor

inpaintingas

demonstrated

inthe

nextsection.

V.

EX

PER

IME

NTA

LR

ESU

LTS

We

arenow

readyto

presentthe

colorim

agedenoising,in-

painting,anddem

osaicingresultsthatare

obtainedw

iththe

pro-posed

framew

ork.

A.

Denoising

Color

Images

The

state-of-the-artperform

anceof

thealgorithm

ongrayscale

images

hasalready

beenstudied

in[2].

We

nowevaluate

ourextension

forcolor

images.

We

trainedsom

edictionaries

with

differentsizesof

atoms

55

3,66

3,7

73

and8

83,

on200

000patches

takenfrom

adatabase

of15

000im

agesw

iththe

patch-sparsityparam

eter(six

atoms

inthe

representations).We

usedthe

databaseL

abelMe

[55]to

buildour

image

database.T

henw

etrained

eachdictionary

with

600iterations.

This

providedus

aset

ofgeneric

dictionariesthat

we

usedas

initialdictionaries

inour

denoisingalgorithm

.C

omparing

theresults

obtainedw

iththe

globalapproach

andthe

adaptiveone

permits

usto

seethe

improvem

entsin

thelearning

process.W

echose

toevaluate

MAIRAL et al.: SPARSE REPRESENTATION FOR COLOR IMAGE RESTORATION 61

Fig. 7. Data set used for evaluating denoising experiments.

TABLE IPSNR RESULTS OF OUR DENOISING ALGORITHM WITH 256 ATOMS OF SIZE 7 7 3 FOR AND 6 6 3 FOR . EACH CASE IS DIVIDED IN FOURPARTS: THE TOP-LEFT RESULTS ARE THOSE GIVEN BY MCAULEY AND AL [28] WITH THEIR “3 3 MODEL.” THE TOP-RIGHT RESULTS ARE THOSE OBTAINED BY

APPLYING THE GRAYSCALE K-SVD ALGORITHM [2] ON EACH CHANNEL SEPARATELY WITH 8 8 ATOMS. THE BOTTOM-LEFT ARE OUR RESULTS OBTAINEDWITH A GLOBALLY TRAINED DICTIONARY. THE BOTTOM-RIGHT ARE THE IMPROVEMENTS OBTAINED WITH THE ADAPTIVE APPROACH WITH 20 ITERATIONS.

BOLD INDICATES THE BEST RESULTS FOR EACH GROUP. AS CAN BE SEEN, OUR PROPOSED TECHNIQUE CONSISTENTLY PRODUCES THE BEST RESULTS

TABLE IICOMPARISON OF THE PSNR RESULTS ON THE IMAGE “CASTLE” BETWEEN [28] AND WHAT WE OBTAINED WITH 256 6 6 3 AND 7 7 3 PATCHES.

FOR THE ADAPTIVE APPROACH, 20 ITERATIONS HAVE BEEN PERFORMED. BOLD INDICATES THE BEST RESULT, INDICATING ONCEAGAIN THE CONSISTENT IMPROVEMENT OBTAINED WITH OUR PROPOSED TECHNIQUE

patch), in order to prevent any learning of these artifacts (over-fitting). We define then the patch sparsity of the decompo-sition as this number of steps. The stopping criteria in (2) be-comes the number of atoms used instead of the reconstructionerror. Using a small during the OMP permits to learn a dic-tionary specialized in providing a coarse approximation. Ourassumption is that (pattern) artifacts are less present in coarseapproximations, preventing the dictionary from learning them.We propose then the algorithm described in Fig. 6. We typicallyused to prevent the learning of artifacts and found outthat two outer iterations in the scheme in Fig. 6 are sufficient togive satisfactory results, while within the K-SVD, 10–20 itera-tions are required.

To conclude, in order to address the demosaicing problem, weuse the modified K-SVD algorithm that deals with nonuniformnoise, as described in previous section, and add to it an adaptivedictionary that has been learned with low patch sparsity in orderto avoid over-fitting the mosaic pattern. The same technique canbe applied to generic color inpainting as demonstrated in thenext section.

V. EXPERIMENTAL RESULTS

We are now ready to present the color image denoising, in-painting, and demosaicing results that are obtained with the pro-posed framework.

A. Denoising Color Images

The state-of-the-art performance of the algorithm ongrayscale images has already been studied in [2]. We nowevaluate our extension for color images. We trained somedictionaries with different sizes of atoms 5 5 3, 6 6 3,7 7 3 and 8 8 3, on 200 000 patches taken from adatabase of 15 000 images with the patch-sparsity parameter

(six atoms in the representations). We used the databaseLabelMe [55] to build our image database. Then we trainedeach dictionary with 600 iterations. This provided us a set ofgeneric dictionaries that we used as initial dictionaries in ourdenoising algorithm. Comparing the results obtained with theglobal approach and the adaptive one permits us to see theimprovements in the learning process. We chose to evaluate

Other sparse priors:

Image f = �x

Coe�cients x c = D�f

� D�

|x1| + |x2| max(|x1|, |x2|)

(a) (b) (c)

Figure 1: Unit balls of some atomic norms: In each figure, the set of atoms is graphed in red andthe unit ball of the associated atomic norm is graphed in blue. In (a), the atoms are the unit-Euclidean-norm one-sparse vectors, and the atomic norm is the !1 norm. In (b), the atoms are the2!2 symmetric unit-Euclidean-norm rank-one matrices, and the atomic norm is the nuclear norm.In (c), the atoms are the vectors {"1,+1}2, and the atomic norm is the !! norm.

natural procedure to go from the set of one-sparse vectors A to the !1 norm? We observe thatthe convex hull of (unit-Euclidean-norm) one-sparse vectors is the unit ball of the !1 norm, or thecross-polytope. Similarly the convex hull of the (unit-Euclidean-norm) rank-one matrices is thenuclear norm ball; see Figure 1 for illustrations. These constructions suggest a natural generaliza-tion to other settings. Under suitable conditions the convex hull conv(A) defines the unit ball ofa norm, which is called the atomic norm induced by the atomic set A. We can then minimize theatomic norm subject to measurement constraints, which results in a convex programming heuristicfor recovering simple models given linear measurements. As an example suppose we wish to recoverthe sum of a few permutation matrices given linear measurements. The convex hull of the set ofpermutation matrices is the Birkho! polytope of doubly stochastic matrices [73], and our proposalis to solve a convex program that minimizes the norm induced by this polytope. Similarly if wewish to recover an orthogonal matrix from linear measurements we would solve a spectral normminimization problem, as the spectral norm ball is the convex hull of all orthogonal matrices. Asdiscussed in Section 2.5 the atomic norm minimization problem is, in some sense, the best convexheuristic for recovering simple models with respect to a given atomic set.

We give general conditions for exact and robust recovery using the atomic norm heuristic. InSection 3 we provide concrete bounds on the number of generic linear measurements required forthe atomic norm heuristic to succeed. This analysis is based on computing certain Gaussian widthsof tangent cones with respect to the unit balls of the atomic norm [37]. Arguments based on Gaus-sian width have been fruitfully applied to obtain bounds on the number of Gaussian measurementsfor the special case of recovering sparse vectors via !1 norm minimization [64, 67], but computingGaussian widths of general cones is not easy. Therefore it is important to exploit the special struc-ture in atomic norms, while still obtaining su!ciently general results that are broadly applicable.An important theme in this paper is the connection between Gaussian widths and various notionsof symmetry. Specifically by exploiting symmetry structure in certain atomic norms as well as con-vex duality properties, we give bounds on the number of measurements required for recovery usingvery general atomic norm heuristics. For example we provide precise estimates of the number ofgeneric measurements required for exact recovery of an orthogonal matrix via spectral norm min-imization, and the number of generic measurements required for exact recovery of a permutationmatrix by minimizing the norm induced by the Birkho" polytope. While these results correspond

3

Nuclear