A Review of Proximal Methods, with a New One

A Review of ProximalSplitting Methods

www.numerical-tours.com

with a new one

Gabriel Peyré Jalal FadiliHugo Raguet

http://www.ceremade.dauphine.fr/~peyre/wavelet-tour/

http://www.ceremade.dauphine.fr/~peyre/wavelet-tour/

Overview

• Inverse Problems Regularization

• Proximal Splitting

• Generalized Forward-Backward

Forward model:

Observations Operator Noise(Unknown)Input� : RQ � RP

Inverse Problems

y = K f0 + w � RP

Forward model:

Observations Operator Noise(Unknown)Input� : RQ � RP

Denoising: K = IdQ, P = Q.

Inverse Problems

y = K f0 + w � RP

Forward model:

Observations Operator Noise(Unknown)Input

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

K

� : RQ � RP


Inpainting: set � of missing pixels, P = Q� |�|.

Inverse Problems

y = K f0 + w � RP

Forward model:

Observations Operator Noise(Unknown)Input

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

K

� : RQ � RP


Inpainting: set � of missing pixels, P = Q� |�|.

Super-resolution: Kf = (f � k) �� , P = Q/� .

Inverse Problems

K

y = K f0 + w � RP

Noisy measurements: y = Kf0 + w.

Prior model: J : RQ � R assigns a score to images.

Inverse Problem Regularization


f� � argminf�RQ

12||y �Kf ||2 + � J(f)



Data fidelity Regularity


Choice of �: tradeo�

||w||Regularity of f0

J(f0)Noise level


12||y �Kf ||2 + � J(f)





No noise: �� 0+, minimize

Choice of �: tradeo�

||w||Regularity of f0

J(f0)Noise level


12||y �Kf ||2 + � J(f)


f� � argminf�RQ,Kf=y

J(f)



L1 Regularization

coe�cientsx0 � RN

L1 Regularization

coe�cients image�

x0 � RN f0 = �x0 � RQ

L1 Regularization

observations

w

coe�cients image� K

x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

L1 Regularization

observations

� = K �⇥ ⇥ RP�N

w


x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

Fidelity Regularization

minx�RN

12

||y � �x||2 + �||x||1

L1 Regularization

Sparse recovery: f� = �x� where x� solves

observations

� = K �⇥ ⇥ RP�N

w


x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

K

y = Kf0 + wMeasures:

Inpainting Problem

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

Overview




Proximal operator of G:Prox�G(x) = argmin

z

12

||x� z||2 + �G(z)

Proximal Operators


z

12

||x� z||2 + �G(z)

Prox�G(x)i = max�

0, 1� �

|xi|

�xi

G(x) = ||x||1 =�

i

|xi|

Proximal Operators

−10 −8 −6 −4 −2 0 2 4 6 8 10

−2

0

2

4

6

8

10

12

−10 −8 −6 −4 −2 0 2 4 6 8 10−10

−8

−6

−4

−2

0

2

4

6

8

10

||x||0|x|log(1 + x2)

G(x)

ProxG(x)


z

12

||x� z||2 + �G(z)


0, 1� �

|xi|

�xi

Prox�G(x)i =�

xi if |xi| � �2�,0 otherwise.

G(x) = ||x||1 =�

i

|xi|

G(x) = ||x||0 = | {i \ xi �= 0} |

Proximal Operators

−10 −8 −6 −4 −2 0 2 4 6 8 10

−2

0

2

4

6

8

10

12

−10 −8 −6 −4 −2 0 2 4 6 8 10−10

−8

−6

−4

−2

0

2

4

6

8

10

||x||0|x|log(1 + x2)

G(x)

ProxG(x)

�� 3rd order polynomial root.


z

12

||x� z||2 + �G(z)


0, 1� �

|xi|

�xi

Prox�G(x)i =�

xi if |xi| � �2�,0 otherwise.

G(x) = ||x||1 =�

i

|xi|

G(x) = ||x||0 = | {i \ xi �= 0} |

G(x) =�

i

log(1 + |xi|2)

Proximal Operators

−10 −8 −6 −4 −2 0 2 4 6 8 10

−2

0

2

4

6

8

10

12

−10 −8 −6 −4 −2 0 2 4 6 8 10−10

−8

−6

−4

−2

0

2

4

6

8

10

||x||0|x|log(1 + x2)

G(x)

ProxG(x)

Solve minx�H

E(x)

Problem: Prox�E is not available.

Proximal Splitting Methods

Solve minx�H

E(x)

Splitting:

SimpleSmooth


E(x) = F (x) +�

i

Gi(x)


Solve minx�H

E(x)

Splitting:

SimpleSmooth


Iterative algorithms using: �F (x)Prox�Gi(x)

Forward-Backward:Douglas-Rachford:

Primal-Dual:Generalized FB:

�Gi

F +�

Gi

F + Gsolves

E(x) = F (x) +�

i

Gi(x)

�Gi � Ai


minx�RN

F (x) + G(x)

Forward-Backward

SimpleSmooth

(�)

minx�RN

F (x) + G(x)

x(�+1) = Prox�G

�x(�) � ��F (x(�))

�

Forward-Backward

Forward-backward:

SimpleSmooth

(�)

G = �C

minx�RN

F (x) + G(x)

x(�+1) = Prox�G

�x(�) � ��F (x(�))

�

Forward-Backward

Forward-backward:

Projected gradient descent:

SimpleSmooth

(�)

G = �C

minx�RN

F (x) + G(x)

x(�+1) = Prox�G

�x(�) � ��F (x(�))

�

Forward-Backward

Forward-backward:


Theorem:

a solution of (�)If � < 2/L,

Let �F be L-Lipschitz.x(�) � x�

SimpleSmooth

(�)

G = �C

�� Multi-step accelerations (Nesterov, Beck-Teboule).

minx�RN

F (x) + G(x)

x(�+1) = Prox�G

�x(�) � ��F (x(�))

�

Forward-Backward

Forward-backward:


Theorem:

a solution of (�)If � < 2/L,

Let �F be L-Lipschitz.x(�) � x�

SimpleSmooth

(�)

minx

12

||�x� y||2 + �||x||1 minx

F (x) + G(x)

F (x) =12

||�x� y||2

G(x) = �||x||1

�F (x) = ��(�x� y)


0, 1� �⇥

|xi|

�xi

L = ||��||

Example: L1 Regularization

��

Forward-backward Iterative soft thresholding��

Douglas Rachford Scheme

(�)minx

G1(x) + G2(x)

SimpleSimple

Douglas-Rachford iterations:

RProx�G(x) = 2Prox�G(x)� x

Reflexive prox:

z(�+1) =�1� �

2

�z(�) +

�

2RProx�G2 � RProx�G1(z

(�))

x(�+1) = Prox�G2(z(�+1))


(�)minx

G1(x) + G2(x)

SimpleSimple

Douglas-Rachford iterations:

Theorem:

a solution of (�)

RProx�G(x) = 2Prox�G(x)� x

x(�) � x�

If 0 < � < 2 and ⇥ > 0,

Reflexive prox:

z(�+1) =�1� �

2

�z(�) +

�

2RProx�G2 � RProx�G1(z

(�))

x(�+1) = Prox�G2(z(�+1))


(�)minx

G1(x) + G2(x)

SimpleSimple

C = {x \ �x = y}

Prox�G1(x) = ProjC(x) = x + �⇥(��⇥)�1(y � �x)

Prox�G2(x) =�

max�

0, 1� �

|xi|

�xi

�

i

�� e⇥cient if �� easy to invert.

minx

G1(x) + G2(x)min�x=y

||x||1

G1(x) = iC(x),

G2(x) = ||x||1

Example: Constrainted L1

��

50 100 150 200 250

−5

−4

−3

−2

−1

0

1

C = {x \ �x = y}

Prox�G1(x) = ProjC(x) = x + �⇥(��⇥)�1(y � �x)

Prox�G2(x) =�

max�

0, 1� �

|xi|

�xi

�

i

�� e⇥cient if �� easy to invert.

� = 0.01� = 1� = 10

Example: compressed sensing

� � R100�400 Gaussian matrix

||x0||0 = 17y = �x0

log10(||x(�)||1 � ||x�||1)

minx

G1(x) + G2(x)min�x=y

||x||1

G1(x) = iC(x),

G2(x) = ||x||1

Example: Constrainted L1

��

�

Overview




GFB Splitting

(�)minx�RN

F (x) +n�

i=1

Gi(x)

SimpleSmooth

� i = 1, . . . , n,

x(�+1) =1n

n�

i=1

z(�+1)i

z(�+1)i =z(�)

i + Proxn�Gi(2x(�)�z(�)i ��F (x(�)))�x(�)

GFB Splitting

(�)minx�RN

F (x) +n�

i=1

Gi(x)

SimpleSmooth

� i = 1, . . . , n,

n = 1 �� Forward-backward.F = 0 �� Douglas-Rachford.

x(�+1) =1n

n�

i=1

z(�+1)i

z(�+1)i =z(�)

i + Proxn�Gi(2x(�)�z(�)i ��F (x(�)))�x(�)

GFB Splitting

Theorem:

a solution of (�)x(�) � x�If � < 2/L,Let �F be L-Lipschitz.

(�)minx�RN

F (x) +n�

i=1

Gi(x)

SimpleSmooth

Coe�cients x.Image f = �x

�1 � �2 block sparsity:

Block Regularization

Towards More Complex Penalization

⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i

Decomposition G = �k Gk

Numerical ExperimentsDeconvolution minx

12 ��Y ⇥K � x ��2 + �(2)`1�`2 �4

k=1 ��x ��Bk1,2

10 20 30 40−1

0

1

2

3

tEFB: 161s; tPR: 173s; tCP: 190s

iteration #

log 10

(E−E

min

)

EFBPRCP

N: 256

noise: 0.025; convol.: 2λl1/l2

2 : 1.30e−03; it. #50; SNR: 22.49dB

G(x) =�

b�B||x[b]||,

b � B

||x[b]||2 =�

m�b

x2m

Coe�cients x.Image f = �x Blocks B1

Non-overlapping decomposition:


B = B1 � . . . � Bn

G(x) =n�

i=1

Gi(x) Gi(x) =�

b�Bi

||x[b]||,



⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i



12 ��Y ⇥K � x ��2 + �(2)`1�`2 �4

k=1 ��x ��Bk1,2

10 20 30 40−1

0

1

2

3


iteration #

log 10

(E−E

min

)

EFBPRCP

N: 256


2 : 1.30e−03; it. #50; SNR: 22.49dB


⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i



⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i


B1 � B2

G(x) =�

b�B||x[b]||,

b � B

||x[b]||2 =�

m�b

x2m

Coe�cients x.Image f = �x Blocks B1

Non-overlapping decomposition:


B = B1 � . . . � Bn

G(x) =n�

i=1

Gi(x)

⇤m ⇥ b ⇥ Bi, Prox�Gi(x)m = max�

0, 1� �

||x[b]||

�xm

Gi(x) =�

b�Bi

||x[b]||,

Each Gi is simple:



⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i



12 ��Y ⇥K � x ��2 + �(2)`1�`2 �4

k=1 ��x ��Bk1,2

10 20 30 40−1

0

1

2

3


iteration #

log 10

(E−E

min

)

EFBPRCP

N: 256


2 : 1.30e−03; it. #50; SNR: 22.49dB


⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i



⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i


B1 � B2

G(x) =�

b�B||x[b]||,

b � B

||x[b]||2 =�

m�b

x2m

y = �x0 + w

x0

x�

� = TI wavelets

� = convolution

Numerical Illustration


12 ��Y ⇥K � x ��2 + �(2)`1�`2 �4

k=1 ��x ��Bk1,2

10 20 30 40−1

0

1

2

3


iteration #

log 10

(E−E

min

)

EFBPRCP

N: 256


2 : 1.30e−03; it. #50; SNR: 22.49dB


12 ��Y ⇥K � x ��2 + �(2)`1�`2 �4

k=1 ��x ��Bk1,2

10 20 30 40−1

0

1

2

3


iteration #

log 10

(E−E

min

)

EFBPRCP

N: 256


2 : 1.30e−03; it. #50; SNR: 22.49dB


12 ��Y ⇥K � x ��2 + �(2)`1�`2 �4

k=1 ��x ��Bk1,2

10 20 30 40−1

0

1

2

3


iteration #

log 10

(E−E

min

)

EFBPRCP

N: 256


2 : 1.30e−03; it. #50; SNR: 22.49dB

Numerical ExperimentsDeconv. + Inpaint. minx

12 ��Y ⇥ P�K � x ��2 + �(4)`1�`2 �16

k=1 ��x ��Bk1,2

10 20 30 40

0

1

2

3


iteration #

log 10

(E−E

min

)

EFBPRCP

noise: 0.025; degrad.: 0.4; convol.: 2λl1/l2

4 : 1.00e−03; it. #50; SNR: 21.80dB

Numerical ExperimentsDeconv. + Inpaint. minx

12 ��Y ⇥ P�K � x ��2 + �(4)`1�`2 �16

k=1 ��x ��Bk1,2

10 20 30 40

0

1

2

3


iteration #

log 10

(E−E

min

)

EFBPRCP

noise: 0.025; degrad.: 0.4; convol.: 2λl1/l2

4 : 1.00e−03; it. #50; SNR: 21.80dB

log10(E(x(�))� E(x�))

minx

12

||y � �⇥x||2 + ��

i

Gi(x)

� = inpainting+convolution

Inverse problems in imaging:� Large scale, N � 106.

� Non-smooth (sparsity, TV, . . . )

� (Sometimes) convex.

� Highly structured (separability, �p norms, . . . ).

Conclusion





Proximal splitting:

� Parallelizable.� Unravel the structure of problems.

Conclusion


⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i






Proximal splitting:

Open problems:� Less structured problems without smoothness.� Non-convex optimization.

� Parallelizable.� Unravel the structure of problems.

Conclusion


⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i


A Review of Proximal Methods, with a New One

Documents

Transcript of A Review of Proximal Methods, with a New One