A Review of Proximal Methods, with a New One

46
A Review of Proximal Splitting Methods www.numerical-tours.com with a new one Gabriel Peyré Jalal Fadili Hugo Raguet

description

Slides of presentation at the conference ISMP 2012, Aug. 19-24, 2012, Berlin, Germany

Transcript of A Review of Proximal Methods, with a New One

Page 1: A Review of Proximal Methods, with a New One

A Review of ProximalSplitting Methods

www.numerical-tours.com

with a new one

Gabriel Peyré Jalal FadiliHugo Raguet

Page 2: A Review of Proximal Methods, with a New One

Overview

• Inverse Problems Regularization

• Proximal Splitting

• Generalized Forward-Backward

Page 3: A Review of Proximal Methods, with a New One

Forward model:

Observations Operator Noise(Unknown)Input� : RQ � RP

Inverse Problems

y = K f0 + w � RP

Page 4: A Review of Proximal Methods, with a New One

Forward model:

Observations Operator Noise(Unknown)Input� : RQ � RP

Denoising: K = IdQ, P = Q.

Inverse Problems

y = K f0 + w � RP

Page 5: A Review of Proximal Methods, with a New One

Forward model:

Observations Operator Noise(Unknown)Input

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

K

� : RQ � RP

Denoising: K = IdQ, P = Q.

Inpainting: set � of missing pixels, P = Q� |�|.

Inverse Problems

y = K f0 + w � RP

Page 6: A Review of Proximal Methods, with a New One

Forward model:

Observations Operator Noise(Unknown)Input

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

K

� : RQ � RP

Denoising: K = IdQ, P = Q.

Inpainting: set � of missing pixels, P = Q� |�|.

Super-resolution: Kf = (f � k) �� , P = Q/� .

Inverse Problems

K

y = K f0 + w � RP

Page 7: A Review of Proximal Methods, with a New One

Noisy measurements: y = Kf0 + w.

Prior model: J : RQ � R assigns a score to images.

Inverse Problem Regularization

Page 8: A Review of Proximal Methods, with a New One

Noisy measurements: y = Kf0 + w.

f� � argminf�RQ

12||y �Kf ||2 + � J(f)

Prior model: J : RQ � R assigns a score to images.

Inverse Problem Regularization

Data fidelity Regularity

Page 9: A Review of Proximal Methods, with a New One

Noisy measurements: y = Kf0 + w.

Choice of �: tradeo�

||w||Regularity of f0

J(f0)Noise level

f� � argminf�RQ

12||y �Kf ||2 + � J(f)

Prior model: J : RQ � R assigns a score to images.

Inverse Problem Regularization

Data fidelity Regularity

Page 10: A Review of Proximal Methods, with a New One

Noisy measurements: y = Kf0 + w.

No noise: �� 0+, minimize

Choice of �: tradeo�

||w||Regularity of f0

J(f0)Noise level

f� � argminf�RQ

12||y �Kf ||2 + � J(f)

Prior model: J : RQ � R assigns a score to images.

f� � argminf�RQ,Kf=y

J(f)

Inverse Problem Regularization

Data fidelity Regularity

Page 11: A Review of Proximal Methods, with a New One

L1 Regularization

coe�cientsx0 � RN

Page 12: A Review of Proximal Methods, with a New One

L1 Regularization

coe�cients image�

x0 � RN f0 = �x0 � RQ

Page 13: A Review of Proximal Methods, with a New One

L1 Regularization

observations

w

coe�cients image� K

x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

Page 14: A Review of Proximal Methods, with a New One

L1 Regularization

observations

� = K �⇥ ⇥ RP�N

w

coe�cients image� K

x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

Page 15: A Review of Proximal Methods, with a New One

Fidelity Regularization

minx�RN

12

||y � �x||2 + �||x||1

L1 Regularization

Sparse recovery: f� = �x� where x� solves

observations

� = K �⇥ ⇥ RP�N

w

coe�cients image� K

x0 � RN f0 = �x0 � RQ y = Kf0 + w � RP

Page 16: A Review of Proximal Methods, with a New One

K

y = Kf0 + wMeasures:

Inpainting Problem

(Kf)(x) =�

0 if x � �,f(x) if x /� �.

Page 17: A Review of Proximal Methods, with a New One

Overview

• Inverse Problems Regularization

• Proximal Splitting

• Generalized Forward-Backward

Page 18: A Review of Proximal Methods, with a New One

Proximal operator of G:Prox�G(x) = argmin

z

12

||x� z||2 + �G(z)

Proximal Operators

Page 19: A Review of Proximal Methods, with a New One

Proximal operator of G:Prox�G(x) = argmin

z

12

||x� z||2 + �G(z)

Prox�G(x)i = max�

0, 1� �

|xi|

�xi

G(x) = ||x||1 =�

i

|xi|

Proximal Operators

−10 −8 −6 −4 −2 0 2 4 6 8 10

−2

0

2

4

6

8

10

12

−10 −8 −6 −4 −2 0 2 4 6 8 10−10

−8

−6

−4

−2

0

2

4

6

8

10

||x||0|x|log(1 + x2)

G(x)

ProxG(x)

Page 20: A Review of Proximal Methods, with a New One

Proximal operator of G:Prox�G(x) = argmin

z

12

||x� z||2 + �G(z)

Prox�G(x)i = max�

0, 1� �

|xi|

�xi

Prox�G(x)i =�

xi if |xi| � �2�,0 otherwise.

G(x) = ||x||1 =�

i

|xi|

G(x) = ||x||0 = | {i \ xi �= 0} |

Proximal Operators

−10 −8 −6 −4 −2 0 2 4 6 8 10

−2

0

2

4

6

8

10

12

−10 −8 −6 −4 −2 0 2 4 6 8 10−10

−8

−6

−4

−2

0

2

4

6

8

10

||x||0|x|log(1 + x2)

G(x)

ProxG(x)

Page 21: A Review of Proximal Methods, with a New One

�� 3rd order polynomial root.

Proximal operator of G:Prox�G(x) = argmin

z

12

||x� z||2 + �G(z)

Prox�G(x)i = max�

0, 1� �

|xi|

�xi

Prox�G(x)i =�

xi if |xi| � �2�,0 otherwise.

G(x) = ||x||1 =�

i

|xi|

G(x) = ||x||0 = | {i \ xi �= 0} |

G(x) =�

i

log(1 + |xi|2)

Proximal Operators

−10 −8 −6 −4 −2 0 2 4 6 8 10

−2

0

2

4

6

8

10

12

−10 −8 −6 −4 −2 0 2 4 6 8 10−10

−8

−6

−4

−2

0

2

4

6

8

10

||x||0|x|log(1 + x2)

G(x)

ProxG(x)

Page 22: A Review of Proximal Methods, with a New One

Solve minx�H

E(x)

Problem: Prox�E is not available.

Proximal Splitting Methods

Page 23: A Review of Proximal Methods, with a New One

Solve minx�H

E(x)

Splitting:

SimpleSmooth

Problem: Prox�E is not available.

E(x) = F (x) +�

i

Gi(x)

Proximal Splitting Methods

Page 24: A Review of Proximal Methods, with a New One

Solve minx�H

E(x)

Splitting:

SimpleSmooth

Problem: Prox�E is not available.

Iterative algorithms using: �F (x)Prox�Gi(x)

Forward-Backward:Douglas-Rachford:

Primal-Dual:Generalized FB:

�Gi

F +�

Gi

F + Gsolves

E(x) = F (x) +�

i

Gi(x)

�Gi � Ai

Proximal Splitting Methods

Page 25: A Review of Proximal Methods, with a New One

minx�RN

F (x) + G(x)

Forward-Backward

SimpleSmooth

(�)

Page 26: A Review of Proximal Methods, with a New One

minx�RN

F (x) + G(x)

x(�+1) = Prox�G

�x(�) � ��F (x(�))

Forward-Backward

Forward-backward:

SimpleSmooth

(�)

Page 27: A Review of Proximal Methods, with a New One

G = �C

minx�RN

F (x) + G(x)

x(�+1) = Prox�G

�x(�) � ��F (x(�))

Forward-Backward

Forward-backward:

Projected gradient descent:

SimpleSmooth

(�)

Page 28: A Review of Proximal Methods, with a New One

G = �C

minx�RN

F (x) + G(x)

x(�+1) = Prox�G

�x(�) � ��F (x(�))

Forward-Backward

Forward-backward:

Projected gradient descent:

Theorem:

a solution of (�)If � < 2/L,

Let �F be L-Lipschitz.x(�) � x�

SimpleSmooth

(�)

Page 29: A Review of Proximal Methods, with a New One

G = �C

�� Multi-step accelerations (Nesterov, Beck-Teboule).

minx�RN

F (x) + G(x)

x(�+1) = Prox�G

�x(�) � ��F (x(�))

Forward-Backward

Forward-backward:

Projected gradient descent:

Theorem:

a solution of (�)If � < 2/L,

Let �F be L-Lipschitz.x(�) � x�

SimpleSmooth

(�)

Page 30: A Review of Proximal Methods, with a New One

minx

12

||�x� y||2 + �||x||1 minx

F (x) + G(x)

F (x) =12

||�x� y||2

G(x) = �||x||1

�F (x) = ��(�x� y)

Prox�G(x)i = max�

0, 1� �⇥

|xi|

�xi

L = ||���||

Example: L1 Regularization

��

Forward-backward Iterative soft thresholding��

Page 31: A Review of Proximal Methods, with a New One

Douglas Rachford Scheme

(�)minx

G1(x) + G2(x)

SimpleSimple

Page 32: A Review of Proximal Methods, with a New One

Douglas-Rachford iterations:

RProx�G(x) = 2Prox�G(x)� x

Reflexive prox:

z(�+1) =�1� �

2

�z(�) +

2RProx�G2 � RProx�G1(z

(�))

x(�+1) = Prox�G2(z(�+1))

Douglas Rachford Scheme

(�)minx

G1(x) + G2(x)

SimpleSimple

Page 33: A Review of Proximal Methods, with a New One

Douglas-Rachford iterations:

Theorem:

a solution of (�)

RProx�G(x) = 2Prox�G(x)� x

x(�) � x�

If 0 < � < 2 and ⇥ > 0,

Reflexive prox:

z(�+1) =�1� �

2

�z(�) +

2RProx�G2 � RProx�G1(z

(�))

x(�+1) = Prox�G2(z(�+1))

Douglas Rachford Scheme

(�)minx

G1(x) + G2(x)

SimpleSimple

Page 34: A Review of Proximal Methods, with a New One

C = {x \ �x = y}

Prox�G1(x) = ProjC(x) = x + �⇥(��⇥)�1(y � �x)

Prox�G2(x) =�

max�

0, 1� �

|xi|

�xi

i

�� e⇥cient if ��� easy to invert.

minx

G1(x) + G2(x)min�x=y

||x||1

G1(x) = iC(x),

G2(x) = ||x||1

Example: Constrainted L1

��

Page 35: A Review of Proximal Methods, with a New One

50 100 150 200 250

−5

−4

−3

−2

−1

0

1

C = {x \ �x = y}

Prox�G1(x) = ProjC(x) = x + �⇥(��⇥)�1(y � �x)

Prox�G2(x) =�

max�

0, 1� �

|xi|

�xi

i

�� e⇥cient if ��� easy to invert.

� = 0.01� = 1� = 10

Example: compressed sensing

� � R100�400 Gaussian matrix

||x0||0 = 17y = �x0

log10(||x(�)||1 � ||x�||1)

minx

G1(x) + G2(x)min�x=y

||x||1

G1(x) = iC(x),

G2(x) = ||x||1

Example: Constrainted L1

��

Page 36: A Review of Proximal Methods, with a New One

Overview

• Inverse Problems Regularization

• Proximal Splitting

• Generalized Forward-Backward

Page 37: A Review of Proximal Methods, with a New One

GFB Splitting

(�)minx�RN

F (x) +n�

i=1

Gi(x)

SimpleSmooth

Page 38: A Review of Proximal Methods, with a New One

� i = 1, . . . , n,

x(�+1) =1n

n�

i=1

z(�+1)i

z(�+1)i =z(�)

i + Proxn�Gi(2x(�)�z(�)i ���F (x(�)))�x(�)

GFB Splitting

(�)minx�RN

F (x) +n�

i=1

Gi(x)

SimpleSmooth

Page 39: A Review of Proximal Methods, with a New One

� i = 1, . . . , n,

n = 1 �� Forward-backward.F = 0 �� Douglas-Rachford.

x(�+1) =1n

n�

i=1

z(�+1)i

z(�+1)i =z(�)

i + Proxn�Gi(2x(�)�z(�)i ���F (x(�)))�x(�)

GFB Splitting

Theorem:

a solution of (�)x(�) � x�If � < 2/L,Let �F be L-Lipschitz.

(�)minx�RN

F (x) +n�

i=1

Gi(x)

SimpleSmooth

Page 40: A Review of Proximal Methods, with a New One

Coe�cients x.Image f = �x

�1 � �2 block sparsity:

Block Regularization

Towards More Complex Penalization

⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i

Decomposition G = �k Gk

Numerical ExperimentsDeconvolution minx

12 ��Y ⇥K � x ��2 + �(2)`1�`2 �4

k=1 ��x ��Bk1,2

10 20 30 40−1

0

1

2

3

tEFB: 161s; tPR: 173s; tCP: 190s

iteration #

log 10

(E−E

min

)

EFBPRCP

N: 256

noise: 0.025; convol.: 2λl1/l2

2 : 1.30e−03; it. #50; SNR: 22.49dB

G(x) =�

b�B||x[b]||,

b � B

||x[b]||2 =�

m�b

x2m

Page 41: A Review of Proximal Methods, with a New One

Coe�cients x.Image f = �x Blocks B1

Non-overlapping decomposition:

�1 � �2 block sparsity:

B = B1 � . . . � Bn

G(x) =n�

i=1

Gi(x) Gi(x) =�

b�Bi

||x[b]||,

Block Regularization

Towards More Complex Penalization

⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i

Decomposition G = �k Gk

Numerical ExperimentsDeconvolution minx

12 ��Y ⇥K � x ��2 + �(2)`1�`2 �4

k=1 ��x ��Bk1,2

10 20 30 40−1

0

1

2

3

tEFB: 161s; tPR: 173s; tCP: 190s

iteration #

log 10

(E−E

min

)

EFBPRCP

N: 256

noise: 0.025; convol.: 2λl1/l2

2 : 1.30e−03; it. #50; SNR: 22.49dB

Towards More Complex Penalization

⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i

Decomposition G = �k Gk

Towards More Complex Penalization

⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i

Decomposition G = �k Gk

B1 � B2

G(x) =�

b�B||x[b]||,

b � B

||x[b]||2 =�

m�b

x2m

Page 42: A Review of Proximal Methods, with a New One

Coe�cients x.Image f = �x Blocks B1

Non-overlapping decomposition:

�1 � �2 block sparsity:

B = B1 � . . . � Bn

G(x) =n�

i=1

Gi(x)

⇤m ⇥ b ⇥ Bi, Prox�Gi(x)m = max�

0, 1� �

||x[b]||

�xm

Gi(x) =�

b�Bi

||x[b]||,

Each Gi is simple:

Block Regularization

Towards More Complex Penalization

⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i

Decomposition G = �k Gk

Numerical ExperimentsDeconvolution minx

12 ��Y ⇥K � x ��2 + �(2)`1�`2 �4

k=1 ��x ��Bk1,2

10 20 30 40−1

0

1

2

3

tEFB: 161s; tPR: 173s; tCP: 190s

iteration #

log 10

(E−E

min

)

EFBPRCP

N: 256

noise: 0.025; convol.: 2λl1/l2

2 : 1.30e−03; it. #50; SNR: 22.49dB

Towards More Complex Penalization

⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i

Decomposition G = �k Gk

Towards More Complex Penalization

⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i

Decomposition G = �k Gk

B1 � B2

G(x) =�

b�B||x[b]||,

b � B

||x[b]||2 =�

m�b

x2m

Page 43: A Review of Proximal Methods, with a New One

y = �x0 + w

x0

x�

� = TI wavelets

� = convolution

Numerical Illustration

Numerical ExperimentsDeconvolution minx

12 ��Y ⇥K � x ��2 + �(2)`1�`2 �4

k=1 ��x ��Bk1,2

10 20 30 40−1

0

1

2

3

tEFB: 161s; tPR: 173s; tCP: 190s

iteration #

log 10

(E−E

min

)

EFBPRCP

N: 256

noise: 0.025; convol.: 2λl1/l2

2 : 1.30e−03; it. #50; SNR: 22.49dB

Numerical ExperimentsDeconvolution minx

12 ��Y ⇥K � x ��2 + �(2)`1�`2 �4

k=1 ��x ��Bk1,2

10 20 30 40−1

0

1

2

3

tEFB: 161s; tPR: 173s; tCP: 190s

iteration #

log 10

(E−E

min

)

EFBPRCP

N: 256

noise: 0.025; convol.: 2λl1/l2

2 : 1.30e−03; it. #50; SNR: 22.49dB

Numerical ExperimentsDeconvolution minx

12 ��Y ⇥K � x ��2 + �(2)`1�`2 �4

k=1 ��x ��Bk1,2

10 20 30 40−1

0

1

2

3

tEFB: 161s; tPR: 173s; tCP: 190s

iteration #

log 10

(E−E

min

)

EFBPRCP

N: 256

noise: 0.025; convol.: 2λl1/l2

2 : 1.30e−03; it. #50; SNR: 22.49dB

Numerical ExperimentsDeconv. + Inpaint. minx

12 ��Y ⇥ P�K � x ��2 + �(4)`1�`2 �16

k=1 ��x ��Bk1,2

10 20 30 40

0

1

2

3

tEFB: 283s; tPR: 298s; tCP: 368s

iteration #

log 10

(E−E

min

)

EFBPRCP

noise: 0.025; degrad.: 0.4; convol.: 2λl1/l2

4 : 1.00e−03; it. #50; SNR: 21.80dB

Numerical ExperimentsDeconv. + Inpaint. minx

12 ��Y ⇥ P�K � x ��2 + �(4)`1�`2 �16

k=1 ��x ��Bk1,2

10 20 30 40

0

1

2

3

tEFB: 283s; tPR: 298s; tCP: 368s

iteration #

log 10

(E−E

min

)

EFBPRCP

noise: 0.025; degrad.: 0.4; convol.: 2λl1/l2

4 : 1.00e−03; it. #50; SNR: 21.80dB

log10(E(x(�))� E(x�))

minx

12

||y � �⇥x||2 + ��

i

Gi(x)

� = inpainting+convolution

Page 44: A Review of Proximal Methods, with a New One

Inverse problems in imaging:� Large scale, N � 106.

� Non-smooth (sparsity, TV, . . . )

� (Sometimes) convex.

� Highly structured (separability, �p norms, . . . ).

Conclusion

Page 45: A Review of Proximal Methods, with a New One

Inverse problems in imaging:� Large scale, N � 106.

� Non-smooth (sparsity, TV, . . . )

� (Sometimes) convex.

� Highly structured (separability, �p norms, . . . ).

Proximal splitting:

� Parallelizable.� Unravel the structure of problems.

Conclusion

Towards More Complex Penalization

⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i

Decomposition G = �k Gk

Page 46: A Review of Proximal Methods, with a New One

Inverse problems in imaging:� Large scale, N � 106.

� Non-smooth (sparsity, TV, . . . )

� (Sometimes) convex.

� Highly structured (separability, �p norms, . . . ).

Proximal splitting:

Open problems:� Less structured problems without smoothness.� Non-convex optimization.

� Parallelizable.� Unravel the structure of problems.

Conclusion

Towards More Complex Penalization

⇥⇥x ⇥⇥1 = �i ⇥xi ⇥ �b�B��i�b x2

i

�b�B1

��i�b x2i+

�b�B2

��i�b x2i

Decomposition G = �k Gk