Download - Accelerated Proximal Point Method for Maximally Monotone Operatorsmathsci.kaist.ac.kr/~donghwankim/doc/dkim_iccopt19.pdf · 2019-08-06 · Goal Accelerate proximal point method for

Accelerated Proximal Point Methodfor Maximally Monotone Operators

Donghwan Kim

KAIST

ICCOPT 2019

Aug 6, 2019

Goal

Goal Accelerate proximal point method for maximally monotone operators

in terms of the fixed-point residual

via the performance estimation problem (PEP) approach.

Donghwan Kim (KAIST) Accelerated Proximal Point Method 1 / 25

1 Monotone Operators

2 Proximal Point Method

3 Accelerated Proximal Point Method

4 Numerical Experiments

5 Accelerated Forward Method

1. Monotone Operators

Monotone Operators

Let H be a real Hilbert space equipped with inner product 〈·, ·〉, andassociated norm || · ||.

A set-valued operator T : H⇒ H is monotone if

〈x− y, Tx− Ty〉 ≥ 0 for all x,y ∈ H.

or more precisely,

〈x− y, u− v〉 ≥ 0 for all x,y ∈ H, u ∈ T (x),v ∈ T (y).

It is said to be maximally monotone if the graph

gra(T ) = {(x,u) ∈ H ×H : u ∈ Tx}

is not properly contained in the graph of any other monotone operator.


1. Monotone Operators

Monotone Inclusion Problem

Monotone inclusion problem

Find x ∈ H s.t. 0 ∈ Tx,

where T : H⇒ H is maximally monotone.

Subdifferential: T = ∂f for a proper closed convex f

Saddle subdifferential: T =

[∂xφ(x,y)

∂y(−φ(x,y))

]for a convex-concave φ

...


2. Proximal Point Method

Proximal Point Method

Proximal Point Method [Martinet, 1970]

Initialize x0 ∈ H, λ > 0.for i = 0, 1, . . .

xi+1 = proxλf (xi) = argminx∈H

{1

2||x− xi||2 + λf(x)

}

Proximal point method on dual problem

= Augmented Lagrangian method

Inspired by [Nesterov, 1983],

an accelerated version was developed in [Guler, 1992],

which is an instance of FISTA [Beck-Teboulle, 2009].



Proximal Point Method for Maximally Monotone Operators

Resolvent operator

JT := (I + T )−1

Proximal Point Method for Maximally Monotone Operators [Rockafellar, 1976]


xi+1 = JλT (xi)

Includes the Douglas-Rachford Splitting (DRS) Method (and ADMM).

Only empirical accelerations are known.



Douglas-Rachford Splitting Method

Consider a problem

0 ∈ Tx = (T1 + T2)x,

where JT1and JT2

are efficient than JT .

Douglas-Rachford Splitting (DRS) Method [Lions-Mercier, 1979]

Initialize x0 ∈ H, ρ > 0.for i = 0, 1, . . .

xi+1 = (JρT1◦ (2JρT2

− I) + (I − JρT2))xi

{JρT2(xi)} converges to some zero of T1 + T2

ADMM ∈ DRS ∈ Proximal Point [Eckstein-Bertsekas, 1992]

JTDRS= JρT1

◦ (2JρT2− I) + (I − JρT2

)

where TDRS is maximally monotone.



Goal

Goal Accelerate proximal point method for maximally monotone operators

in terms of the fixed-point residual

via the performance estimation problem (PEP) approach,

and also accelerate DRS and ADMM.

[Ryu-Taylor-Bergeling-Giselsson, 2018] uses PEP to find the optimalparameter for DRS under additional assumptions.


3. Accelerated Proximal Point Method

Proximal Point Method for Maximally Monotone Operators

Proximal Point Method [Rockafellar, 1976]


xi+1 = JλT (xi)

Theorem 3.1 [Brezis-Lions, 1978]

A proximal point method satisfies

||JλT (xi−1)− xi−1||2︸︷︷︸Fixed-point residual

= ||xi − xi−1||2 ≤||x0 − x∗||2

i.

Recently improved by a constant(1− 1

i

)i−1via PEP. [Gu-Yang, 2019]



Guler’s Accelerated Proximal Point Method

For a convex minimization (with T = ∂f),

an accelerated version was developed in [Guler, 1992].

Guler’s Accelerated Proximal Point Method [Guler, 1992]

Initialize x0 = y0 ∈ H, λ > 0, t0 = 1.for i = 0, 1, . . .

xi+1 = Jλ∂f (yi) = proxλf (yi)

ti+1 =1

2

(1 +

√1 + 4t2i

)yi+1 = xi+1 +

ti − 1

ti+1(xi+1 − xi) (momentum update)

This diverges for some monotone operators T ...



Performance Estimation Problem (PEP)

General Proximal Point Method

Initialize x0 = y0 ∈ H, λ > 0.for i = 0, 1, . . . , N − 1

xi+1 = JλT (yi), yi+1 = yi +

i∑k=0

hi+1,k+1(xk+1 − yk)

Using PEP [Drori-Teboulle, 2014, Taylor-Hendrickx-Glineur, 2017],its worst-case rate can be found by solving

maxT ,

x1,...,xN∈H,y0,...,yN−1∈H

1

R2||xN − yN−1||2

subject to T : H⇒ H is maximally monotone,

{xi}, {yi} generated by the general proximal point method,

0 ∈ Tx∗, ||y0 − x∗|| ≤ R.

[Gu-Yang, 2019], [Ryu-Taylor-Bergeling-Giselsson, 2018]



Performance Estimation Problem (PEP) (cont’d)

Similar to [Drori-Teboulle, 2014],find an accelerated method by solving the minimax problem:

min{hi+1,k+1}

maxT ,

x1,...,xN∈H,y0,...,yN−1∈H

1

R2||xN − yN−1||2

subject to T : H⇒ H is maximally monotone,

{xi}, {yi} generated by the general proximal point method,

0 ∈ Tx∗, ||y0 − x∗|| ≤ R.



Proposed Accelerated Proximal Point Method

Proposed Accelerated Proximal Point Method [Kim, 2019]

Initialize x0 = y0 = y−1 ∈ H, λ > 0for i = 0, 1, . . .

xi+1 = JλT (yi)

yi+1 = xi+1 +i

i+ 2(xi+1 − xi) −

i

i+ 2(xi − yi−1)

Theorem 3.2 [Kim, 2019]

The proposed accelerated proximal point method satisfies

||xi − yi−1||2 ≤||x0 − x∗||2

i2.

Similar to [Nesterov, 1983, Guler, 1992], when the red term is discarded.



Strongly Monotone Operators

A set-valued operator T : H⇒ H is µ-strongly monotone if

〈x− y, Tx− Ty〉 ≥ µ||x− y||2 for all x,y ∈ H.

Proximal Point Method [Rockafellar, 1976]


xi+1 = JλT (xi)

Theorem 3.3 [Rockafellar, 1976]

A proximal point method has a linear rate

||xi − xi−1||2 ≤(

1

1 + λµ

)2i

λ2||Tx0||2.



Restarting for Strongly Monotone Operators

Restart the proposed method every k iterations (e.g., [Nesterov, 2013]).

x0,0k iter−→ (x0,k = x1,0)

k iter−→ · · · k iter−→ (xj−1,k = xj,0)k iter−→ · · ·


The proposed accelerated proximal point method with restarting every kiterations has a linear rate

||xj,k − yj,k−1||2 ≤1

λ2µk2||xj−1,k − yj−1,k−1||2

for a maximally and µ-strongly monotone operator T .



Accelerated Douglas-Rachford Splitting Method

Accelerated DRS ∈ Accelerated Proximal Point

Accelerated Douglas-Rachford Splitting (DRS) Method [Kim, 2019]

Initialize x0 = y0 = y−1 ∈ H, λ > 0for i = 0, 1, . . .

xi+1 = JTDRS(yi) = (JρT1 ◦ (2JρT2 − I) + (I − JρT2))yi

yi+1 = xi+1 +i

i+ 2(xi+1 − xi) −

i

i+ 2(xi − yi−1)

Corollary 3.1 [Kim, 2019]

Proposed accelerated DRS satisfies

||xi − yi−1||2 ≤||x0 − x∗||2

i2.



Accelerated ADMM

Accelerated ADMM = Accelerated DRS on Dual

∈ Accelerated DRS ∈ Accelerated Proximal Point

Accelerated ADMM [Kim, 2019]

Initialize x0 ∈ H1, z0 ∈ H2, ν0 ∈ G, ρ > 0.for k = 0, 1, . . .

xi+1 = argminx∈H1

{f(x) + 〈νi, Ax+Bzi − c〉+

ρ

2||Ax+Bzi − c||2

}

ηi =

νi i = 0, 1,

νi +i−1i+1 (νi − νi−1 + ρA(xi+1 − xi))− i−1i+1 (νi−1 − ηi−2 + ρA(xi − xi−1)), i = 2, 3, . . .

zi+1 = argminz∈H2

{g(z) + 〈ηi, Axi+1 +Bz − c〉+

ρ

2||Axi+1 +Bz − c||2

}νi+1 = ηi + ρ(Axi+1 +Bzi+1 − c)



Accelerated ADMM (cont’d)

Corollary 3.2

ADMM satisfies

||Axi+1 +Bzi − c||2 ≤||ν0 + ρA(x0 − c)− ν∗||2

ρ2i.

Corollary 3.3 [Kim, 2019]

Proposed accelerated ADMM satisfies

||Axi+1 +Bzi − c||2 ≤||ν0 + ρA(x0 − c)− ν∗||2

ρ2i2.


4. Numerical Experiments

Numerical Experiment 1

Consider a monotone operator [Gu-Yang, 2019]

T =1√99

[0 1−1 0

],

which is the worst-case for total 100 iterations of proximal point method.

0 20 40 60 80 100

0

0.005

0.01

0.015

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

Markers displayed every 5th iteration.



Numerical Experiments 2

Consider a µ-strongly monotone operator with µ = 0.02:

T =1√99

[0 1−1 0

]+ µ

[1 00 1

]Restarted every 19 iterations.

0 50 100 150 200

10-10

10-5

-0.5 0 0.5 1

-0.5

0

0.5

1

Markers displayed every 5th iteration.



Numerical Experiment 3

Consider a total-variation-regularized problem

minx∈Rd1 ,z∈Rd2

1

2||Hx− b||2 + γ||z||1

subject to Dx− z = 0,

where H ∈ Rp×d1 , b ∈ Rp, and

D =

1 −1 0 0 · · · 00 1 −1 0 · · · 0...

. . .. . .

. . .. . .

......

. . . 0 1 −1 00 · · · · · · 0 1 −1

∈ Rd2×d1 .

d1 = 100, d2 = 99, p = 5, γ = 3.

ρ = 0.05 for ADMM.



Numerical Experiment 3 (cont’d)

0 50 100 150 200

100

102



Numerical Experiment 3 (cont’d)

0 50 100 150 20010

-5

100


5. Accelerated Forward Method

Cocoercive Operators

A single-valued operator T : H → H is β-cocoercive if

〈x− y, Tx− Ty〉 ≥ β||Tx− Ty||2 for all x,y ∈ H.

General Forward Method

Initialize x0 = y0 ∈ H.for i = 0, 1, . . . , N − 1

xi+1 = (I − βT )yi, yi+1 = yi +

i∑k=0

hi+1,k+1(xk+1 − yk)



Performance Estimation Problem (PEP)

Find an accelerated method by solving the minimax problem:

min{hi+1,k+1}

maxT ,

x1,...,xN∈H,y0,...,yN−1∈H

1

R2||xN − yN−1||2

subject to T : H⇒ H is β-cocoercive,

{xi}, {yi} generated by the general forward method,

0 ∈ Tx∗, ||y0 − x∗|| ≤ R.



Proposed Accelerated Forward Method

Proposed Accelerated Forward Method [Kim, 2019]

Initialize x0 = y0 = y−1 ∈ Hfor i = 0, 1, . . .

xi+1 = (I − βT )yi

yi+1 = xi+1 +i

i+ 2(xi+1 − xi) −

i

i+ 2(xi − yi−1)


The proposed accelerated forward method satisfies

||xi − yi−1||2 ≤||x0 − x∗||2

i2.



Numerical Experiment

Consider a 1-cocoercive operator [Kim, 2019]

T =1

100

[1

√99

−√99 1

](+µ

[1 00 1

])which is the worst-case for total 100 iterations of forward method.

0 20 40 60 80 100

0

0.005

0.01

0.015

0 50 100 150 200

10-10

10-5


References

1. Beck and Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverseproblems,” SIAM J. Imaging Sciences, 2009.

2. Brezis and Lions, “Produits infinis de resolvantes,” Israel Journal of Mathematics,1978.

3. Drori and Teboulle, “Performance of first-order methods for smooth convexminimization: a novel approach,” Mathematical Programming, 2014.

4. Eckstein and Bertsekas, “On the Douglas-Rachford splitting method and the proximalpoint algorithm for maximal monotone operators,” Mathematical Programming, 1992.

5. Guler, “New proximal point algorithms for convex minimization,” SIAM J.Optimization, 1992.

6. Gu and Yang, “Optimal nonergodic sublinear convergence rate of proximal pointalgorithm for maximal monotone inclusion problems,” arxiv, 2019.

7. Kim, “Accelerated proximal point method and forward method for monotoneinclusions,” arxiv, 2019.

8. Lions and Mercier, “Splitting algorithms for the sum of two nonlinear operators,”SIAM J. on Numerical Analysis, 1979.

9. Martinet, “Regularisation d’inequations variationnelles par approximationssuccessives,” Rev. Fracaise Informat. Recherche Operationnelle, 1970.

10. Nesterov, “A method for unconstrained convex minimization problem with the rate ofconvergence O(1/k2),” Dokl. Akad. Nauk. USSR, 1983.

References

11. Nesterov, “Gradient methods for minimizing composite functions,” MathematicalProgramming, 2013.

12. Rockafellar, “Monotone operators and the proximal point algorithm,” SIAM J. Controland Optimization, 1976.

13. Ryu, Taylor, Bergeling and Giselsson, “Operator splitting performance estimation:Tight contraction factors and optimal parameter selection, ” arxiv, 2018.

14. Taylor, Hendrickx and Glineur, “Smooth strongly convex interpolation and exactworst-case performance of first-order methods,” Mathematical Programming, 2017.

Thank You! Questions?

-8

25

-6

-4

-2

2025

0

2

15 20

4

6

1510

8

105

5

0 0