Accelerated Proximal Point Methodfor Maximally Monotone Operators
Donghwan Kim
KAIST
ICCOPT 2019
Aug 6, 2019
Goal
Goal Accelerate proximal point method for maximally monotone operators
in terms of the fixed-point residual
via the performance estimation problem (PEP) approach.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 1 / 25
1 Monotone Operators
2 Proximal Point Method
3 Accelerated Proximal Point Method
4 Numerical Experiments
5 Accelerated Forward Method
1. Monotone Operators
Monotone Operators
Let H be a real Hilbert space equipped with inner product 〈·, ·〉, andassociated norm || · ||.
A set-valued operator T : H⇒ H is monotone if
〈x− y, Tx− Ty〉 ≥ 0 for all x,y ∈ H.
or more precisely,
〈x− y, u− v〉 ≥ 0 for all x,y ∈ H, u ∈ T (x),v ∈ T (y).
It is said to be maximally monotone if the graph
gra(T ) = {(x,u) ∈ H ×H : u ∈ Tx}
is not properly contained in the graph of any other monotone operator.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 2 / 25
1. Monotone Operators
Monotone Inclusion Problem
Monotone inclusion problem
Find x ∈ H s.t. 0 ∈ Tx,
where T : H⇒ H is maximally monotone.
Subdifferential: T = ∂f for a proper closed convex f
Saddle subdifferential: T =
[∂xφ(x,y)
∂y(−φ(x,y))
]for a convex-concave φ
...
Donghwan Kim (KAIST) Accelerated Proximal Point Method 3 / 25
1 Monotone Operators
2 Proximal Point Method
3 Accelerated Proximal Point Method
4 Numerical Experiments
5 Accelerated Forward Method
2. Proximal Point Method
Proximal Point Method
Proximal Point Method [Martinet, 1970]
Initialize x0 ∈ H, λ > 0.for i = 0, 1, . . .
xi+1 = proxλf (xi) = argminx∈H
{1
2||x− xi||2 + λf(x)
}
Proximal point method on dual problem
= Augmented Lagrangian method
Inspired by [Nesterov, 1983],
an accelerated version was developed in [Guler, 1992],
which is an instance of FISTA [Beck-Teboulle, 2009].
Donghwan Kim (KAIST) Accelerated Proximal Point Method 4 / 25
2. Proximal Point Method
Proximal Point Method for Maximally Monotone Operators
Resolvent operator
JT := (I + T )−1
Proximal Point Method for Maximally Monotone Operators [Rockafellar, 1976]
Initialize x0 ∈ H, λ > 0.for i = 0, 1, . . .
xi+1 = JλT (xi)
Includes the Douglas-Rachford Splitting (DRS) Method (and ADMM).
Only empirical accelerations are known.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 5 / 25
2. Proximal Point Method
Douglas-Rachford Splitting Method
Consider a problem
0 ∈ Tx = (T1 + T2)x,
where JT1and JT2
are efficient than JT .
Douglas-Rachford Splitting (DRS) Method [Lions-Mercier, 1979]
Initialize x0 ∈ H, ρ > 0.for i = 0, 1, . . .
xi+1 = (JρT1◦ (2JρT2
− I) + (I − JρT2))xi
{JρT2(xi)} converges to some zero of T1 + T2
ADMM ∈ DRS ∈ Proximal Point [Eckstein-Bertsekas, 1992]
JTDRS= JρT1
◦ (2JρT2− I) + (I − JρT2
)
where TDRS is maximally monotone.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 6 / 25
2. Proximal Point Method
Goal
Goal Accelerate proximal point method for maximally monotone operators
in terms of the fixed-point residual
via the performance estimation problem (PEP) approach,
and also accelerate DRS and ADMM.
[Ryu-Taylor-Bergeling-Giselsson, 2018] uses PEP to find the optimalparameter for DRS under additional assumptions.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 7 / 25
1 Monotone Operators
2 Proximal Point Method
3 Accelerated Proximal Point Method
4 Numerical Experiments
5 Accelerated Forward Method
3. Accelerated Proximal Point Method
Proximal Point Method for Maximally Monotone Operators
Proximal Point Method [Rockafellar, 1976]
Initialize x0 ∈ H, λ > 0.for i = 0, 1, . . .
xi+1 = JλT (xi)
Theorem 3.1 [Brezis-Lions, 1978]
A proximal point method satisfies
||JλT (xi−1)− xi−1||2︸ ︷︷ ︸Fixed-point residual
= ||xi − xi−1||2 ≤||x0 − x∗||2
i.
Recently improved by a constant(1− 1
i
)i−1via PEP. [Gu-Yang, 2019]
Donghwan Kim (KAIST) Accelerated Proximal Point Method 8 / 25
3. Accelerated Proximal Point Method
Guler’s Accelerated Proximal Point Method
For a convex minimization (with T = ∂f),
an accelerated version was developed in [Guler, 1992].
Guler’s Accelerated Proximal Point Method [Guler, 1992]
Initialize x0 = y0 ∈ H, λ > 0, t0 = 1.for i = 0, 1, . . .
xi+1 = Jλ∂f (yi) = proxλf (yi)
ti+1 =1
2
(1 +
√1 + 4t2i
)yi+1 = xi+1 +
ti − 1
ti+1(xi+1 − xi) (momentum update)
This diverges for some monotone operators T ...
Donghwan Kim (KAIST) Accelerated Proximal Point Method 9 / 25
3. Accelerated Proximal Point Method
Performance Estimation Problem (PEP)
General Proximal Point Method
Initialize x0 = y0 ∈ H, λ > 0.for i = 0, 1, . . . , N − 1
xi+1 = JλT (yi), yi+1 = yi +
i∑k=0
hi+1,k+1(xk+1 − yk)
Using PEP [Drori-Teboulle, 2014, Taylor-Hendrickx-Glineur, 2017],its worst-case rate can be found by solving
maxT ,
x1,...,xN∈H,y0,...,yN−1∈H
1
R2||xN − yN−1||2
subject to T : H⇒ H is maximally monotone,
{xi}, {yi} generated by the general proximal point method,
0 ∈ Tx∗, ||y0 − x∗|| ≤ R.
[Gu-Yang, 2019], [Ryu-Taylor-Bergeling-Giselsson, 2018]
Donghwan Kim (KAIST) Accelerated Proximal Point Method 10 / 25
3. Accelerated Proximal Point Method
Performance Estimation Problem (PEP) (cont’d)
Similar to [Drori-Teboulle, 2014],find an accelerated method by solving the minimax problem:
min{hi+1,k+1}
maxT ,
x1,...,xN∈H,y0,...,yN−1∈H
1
R2||xN − yN−1||2
subject to T : H⇒ H is maximally monotone,
{xi}, {yi} generated by the general proximal point method,
0 ∈ Tx∗, ||y0 − x∗|| ≤ R.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 11 / 25
3. Accelerated Proximal Point Method
Proposed Accelerated Proximal Point Method
Proposed Accelerated Proximal Point Method [Kim, 2019]
Initialize x0 = y0 = y−1 ∈ H, λ > 0for i = 0, 1, . . .
xi+1 = JλT (yi)
yi+1 = xi+1 +i
i+ 2(xi+1 − xi) −
i
i+ 2(xi − yi−1)
Theorem 3.2 [Kim, 2019]
The proposed accelerated proximal point method satisfies
||xi − yi−1||2 ≤||x0 − x∗||2
i2.
Similar to [Nesterov, 1983, Guler, 1992], when the red term is discarded.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 12 / 25
3. Accelerated Proximal Point Method
Strongly Monotone Operators
A set-valued operator T : H⇒ H is µ-strongly monotone if
〈x− y, Tx− Ty〉 ≥ µ||x− y||2 for all x,y ∈ H.
Proximal Point Method [Rockafellar, 1976]
Initialize x0 ∈ H, λ > 0.for i = 0, 1, . . .
xi+1 = JλT (xi)
Theorem 3.3 [Rockafellar, 1976]
A proximal point method has a linear rate
||xi − xi−1||2 ≤(
1
1 + λµ
)2i
λ2||Tx0||2.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 13 / 25
3. Accelerated Proximal Point Method
Restarting for Strongly Monotone Operators
Restart the proposed method every k iterations (e.g., [Nesterov, 2013]).
x0,0k iter−→ (x0,k = x1,0)
k iter−→ · · · k iter−→ (xj−1,k = xj,0)k iter−→ · · ·
Theorem 3.4 [Kim, 2019]
The proposed accelerated proximal point method with restarting every kiterations has a linear rate
||xj,k − yj,k−1||2 ≤1
λ2µk2||xj−1,k − yj−1,k−1||2
for a maximally and µ-strongly monotone operator T .
Donghwan Kim (KAIST) Accelerated Proximal Point Method 14 / 25
3. Accelerated Proximal Point Method
Accelerated Douglas-Rachford Splitting Method
Accelerated DRS ∈ Accelerated Proximal Point
Accelerated Douglas-Rachford Splitting (DRS) Method [Kim, 2019]
Initialize x0 = y0 = y−1 ∈ H, λ > 0for i = 0, 1, . . .
xi+1 = JTDRS(yi) = (JρT1 ◦ (2JρT2 − I) + (I − JρT2))yi
yi+1 = xi+1 +i
i+ 2(xi+1 − xi) −
i
i+ 2(xi − yi−1)
Corollary 3.1 [Kim, 2019]
Proposed accelerated DRS satisfies
||xi − yi−1||2 ≤||x0 − x∗||2
i2.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 15 / 25
3. Accelerated Proximal Point Method
Accelerated ADMM
Accelerated ADMM = Accelerated DRS on Dual
∈ Accelerated DRS ∈ Accelerated Proximal Point
Accelerated ADMM [Kim, 2019]
Initialize x0 ∈ H1, z0 ∈ H2, ν0 ∈ G, ρ > 0.for k = 0, 1, . . .
xi+1 = argminx∈H1
{f(x) + 〈νi, Ax+Bzi − c〉+
ρ
2||Ax+Bzi − c||2
}
ηi =
νi i = 0, 1,
νi +i−1i+1 (νi − νi−1 + ρA(xi+1 − xi))− i−1i+1 (νi−1 − ηi−2 + ρA(xi − xi−1)), i = 2, 3, . . .
zi+1 = argminz∈H2
{g(z) + 〈ηi, Axi+1 +Bz − c〉+
ρ
2||Axi+1 +Bz − c||2
}νi+1 = ηi + ρ(Axi+1 +Bzi+1 − c)
Donghwan Kim (KAIST) Accelerated Proximal Point Method 16 / 25
3. Accelerated Proximal Point Method
Accelerated ADMM (cont’d)
Corollary 3.2
ADMM satisfies
||Axi+1 +Bzi − c||2 ≤||ν0 + ρA(x0 − c)− ν∗||2
ρ2i.
Corollary 3.3 [Kim, 2019]
Proposed accelerated ADMM satisfies
||Axi+1 +Bzi − c||2 ≤||ν0 + ρA(x0 − c)− ν∗||2
ρ2i2.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 17 / 25
1 Monotone Operators
2 Proximal Point Method
3 Accelerated Proximal Point Method
4 Numerical Experiments
5 Accelerated Forward Method
4. Numerical Experiments
Numerical Experiment 1
Consider a monotone operator [Gu-Yang, 2019]
T =1√99
[0 1−1 0
],
which is the worst-case for total 100 iterations of proximal point method.
0 20 40 60 80 100
0
0.005
0.01
0.015
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
Markers displayed every 5th iteration.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 18 / 25
4. Numerical Experiments
Numerical Experiments 2
Consider a µ-strongly monotone operator with µ = 0.02:
T =1√99
[0 1−1 0
]+ µ
[1 00 1
]Restarted every 19 iterations.
0 50 100 150 200
10-10
10-5
-0.5 0 0.5 1
-0.5
0
0.5
1
Markers displayed every 5th iteration.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 19 / 25
4. Numerical Experiments
Numerical Experiment 3
Consider a total-variation-regularized problem
minx∈Rd1 ,z∈Rd2
1
2||Hx− b||2 + γ||z||1
subject to Dx− z = 0,
where H ∈ Rp×d1 , b ∈ Rp, and
D =
1 −1 0 0 · · · 00 1 −1 0 · · · 0...
. . .. . .
. . .. . .
......
. . . 0 1 −1 00 · · · · · · 0 1 −1
∈ Rd2×d1 .
d1 = 100, d2 = 99, p = 5, γ = 3.
ρ = 0.05 for ADMM.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 20 / 25
4. Numerical Experiments
Numerical Experiment 3 (cont’d)
0 50 100 150 200
100
102
Donghwan Kim (KAIST) Accelerated Proximal Point Method 21 / 25
4. Numerical Experiments
Numerical Experiment 3 (cont’d)
0 50 100 150 20010
-5
100
Donghwan Kim (KAIST) Accelerated Proximal Point Method 21 / 25
1 Monotone Operators
2 Proximal Point Method
3 Accelerated Proximal Point Method
4 Numerical Experiments
5 Accelerated Forward Method
5. Accelerated Forward Method
Cocoercive Operators
A single-valued operator T : H → H is β-cocoercive if
〈x− y, Tx− Ty〉 ≥ β||Tx− Ty||2 for all x,y ∈ H.
General Forward Method
Initialize x0 = y0 ∈ H.for i = 0, 1, . . . , N − 1
xi+1 = (I − βT )yi, yi+1 = yi +
i∑k=0
hi+1,k+1(xk+1 − yk)
Donghwan Kim (KAIST) Accelerated Proximal Point Method 22 / 25
5. Accelerated Forward Method
Performance Estimation Problem (PEP)
Find an accelerated method by solving the minimax problem:
min{hi+1,k+1}
maxT ,
x1,...,xN∈H,y0,...,yN−1∈H
1
R2||xN − yN−1||2
subject to T : H⇒ H is β-cocoercive,
{xi}, {yi} generated by the general forward method,
0 ∈ Tx∗, ||y0 − x∗|| ≤ R.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 23 / 25
5. Accelerated Forward Method
Proposed Accelerated Forward Method
Proposed Accelerated Forward Method [Kim, 2019]
Initialize x0 = y0 = y−1 ∈ Hfor i = 0, 1, . . .
xi+1 = (I − βT )yi
yi+1 = xi+1 +i
i+ 2(xi+1 − xi) −
i
i+ 2(xi − yi−1)
Theorem 5.1 [Kim, 2019]
The proposed accelerated forward method satisfies
||xi − yi−1||2 ≤||x0 − x∗||2
i2.
Donghwan Kim (KAIST) Accelerated Proximal Point Method 24 / 25
5. Accelerated Forward Method
Numerical Experiment
Consider a 1-cocoercive operator [Kim, 2019]
T =1
100
[1
√99
−√99 1
](+µ
[1 00 1
])which is the worst-case for total 100 iterations of forward method.
0 20 40 60 80 100
0
0.005
0.01
0.015
0 50 100 150 200
10-10
10-5
Donghwan Kim (KAIST) Accelerated Proximal Point Method 25 / 25
References
1. Beck and Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverseproblems,” SIAM J. Imaging Sciences, 2009.
2. Brezis and Lions, “Produits infinis de resolvantes,” Israel Journal of Mathematics,1978.
3. Drori and Teboulle, “Performance of first-order methods for smooth convexminimization: a novel approach,” Mathematical Programming, 2014.
4. Eckstein and Bertsekas, “On the Douglas-Rachford splitting method and the proximalpoint algorithm for maximal monotone operators,” Mathematical Programming, 1992.
5. Guler, “New proximal point algorithms for convex minimization,” SIAM J.Optimization, 1992.
6. Gu and Yang, “Optimal nonergodic sublinear convergence rate of proximal pointalgorithm for maximal monotone inclusion problems,” arxiv, 2019.
7. Kim, “Accelerated proximal point method and forward method for monotoneinclusions,” arxiv, 2019.
8. Lions and Mercier, “Splitting algorithms for the sum of two nonlinear operators,”SIAM J. on Numerical Analysis, 1979.
9. Martinet, “Regularisation d’inequations variationnelles par approximationssuccessives,” Rev. Fracaise Informat. Recherche Operationnelle, 1970.
10. Nesterov, “A method for unconstrained convex minimization problem with the rate ofconvergence O(1/k2),” Dokl. Akad. Nauk. USSR, 1983.
References
11. Nesterov, “Gradient methods for minimizing composite functions,” MathematicalProgramming, 2013.
12. Rockafellar, “Monotone operators and the proximal point algorithm,” SIAM J. Controland Optimization, 1976.
13. Ryu, Taylor, Bergeling and Giselsson, “Operator splitting performance estimation:Tight contraction factors and optimal parameter selection, ” arxiv, 2018.
14. Taylor, Hendrickx and Glineur, “Smooth strongly convex interpolation and exactworst-case performance of first-order methods,” Mathematical Programming, 2017.
Thank You! Questions?
-8
25
-6
-4
-2
2025
0
2
15 20
4
6
1510
8
105
5
0 0
Top Related