LECTURE 20 LECTURE OUTLINEdspace.mit.edu/.../contents/lecture-notes/MIT6_235S10_lec20.pdf ·...

LECTURE 20

LECTURE OUTLINE

• Approximation methods

• Cutting plane methods

• Proximal minimization algorithm

• Proximal cutting plane algorithm

Bundle methods •

All figures are courtesy of Athena Scientific, and are used with permission.

APPROXIMATION APPROACHES

• Approximation methods replace the original problem with an approximate problem.

• The approximation may be iteratively refined, for convergence to an exact optimum.

• A partial list of methods: − Cutting plane/outer approximation. − Simplicial decomposition/inner approxima

tion. − Proximal methods (including Augmented La

grangian methods for constrained minimization).

− Interior point methods.

• A partial list of combination of methods: − Combined inner-outer approximation. − Bundle methods (proximal-cutting plane). − Combined proximal-subgradient (incremen

tal option).

SUBGRADIENTS-OUTER APPROXIMATION

Consider minimization of a convex function f :• �n �→ �, over a closed convex set X.

• We assume that at each x ∈ X, a subgradient g of f can be computed.

We have •

f(z) ≥ f(x) + g�(z − x), ∀ z ∈ �n,

so each subgradient defines a plane (a linear function) that approximates f from below.

• The idea of the outer approximation/cutting plane approach is to build an ever more accurate approximation of f using such planes.

CUTTING PLANE METHOD

• Start with any x0 ∈ X. For k ≥ 0, set

xk+1 ∈ arg min Fk(x), x∈X

Fk(x) = max�f(x0)+(x−x0)

�g0, . . . , f(xk)+(x−xk)�gk

and gi is a subgradient of f at xi.

• Note that Fk(x) ≤ f(x) for all x, and that Fk(xk+1) increases monotonically with k. These imply that all limit points of xk are optimal.

Proof: If xk → x then Fk(xk) → f(x), [otherwise there would exist a hyperplane strictly separating epi(f) and (x, limk→∞ Fk(xk))]. This implies that f(x) ≤ limk→∞ Fk(x) ≤ f(x) for all x. Q.E.D.

CONVERGENCE AND TERMINATION

We have for all k•

Fk(xk+1) ≤ f∗ ≤ min f(xi)i≤k

• Termination when mini≤k f(xi)−Fk(xk+1) comes to within some small tolerance.

• For f polyhedral, we have finite termination with an exactly optimal solution.

• Instability problem: The method can make large moves that deteriorate the value of f .

• Starting from the exact minimum it typically moves away from that minimum.

VARIANTS

• Variant I: Simultaneously with f , construct polyhedral approximations to X.

• Variant II: Central cutting plane methods

Variant III: Proximal methods - to be dis• cussed next.

PROXIMAL/BUNDLE METHODS

• Aim to reduce the instability problem at the expense of solving a more difficult subproblem.

• A general form:

xk+1 ∈ arg min �Fk(x) + pk(x)

�g0, . . . , f(xk)+(x−xk)�gk

1 2 pk(x) = 2ck

�x− yk�

where ck is a positive scalar parameter.

We refer to pk(x) as the proximal term, and to• its centeryk as the proximal center .

PROXIMAL MINIMIZATION ALGORITHM

Starting point for analysis: A general algorithm • for convex function minimization

1 xk+1 ∈ arg min

�f(x) +

k �x− xk� 2

x∈�n 2c

n− f : � �→ (−∞,∞] is closed proper convex − ck is a positive scalar parameter − x0 is arbitrary starting point

Convergence mechanism: •

1 2γk = f(xk+1) + 2ck

�xk+1 − xk� < f(xk).

Cost improves by at least 2c1 k �xk+1 −xk�2, and this

is sufficient to guarantee convergence.

RATE OF CONVERGENCE I

Role of penalty parameter ck:•

Role of growth properties of f near optimal • solution set:

RATE OF CONVERGENCE II

Assume that for some scalars β > 0, δ > 0, and • α ≥ 1,

nf∗ + β�d(x)

�α ≤ f(x), ∀ x ∈ � with d(x) ≤ δ

where d(x) = min x − x∗�

x∗∈X∗ �

i.e., growth of order α from optimal solution set X∗.

If α = 2 and limk→∞ ck = c̄, then •

d(xk+1) 1 lim sup

d(xk) ≤

1 + βc̄k→∞

linear convergence. If 1 < α < 2, then •

d(xk+1)lim sup �

d(xk)�1/(α−1)

< ∞k→∞

superlinear convergence.

FINITE CONVERGENCE

Assume growth order α = 1:•

f∗ + βd(x) ≤ f(x), ∀ x ∈ � n ,

e.g., f is polyhedral.

Method converges finitely (in a single step for • c0 sufficiently large).

x2 = x∗

PROXIMAL CUTTING PLANE METHODS

Same as proximal minimization algorithm, but • f is replaced by a cutting plane approximation Fk:

1 xk+1 ∈ arg min

�Fk(x) +

2ck �x− xk� 2

�g0, . . . , f(xk)+(x−xk)�gk

Drawbacks: •

(a) Hard stability tradeoff: For large enough ck and polyhedral X, xk+1 is the exact minimum of Fk over X in a single minimization, so it is identical to the ordinary cutting plane method. For small ck convergence is slow.

(b) The number of subgradients used in Fk

may become very large; the quadratic program may become very time-consuming.

These drawbacks motivate algorithmic variants,• called bundle methods.

BUNDLE METHODS

Allow a proximal center yk = xk:• �

xk+1 ∈ arg min �Fk(x) + pk(x)

� x∈X

�g0, . . . , f(xk)+(x−xk)�gk

1 2 pk(x) = 2ck

�x− yk�

Null/Serious test for changing yk: For some • fixed β ∈ (0, 1)

� xk+1 if f(yk) − f(xk+1) ≥ βδk,

yk+1 = yk if f(yk) − f(xk+1) < βδk,

δk = f(yk) − �Fk(xk+1) + pk(xk+1)

� > 0

MIT OpenCourseWare http://ocw.mit.edu

6.253 Convex Analysis and Optimization Spring 2010

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

LECTURE 20 LECTURE OUTLINEdspace.mit.edu/.../contents/lecture-notes/MIT6_235S10_lec20.pdf ·...

Documents

Transcript of LECTURE 20 LECTURE OUTLINEdspace.mit.edu/.../contents/lecture-notes/MIT6_235S10_lec20.pdf ·...

Solving Structured Sparsity Regularization with Proximal ...lcsl.mit.edu/papers/prox_ECML.pdf · Solving Structured Sparsity Regularization with Proximal Methods 419 On the other

arXiv:2010.02653v1 [math.OC] 6 Oct 2020 · 2020. 10. 7. · popular among ﬁrst-order methods are the proximal algorithms, also known as operator splitting methods. Such methods

Projection and proximal point methods: convergence results ...docserver.carma.newcastle.edu.au/30/2/2003-211.pdf · Projection and proximal point methods: convergence results and

Accelerated primal-dual methods for linearly constrained ... · Accelerated ﬁrst-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J.

Chapter 5 Proximal methodsweb.eecs.umich.edu/~fessler/course/598/l/n-05-prox.pdfChapter 5 Proximal methods Contents (class version) 5.0 Introduction. . . . . . . . . . . . . . . .

Gradient method - PKUbicmr.pku.edu.cn/~wenzw/opt2015/lect-gm.pdf · gradient method, line search subgradient, proximal gradient methods accelerated (proximal) gradient methods decomposition

Block-proximal methods with spatially adapted accelerationetna.mcs.kent.edu/vol.51.2019/pp15-49.dir/pp15-49.pdf · ETNA Kent State University and JohannRadonInstitute(RICAM) BLOCK-PROXIMAL

Stochastic Proximal Methods for Non-Smooth Non-Convex ...

Low Complexity Regularization of Inverse Problems - Course #3 Proximal Splitting Methods

Faster Stochastic Variational Inference using Proximal ... · Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz

CENTRAL PATHS, GENERALIZED PROXIMAL POINT METHODS …w3.impa.br/~benar/docs/iusem/ius-xav.pdf · CENTRAL PATHS, GENERALIZED PROXIMAL POINT METHODS AND CAUCHY TRAJECTORIES IN RIEMANNIAN

Awaveletframemethodwithshapepriorforultrasoundvideosegment ... · Keywords: Variational model; wavelet frames; shape-prior; alternating proximal methods; Gaussian and Poisson noise;

Accelerated Bregman proximal gradient methods for ...

Proximal Deep Structured Modelsfidler/papers/shenlongNIPS16.pdfproximal methods. We then propose proximal deep structured models and discuss how to do efﬁcient inference and learning

Angioplasty of Subclavian Artery Stenosis Proximal to the ... · artery stenosis. Subjects and Methods Nine patients with left subclavian artery stenosis located proximal to the vertebral

Chapter 10 Proximal Splitting Methods in Signal Processingplc/prox.pdf · Chapter 10 Proximal Splitting Methods in Signal Processing Patrick L. Combettes and Jean-Christophe Pesquet

Incremental Gradient, Subgradient, and Proximal Methods ...

Formal Methods lecture 01

Mathematical Methods Lecture Notes

Maths Methods Lecture Notes