Unconstrained minimization · 2017-11-14 · Unconstrained minimization I terminology and...

Unconstrained minimization

I terminology and assumptions

I gradient descent method

I steepest descent method

I Newton’s method

I self-concordant functions

I implementation

IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–1

Unconstrained minimization: assumptions

minimize f (x)

I f convex, twice continuously di↵erentiable (hence dom fopen)

I we assume optimal value p? = infx

f (x) is attained (andfinite)

Unconstrained minimization methodsI produce sequence of points x (k) 2 dom f , k = 0, 1, . . . with

f (x (k)) ! p? (commonly, f (x (k)) & p?)

I in practice, terminated in finite time, when f (xk) � p? ✏,where ✏ > 0 is pre-specified

I can be interpreted as iterative methods for solving optimalitycondition

rf (x?) = 0IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–2

Descent methods

x (k+1) = x (k) + t(k)�x (k) with f (x (k+1)) < f (x (k))

I other notation options: x+ = x + t�x , x := x + t�xI �x is the step, or search direction; t is the step size, or step

lengthI from convexity, f (x+) < f (x) implies rf (x)T�x < 0

(i.e., �x is a descent direction)

General descent method.

given a starting point x 2 dom f .repeat

1. Determine a descent direction �x .2. Line search. Choose a step size t > 0.3. Update. x := x + t�x .

until stopping criterion is satisfied.


Further assumptions: Initial point and sublevel set

Algorithms in this chapter require a starting point x (0) such that

I x (0) 2 dom f

I sublevel set S = {x | f (x) f (x (0))} is closed

2nd condition is hard to verify, except when all sublevel sets areclosed:

I equivalent to condition that epi f is closed

I true if dom f = Rn

I true if f (x) ! 1 as x ! bddom f

examples of di↵erentiable functions with closed sublevel sets:

f (x) = log

mX

i=1

exp(aTi

x + bi

)

!, f (x) = �

mX

i=1

log(bi

� aTi

x)


Strong convexity and implicationsf is strongly convex on S if there exists an m > 0 such that

r2f (x) ⌫ mI for all x 2 S

Implications

1. for x , y 2 S , f (y) � f (x) + rf (x)T (y � x) + m

2 kx � yk222. Hence, S is bounded: take any y 2 S and x = x? 2 S in the

above

3. p? > �1, and for x 2 S ,

f (x) � p? 1

2mkrf (x)k22

useful as stopping criterion (if you know m)

4. Also, 8x 2 S ,

kx � x?k2 2

mkrf (x)k2,

and hence opt. solution is unique.IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–5

Descent methods (reprise)

x (k+1) = x (k) + t(k)�x (k) with f (x (k+1)) < f (x (k))

I other notation options: x+ = x + t�x , x := x + t�xI �x is the step, or search direction; t is the step size, or step

lengthI from convexity, f (x+) < f (x) implies rf (x)T�x < 0

(i.e., �x is a descent direction)

General descent method.


1. Determine a descent direction �x .2. Line search. Choose a step size t > 0.3. Update. x := x + t�x .



Line search typesExact line search: t = argmin

t>0 f (x + t�x)Backtracking line search (with parameters ↵ 2 (0, 1/2),� 2 (0, 1))

I starting at t = 1, repeat t := �t until

f (x + t�x) < f (x) + ↵trf (x)T�x

I graphical interpretation: backtrack until t t0

Line search types

exact line search: t = argmint>0 f(x + t∆x)

backtracking line search (with parameters α ∈ (0, 1/2), β ∈ (0, 1))

• starting at t = 1, repeat t := βt until

f(x + t∆x) < f(x) + αt∇f(x)T∆x

• graphical interpretation: backtrack until t ≤ t0

PSfrag replacements

t

f(x + t∆x)

t = 0 t0

f(x) + αt∇f(x)T∆xf(x) + t∇f(x)T∆x

Unconstrained minimization 10–6I after backtracking, stepsize t will satisfy t = 1 or t 2 (�t0, t0]


Gradient descent methodgeneral descent method with �x = �rf (x)


1. �x := �rf (x).2. Line search. Choose step size t via exact or backtracking line search.3. Update. x := x + t�x .


I stopping criterion usually of the form krf (x)k2 ✏I convergence result: for strongly convex f ,

f (x (k)) � p? ck(f (x (0)) � p?)

c 2 (0, 1) depends on m, x (0), line search typeI will terminate in at most

log((f (x0) � p?)/✏)

log(1/c)iterations

I very simple, but often very slow; rarely used in practiceIOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–8

Quadratic problem in R2

f (x) = (1/2)(x21 + �x22 ) (� > 0)

with exact line search, starting at x (0) = (�, 1):

x (k)1 = �

✓� � 1

� + 1

◆k

, x (k)2 =

✓�� 1

� + 1

◆k

I very slow if � � 1 or � ⌧ 1

I example for � = 10:

quadratic problem in R2

f(x) = (1/2)(x21 + γx2

2) (γ > 0)

with exact line search, starting at x(0) = (γ, 1):

x(k)1 = γ

!γ − 1

γ + 1

"k

, x(k)2 =

!−γ − 1

γ + 1

"k

• very slow if γ ≫ 1 or γ ≪ 1

• example for γ = 10:

PSfrag replacements

x1

x2

x(0)

x(1)

−10 0 10

−4

0

4

Unconstrained minimization 10–8IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–9

Nonquadratic example

f (x1, x2) = ex1+3x2�0.1 + ex1�3x2�0.1 + e�x1�0.1

nonquadratic example

f(x1, x2) = ex1+3x2−0.1 + ex1−3x2−0.1 + e−x1−0.1

PSfrag replacements

x(0)

x(1)

x(2)

PSfrag replacements

x(0)

x(1)

backtracking line search exact line search

Unconstrained minimization 10–9


A problem in R100

f (x) = cT x �500X

i=1

log(bi

� aTi

x)

a problem in R100

f(x) = cTx −500!

i=1

log(bi − aTi x)

PSfrag replacements

k

f(x

(k) )

−p

⋆

exact l.s.

backtracking l.s.

0 50 100 150 20010−4

10−2

100

102

104

‘linear’ convergence, i.e., a straight line on a semilog plot


‘linear’ convergence, i.e., a straight line on a semilog plot


Complexity analysis of gradient descentAssumptions and observations

I Assumptions: • f 2 C2(D); • S = {x : f (x) f (x0)} isclosed; • 9m > 0 : r2f (x) ⌫ mI 8x 2 S

I Observations:I 8x , y 2 S , f (y) � f (x) + rf (x)T (y � x) + m

2 kx � yk22I S is boundedI p? > �1, and for x 2 S , f (x) � p? 1

2mkrf (x)k22I 9M � m: r2f (x) � MI 8x 2 S (since S is bdd). This implies:

I 8x , y 2 S , f (y) f (x) +rf (x)T (y � x) + M

2kx � yk22

I Minimizing each side over y gives: 8x 2 S ,

p

? f (x)� 1

2Mkrf (x)k22

I Geometric implication: If S↵ is a sub-level set of f with↵ f (x0), then

cond(S↵) M

m,

where cond(·) is the square of the ratio of maximum tominimum “widths” of a convex set


Complexity analysis of gradient descentAlgorithm analysis

Let x(t) = x � trf (x). Then:

I f (x(t)) f (x) � tkrf (x)k22 + Mt

2

2 krf (x)k22 for any tI If exact line search is used:

I f (x+) mint

{f (x) � tkrf (x)k22 +Mt2

2krf (x)k22} =

f (x) � 1

2Mkrf (x)k22 f (x) � 1

2M· 2m(f (x) � p?)

I We conclude:

f (x+) � p? (f (x) � p?)(1 � m/M)

Linear convergence with c = 1 � m

M

I If backtracking (BT) line search is used:I BT exit condition is guaranteed for t 2 [0, 1/M] (for ↵ < 1/2)I BT terminates either with t = 1 or t � �/MI Using these bounds on t and analysis similar to the above, get

linear convergence with

c = 1 � min{2m↵, 2�↵m/M} < 1.

IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization –13

Steepest descent method

Normalized steepest descent direction (at x , for general normk · k):

�xnsd

= argmin{rf (x)T v | kvk = 1}interpretation: for small v , f (x + v) ⇡ f (x) + rf (x)T v ;direction �x

nsd

is unit-norm step with most negative directionalderivative

(Unnormalized) steepest descent direction

�xsd

= krf (x)k⇤�xnsd

satisfies rf (x)T�xsd

= �krf (x)k2⇤Steepest descent method

I general descent method with �x = �xsd

I convergence properties similar to gradient descent


Examples

I Euclidean norm: �xsd

= �rf (x)

I quadratic norm kxkP

= (xTPx)1/2 (P 2 Sn

++):�x

sd

= �P�1rf (x)

I `1-norm: �xsd

= �(@f (x)/@xi

)ei

, where i is chosen so that|@f (x)/@x

i

| = krf (x)k1unit balls and normalized steepest descent directions for aquadratic norm and the `1-norm:

examples

• Euclidean norm: ∆xsd = −∇f(x)

• quadratic norm ∥x∥P = (xTPx)1/2 (P ∈ Sn++): ∆xsd = −P−1∇f(x)

• ℓ1-norm: ∆xsd = −(∂f(x)/∂xi)ei, where |∂f(x)/∂xi| = ∥∇f(x)∥∞

unit balls and normalized steepest descent directions for a quadratic normand the ℓ1-norm:

PSfrag replacements

−∇f(x)

∆xnsd

PSfrag replacements

−∇f(x)

∆xnsd



Choice of norm for steepest descentchoice of norm for steepest descent

PSfrag replacements

x(0)

x(1)x(2)

PSfrag replacements

x(0)

x(1)

x(2)

• steepest descent with backtracking line search for two quadratic norms

• ellipses show {x | ∥x − x(k)∥P = 1}

• equivalent interpretation of steepest descent with quadratic norm ∥ · ∥P :gradient descent after change of variables x̄ = P 1/2x

shows choice of P has strong effect on speed of convergence


I steepest descent with backtracking line search for twoquadratic norms

I ellipses show {x | kx � x (k)kP

= 1}I equivalent interpretation of steepest descent with quadratic

norm k · kP

: gradient descent after change of variablesx̄ = P1/2x

shows choice of P has strong e↵ect on speed of convergenceIOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–16

Newton step

�xnt

= �r2f (x)�1rf (x)

interpretations

I x + �xnt

minimizes second order approximation

bf (x + v) = f (x) + rf (x)T v +1

2vTr2f (x)v

I x + �xnt

solves linearized optimality condition

rf (x + v) ⇡ rbf (x + v) = rf (x) + r2f (x)v = 0

Newton step

∆xnt = −∇2f(x)−1∇f(x)

interpretations

• x + ∆xnt minimizes second order approximation

!f(x + v) = f(x) + ∇f(x)Tv +1

2vT∇2f(x)v

• x + ∆xnt solves linearized optimality condition

∇f(x + v) ≈ ∇ !f(x + v) = ∇f(x) + ∇2f(x)v = 0

PSfrag replacements

f

bf

(x, f(x))

(x + ∆xnt, f(x + ∆xnt))

PSfrag replacements

f ′

bf ′

(x, f ′(x))

(x + ∆xnt, f ′(x + ∆xnt))

0



Newton step — steepest descent interpretationI �x

nt

is steepest descent direction at x in local Hessian norm

kukr2f (x) =

⇣uTr2f (x)u

⌘1/2• ∆xnt is steepest descent direction at x in local Hessian norm

∥u∥∇2f(x) =!uT∇2f(x)u

"1/2

PSfrag replacements

x

x + ∆xnt

x + ∆xnsd

dashed lines are contour lines of f ; ellipse is {x + v | vT∇2f(x)v = 1}arrow shows −∇f(x)


dashed lines are contour lines of f ;ellipse is {x + v | vTr2f (x)v = 1}arrow shows �rf (x)

I Newton step is a�ne invariant, i.e., independent of a�nechange of coordinates


Newton decrement

�(x) =⇣rf (x)Tr2f (x)�1rf (x)

⌘1/2

a measure of the proximity of x to x?

PropertiesI gives an estimate of f (x) � p?, using quadratic approximationbf :

f (x) � infy

bf (y) = f (x) � bf (x + �xnt

) =1

2�(x)2

I equal to the norm of the Newton step in the quadraticHessian norm

�(x) =⇣�xT

nt

r2f (x)�xnt

⌘1/2

I directional derivative in the Newton direction:rf (x)T�x

nt

= ��(x)2

I a�ne invariant (unlike krf (x)k2)IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–19

Newton’s method

given a starting point x 2 dom f , tolerance ✏ > 0.repeat

1. Compute the Newton step and decrement.�xnt := �r2f (x)�1rf (x); �2 := rf (x)Tr2f (x)�1rf (x).

2. Stopping criterion. quit if �2/2 ✏.3. Line search. Choose step size t by backtracking line search.4. Update. x := x + t�xnt.

A�ne invariant, i.e., independent of linear changes of coordinates:Newton iterates for f̃ (y) = f (By) with starting pointy (0) = B�1x (0) are

y (k) = B�1x (k)


Classical convergence analysis

assumptions

I f strongly convex on S with constant m

I r2f is Lipschitz continuous on S , with constant L > 0:

kr2f (x) � r2f (y)k2 Lkx � yk2(L measures how well f can be approximated by a quadraticfunction)

outline: there exist constants ⌘ 2 (0,m2/L), � > 0 such that

I if krf (x (k))k2 � ⌘, then f (x (k+1)) � f (x (k)) ��I if krf (x (k))k2 < ⌘, then t(k) = 1, and

L

2m2krf (x (k+1))k2

✓L

2m2krf (x (k))k2

◆2


Two phases of Newton’s method:

⌘ 3(1 � 2↵)m2

L, � = ↵�⌘2

m

M2

Damped Newton phase (krf (x)k2 � ⌘)

I most iterations require backtracking steps

I function value decreases by at least �

I if p? > �1, this phase ends after at most (f (x (0)) � p?)/�iterations

Quadratically convergent phase (krf (x)k2 < ⌘)

I all iterations use step size t = 1

I krf (x)k2 converges to zero quadratically: ifkrf (x (k))k2 < ⌘, then

L

2m2krf (x l)k2

✓L

2m2krf (xk)k2

◆2l�k

✓

1

2

◆2l�k

, l � k


Conclusion:

number of iterations until f (x) � p? ✏ is bounded above by

f (x (0)) � p?

�+ log2 log2(✏0/✏)

I �, ✏0 are constants that depend on m, L, x (0)

I second term is small (of the order of 6) and almost constantfor practical purposes

I in practice, constants m, L (hence �, ✏0) are usually unknown

I provides qualitative insight into convergence properties (i.e.,explains two algorithm phases)


Examples

example in R2

Examples

example in R2 (page 10–9)

PSfrag replacements

x(0)

x(1)

PSfrag replacements

k

f(x

(k) )

−p

⋆

0 1 2 3 4 510−15

10−10

10−5

100

105

• backtracking parameters α = 0.1, β = 0.7

• converges in only 5 steps

• quadratic local convergence


I backtracking parameters ↵ = 0.1, � = 0.7

I converges in only 5 steps

I quadratic local convergence


Example in R100

example in R100 (page 10–10)

PSfrag replacements

k

f(x

(k) )

−p

⋆

exact line search

backtracking

0 2 4 6 8 1010−15

10−10

10−5

100

105

PSfrag replacements

k

step

size

t(k)

exact line search

backtracking

0 2 4 6 80

0.5

1

1.5

2

• backtracking parameters α = 0.01, β = 0.5

• backtracking line search almost as fast as exact l.s. (and much simpler)

• clearly shows two phases in algorithm


I backtracking parameters ↵ = 0.01, � = 0.5

I backtracking line search almost as fast as exact l.s. (andmuch simpler)

I clearly shows two phases in algorithm


Example in R10000

(with sparse ai

)

f (x) = �10000X

i=1

log(1 � x2i

) �100000X

i=1

log(bi

� aTi

x)

example in R10000

f(x) = −10000!

i=1

log(1 − x2i ) − log

100000!

i=1

log(bi − aTi x)

PSfrag replacements

k

f(x

(k) )

−p

⋆

0 5 10 15 20

10−5

100

105

• backtracking parameters α = 0.01, β = 0.5.

• performance similar as for small examples


I backtracking parameters ↵ = 0.01, � = 0.5.I performance similar to small examples


Self-concordance

shortcomings of classical convergence analysis

I depends on unknown constants (m, L, . . . )

I bound is not a�nely invariant, although Newton’s method is

convergence analysis via self-concordance (Nesterov andNemirovski)

I does not depend on any unknown constants

I gives a�ne-invariant bound

I applies to special class of convex functions (‘self-concordant’functions)

I developed to analyze polynomial-time interior-point methodsfor convex optimization


Self-concordant functionsdefinition

I Convex f : R ! R is self-concordant if |f 000(x)| 2f 00(x)3/2

for all x 2 dom fI Convex f : Rn ! R is self-concordant if g(t) = f (x + tv) is

self-concordant for all x 2 dom f , v 2 Rn

examples on R

I linear and quadratic functionsI negative logarithm f (x) = � log xI negative entropy plus negative logarithm:

f (x) = x log x � log x

a�ne invariance: if f : R ! R is s.c., then f̃ (y) = f (ay + b) iss.c.:

f̃ 000(y) = a3f 000(ay + b), f̃ 00(y) = a2f 00(ay + b)

be careful: if f is s.c., ↵f may not beIOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–28

Self-concordant calculus

properties

I preserved under positive scaling ↵ � 1, and sum

I preserved under composition with a�ne function

I if g is convex with dom g = R++ and |g 000(x)| 3g 00(x)/xthen

f (x) = � log(�g(x)) � log x

is self-concordant

examples: properties can be used to show that the following ares.c.

I f (x) = �Pm

i=1 log(bi

� aTi

x) on {x | aTi

x < bi

, i = 1, . . . ,m}I f (X ) = � log detX on Sn

++

I f (x) = � log(y2 � xT x) on {(x , y) | kxk2 < y}


Convergence analysis for self-concordant functionssummary: there exist constants ⌘ 2 (0, 1/4], � > 0 such that

I if �(x) > ⌘, then

f (x (k+1)) � f (x (k)) ��I if �(x) ⌘, then

2�(x (k+1)) ⇣2�(x (k))

⌘2

(⌘ and � only depend on backtracking parameters ↵, �)

complexity bound: number of Newton iterations bounded by

f (x (0)) � p?

�+ log2 log2(1/✏)

for ↵ = 0.1, � = 0.8, ✏ = 10�10, bound evaluates to375(f (x (0)) � p?) + 6


Bounding second derivatives of a SC functionI Let � : R ! R be a SCSC functionI SC condition can be restated as

� 1 d

d⌧

⇣�00(⌧)�1/2

⌘ 1 8⌧ 2 dom� (2)

I Assume [0, t] 2 dom�. Integrating (2) between 0 and t, weget

� t Z

t

0

d

d⌧

⇣�00(⌧)�1/2

⌘d⌧ t, (3)

i.e., �t �00(t)�1/2 � �00(0)�1/2 tI Rearranging:

�00(0)

(1 + t�00(0)1/2)2 �00(t) �00(0)

(1 � t�00(0)1/2)2(4)

I the lower bound is valid for any 0 t 2 dom�, and the upperbound is valid for 0 t < �00(0)�1/2


Preliminaries for analysis of the Newton’s algorithmI Newton step: �x

nt

= �r2f (x)�1rf (x)I Newton decrement: �(x) = krf (x)kr2

f (x)�1

I Can show:

�(x) = supv 6=0

�vTrf (x)

kvkr2f (x)

(5)

I Let vTrf (x) < 0 and define �v

(t) = f (x + tv)I �0

v

(0) = rf (x)T v and �00v

(0) = vTr2f (x)vI According to (5),

�(x) � ��0v

(0)�00v

(0)�1/2

(with equality when v = �xnt)I If v = �x

nt

, the exit condition for BT line search withparameters ↵ < 1/2 and � is

�(t) �(0) + ↵t�0(0) = �(0) � ↵t�(x)2 (6)

I (we will drop the subscript of � when v is the Newtondirection)


Termination criterion and bound on suboptimalityI Integrate the lower bound in (4) as applied to �

v

(·) twice:

�v

(t) � �v

(0) + t�0v

(0) + t�00v

(0)1/2 � log(1 + t�00v

(0)1/2) (7)

I The right-hand side of (7) can be minimized analytically, so

inft�0

�v

(t) � �v

(0)��0v

(0)�00v

(0)�1/2+log(1+�v

(0)0�00v

(0)�1/2)

(8)I u + log(1 � u) is decreasing in u, so bound (8) further:

inft�0

�v

(t) � �v

(0) + �(x) + log(1 � �(x)) (9)

I If v is the descent direction that leads from x to x?,

p? � �v

(0)+�(x)+log(1��(x)) � f (x)��(x)2 if �(x) 0.68

I Conclusion:

f (x) � p? �(x)2 for �(x) 0.68, (10)

so termination criterion “�(x)2 ✏” results in a guaranteedsub-optimality bound when applied to a SCSC function


One iteration of Newton’s method on SC functions

I From now on,v = �xnt

, and drop the subscript on �(t)

I We would like to show that there exist (problem-independent)parameters

� > 0 and ⌘ 2 (0, 1/4],

such thatI If �(x) > ⌘, then f (x+) � f (x) ��, andI If �(x) ⌘, then t = 1 and 2�(x+) (2�(x))2

I Useful derivation step: Integrating the upper bound of (4)twice, we get:

�(t) �(0)�t�(x)2�t�(x)�log(1�t�(x)) for t 2 [0, 1/�(x))(11)


One iteration of Newton’s method on SC functionsAnalysis of the Damped Newton phase

I Claim: t̂ = 11+�(x) satisfies the BT exit condition. Using (11):

�(t̂) �(0) � �(x) + log(1 + �(x))

�(0) � �(x)2/(2(1 + �(x))) (true since �(x) � 0)

�(0) � ↵�(x)2/(1 + �(x)) (true since ↵ < 1/2)

= �(0) � ↵�(x)2t̂

I Hence, after BT, t � �1+�(x)

I Combining t � �1+�(x) with (6), we get:

�(t) � �(0) �↵� �(x)2

1 + �(x), (12)

so, set

� = ↵�⌘2

1 + ⌘(13)


One iteration of Newton’s method on SC functionsAnalysis of the quadratically convergence phase

I The following values only depend on BT settings:

⌘ = (1 � 2↵)/4 < 1/4 and � = ↵�⌘2

1 + ⌘(14)

I Returning to (11):I It implies that t = 1 2 dom� if �(x) < 1I Since in this phase �(x) ⌘ < (1 � 2↵)/2, we have

�(1) �(0) � �(x)2 � �(x) � log(1 � �(x))

�(0) � 1

2�(x)2 + �(x)3 (true since 0 �(x) 0.81)

�(0) � ↵�(x)2,

so t = 1 satisfies the BT exit condition.

I Can show: �(x+) �(x)2

(1��(x))2 for x+ = x + �xnt

and SCSC f

I In particular, if �(x) 1/4, we have �(x+) 2�(x)2


One iteration of Newton’s method on SC functionsFinal complexity bound

I Termination criterion: �(x)2 ✏I The algorithm will perform as follows:

I It will spend at most f (x (0))�p

?

� iterations in the dampedNewton’s phase.

I Upon entering the quadratically convergent phase in iterationk , the Newton decrement at iteration l � k will satisfy

2�(x (l)) (2�(x (k)))2l�k (2⌘)2

l�k ✓

1

2

◆2l�k

I Thus, the termination criterion will be satisfied for any l suchthat ✓

1

2

◆2l�k+1

✏,

from which we conclude that at most log2 log2(1/✏) iterationswill be needed in this second phase


Numerical example:150 randomly generated instances of

minimize f (x) = �Pm

i=1 log(bi

� aTi

x)

numerical example: 150 randomly generated instances of

minimize f(x) = −!m

i=1 log(bi − aTi x)

◦: m = 100, n = 50✷: m = 1000, n = 500✸: m = 1000, n = 50

PSfrag replacements

f(x(0)) − p⋆

iter

atio

ns

0 5 10 15 20 25 30 350

5

10

15

20

25

• number of iterations much smaller than 375(f(x(0)) − p⋆) + 6

• bound of the form c(f(x(0)) − p⋆) + 6 with smaller c (empirically) valid


I number of iterations much smaller than 375(f (x (0)) � p?) + 6I bound of the form c(f (x (0)) � p?) + 6 with smaller c

(empirically) valid


Implementation

main e↵ort in each iteration: evaluate derivatives and solveNewton system

H�x = g

where H = r2f (x), g = �rf (x)

via Cholesky factorization

H = LLT , �xnt

= L�TL�1g , �(x) = kL�1gk2

I cost (1/3)n3 flops for unstructured system

I cost ⌧ (1/3)n3 if H sparse, banded


Example of dense Newton system with structure

f (x) =nX

i=1

i

(xi

) + 0(Ax + b), H = D + ATH0A

I assume A 2 Rp⇥n, dense, with p ⌧ nI D diagonal with diagonal elements 00

i

(xi

);H0 = r2 0(Ax + b)

method 1: form H, solve via dense Cholesky factorization: (cost(1/3)n3)method 2: factor H0 = L0LT0 ; write Newton system as

D�x + ATL0w = �g , LT0 A�x � w = 0

eliminate �x from first equation; compute w and �x from

(I + LT0 AD�1ATL0)w = �LT0 AD

�1g , D�x = �g � ATL0w

cost: 2p2n (dominated by computation of LT0 AD�1AL0)


Unconstrained minimization · 2017-11-14 · Unconstrained minimization I terminology and...

Documents

Transcript of Unconstrained minimization · 2017-11-14 · Unconstrained minimization I terminology and...