Unconstrained minimization · 2017-11-14 · Unconstrained minimization I terminology and...
Transcript of Unconstrained minimization · 2017-11-14 · Unconstrained minimization I terminology and...
Unconstrained minimization
I terminology and assumptions
I gradient descent method
I steepest descent method
I Newton’s method
I self-concordant functions
I implementation
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–1
Unconstrained minimization: assumptions
minimize f (x)
I f convex, twice continuously di↵erentiable (hence dom fopen)
I we assume optimal value p? = infx
f (x) is attained (andfinite)
Unconstrained minimization methodsI produce sequence of points x (k) 2 dom f , k = 0, 1, . . . with
f (x (k)) ! p? (commonly, f (x (k)) & p?)
I in practice, terminated in finite time, when f (xk) � p? ✏,where ✏ > 0 is pre-specified
I can be interpreted as iterative methods for solving optimalitycondition
rf (x?) = 0IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–2
Descent methods
x (k+1) = x (k) + t(k)�x (k) with f (x (k+1)) < f (x (k))
I other notation options: x+ = x + t�x , x := x + t�xI �x is the step, or search direction; t is the step size, or step
lengthI from convexity, f (x+) < f (x) implies rf (x)T�x < 0
(i.e., �x is a descent direction)
General descent method.
given a starting point x 2 dom f .repeat
1. Determine a descent direction �x .2. Line search. Choose a step size t > 0.3. Update. x := x + t�x .
until stopping criterion is satisfied.
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–3
Further assumptions: Initial point and sublevel set
Algorithms in this chapter require a starting point x (0) such that
I x (0) 2 dom f
I sublevel set S = {x | f (x) f (x (0))} is closed
2nd condition is hard to verify, except when all sublevel sets areclosed:
I equivalent to condition that epi f is closed
I true if dom f = Rn
I true if f (x) ! 1 as x ! bddom f
examples of di↵erentiable functions with closed sublevel sets:
f (x) = log
mX
i=1
exp(aTi
x + bi
)
!, f (x) = �
mX
i=1
log(bi
� aTi
x)
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–4
Strong convexity and implicationsf is strongly convex on S if there exists an m > 0 such that
r2f (x) ⌫ mI for all x 2 S
Implications
1. for x , y 2 S , f (y) � f (x) + rf (x)T (y � x) + m
2 kx � yk222. Hence, S is bounded: take any y 2 S and x = x? 2 S in the
above
3. p? > �1, and for x 2 S ,
f (x) � p? 1
2mkrf (x)k22
useful as stopping criterion (if you know m)
4. Also, 8x 2 S ,
kx � x?k2 2
mkrf (x)k2,
and hence opt. solution is unique.IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–5
Descent methods (reprise)
x (k+1) = x (k) + t(k)�x (k) with f (x (k+1)) < f (x (k))
I other notation options: x+ = x + t�x , x := x + t�xI �x is the step, or search direction; t is the step size, or step
lengthI from convexity, f (x+) < f (x) implies rf (x)T�x < 0
(i.e., �x is a descent direction)
General descent method.
given a starting point x 2 dom f .repeat
1. Determine a descent direction �x .2. Line search. Choose a step size t > 0.3. Update. x := x + t�x .
until stopping criterion is satisfied.
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–6
Line search typesExact line search: t = argmin
t>0 f (x + t�x)Backtracking line search (with parameters ↵ 2 (0, 1/2),� 2 (0, 1))
I starting at t = 1, repeat t := �t until
f (x + t�x) < f (x) + ↵trf (x)T�x
I graphical interpretation: backtrack until t t0
Line search types
exact line search: t = argmint>0 f(x + t∆x)
backtracking line search (with parameters α ∈ (0, 1/2), β ∈ (0, 1))
• starting at t = 1, repeat t := βt until
f(x + t∆x) < f(x) + αt∇f(x)T∆x
• graphical interpretation: backtrack until t ≤ t0
PSfrag replacements
t
f(x + t∆x)
t = 0 t0
f(x) + αt∇f(x)T∆xf(x) + t∇f(x)T∆x
Unconstrained minimization 10–6I after backtracking, stepsize t will satisfy t = 1 or t 2 (�t0, t0]
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–7
Gradient descent methodgeneral descent method with �x = �rf (x)
given a starting point x 2 dom f .repeat
1. �x := �rf (x).2. Line search. Choose step size t via exact or backtracking line search.3. Update. x := x + t�x .
until stopping criterion is satisfied.
I stopping criterion usually of the form krf (x)k2 ✏I convergence result: for strongly convex f ,
f (x (k)) � p? ck(f (x (0)) � p?)
c 2 (0, 1) depends on m, x (0), line search typeI will terminate in at most
log((f (x0) � p?)/✏)
log(1/c)iterations
I very simple, but often very slow; rarely used in practiceIOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–8
Quadratic problem in R2
f (x) = (1/2)(x21 + �x22 ) (� > 0)
with exact line search, starting at x (0) = (�, 1):
x (k)1 = �
✓� � 1
� + 1
◆k
, x (k)2 =
✓�� � 1
� + 1
◆k
I very slow if � � 1 or � ⌧ 1
I example for � = 10:
quadratic problem in R2
f(x) = (1/2)(x21 + γx2
2) (γ > 0)
with exact line search, starting at x(0) = (γ, 1):
x(k)1 = γ
!γ − 1
γ + 1
"k
, x(k)2 =
!−γ − 1
γ + 1
"k
• very slow if γ ≫ 1 or γ ≪ 1
• example for γ = 10:
PSfrag replacements
x1
x2
x(0)
x(1)
−10 0 10
−4
0
4
Unconstrained minimization 10–8IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–9
Nonquadratic example
f (x1, x2) = ex1+3x2�0.1 + ex1�3x2�0.1 + e�x1�0.1
nonquadratic example
f(x1, x2) = ex1+3x2−0.1 + ex1−3x2−0.1 + e−x1−0.1
PSfrag replacements
x(0)
x(1)
x(2)
PSfrag replacements
x(0)
x(1)
backtracking line search exact line search
Unconstrained minimization 10–9
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–10
A problem in R100
f (x) = cT x �500X
i=1
log(bi
� aTi
x)
a problem in R100
f(x) = cTx −500!
i=1
log(bi − aTi x)
PSfrag replacements
k
f(x
(k) )
−p
⋆
exact l.s.
backtracking l.s.
0 50 100 150 20010−4
10−2
100
102
104
‘linear’ convergence, i.e., a straight line on a semilog plot
Unconstrained minimization 10–10
‘linear’ convergence, i.e., a straight line on a semilog plot
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–11
Complexity analysis of gradient descentAssumptions and observations
I Assumptions: • f 2 C2(D); • S = {x : f (x) f (x0)} isclosed; • 9m > 0 : r2f (x) ⌫ mI 8x 2 S
I Observations:I 8x , y 2 S , f (y) � f (x) + rf (x)T (y � x) + m
2 kx � yk22I S is boundedI p? > �1, and for x 2 S , f (x) � p? 1
2mkrf (x)k22I 9M � m: r2f (x) � MI 8x 2 S (since S is bdd). This implies:
I 8x , y 2 S , f (y) f (x) +rf (x)T (y � x) + M
2kx � yk22
I Minimizing each side over y gives: 8x 2 S ,
p
? f (x)� 1
2Mkrf (x)k22
I Geometric implication: If S↵ is a sub-level set of f with↵ f (x0), then
cond(S↵) M
m,
where cond(·) is the square of the ratio of maximum tominimum “widths” of a convex set
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–12
Complexity analysis of gradient descentAlgorithm analysis
Let x(t) = x � trf (x). Then:
I f (x(t)) f (x) � tkrf (x)k22 + Mt
2
2 krf (x)k22 for any tI If exact line search is used:
I f (x+) mint
{f (x) � tkrf (x)k22 +Mt2
2krf (x)k22} =
f (x) � 1
2Mkrf (x)k22 f (x) � 1
2M· 2m(f (x) � p?)
I We conclude:
f (x+) � p? (f (x) � p?)(1 � m/M)
Linear convergence with c = 1 � m
M
I If backtracking (BT) line search is used:I BT exit condition is guaranteed for t 2 [0, 1/M] (for ↵ < 1/2)I BT terminates either with t = 1 or t � �/MI Using these bounds on t and analysis similar to the above, get
linear convergence with
c = 1 � min{2m↵, 2�↵m/M} < 1.
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–13
Steepest descent method
Normalized steepest descent direction (at x , for general normk · k):
�xnsd
= argmin{rf (x)T v | kvk = 1}interpretation: for small v , f (x + v) ⇡ f (x) + rf (x)T v ;direction �x
nsd
is unit-norm step with most negative directionalderivative
(Unnormalized) steepest descent direction
�xsd
= krf (x)k⇤�xnsd
satisfies rf (x)T�xsd
= �krf (x)k2⇤Steepest descent method
I general descent method with �x = �xsd
I convergence properties similar to gradient descent
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–14
Examples
I Euclidean norm: �xsd
= �rf (x)
I quadratic norm kxkP
= (xTPx)1/2 (P 2 Sn
++):�x
sd
= �P�1rf (x)
I `1-norm: �xsd
= �(@f (x)/@xi
)ei
, where i is chosen so that|@f (x)/@x
i
| = krf (x)k1unit balls and normalized steepest descent directions for aquadratic norm and the `1-norm:
examples
• Euclidean norm: ∆xsd = −∇f(x)
• quadratic norm ∥x∥P = (xTPx)1/2 (P ∈ Sn++): ∆xsd = −P−1∇f(x)
• ℓ1-norm: ∆xsd = −(∂f(x)/∂xi)ei, where |∂f(x)/∂xi| = ∥∇f(x)∥∞
unit balls and normalized steepest descent directions for a quadratic normand the ℓ1-norm:
PSfrag replacements
−∇f(x)
∆xnsd
PSfrag replacements
−∇f(x)
∆xnsd
Unconstrained minimization 10–12
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–15
Choice of norm for steepest descentchoice of norm for steepest descent
PSfrag replacements
x(0)
x(1)x(2)
PSfrag replacements
x(0)
x(1)
x(2)
• steepest descent with backtracking line search for two quadratic norms
• ellipses show {x | ∥x − x(k)∥P = 1}
• equivalent interpretation of steepest descent with quadratic norm ∥ · ∥P :gradient descent after change of variables x̄ = P 1/2x
shows choice of P has strong effect on speed of convergence
Unconstrained minimization 10–13
I steepest descent with backtracking line search for twoquadratic norms
I ellipses show {x | kx � x (k)kP
= 1}I equivalent interpretation of steepest descent with quadratic
norm k · kP
: gradient descent after change of variablesx̄ = P1/2x
shows choice of P has strong e↵ect on speed of convergenceIOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–16
Newton step
�xnt
= �r2f (x)�1rf (x)
interpretations
I x + �xnt
minimizes second order approximation
bf (x + v) = f (x) + rf (x)T v +1
2vTr2f (x)v
I x + �xnt
solves linearized optimality condition
rf (x + v) ⇡ rbf (x + v) = rf (x) + r2f (x)v = 0
Newton step
∆xnt = −∇2f(x)−1∇f(x)
interpretations
• x + ∆xnt minimizes second order approximation
!f(x + v) = f(x) + ∇f(x)Tv +1
2vT∇2f(x)v
• x + ∆xnt solves linearized optimality condition
∇f(x + v) ≈ ∇ !f(x + v) = ∇f(x) + ∇2f(x)v = 0
PSfrag replacements
f
bf
(x, f(x))
(x + ∆xnt, f(x + ∆xnt))
PSfrag replacements
f ′
bf ′
(x, f ′(x))
(x + ∆xnt, f ′(x + ∆xnt))
0
Unconstrained minimization 10–14
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–17
Newton step — steepest descent interpretationI �x
nt
is steepest descent direction at x in local Hessian norm
kukr2f (x) =
⇣uTr2f (x)u
⌘1/2• ∆xnt is steepest descent direction at x in local Hessian norm
∥u∥∇2f(x) =!uT∇2f(x)u
"1/2
PSfrag replacements
x
x + ∆xnt
x + ∆xnsd
dashed lines are contour lines of f ; ellipse is {x + v | vT∇2f(x)v = 1}arrow shows −∇f(x)
Unconstrained minimization 10–15
dashed lines are contour lines of f ;ellipse is {x + v | vTr2f (x)v = 1}arrow shows �rf (x)
I Newton step is a�ne invariant, i.e., independent of a�nechange of coordinates
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–18
Newton decrement
�(x) =⇣rf (x)Tr2f (x)�1rf (x)
⌘1/2
a measure of the proximity of x to x?
PropertiesI gives an estimate of f (x) � p?, using quadratic approximationbf :
f (x) � infy
bf (y) = f (x) � bf (x + �xnt
) =1
2�(x)2
I equal to the norm of the Newton step in the quadraticHessian norm
�(x) =⇣�xT
nt
r2f (x)�xnt
⌘1/2
I directional derivative in the Newton direction:rf (x)T�x
nt
= ��(x)2
I a�ne invariant (unlike krf (x)k2)IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–19
Newton’s method
given a starting point x 2 dom f , tolerance ✏ > 0.repeat
1. Compute the Newton step and decrement.�xnt := �r2f (x)�1rf (x); �2 := rf (x)Tr2f (x)�1rf (x).
2. Stopping criterion. quit if �2/2 ✏.3. Line search. Choose step size t by backtracking line search.4. Update. x := x + t�xnt.
A�ne invariant, i.e., independent of linear changes of coordinates:Newton iterates for f̃ (y) = f (By) with starting pointy (0) = B�1x (0) are
y (k) = B�1x (k)
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–20
Classical convergence analysis
assumptions
I f strongly convex on S with constant m
I r2f is Lipschitz continuous on S , with constant L > 0:
kr2f (x) � r2f (y)k2 Lkx � yk2(L measures how well f can be approximated by a quadraticfunction)
outline: there exist constants ⌘ 2 (0,m2/L), � > 0 such that
I if krf (x (k))k2 � ⌘, then f (x (k+1)) � f (x (k)) ��I if krf (x (k))k2 < ⌘, then t(k) = 1, and
L
2m2krf (x (k+1))k2
✓L
2m2krf (x (k))k2
◆2
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–21
Two phases of Newton’s method:
⌘ 3(1 � 2↵)m2
L, � = ↵�⌘2
m
M2
Damped Newton phase (krf (x)k2 � ⌘)
I most iterations require backtracking steps
I function value decreases by at least �
I if p? > �1, this phase ends after at most (f (x (0)) � p?)/�iterations
Quadratically convergent phase (krf (x)k2 < ⌘)
I all iterations use step size t = 1
I krf (x)k2 converges to zero quadratically: ifkrf (x (k))k2 < ⌘, then
L
2m2krf (x l)k2
✓L
2m2krf (xk)k2
◆2l�k
✓
1
2
◆2l�k
, l � k
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–22
Conclusion:
number of iterations until f (x) � p? ✏ is bounded above by
f (x (0)) � p?
�+ log2 log2(✏0/✏)
I �, ✏0 are constants that depend on m, L, x (0)
I second term is small (of the order of 6) and almost constantfor practical purposes
I in practice, constants m, L (hence �, ✏0) are usually unknown
I provides qualitative insight into convergence properties (i.e.,explains two algorithm phases)
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–23
Examples
example in R2
Examples
example in R2 (page 10–9)
PSfrag replacements
x(0)
x(1)
PSfrag replacements
k
f(x
(k) )
−p
⋆
0 1 2 3 4 510−15
10−10
10−5
100
105
• backtracking parameters α = 0.1, β = 0.7
• converges in only 5 steps
• quadratic local convergence
Unconstrained minimization 10–21
I backtracking parameters ↵ = 0.1, � = 0.7
I converges in only 5 steps
I quadratic local convergence
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–24
Example in R100
example in R100 (page 10–10)
PSfrag replacements
k
f(x
(k) )
−p
⋆
exact line search
backtracking
0 2 4 6 8 1010−15
10−10
10−5
100
105
PSfrag replacements
k
step
size
t(k)
exact line search
backtracking
0 2 4 6 80
0.5
1
1.5
2
• backtracking parameters α = 0.01, β = 0.5
• backtracking line search almost as fast as exact l.s. (and much simpler)
• clearly shows two phases in algorithm
Unconstrained minimization 10–22
I backtracking parameters ↵ = 0.01, � = 0.5
I backtracking line search almost as fast as exact l.s. (andmuch simpler)
I clearly shows two phases in algorithm
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–25
Example in R10000
(with sparse ai
)
f (x) = �10000X
i=1
log(1 � x2i
) �100000X
i=1
log(bi
� aTi
x)
example in R10000
f(x) = −10000!
i=1
log(1 − x2i ) − log
100000!
i=1
log(bi − aTi x)
PSfrag replacements
k
f(x
(k) )
−p
⋆
0 5 10 15 20
10−5
100
105
• backtracking parameters α = 0.01, β = 0.5.
• performance similar as for small examples
Unconstrained minimization 10–23
I backtracking parameters ↵ = 0.01, � = 0.5.I performance similar to small examples
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–26
Self-concordance
shortcomings of classical convergence analysis
I depends on unknown constants (m, L, . . . )
I bound is not a�nely invariant, although Newton’s method is
convergence analysis via self-concordance (Nesterov andNemirovski)
I does not depend on any unknown constants
I gives a�ne-invariant bound
I applies to special class of convex functions (‘self-concordant’functions)
I developed to analyze polynomial-time interior-point methodsfor convex optimization
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–27
Self-concordant functionsdefinition
I Convex f : R ! R is self-concordant if |f 000(x)| 2f 00(x)3/2
for all x 2 dom fI Convex f : Rn ! R is self-concordant if g(t) = f (x + tv) is
self-concordant for all x 2 dom f , v 2 Rn
examples on R
I linear and quadratic functionsI negative logarithm f (x) = � log xI negative entropy plus negative logarithm:
f (x) = x log x � log x
a�ne invariance: if f : R ! R is s.c., then f̃ (y) = f (ay + b) iss.c.:
f̃ 000(y) = a3f 000(ay + b), f̃ 00(y) = a2f 00(ay + b)
be careful: if f is s.c., ↵f may not beIOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–28
Self-concordant calculus
properties
I preserved under positive scaling ↵ � 1, and sum
I preserved under composition with a�ne function
I if g is convex with dom g = R++ and |g 000(x)| 3g 00(x)/xthen
f (x) = � log(�g(x)) � log x
is self-concordant
examples: properties can be used to show that the following ares.c.
I f (x) = �Pm
i=1 log(bi
� aTi
x) on {x | aTi
x < bi
, i = 1, . . . ,m}I f (X ) = � log detX on Sn
++
I f (x) = � log(y2 � xT x) on {(x , y) | kxk2 < y}
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–29
Convergence analysis for self-concordant functionssummary: there exist constants ⌘ 2 (0, 1/4], � > 0 such that
I if �(x) > ⌘, then
f (x (k+1)) � f (x (k)) ��I if �(x) ⌘, then
2�(x (k+1)) ⇣2�(x (k))
⌘2
(⌘ and � only depend on backtracking parameters ↵, �)
complexity bound: number of Newton iterations bounded by
f (x (0)) � p?
�+ log2 log2(1/✏)
for ↵ = 0.1, � = 0.8, ✏ = 10�10, bound evaluates to375(f (x (0)) � p?) + 6
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–30
Bounding second derivatives of a SC functionI Let � : R ! R be a SCSC functionI SC condition can be restated as
� 1 d
d⌧
⇣�00(⌧)�1/2
⌘ 1 8⌧ 2 dom� (2)
I Assume [0, t] 2 dom�. Integrating (2) between 0 and t, weget
� t Z
t
0
d
d⌧
⇣�00(⌧)�1/2
⌘d⌧ t, (3)
i.e., �t �00(t)�1/2 � �00(0)�1/2 tI Rearranging:
�00(0)
(1 + t�00(0)1/2)2 �00(t) �00(0)
(1 � t�00(0)1/2)2(4)
I the lower bound is valid for any 0 t 2 dom�, and the upperbound is valid for 0 t < �00(0)�1/2
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–31
Preliminaries for analysis of the Newton’s algorithmI Newton step: �x
nt
= �r2f (x)�1rf (x)I Newton decrement: �(x) = krf (x)kr2
f (x)�1
I Can show:
�(x) = supv 6=0
�vTrf (x)
kvkr2f (x)
(5)
I Let vTrf (x) < 0 and define �v
(t) = f (x + tv)I �0
v
(0) = rf (x)T v and �00v
(0) = vTr2f (x)vI According to (5),
�(x) � ��0v
(0)�00v
(0)�1/2
(with equality when v = �xnt)I If v = �x
nt
, the exit condition for BT line search withparameters ↵ < 1/2 and � is
�(t) �(0) + ↵t�0(0) = �(0) � ↵t�(x)2 (6)
I (we will drop the subscript of � when v is the Newtondirection)
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–32
Termination criterion and bound on suboptimalityI Integrate the lower bound in (4) as applied to �
v
(·) twice:
�v
(t) � �v
(0) + t�0v
(0) + t�00v
(0)1/2 � log(1 + t�00v
(0)1/2) (7)
I The right-hand side of (7) can be minimized analytically, so
inft�0
�v
(t) � �v
(0)��0v
(0)�00v
(0)�1/2+log(1+�v
(0)0�00v
(0)�1/2)
(8)I u + log(1 � u) is decreasing in u, so bound (8) further:
inft�0
�v
(t) � �v
(0) + �(x) + log(1 � �(x)) (9)
I If v is the descent direction that leads from x to x?,
p? � �v
(0)+�(x)+log(1��(x)) � f (x)��(x)2 if �(x) 0.68
I Conclusion:
f (x) � p? �(x)2 for �(x) 0.68, (10)
so termination criterion “�(x)2 ✏” results in a guaranteedsub-optimality bound when applied to a SCSC function
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–33
One iteration of Newton’s method on SC functions
I From now on,v = �xnt
, and drop the subscript on �(t)
I We would like to show that there exist (problem-independent)parameters
� > 0 and ⌘ 2 (0, 1/4],
such thatI If �(x) > ⌘, then f (x+) � f (x) ��, andI If �(x) ⌘, then t = 1 and 2�(x+) (2�(x))2
I Useful derivation step: Integrating the upper bound of (4)twice, we get:
�(t) �(0)�t�(x)2�t�(x)�log(1�t�(x)) for t 2 [0, 1/�(x))(11)
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–34
One iteration of Newton’s method on SC functionsAnalysis of the Damped Newton phase
I Claim: t̂ = 11+�(x) satisfies the BT exit condition. Using (11):
�(t̂) �(0) � �(x) + log(1 + �(x))
�(0) � �(x)2/(2(1 + �(x))) (true since �(x) � 0)
�(0) � ↵�(x)2/(1 + �(x)) (true since ↵ < 1/2)
= �(0) � ↵�(x)2t̂
I Hence, after BT, t � �1+�(x)
I Combining t � �1+�(x) with (6), we get:
�(t) � �(0) �↵� �(x)2
1 + �(x), (12)
so, set
� = ↵�⌘2
1 + ⌘(13)
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–35
One iteration of Newton’s method on SC functionsAnalysis of the quadratically convergence phase
I The following values only depend on BT settings:
⌘ = (1 � 2↵)/4 < 1/4 and � = ↵�⌘2
1 + ⌘(14)
I Returning to (11):I It implies that t = 1 2 dom� if �(x) < 1I Since in this phase �(x) ⌘ < (1 � 2↵)/2, we have
�(1) �(0) � �(x)2 � �(x) � log(1 � �(x))
�(0) � 1
2�(x)2 + �(x)3 (true since 0 �(x) 0.81)
�(0) � ↵�(x)2,
so t = 1 satisfies the BT exit condition.
I Can show: �(x+) �(x)2
(1��(x))2 for x+ = x + �xnt
and SCSC f
I In particular, if �(x) 1/4, we have �(x+) 2�(x)2
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–36
One iteration of Newton’s method on SC functionsFinal complexity bound
I Termination criterion: �(x)2 ✏I The algorithm will perform as follows:
I It will spend at most f (x (0))�p
?
� iterations in the dampedNewton’s phase.
I Upon entering the quadratically convergent phase in iterationk , the Newton decrement at iteration l � k will satisfy
2�(x (l)) (2�(x (k)))2l�k (2⌘)2
l�k ✓
1
2
◆2l�k
I Thus, the termination criterion will be satisfied for any l suchthat ✓
1
2
◆2l�k+1
✏,
from which we conclude that at most log2 log2(1/✏) iterationswill be needed in this second phase
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–37
Numerical example:150 randomly generated instances of
minimize f (x) = �Pm
i=1 log(bi
� aTi
x)
numerical example: 150 randomly generated instances of
minimize f(x) = −!m
i=1 log(bi − aTi x)
◦: m = 100, n = 50✷: m = 1000, n = 500✸: m = 1000, n = 50
PSfrag replacements
f(x(0)) − p⋆
iter
atio
ns
0 5 10 15 20 25 30 350
5
10
15
20
25
• number of iterations much smaller than 375(f(x(0)) − p⋆) + 6
• bound of the form c(f(x(0)) − p⋆) + 6 with smaller c (empirically) valid
Unconstrained minimization 10–28
I number of iterations much smaller than 375(f (x (0)) � p?) + 6I bound of the form c(f (x (0)) � p?) + 6 with smaller c
(empirically) valid
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–38
Implementation
main e↵ort in each iteration: evaluate derivatives and solveNewton system
H�x = g
where H = r2f (x), g = �rf (x)
via Cholesky factorization
H = LLT , �xnt
= L�TL�1g , �(x) = kL�1gk2
I cost (1/3)n3 flops for unstructured system
I cost ⌧ (1/3)n3 if H sparse, banded
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–39
Example of dense Newton system with structure
f (x) =nX
i=1
i
(xi
) + 0(Ax + b), H = D + ATH0A
I assume A 2 Rp⇥n, dense, with p ⌧ nI D diagonal with diagonal elements 00
i
(xi
);H0 = r2 0(Ax + b)
method 1: form H, solve via dense Cholesky factorization: (cost(1/3)n3)method 2: factor H0 = L0LT0 ; write Newton system as
D�x + ATL0w = �g , LT0 A�x � w = 0
eliminate �x from first equation; compute w and �x from
(I + LT0 AD�1ATL0)w = �LT0 AD
�1g , D�x = �g � ATL0w
cost: 2p2n (dominated by computation of LT0 AD�1AL0)
IOE 611: Nonlinear Programming, Fall 2017 7. Unconstrained minimization Page 7–40