Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read •...

61
Convex Optimization CMU-10725 Newton Method Barnabás Póczos & Ryan Tibshirani

Transcript of Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read •...

Page 1: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

Convex Optimization

CMU-10725Newton Method

Barnabás Póczos & Ryan Tibshirani

Page 2: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

2

Administrivia

� Scribing

� Projects

� HW1 solutions

� Feedback about lectures / solutions on blackboard

Page 3: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

3

Books to read

• Boyd and Vandenberghe: Convex Optimization, Chapters 9.5

• Nesterov: Introductory lectures on convex optimization

• Bazaraa, Sherali, Shetty: Nonlinear Programming

• Dimitri P. Bestsekas: Nonlinear Programming

• Wikipedia

• http://www.chiark.greenend.org.uk/~sgtatham/newton/

Page 4: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

4

Goal of this lecture

Newton method

� Finding a root

� Unconstrained minimization

� Motivation with quadratic approximation

� Rate of Newton’s method

� Newton fractals

Next lectures:

� Conjugate gradients

� Quasi Newton Methods

Page 5: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

5

Newton method for finding a root

Page 6: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

6

Newton method for finding a root� Newton method: originally developed for finding a root of a function

� also known as the Newton–Raphson method

Page 7: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

7

History

This is a special case of Newton’s method

Babylonian method:

S=100. starting values x0 = 50, x0 = 1, and x0 = −5.

Page 8: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

8

History

� 1669, Isaac Newton [published in 1711]:

finding roots of polynomials

� 1690, Joseph Raphson:

finding roots of polynomials

� 1740, Thomas Simpson:

solving general nonlinear equation

generalization to systems of two equations

solving optimization problems (gradient = zero)

� 1879, Arthur Cayley:

generalizing the Newton's method to finding complexroots of polynomials

Page 9: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

9

Newton Method for Finding a Root

Linear Approximation (1st order Taylor approx):

Goal:

Therefore,

Page 10: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

10

Illustration of Newton’s method

Goal: finding a root

In the next step we will linearize here in x

Page 11: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

11

Example: Finding a Root

http://en.wikipedia.org/wiki/Newton%27s_method

Page 12: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

12

Newton Method for Finding a Root

This can be generalized to multivariate functions

Therefore,

[Pseudo inverse if there is no inverse]

Page 13: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

13

Newton method for minimization

Page 14: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

14

Newton method for minimization

Iterate until convergence, or max number of iterations exceeded

(divergence, loops, division by zero might happen…)

Page 15: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

15

How good is the Newton method?

Page 16: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

16

Descent direction

Lemma [Descent direction]

Proof:

We know that if a vector has negative inner product with the gradient vector, then that direction is a descent direction

Page 17: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

17

Newton method properties

� Quadratic convergence in the neighborhood of a strict local minimum [under some conditions].

� It can break down if f’’(xk) is degenerate.

[no inverse]

� It can diverge.

� It can be trapped in a loop.

� It can converge to a loop…

Page 18: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

18

Motivation with Quadratic Approximation

Page 19: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

19

Motivation with Quadratic Approximation

Second order Taylor approximation:

Assume that

Newton step:

Page 20: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

20

Motivation with Quadratic Approximation

Page 21: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

21

Convergence rate (f: R→ R case)

Page 22: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

22

Rates

Example:

Superlinear rate:

Quadratic rate:

Sublinear rate:

Example:

Example:

Example:

Page 23: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

23

Finding a root: Convergence rate

Goal:

Assumption:

Taylor theorem:

Therefore,

Page 24: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

24

Finding a root: Convergence rate

We have seen that

Page 25: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

25

Problematic cases

Page 26: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

26

Finding a root: chaotic behavior

Let f(x)=x^3-2x^2-11x+12

Goal: find the roots, (-3, 1, 4), using Newton’s method

2.35287527 converges to 4;

2.35284172 converges to −3;

2.35283735 converges to 4;

2.352836327 converges to −3;

2.352836323 converges to 1.

Page 27: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

27

Finding a root: Cycle

Goal: find its roots

Stating point is important!

Page 28: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

28

Finding a root: divergence everywhere (except in the root)

Newton’s method might never converge (except in the root)!

Page 29: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

29

Finding a root: Linear convergence only

If the first derivative is zero at the root, then convergence might be only linear (not quadratic)

Linear convergence only!

Page 30: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

30

Difficulties in minimization

range of quadratic convergence

Page 31: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

31

Generalizations

� Newton method in Banach spaces

• Newton method on the Banach space of functions

• We need Frechet derivatives

� Newton method on curved manifolds

• E.g. on space of otrthonormal matrices

� Newton method on complex numbers

Page 32: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

32

Newton Fractals

Page 33: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

33

Gradient descent

Page 34: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

34

Compex functions

f(z)= z4-1, Roots: -1, +1, -i, +i

color the starting point according to which root it ended uphttp://www.chiark.greenend.org.uk/~sgtatham/newton/

Page 35: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

35

Basins of attraction

f(z)= z4-1, Roots: -1, +1, -i, +i

Shading according to how many iterations it needed till convergence

http://www.chiark.greenend.org.uk/~sgtatham/newton/

Page 36: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

36

Basins of attractionf’(z)=(z-1)4(z+1)4

Page 37: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

37

No convergence

polynomial f, having the roots at +i, -i, -2.3 and +2.3

Black circles: no convergence, attracting cycle with period 2

Page 38: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

38

Avoiding divergence

Page 39: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

39

Damped Newton method

In order to avoid possible divergence do line search with back tracking

We already know that the Newton direction is descent direction

Page 40: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

40

Classes of differentiable functions

Page 41: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

41

Classes of Differentiable functions

• Any continuous (but otherwise ugly, nondifferentiable) function can be approximated arbitrarily well with smooth (k times differentiable) function

• Assuming smoothness only is not enough to talk about rates…

• We need stronger assumptions on the derivatives. [e.g. their magnitudes behaves nicely]

Page 42: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

42

The CLk,p(Q) Class

Notation

[Lipschitz continuous pth order derivative]

Definition

Page 43: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

43

Trivial LemmasLemma [Linear combinations]

Lemma [Class hierarchy]

Page 44: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

44

Relation between 1st and 2nd derivatives

Lemma

Proof

Therefore,

Special case, Q.E.D

Page 45: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

45

CL2,1(Rn) and the norm of f’’

Lemma [CL2,1 and the norm of f”]

Proof

Q.E.D

Page 46: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

46

CL2,1(Rn) and the norm of f’’

Proving the other direction

Q.E.D

Page 47: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

47

Examples

Page 48: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

48

Error of 1st orderTaylor approx. in CL1,1

Lemma [1st orderTaylor approximation in CL1,1]

Proof

Page 49: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

49

Error of 1st orderTaylor approx. in CL1,1

Q.E.D

Page 50: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

50

Sandwiching with quadratic functions in CL1,1

We have proved:

Corollary [Sandwiching CL1,1 functions with quadratic functions]

Function f can be lower and upper bounded with quadratic functions

Page 51: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

51

CL2,2(Rn) Class

Lemma [Properties of CL2,2 Class]

Proof

[Error of the 1st order approximation of f’]

[Error of the 2nd order approximation of f]

(*1) Definition

(*2) Same as previous lemma [f’ instead of f]

(*3) Similar [Homework]

Page 52: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

52

Sandwiching f’’(y) in CL2,2(Rn)

By definition

Corollary [Sandwiching f’’(y) matrix]

Proof

Q.E.D

Page 53: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

53

Convergence rate of Newton’s method

Page 54: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

54

Convergence rate of Newton’s method

Assumptions

[Local convergence only]

Page 55: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

55

Convergence rate of Newton’s method

We already know:

Therefore,

Newton step:

Trivial identity:

Page 56: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

56

Convergence rate of Newton’s method

Page 57: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

57

Convergence rate of Newton’s methodWe have already proved:

Therefore,

and thus,

Page 58: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

58

Convergence rate of Newton’s method

We already know:

Page 59: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

59

Convergence rate of Newton’s method

The error doesn’t increase!

Now, we have that

Page 60: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

60

Convergence rate of Newton’s method

We have proved the following theorem

Theorem [Rate of Newton’s method]

Quadratic rate!

Page 61: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

61

Summary

Newton method

� Finding a root

� Unconstrained minimization

� Motivation with quadratic approximation

� Rate of Newton’s method

� Newton fractals

Classes of differentiable functions