Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex...
Transcript of Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex...
![Page 1: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/1.jpg)
Convex Optimization
CMU-10725Quasi Newton Methods
Barnabás Póczos & Ryan Tibshirani
![Page 2: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/2.jpg)
2
Quasi Newton Methods
![Page 3: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/3.jpg)
3
Outline
� Modified Newton Method
� Rank one correction of the inverse
� Rank two correction of the inverse
� Davidon–Fletcher–Powell Method (DFP)
� Broyden–Fletcher–Goldfarb–Shanno Method (BFGS)
![Page 4: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/4.jpg)
4
Books to Read
David G. Luenberger, Yinyu Ye: Linear and Nonlinear Programming
Nesterov: Introductory lectures on convex optimization
Bazaraa, Sherali, Shetty: Nonlinear Programming
Dimitri P. Bestsekas: Nonlinear Programming
![Page 5: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/5.jpg)
5
Motivation
Quasi Newton:
somewhere between steepest descent and Newton’s method
Evaluation and use of the Hessian matrix is impractical or costly
Idea:
use an approximation to the inverse Hessian.
Motivation:
![Page 6: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/6.jpg)
6
Modified Newton Method
Goal:
Gradient descent:
Newton method:
Modified Newton method: [Method of Deflected Gradients]
Special cases:
![Page 7: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/7.jpg)
7
Modified Newton Method
Lemma [Descent direction]
Proof:
We know that if a vector has negative inner product with the gradient vector, then that direction is a descent direction
![Page 8: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/8.jpg)
8
Quadratic problem
Modified Newton Method update rule:
Lemma [αk in quadratic problems]
![Page 9: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/9.jpg)
9
Quadratic problem
Lemma [αk in quadratic problems]
Proof [αk]
![Page 10: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/10.jpg)
10
Convergence rate (Quadratic case)
Then for the modified Newton method it holds at every step k
Theorem [Convergence rate of the modified Newton method]
Corollary
If Sk-1 is close to Q, then bk is close to Bk, and then convergence is fast
Proof: No time for it…
![Page 11: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/11.jpg)
11
Classical modified Newton’s method
The Hessian at the initial point x0 is used throughout the process.
The effectiveness depends on how fast the Hessian is changing.
Classical modified Newton:
![Page 12: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/12.jpg)
12
Construction of the inverse of the Hessian
![Page 13: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/13.jpg)
13
Construction of the inverse
Idea behind quasi-Newton methods: construct the approximation of the inverse Hessian using information gathered during the process
We show how the inverse Hessian can be built up from gradient information obtained at various points.
In the quadratic case
Notation:
![Page 14: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/14.jpg)
14
Construction of the inverse
Quadratic case:
Goal:
![Page 15: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/15.jpg)
15
Symmetric rank one correction (SR1)
We want an update on Hk such that :
Let us find the update in this form [Rank one correction]
Theorem [Rank one update of Hk]
![Page 16: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/16.jpg)
16
Therefore,
Proof:
Q.E.D.
We already know that
Symmetric rank one correction (SR1)
![Page 17: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/17.jpg)
17
We still have to proof that this update will be good for us:
Theorem [Hk update works]
Corollary
Symmetric rank one correction (SR1)
![Page 18: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/18.jpg)
18
Issues:
Algorithm: [Modified Newton method with rank 1 correction]
Symmetric rank one correction (SR1)
![Page 19: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/19.jpg)
19
Davidon–Fletcher–Powell Method[Rank two correction]
![Page 20: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/20.jpg)
20
Davidon–Fletcher–Powell Method
� For a quadratic objective, it simultaneously generates the directions of the conjugate gradient method while constructing the inverse Hessian.
� At each step the inverse Hessian is updated by the sum of two symmetric rank one matrices. [rank two correction procedure]
� The method is also often referred to as the variable metric method
![Page 21: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/21.jpg)
21
Davidon–Fletcher–Powell Method
![Page 22: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/22.jpg)
22
Davidon–Fletcher–Powell Method
Theorem [Hk is positive definite]
Theorem [DFP is a conjugate direction method]
Corollary [finite step convergence for quadratic functions]
![Page 23: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/23.jpg)
23
Broyden–Fletcher–Goldfarb–Shanno
In DFP, at each step the inverse Hessian is updated by the sum of two symmetric rank one matrices.
BFGS we will estimate the Hessian Q, instead of its inverse
In the quadratic case we already proved:
To estimate H, we used the update:
Therefore, if we switch q and p, then Q can be estimated as well with Qk
![Page 24: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/24.jpg)
24
BFGS
Similarly, we already know that the the DFP update rule for H is
Switching q and p, this can also be used to estimate Q:
In the minimization algorithm, however, we will need an estimator of Q-1
To get an update for Hk+1, let us use the Sherman-Morrison formula twice
![Page 25: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/25.jpg)
25
Sherman-Morrison matrix inversion formula
![Page 26: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/26.jpg)
26
BFGS Algorithm
BFGS is almost the same as DFP, only the H update is different.
In practice BFGS seems to work better than DFP.
![Page 27: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In](https://reader034.fdocuments.us/reader034/viewer/2022051308/5ae76e837f8b9ae1578f076d/html5/thumbnails/27.jpg)
27
Summary
� Modified Newton Method
� Rank one correction of the inverse
� Rank two correction of the inverse
� Davidon–Fletcher–Powell Method (DFP)
� Broyden–Fletcher–Goldfarb–Shanno Method (BFGS)