Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex...
Transcript of Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex...
![Page 1: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/1.jpg)
Convex Optimization
CMU-10725Conjugate Direction Methods
Barnabás Póczos & Ryan Tibshirani
![Page 2: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/2.jpg)
2
Conjugate Direction Methods
![Page 3: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/3.jpg)
3
Books to Read
David G. Luenberger, Yinyu Ye: Linear and Nonlinear Programming
Nesterov: Introductory lectures on convex optimization
Bazaraa, Sherali, Shetty: Nonlinear Programming
Dimitri P. Bestsekas: Nonlinear Programming
![Page 4: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/4.jpg)
4
MotivationConjugate direction methods can be regarded as being between
the method of steepest descent and Newton’s method.
Motivation:
� steepest descent is slow. Goal: Accelerate it!
� Newton method is fast… BUT:we need to calculate the inverse of the Hessian matrix…
Something between steepest descent and Newton method?
![Page 5: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/5.jpg)
5
Conjugate Direction Methods
Goal:
• Accelerate the convergence rate of steepest descent
• while avoiding the high computational cost of Newton’s method
Originally developed for solving the quadratic problem:
Equivalently, our goal is to solve:
Conjugate direction methods can solve this problem at most n iterations (usually for large n less is enough)
![Page 6: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/6.jpg)
6
Conjugate Direction Methods
� Algorithm for the numerical solution of linear equations, whose matrix Q is symmetric and positive-definite.
� An iterative method, so it can be applied to systems that are too large to be handled by direct methods (such as the Cholesky decomposition.)
� Algorithm for seeking minima of nonlinear equations.
![Page 7: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/7.jpg)
7
Conjugate directions
� In the applications that we consider, the matrix Q will be positive definite but this is not inherent in the basic definition.
� If Q = 0, any two vectors are conjugate.
� if Q = I, conjugacy is equivalent to the usual notion of orthogonality.
Definition [Q-conjugate directions]
![Page 8: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/8.jpg)
8
Linear independence lemma
Lemma [Linear Independence]
Proof: [Proof by contradiction]
![Page 9: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/9.jpg)
9
Why is Q-conjugacy useful?
![Page 10: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/10.jpg)
10
Quadratic problem
Goal:
the unique solution to this problem is also the unique solution to the linear equation:
![Page 11: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/11.jpg)
11
Importance of Q-conjugancy
Therefore,
Qx*=b is the step where standard orthogonality is not enough anymore,
We need to use Q-conjugacy.
![Page 12: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/12.jpg)
12
Importance of Q-conjugancy
No need to do matrix inversion! We only need to calculate inner products.
This can be generalized further such a way that thestarting point of the iteration can be arbitrary x0
![Page 13: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/13.jpg)
13
Conjugate Direction Theorem
Theorem [Conjugate Direction Theorem]
In the previous slide we had
No need to do matrix inversion! We only need to calculate inner products.
[update rule]
[gradient of f]
![Page 14: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/14.jpg)
14
Proof
Therefore, it is enough to prove that with these αk values we have
![Page 15: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/15.jpg)
15
Proof
Therefore,
We already know
Q.E.D.
![Page 16: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/16.jpg)
16
Another motivation for Q-conjugacy
Goal:
Therefore,
n separate 1-dimensional optimization problems!
![Page 17: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/17.jpg)
17
Expanding Subspace Theorem
![Page 18: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/18.jpg)
18
Expanding Subspace Theorem
![Page 19: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/19.jpg)
19
Expanding Subspace Theorem
Theorem [Expanding Subspace Theorem]
![Page 20: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/20.jpg)
20
Expanding Subspace Theorem
Proof
D. Luenberger, Yinyu Ye: Linear and Nonlinear Programming
![Page 21: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/21.jpg)
21
Proof
By definition,
Therefore,
![Page 22: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/22.jpg)
22
Proof
We have proved
By definition,
Therefore,
![Page 23: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/23.jpg)
23
Proof
Since
[By induction assumption],
Therefore,
Q.E.D.
[We have proved this]
![Page 24: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/24.jpg)
24
Corollary of Exp. Subs. Theorem
Corollary
![Page 25: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/25.jpg)
25
Expanding Subspace Theorem
D. Luenberger, Yinyu Ye: Linear and Nonlinear Programming
![Page 26: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/26.jpg)
26
THE CONJUGATE GRADIENT METHOD
![Page 27: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/27.jpg)
27
THE CONJUGATE GRADIENT METHOD
� The conjugate gradient method is a conjugate direction method
� Selects the successive direction vectors as a conjugate version of the successive gradients obtained as the method progresses.
� The conjugate directions are not specified beforehand, but rather are determined sequentially at each step of the iteration.
Given d0,…, dn-1, we already have an update rule for αk
How should we choose vectors d0,…, dn-1?
The conjugate gradient method
![Page 28: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/28.jpg)
28
THE CONJUGATE GRADIENT METHOD
Advantages
� Simple update rule
� the directions are based on the gradients, therefore the process makes good uniform progress toward the solution at every step.
For arbitrary sequences of conjugate directions the progress may be slight until the final few steps
![Page 29: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/29.jpg)
29
Conjugate Gradient Algorithm
� The CGA is only slightly more complicated to implement than the method of steepest descent but converges in a finite number of steps on quadratic problems.
� In contrast to Newton method, there is no need for matrix inversion.
Conjugate Gradient Algorithm
![Page 30: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/30.jpg)
30
Conjugate Gradient Theorem
To verify that the algorithm is a conjugate direction algorithm, all we need is to verify that the vectors d0,…,dk are Q-orthogonal.
Theorem [Conjugate Gradient Theorem]
The conjugate gradient algorithm is a conjugate direction method.
a)
b)
c)
d)
e)
Only αk
needs matrix Q in the algorithm!
![Page 31: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/31.jpg)
31
Proofs
Would be nice to discuss it, but no time…
![Page 32: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/32.jpg)
32
EXTENSION TO NONQUADRATIC PROBLEMS
![Page 33: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/33.jpg)
33
EXTENSION TO NONQUADRATIC PROBLEMS
Do quadratic approximation
This is similar to Newton’s method. [f is approximated by a quadratic function]
� When applied to nonquadratic problems, conjugate gradient methods will not usually terminate within n steps.
� After n steps, we can restart the process from this point and run the algorithm for another n steps…
Goal:
![Page 34: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/34.jpg)
34
Conjugate Gradient Algorithmfor nonquadratic functions
Step 1
Step 2
Step 3
![Page 35: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/35.jpg)
35
Properties of CGA
� An attractive feature of the algorithm is that, just as in the pure form of Newton’s method, no line searching is required at any stage.
� The algorithm converges in a finite number of steps for a quadratic problem.
� The undesirable features are that Hessian matrix must be evaluated at each point. The algorithm is not, in this form, globally convergent.
![Page 36: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/36.jpg)
36
LINE SEARCH METHODS
![Page 37: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/37.jpg)
37
Step 1
Step 2
Step 3
Fletcher–Reeves method
![Page 38: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/38.jpg)
38
Line search method
Hessian is not used in the algorithm
In the quadratic case it is identical to the original conjugate direction algorithm
Fletcher–Reeves method
![Page 39: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/39.jpg)
39
Polak–Ribiere method
Again this leads to a value identical to the standard formula in the quadratic case.
Experimental evidence seems to favor the Polak–Ribiere method over other methods of this general type.
Same as Fletcher–Reeves method, BUT:
![Page 40: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/40.jpg)
40
Convergence rate
Under some conditions the line search method is globally convergent.
Under some conditions, the rate is
[since one complete n step cycle solves a quadratic problem similarly
To the Newton method]
![Page 41: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/41.jpg)
41
Numerical Experiments
A comparison of
* gradient descent with optimal step size (in green) and
* conjugate vector (in red)
for minimizing a quadratic function.
Conjugate gradient converges in at most n steps (here n=2).
![Page 42: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/10-CG-Annotate... · Convex Optimization CMU-10725 ... David G. Luenberger, YinyuYe: Linear and Nonlinear Programming](https://reader030.fdocuments.us/reader030/viewer/2022013110/5b872efc7f8b9a3a608e5994/html5/thumbnails/42.jpg)
42
Summary
� Conjugate Direction Methods
- conjugate directions
� Minimizing quadratic functions
� Conjugate Gradient Methods for nonquadratic functions
- Line search methods
* Fletcher–Reeves
* Polak–Ribiere