Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning...
Transcript of Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning...
![Page 1: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/1.jpg)
Optimization Methods in Machine Learning
Lectures 13-14
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAAA
Katya Scheinberg Lehigh University
![Page 2: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/2.jpg)
First Order Methods
![Page 3: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/3.jpg)
• Consider:
• Linear lower approximation
• Quadratic upper approximation
First-order proximal gradient methods
![Page 4: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/4.jpg)
• Minimize quadratic upper approximation on each iteration
• If µ· 1/L then
First-order proximal gradient method
![Page 5: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/5.jpg)
Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html
![Page 6: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/6.jpg)
Complexity bound derivation outline
![Page 7: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/7.jpg)
• Minimize quadratic upper approximation on each iteration
• If µ· 1/L then in O(L||x0-x*||/²) iterations finds solution
Complexity of proximal gradient method
Compare to O(log(L/²)) of interior point methods.
Can we do better?
![Page 8: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/8.jpg)
• Minimize upper approximation at an intermediate point.
• If µ· 1/L then
Accelerated first-order method Nesterov, ’83, ‘00s,
Beck&Teboulle ‘09
![Page 9: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/9.jpg)
• Minimize upper approximation at an intermediate point.
• If µ· 1/L then in iterations finds solution
Complexity of accelerated first-order method Nesterov, ’83, ‘00s,
Beck&Teboulle ‘09
This method is optimal if only gradient information is used.
![Page 10: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/10.jpg)
![Page 11: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/11.jpg)
Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html
![Page 12: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/12.jpg)
• Minimize upper approximation at an intermediate point.
• If µ· 1/L then in iterations finds solution
FISTA method Beck&Teboulle ‘09
![Page 13: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/13.jpg)
Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html
µ is not a prox parameter here
![Page 14: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/14.jpg)
Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html
![Page 15: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/15.jpg)
Slides from L. Vandenberghe http://www.ee.ucla.edu/~vandenbe/ee236c.html
![Page 16: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/16.jpg)
Unconstrained formulation of the SVM problem
![Page 17: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/17.jpg)
SVM problem using Huber loss function
![Page 18: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/18.jpg)
First order methods for composite functions
![Page 19: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/19.jpg)
• Lasso or CS:
• Group Lasso or MMV
• Matrix Completion
• Robust PCA
• SICS
Examples
![Page 20: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/20.jpg)
• Consider:
• Quadratic upper approximation
Prox method with nonsmooth term
Assume that g(y) is such that the above function is easy to optimize over y
![Page 21: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/21.jpg)
• Minimize upper approximation function Qf,µ(x,y) on each iteration
Example 1 (Lasso and SICS)
Closed form solution!
O(n) effort
![Page 22: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/22.jpg)
![Page 23: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/23.jpg)
Very similar to the previous case, but with ||.|| instead of |.|
Example 2 (Group Lasso)
Closed form solution!
O(n) effort
![Page 24: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/24.jpg)
Example 3 (Collaborative Prediction)
Closed form solution!
O(n^3) effort
![Page 25: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/25.jpg)
• Minimize quadratic upper approximation on each iteration
• If µ· 1/L then in O(L/²) iterations finds solution
ISTA/Gradient prox method
![Page 26: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/26.jpg)
• Minimize upper approximation at an “accelerated” point.
• If µ· 1/L then in iterations finds solution
Fast first-order method Nesterov, Beck & Teboulle
![Page 27: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/27.jpg)
Practical first order algorithms using backtracking search
![Page 28: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/28.jpg)
• Minimize quadratic upper relaxation on each iteration
• Using line search find µk such that
• In O(1/µmin²) iterations finds ²-optimal solution (in practice better)
Nesterov, 07 Beck&Teboulle, Tseng, Auslender&Teboulle, 08
Iterative Shrinkage Threshholding Algorithm (ISTA)
![Page 29: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/29.jpg)
• Minimize quadratic upper relaxation on each iteration
• Using line search find µk · µk-1 such that
• In iterations finds ²-optimal solution
Fast Iterative Shrinkage Threshholding Algorithm (FISTA)
Nesterov, Beck&Teboulle, Tseng
Very restrictive
![Page 30: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/30.jpg)
• ISTA’s complexity is O(L/²) while FISTA’s is
• However, FISTA’s condition µk · µk-1 often slows down practical performance and simply ignoring the condition does not help.
• We want to modify FISTA algorithm to relax µk · µk-1, while maintaining complexity bound or maybe even improving it
FISTA with line search Goldfarb and S. 2010
![Page 31: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/31.jpg)
Find µk · µk-1 such that
Cycle to find µk
Convergence rate:
![Page 32: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/32.jpg)
Find µk such that
Goldfarb & S. 2011
This condition….
… gives this bound on the error
![Page 33: Optimization Methods in Machine Learning Lectures 13-14Optimization Methods in Machine Learning Lectures 13-14 TexPoint fonts used in EMF. Read the TexPoint manual before you delete](https://reader034.fdocuments.us/reader034/viewer/2022052105/6040eacb219a353d6951d553/html5/thumbnails/33.jpg)
Find µk such that
Goldfarb & S. 2011
FISTA with full line search
Cycle to find µ and t