Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning...
Transcript of Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning...
![Page 2: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/2.jpg)
What is this course about?• Useful optimization tools for machine learning
• Focus is on the algorithms and the analysis
• This is NOT a ML course• Don’t expect to learn detailed ML
• This is NOT a classical optimization course• We won’t cover many classical optimization results
• We cover some basics though• Few weeks on optimization• Some ML examples will be explained in details
1
![Page 3: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/3.jpg)
Prerequisites
• ProbabilityandStatistics• Expectedvalue,variance,statisticalindependence,conditionalprobability,maximumlikelihoodestimation,regression,etc.
• LinearAlgebraandMathematicalAnalysis• Sets,functions,limits,liminf,limsup,derivative,gradient,subspace,lineardependence,innerproduct,eigenvalue,singularvalue,norms,etc.
• Programmingskills• Matrix/vectoroperationsinMatlab/Python/C++• “For,while,repeatuntil”loops
2
![Page 4: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/4.jpg)
What is optimization?
3
![Page 5: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/5.jpg)
What is optimization?
Decisionvariable(Discrete/Continuous)
Objective/Costfunction
FeasibleRegion
3
![Page 6: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/6.jpg)
What is optimization?
Decisionvariable(Discrete/Continuous)
Objective/Costfunction
FeasibleRegion• Existenceofasolution?• Checkingifacandidatexisoptimal?• Findingtheoptimalsolutionnumerically
3
![Page 7: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/7.jpg)
Why do we care?
• Manyengineeringproblemsrequiresoptimization
• OptimizationiskeytoefficientimplementationofMLalgorithms
4
![Page 8: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/8.jpg)
Problem Example: RegressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)
600 1.05 12 2.4 10.1 1 … 500
1000 2.34 10 2.5 20.1 1 … 800
1200 1.45 3 3.1 9.7 3 … 1500
1500 1.56 30 1.7 7.2 2 … 1200
… … … … … … … …
2700 1.01 20 0.9 4.3 4 … 5000
5
![Page 9: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/9.jpg)
Problem Example: RegressionSamples/Datapo
ints
Features/independentVariablesTarget/DependentVariables/Label
Area CrimeRate Age RAD PTRATIO Bedrooms … Price(K)
600 1.05 12 2.4 10.1 1 … 500
1000 2.34 10 2.5 20.1 1 … 800
1200 1.45 3 3.1 9.7 3 … 1500
1500 1.56 30 1.7 7.2 2 … 1200
… … … … … … … …
2700 1.01 20 0.9 4.3 4 … 5000
5
![Page 10: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/10.jpg)
Problem Example: Regression
Canweusethisdatasettopredictthepriceofthishouse?
1400 2.2 3 3.1 7.6 2 … ????
Samples/Datapo
ints
Area CrimeRate Age RAD PTRATIO Bedrooms … Price(K)
600 1.05 12 2.4 10.1 1 … 500
1000 2.34 10 2.5 20.1 1 … 800
1200 1.45 3 3.1 9.7 3 … 1500
1500 1.56 30 1.7 7.2 2 … 1200
… … … … … … … …
2700 1.01 20 0.9 4.3 4 … 5000
5
Features/independentVariablesTarget/DependentVariables/Label
![Page 11: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/11.jpg)
Problem Example: RegressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)
600 1.05 12 2.4 10.1 1 … 500
1000 2.34 10 2.5 20.1 1 … 800
1200 1.45 3 3.1 9.7 3 … 1500
1500 1.56 30 1.7 7.2 2 … 1200
… … … … … … … …
2700 1.01 20 0.9 4.3 4 … 5000
Canweusethisdatasettopredictthepriceofthishouse?
1400 2.2 3 3.1 7.6 2 … ????
TrainingData
5
![Page 12: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/12.jpg)
Problem Example: RegressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)
600 1.05 12 2.4 10.1 1 … 500
1000 2.34 10 2.5 20.1 1 … 800
1200 1.45 3 3.1 9.7 3 … 1500
1500 1.56 30 1.7 7.2 2 … 1200
… … … … … … … …
2700 1.01 20 0.9 4.3 4 … 5000
1400 2.2 3 3.1 7.6 2 … ????
TrainingData
LearningAlgorithm Prediction
5
![Page 13: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/13.jpg)
Learning Algorithms
• VariousmethodsinML• Decisiontrees,deeplearning,Bayes,empiricalBayes,linearregression,logisticregression,…
• Manymethods
• Model
• Minimizetheloss/Maximizethelikelihood
6
![Page 14: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/14.jpg)
Algorithm Example: Linear regressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)
600 1.05 12 2.4 10.1 1 … 500
1000 2.34 10 2.5 20.1 1 … 800
1200 1.45 3 3.1 9.7 3 … 1500
1500 1.56 30 1.7 7.2 2 … 1200
… … … … … … … …
2700 1.01 20 0.9 4.3 4 … 5000
Canweusethisdatasettopredictthepriceofthishouse?
1400 2.2 3 3.1 7.6 2 … ????
7
![Page 15: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/15.jpg)
Algorithm Example: Linear regressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)
600 1.05 12 2.4 10.1 1 … 500
1000 2.34 10 2.5 20.1 1 … 800
1200 1.45 3 3.1 9.7 3 … 1500
1500 1.56 30 1.7 7.2 2 … 1200
… … … … … … … …
2700 1.01 20 0.9 4.3 4 … 5000
7
![Page 16: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/16.jpg)
Algorithm Example: Linear regressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)
600 1.05 12 2.4 10.1 1 … 500
1000 2.34 10 2.5 20.1 1 … 800
1200 1.45 3 3.1 9.7 3 … 1500
1500 1.56 30 1.7 7.2 2 … 1200
… … … … … … … …
2700 1.01 20 0.9 4.3 4 … 5000
7
![Page 17: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/17.jpg)
Algorithm Example: Linear regressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)
600 1.05 12 2.4 10.1 1 … 500
1000 2.34 10 2.5 20.1 1 … 800
1200 1.45 3 3.1 9.7 3 … 1500
1500 1.56 30 1.7 7.2 2 … 1200
… … … … … … … …
2700 1.01 20 0.9 4.3 4 … 5000
Model:LinearpredictorLoss:L2difference
7
![Page 18: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/18.jpg)
Algorithm Example: Linear regressionArea CrimeRate Age RAD PTRATIO Bedrooms … Price(K)
600 1.05 12 2.4 10.1 1 … 500
1000 2.34 10 2.5 20.1 1 … 800
1200 1.45 3 3.1 9.7 3 … 1500
1500 1.56 30 1.7 7.2 2 … 1200
… … … … … … … …
2700 1.01 20 0.9 4.3 4 … 5000
Model:LinearpredictorLoss:L2difference
Offsetincluded
7
![Page 19: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/19.jpg)
Another Example: Classification
Radius Texture Area Compactness Symmetry … Rec/non-Rec
1.1 2.3 3.5 2.4 1.4 … 1
0.7 1.2 2.5 1.4 3.2 … 0
1.7 2.4 1.5 3.3 1.3 … 1
… … … … … … …
0.2 3.4 0.7 4.3 2.0 … 1
0.2 2.7 0.9 2.3 1.0 … ????
8
![Page 20: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/20.jpg)
Algorithm Example: Logistic Regression
Radius Texture Area Compactness Symmetry … Rec/non-Rec
1.1 2.3 3.5 2.4 1.4 … 1
0.7 1.2 2.5 1.4 3.2 … 0
1.7 2.4 1.5 3.3 1.3 … 1
… … … … … … …
0.2 3.4 0.7 4.3 2.0 … 1
9
![Page 21: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/21.jpg)
Algorithm Example: Logistic Regression
Radius Texture Area Compactness Symmetry … Rec/non-Rec
1.1 2.3 3.5 2.4 1.4 … 1
0.7 1.2 2.5 1.4 3.2 … 0
1.7 2.4 1.5 3.3 1.3 … 1
… … … … … … …
0.2 3.4 0.7 4.3 2.0 … 1
Model:logisticMaximumlikelihoodestimator
9
![Page 22: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/22.jpg)
Algorithm Example: Logistic Regression
Radius Texture Area Compactness Symmetry … Rec/non-Rec
1.1 2.3 3.5 2.4 1.4 … 1
0.7 1.2 2.5 1.4 3.2 … 0
1.7 2.4 1.5 3.3 1.3 … 1
… … … … … … …
0.2 3.4 0.7 4.3 2.0 … 1
Model:logisticMaximumlikelihoodestimator
9
![Page 23: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/23.jpg)
Optimization in ML
• Many more examples (K-means, SVM, Deep learning, …)
• Efficient algorithms: CPU, Memory requirements, Parallelizable, robustness, etc.
• Other issues: Non-convexity, Sparsity, Large values of n/d, Online implementation, Implicit
bias, Overfitting, etc.
• But first, we need to review a bit of optimization (Targeted review!)
10
![Page 24: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/24.jpg)
Optimization in ML
• Many more examples (K-means, SVM, Deep learning, …)
• Efficient algorithms: CPU, Memory requirements, Parallelizable, robustness, etc.
• Other issues: Non-convexity, Sparsity, Large values of n/d, Online implementation, Implicit
bias, Overfitting, etc.
• But first, we need to review a bit of optimization (Targeted review!)
• Even before this, let’s review a bit of linear algebra and mathematical analysis10
![Page 25: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/25.jpg)
Notations• Sets•• Realnumbers,Complexnumbers
11
![Page 26: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/26.jpg)
Notations• Sets•• Realnumbers,Complexnumbers
• Inf and Sup• Supremumoftheset isthesmallestscalar suchthat• Infimumoftheset isthelargestscalar suchthat
11
![Page 27: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/27.jpg)
Notations• Sets•• Realnumbers,Complexnumbers
• Inf and Sup• Supremumoftheset isthesmallestscalar suchthat• Infimumoftheset isthelargestscalar suchthat
11
![Page 28: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/28.jpg)
Notations• Sets•• Realnumbers,Complexnumbers
• Inf and Sup• Supremumoftheset isthesmallestscalar suchthat• Infimumoftheset isthelargestscalar suchthat
inf{sinn : n � 1} =?
11
![Page 29: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/29.jpg)
Notations• Sets•• Realnumbers,Complexnumbers
• Inf and Sup• Supremumoftheset isthesmallestscalar suchthat• Infimumoftheset isthelargestscalar suchthat
inf{sinn : n � 1} =?
• Functions:
11
![Page 30: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/30.jpg)
Vectors and Subspaces• Linear combination:
12
![Page 31: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/31.jpg)
Vectors and Subspaces• Linear combination:
• Subspace and linear independence• Asetiscalledsubspaceifitisclosedunderlinearcombination• Asetofvectorsiscalledlinearlyindependentifnolinearcombinationofthemisequaltozero• Innerproduct:• Orthogonality:
12
![Page 32: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/32.jpg)
Vectors and Subspaces• Linear combination:
• Subspace and linear independence• Asetiscalledsubspaceifitisclosedunderlinearcombination• Asetofvectorsiscalledlinearlyindependentifnolinearcombinationofthemisequaltozero• Innerproduct:• Orthogonality:
• Cauchy-Schwarz inequality
12
![Page 33: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/33.jpg)
Matrices• Matrix addition• Matrix product• Square matrix• Inner product:
13
![Page 34: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/34.jpg)
Matrices• Matrix addition• Matrix product• Square matrix• Inner product:
• Spectral radius:
13
![Page 35: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/35.jpg)
Matrices• Matrix addition• Matrix product• Square matrix• Inner product:
• Spectral radius:
• Eigenvalue decomposition of symmetric matrices• Positive (Semi-)definite matrices
13
![Page 36: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/36.jpg)
Matrices• Singular values:
14
![Page 37: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/37.jpg)
Matrices• Singular values:• Singular value decomposition of :
14
![Page 38: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/38.jpg)
Matrices• Singular values:• Singular value decomposition of :
• Norms:• Frobenius norm:
• Nuclear norm:
• Matrix 2-norm:
14
![Page 39: Large Scale Optimization for Machine Learning...• Useful optimization tools for machine learning • Focus is on the algorithms and the analysis ... • Matrix/vector operations](https://reader033.fdocuments.us/reader033/viewer/2022043001/5f79ba2027a815702d401044/html5/thumbnails/39.jpg)
Matrices• Singular values:• Singular value decomposition of :
• Norms:• Frobenius norm:
• Nuclear norm:
• Matrix 2-norm:
• Useful inequalities:
14