Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To...
Transcript of Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To...
![Page 1: Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ๐ฅ๐ฅ](https://reader033.fdocuments.us/reader033/viewer/2022050301/5f6ab7448efac7728605ad9e/html5/thumbnails/1.jpg)
Optimization for Deep Learning
Industrial AI Lab.Prof. Seungchul Lee
![Page 2: Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ๐ฅ๐ฅ](https://reader033.fdocuments.us/reader033/viewer/2022050301/5f6ab7448efac7728605ad9e/html5/thumbnails/2.jpg)
Optimization
โข 3 key components1) Objective function2) Decision variable or unknown3) Constraints
โข Procedures1) The process of identifying objective, variables, and constraints for a given problem (known as
"modelingโ)2) Once the model has been formulated, optimization algorithm can be used to find its solutions
2
![Page 3: Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ๐ฅ๐ฅ](https://reader033.fdocuments.us/reader033/viewer/2022050301/5f6ab7448efac7728605ad9e/html5/thumbnails/3.jpg)
Optimization: Mathematical Model
โข In mathematical expression
โ ๐ฅ๐ฅ =๐ฅ๐ฅ1โฎ๐ฅ๐ฅ๐๐
โ โ๐๐ is the decision variable
โ ๐๐:โ๐๐ โ โ is objective functionโ Feasible region: ๐ถ๐ถ = {๐ฅ๐ฅ:๐๐๐๐(๐ฅ๐ฅ) โค 0, ๐๐ = 1,โฏ ,๐๐}
โ ๐ฅ๐ฅโ โ โ๐๐ is an optimal solution if ๐ฅ๐ฅโ โ ๐ถ๐ถ and ๐๐(๐ฅ๐ฅโ) โค ๐๐ ๐ฅ๐ฅ ,โ๐ฅ๐ฅ โ ๐ถ๐ถ
3
![Page 4: Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ๐ฅ๐ฅ](https://reader033.fdocuments.us/reader033/viewer/2022050301/5f6ab7448efac7728605ad9e/html5/thumbnails/4.jpg)
Optimization: Mathematical Model
โข In mathematical expression
โข Remarks: equivalent
4
![Page 5: Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ๐ฅ๐ฅ](https://reader033.fdocuments.us/reader033/viewer/2022050301/5f6ab7448efac7728605ad9e/html5/thumbnails/5.jpg)
Solving Optimization Problems
5
![Page 6: Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ๐ฅ๐ฅ](https://reader033.fdocuments.us/reader033/viewer/2022050301/5f6ab7448efac7728605ad9e/html5/thumbnails/6.jpg)
Solving Optimization Problems
โข Starting with the unconstrained, one dimensional case
โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ ๐ฅ๐ฅโ Any location where ๐๐โฒ ๐ฅ๐ฅ = 0 will be a โflatโ point in the function
โข For convex problems, this is guaranteed to be a minimum
6
![Page 7: Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ๐ฅ๐ฅ](https://reader033.fdocuments.us/reader033/viewer/2022050301/5f6ab7448efac7728605ad9e/html5/thumbnails/7.jpg)
Solving Optimization Problems
โข Generalization for multivariate function ๐๐:โ๐๐ โ โโ the gradient of ๐๐ must be zero
โข For defined as above, gradient is a n-dimensional vector containing partial derivatives with respect to each dimension
โข For continuously differentiable ๐๐ and unconstrained optimization, optimal point must have
๐ป๐ป๐ฅ๐ฅ๐๐ ๐ฅ๐ฅโ = 0
7
![Page 8: Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ๐ฅ๐ฅ](https://reader033.fdocuments.us/reader033/viewer/2022050301/5f6ab7448efac7728605ad9e/html5/thumbnails/8.jpg)
How do we Find ๐ต๐ต๐๐๐๐ ๐๐ = ๐๐
โข Direct solutionโ In some cases, it is possible to analytically compute ๐ฅ๐ฅโ such that ๐ป๐ป๐ฅ๐ฅ๐๐ ๐ฅ๐ฅโ = 0
8
![Page 9: Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ๐ฅ๐ฅ](https://reader033.fdocuments.us/reader033/viewer/2022050301/5f6ab7448efac7728605ad9e/html5/thumbnails/9.jpg)
How do we Find ๐ต๐ต๐๐๐๐ ๐๐ = ๐๐
โข Iterative methodsโ More commonly the condition that the gradient equal zero will not have an analytical solution, require
iterative methods
โ The gradient points in the direction of โsteepest ascentโ for function ๐๐
9
![Page 10: Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ๐ฅ๐ฅ](https://reader033.fdocuments.us/reader033/viewer/2022050301/5f6ab7448efac7728605ad9e/html5/thumbnails/10.jpg)
Descent Direction (1D)
โข It motivates the gradient descent algorithm, which repeatedly takes steps in the direction of the negative gradient
10
![Page 11: Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ๐ฅ๐ฅ](https://reader033.fdocuments.us/reader033/viewer/2022050301/5f6ab7448efac7728605ad9e/html5/thumbnails/11.jpg)
Gradient Descent
11
![Page 12: Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ๐ฅ๐ฅ](https://reader033.fdocuments.us/reader033/viewer/2022050301/5f6ab7448efac7728605ad9e/html5/thumbnails/12.jpg)
Gradient Descent
โข Update rule:
12
![Page 13: Optimization for Deep Learning...โข Starting with the unconstrained, one dimensional case โ To find minimum point ๐ฅ๐ฅโ, we can look at the derivative of the function ๐๐โฒ๐ฅ๐ฅ](https://reader033.fdocuments.us/reader033/viewer/2022050301/5f6ab7448efac7728605ad9e/html5/thumbnails/13.jpg)
Practically Solving Optimization Problems
โข The good news: for many classes of optimization problems, people have already done all the โhard workโ of developing numerical algorithmsโ A wide range of tools that can take optimization problems in โnaturalโ forms and compute a
solution
โข Gradient descentโ Easy to implementโ Very general, can be applied to any differentiable loss functionsโ Requires less memory and computations (for stochastic methods)โ Neural networks/deep learning โ TensorFlow
13