Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon,...
Transcript of Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon,...
![Page 1: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/1.jpg)
AnnouncementsAssignments
▪ HW3
▪ Mon, 9/28, 11:59 pm
Midterm 1
▪ Mon, 10/5
▪ See Piazza for details
▪ Fill out swap-section / conflict form by Friday
![Page 2: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/2.jpg)
Plan
Last time▪ Regression▪ Linear regression▪ Optimization for linear regression
Today▪ Optimization for linear regression
▪ Linear and convex function
▪ (Batch) Gradient descent
▪ Closed-form solution
▪ Stochastic gradient descent
2
![Page 3: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/3.jpg)
Introduction to Machine Learning
Linear Regression and Optimization
Instructor: Pat Virtue
![Page 4: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/4.jpg)
Linear RegressionSelling my car
![Page 5: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/5.jpg)
Linear in Higher DimensionsWhat are these linear shapes called for 1-D, 2-D, 3-D, M-D input?
𝑦 = 𝒘𝑇𝒙 + 𝑏
𝒘𝑇𝒙 + 𝑏 = 0
𝒘𝑇𝒙 + 𝑏 ≥ 0
𝒙 ∈ ℝ 𝒙 ∈ ℝ2 𝒙 ∈ ℝ3 𝒙 ∈ ℝ𝑀
![Page 6: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/6.jpg)
Linear FunctionLinear function
If 𝑓(𝒙) is linear, then:
▪ 𝑓 𝒙 + 𝒛 = 𝑓 𝒙 + 𝑓 𝒛
▪ 𝑓 𝛼𝒙 = 𝛼𝑓 𝒙 ∀𝛼
▪ 𝑓 𝛼𝒙 + 1 − 𝛼 𝒛 = 𝛼𝑓 𝒙 + 1 − 𝛼 𝑓 𝒛 ∀𝛼
![Page 7: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/7.jpg)
Piazza Poll 1Based on the following definition of a linear, is the equation for a line, 𝑦 = 𝑤𝑥 + 𝑏, linear? Example: 𝑦 = 3𝑥 + 5
𝑓(𝒙) is linear if and only if:
▪ 𝑓 𝒙 + 𝒛 = 𝑓 𝒙 + 𝑓 𝒛 and
▪ 𝑓 𝛼𝒙 = 𝛼𝑓 𝒙 ∀𝛼
![Page 8: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/8.jpg)
Linear RegressionLinear algebra formulation
![Page 9: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/9.jpg)
Linear RegressionError and objectives
![Page 10: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/10.jpg)
Linear RegressionLinear algebra formulation
![Page 11: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/11.jpg)
Previous Piazza PollFor fixed data and fixed slope, w, what shape do we get by plotting MSE objective vs intercept, b?
A. Line
B. Plane
C. Half-plane
D. Convex Parabola (U-shape)
E. Concave parabola (up-side-down U)
F. None of the above
x
y
![Page 12: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/12.jpg)
Linear RegressionOptimizing the objective
x
y𝐽 𝑤, 𝑏 =
1
2𝑦(1) − 𝑤𝑥(1) + 𝑏
2+ 𝑦(2) − 𝑤𝑥(2) + 𝑏
2
![Page 13: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/13.jpg)
Linear RegressionOptimizing the objective
x
y𝐽 𝑤, 𝑏 =
1
2𝑦(1) − 𝑤𝑥(1) + 𝑏
2+ 𝑦(2) − 𝑤𝑥(2) + 𝑏
2
![Page 14: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/14.jpg)
Linear RegressionOptimizing the objective
![Page 15: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/15.jpg)
Linear RegressionMethods for optimizing the objective
▪ Grid search
▪ Random search
▪ Closed-form solution
▪ (Batch) Gradient descent
▪ Stochastic gradient descent
𝑤
𝑏
𝐽(𝑤, 𝑏)
![Page 16: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/16.jpg)
OptimizationLinear function
If 𝑓(𝒙) is linear, then:
▪ 𝑓 𝒙 + 𝒛 = 𝑓 𝒙 + 𝑓 𝒛
▪ 𝑓 𝛼𝒙 = 𝛼𝑓 𝒙 ∀𝛼
▪ 𝑓 𝛼𝒙 + 1 − 𝛼 𝒛 = 𝛼𝑓 𝒙 + 1 − 𝛼 𝑓 𝒛 ∀𝛼
![Page 17: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/17.jpg)
OptimizationConvex function
If 𝑓(𝒙) is convex, then:
▪ 𝑓 𝛼𝒙 + 1 − 𝛼 𝒛 ≤ 𝛼𝑓 𝒙 + 1 − 𝛼 𝑓 𝒛 ∀ 0 ≤ 𝛼 ≤ 1
Convex optimization
If 𝑓(𝒙) is convex, then:
▪ Every local minimum is also a global minimum ☺
![Page 18: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/18.jpg)
Linear RegressionOptimizing the objective
x
y
![Page 19: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/19.jpg)
OptimizationGradients
![Page 20: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/20.jpg)
OptimizationGradients
![Page 21: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/21.jpg)
OptimizationGradients
![Page 22: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/22.jpg)
OptimizationGradient descent
![Page 23: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/23.jpg)
Linear RegressionExpanding objective before computing gradient
𝐽 𝜽 =1
𝑁𝒚 − 𝑿𝜽 2
2
=1
𝑁𝒚 − 𝑿𝜽 𝑇 𝒚 − 𝑿𝜽
=1
𝑁𝒚𝑇 − 𝜽𝑇𝑿𝑇 𝒚 − 𝑿𝜽
=1
𝑁𝒚𝑇𝒚 − 𝜽𝑇𝑿𝑇𝒚 − 𝒚𝑇𝑿𝜽 + 𝜽𝑇𝑿𝑇𝑿𝜽
=1
𝑁𝒚𝑇𝒚 − 2𝜽𝑇𝑿𝑇𝒚 + 𝜽𝑇𝑿𝑇𝑿𝜽
![Page 24: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/24.jpg)
Linear Regression
Gradient of objective with respect to parameters
𝐽 𝜽 =1
𝑁𝒚 − 𝑿𝜽 2
2
=1
𝑁𝒚𝑇𝒚 − 2𝜽𝑇𝑿𝑇𝒚 + 𝜽𝑇𝑿𝑇𝑿𝜽
∇𝐽 𝜽 =1
𝑁0 − 2𝑿𝑇𝒚 + 𝟐𝜽𝑇𝑿𝑇𝑿
=1
𝑁0 − 2𝑿𝑇𝒚 + 𝟐𝑿𝑇𝑿𝜽
=2
𝑁−𝑿𝑇𝒚 + 𝑿𝑇𝑿𝜽
𝜕𝒛𝑇𝒖
𝜕𝒛= 𝒖
-- or --
𝜕𝒛𝑇𝒖
𝜕𝒛= 𝒖𝑇
𝜕𝒛𝑇𝑨𝒛
𝜕𝒛= 𝑨 + 𝑨𝑇 𝒛
-- or --
𝜕𝒛𝑇𝑨𝒛
𝜕𝒛= 𝒛𝑇 𝑨 + 𝑨𝑇
![Page 25: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/25.jpg)
Linear RegressionClosed-form solution
∇𝐽 𝜽 =2
𝑁−𝑿𝑇𝒚 + 𝑿𝑇𝑿𝜽
![Page 26: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/26.jpg)
Linear RegressionNumber of solutions
![Page 27: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/27.jpg)
A Note on Matrix RankUnderlying dimensionality of the data
![Page 28: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/28.jpg)
Linear RegressionMethods for optimizing the objective
▪ Grid search
▪ Random search
▪ Closed-form solution
▪ (Batch) Gradient descent
▪ Stochastic gradient descent
𝑤
𝑏
𝐽(𝑤, 𝑏)
![Page 29: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/29.jpg)
Linear RegressionMethods for optimizing the objective
▪ Grid search
▪ Random search
▪ Closed-form solution
▪ (Batch) Gradient descent
▪ Stochastic gradient descent
𝑤
𝑏
𝐽(𝑤, 𝑏)
![Page 30: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/30.jpg)
Linear Regression Gradient DescentWhat happens in gradient descent when we have N=1,000,000 training points?
Input x
Ou
tpu
t y
![Page 31: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/31.jpg)
(Batch) Gradient Descent
31
—
M
Slide credit: CMU MLD Matt Gormley
![Page 32: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/32.jpg)
Stochastic Gradient Descent (SGD)
32
We need a per-example objective:
Slide credit: CMU MLD Matt Gormley
![Page 33: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/33.jpg)
Linear RegressionOptimizing the objective
x
y
![Page 34: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/34.jpg)
Stochastic Gradient Descent
![Page 35: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/35.jpg)
Linear RegressionOptimizing the objective
x
y
![Page 36: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/36.jpg)
Stochastic Gradient Descent
![Page 37: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/37.jpg)
Linear RegressionOptimizing the objective
x
y
![Page 38: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/38.jpg)
Stochastic Gradient Descent
![Page 39: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/39.jpg)
Stochastic Gradient Descent
![Page 40: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/40.jpg)
Stochastic Gradient Descent (SGD)
40
We need a per-example objective:
Slide credit: CMU MLD Matt Gormley
![Page 41: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/41.jpg)
Stochastic Gradient Descent (SGD)
41
We need a per-example objective:
In practice, it is common to implement SGD using
sampling withoutreplacement (i.e.
shuffle({1,2,…N}), even though most of the
theory is for sampling with replacement (i.e.
Uniform({1,2,…N}).
Slide credit: CMU MLD Matt Gormley
![Page 42: Announcements10601/lectures/10601_Fa20... · 2020. 12. 11. · Announcements Assignments HW3 Mon, 9/28, 11:59 pm Midterm 1 Mon, 10/5 See Piazza for details Fill out swap-section](https://reader035.fdocuments.us/reader035/viewer/2022071418/611608a50823f066403c12d9/html5/thumbnails/42.jpg)
Convergence Curves
• SGD reduces MSE much more rapidly than GD
• For GD / SGD, training MSE is initially large due to uninformed initialization
Gradient DescentSGD
Closed-form (normal eq.s)
• Def: an epoch is a single pass through the training data
1. For GD, only one update per epoch
2. For SGD, N updates per epoch N = (# train examples)
Slide credit: CMU MLD Matt Gormley and Eric P. Xing