Selectivity, Scope, and Simplicity of Models: A Lesson From Fitting ...
Lesson 26: Optimization II: Data Fitting
-
Upload
matthew-leingang -
Category
Technology
-
view
1.038 -
download
4
Transcript of Lesson 26: Optimization II: Data Fitting
Lesson 26 (Chapter 17)Unconstrained Optimization II : Data Fitting
Math 20
November 21, 2007
Announcements
I Problem Set 10 assigned today. Due November 28.
I next OH: Today 1–3 by appt
I Midterm II: Thursday, 12/6, 7-8:30pm in Hall A.
Theorem (The Second Derivative Test)
Let f (x , y) be a function of two variables, and let (a, b) be acritical point of f . Then
I If ∂2f∂x2
∂2f∂y2 −
(∂2f
∂x∂y
)2> 0 and ∂2f
∂x2 > 0, the critical point is a
local minimum.
I If ∂2f∂x2
∂2f∂y2 −
(∂2f
∂x∂y
)2> 0 and ∂2f
∂x2 < 0, the critical point is a
local maximum.
I If ∂2f∂x2
∂2f∂y2 −
(∂2f
∂x∂y
)2< 0, the critical point is a saddle point.
All derivatives are evaluated at the critical point (a, b).
Return to the example
Let f (x , y) = 8x3 − 24xy + y3. Classify the critical points.
∂2f
∂x2= 48x
∂2f
∂x∂y= −24
∂2f
∂y∂x= −24
∂2f
∂y2= 6y
I Hf (0, 0) =
(0 −24−24 0
), which has negative determinant.
Hence (0, 0) is a saddle point.
I Hf (2, 4) = 24
(4 −1−1 1
)which, since the determinant is
positive and the top left entry is positive, indicates a localminimum.
Return to the example
Let f (x , y) = 8x3 − 24xy + y3. Classify the critical points.
∂2f
∂x2= 48x
∂2f
∂x∂y= −24
∂2f
∂y∂x= −24
∂2f
∂y2= 6y
I Hf (0, 0) =
(0 −24−24 0
), which has negative determinant.
Hence (0, 0) is a saddle point.
I Hf (2, 4) = 24
(4 −1−1 1
)which, since the determinant is
positive and the top left entry is positive, indicates a localminimum.
Online Demo
Try this site (thanks to Tony Pino):
http://www.slu.edu/classes/maymk/banchoff/LevelCurve.html
Launch the applet and enter:
I f (x , y) = x^3 - 3 * x * y + y^3/8 (1/3 of f from theexample)
I x from −1 to 10 in 50 steps
I y from −1 to 10 in 50 steps
I z from −10 to 10 in 50 steps
Remarks
I The Hessian matrix will always be symmetric in our cases.I If the Hessian has determinant zero, nothing can be said from
this theorem:I f (x , y) = x4 + y4 has a local min at (0, 0)I f (x , y) = −x4 − y4 has a local max at (0, 0)I f (x , y) = x4 − y4 has a saddle point at (0, 0)
In each case Hf (x , y) =
(±12x2 0
0 ±12y2
), so Hf (0, 0) is the
zero matrix.
Regression
We are going to find the line in the plane which is “closest” to thefour points
(0, 1), (1, 1), (2, 2), (3, 2)
Regression
We are going to find the line in the plane which is “closest” to thefour points
(0, 1), (1, 1), (2, 2), (3, 2)
Regression
We are going to find the line in the plane which is “closest” to thefour points
(0, 1), (1, 1), (2, 2), (3, 2)
Regression
We are going to find the line in the plane which is “closest” to thefour points
(0, 1), (1, 1), (2, 2), (3, 2)
Regression
We are going to find the line in the plane which is “closest” to thefour points
(0, 1), (1, 1), (2, 2), (3, 2)
Regression
We are going to find the line in the plane which is “closest” to thefour points
(0, 1), (1, 1), (2, 2), (3, 2)
The equations for the slope m and y -intercept m take the form
m · 0+b =1
m · 1+b =1
m · 2+b =2
m · 3+b =2
Or, in matrix form: 0 11 12 13 1
[b1
b0
]=
1122
These equations are inconsistent!
The equations for the slope b1 and y -intercept b0 take the form
b1 · 0+b0 =1
b1 · 1+b0 =1
b1 · 2+b0 =2
b1 · 3+b0 =2
Or, in matrix form: 0 11 12 13 1
[b1
b0
]=
1122
These equations are inconsistent!
The equations for the slope b1 and y -intercept b0 take the form
b1 · 0+b0 =1
b1 · 1+b0 =1
b1 · 2+b0 =2
b1 · 3+b0 =2
Or, in matrix form: 0 11 12 13 1
[b1
b0
]=
1122
These equations are inconsistent!
The equations for the slope b1 and y -intercept b0 take the form
b1 · 0+b0 =1
b1 · 1+b0 =1
b1 · 2+b0 =2
b1 · 3+b0 =2
Or, in matrix form: 0 11 12 13 1
[b1
b0
]=
1122
These equations are inconsistent!
Treating m and b as variables now, write down the sum of thesquares of the distances between each data point yi and thecorresponding point mxi + b.
SolutionI’ll work in general:
SSE =n∑
i=1
(yi −mxi − b)2
Treating m and b as variables now, write down the sum of thesquares of the distances between each data point yi and thecorresponding point mxi + b.
SolutionI’ll work in general:
SSE =n∑
i=1
(yi −mxi − b)2
Find the m and b which minimize the SSE.
SolutionSince
SSE =n∑
i=1
(yi −mxi − b)2
We have
∂SSE
∂m=
n∑i=1
2(yi −mxi − b)(−xi )= −2n∑
i=1
xiyi + 2mn∑
i=1
x2i + 2b
n∑i=1
xi
∂SSE
∂b=
n∑i=1
2(yi −mxi − b)(−1) = −2n∑
i=1
yi + 2mn∑
i=1
xi + 2bn
If (m, b) is a critical point, then
−2n∑
i=1
xiyi + 2mn∑
i=1
x2i + 2b
n∑i=1
xi = 0 =⇒ mn∑
i=1
x2i + b
n∑i=1
xi =n∑
i=1
xiyi
−2n∑
i=1
yi + 2mn∑
i=1
xi + 2bn = 0 =⇒ mn∑
i=1
xi + bn =n∑
i=1
yi
If E is the expected value (average) of a random variable, then weget
E (X 2)m + E (X )b = E (XY )
E (X )m + b = E (Y )
If (m, b) is a critical point, then
−2n∑
i=1
xiyi + 2mn∑
i=1
x2i + 2b
n∑i=1
xi = 0 =⇒ mn∑
i=1
x2i + b
n∑i=1
xi =n∑
i=1
xiyi
−2n∑
i=1
yi + 2mn∑
i=1
xi + 2bn = 0 =⇒ mn∑
i=1
xi + bn =n∑
i=1
yi
If E is the expected value (average) of a random variable, then weget
E (X 2)m + E (X )b = E (XY )
E (X )m + b = E (Y )
Using Cramer’s Rule we have
m =
∣∣∣∣E (XY ) E (X )E (Y ) 1
∣∣∣∣∣∣∣∣E (X 2) E (X )E (X ) 1
∣∣∣∣ =E (XY )− E (X )E (Y )
E (X 2)− E (X )2=
Cov(X , Y )
Var(X )
Also,
b =
∣∣∣∣E (X 2) E (XY )E (X ) E (Y )
∣∣∣∣∣∣∣∣E (X 2) E (X )E (X ) 1
∣∣∣∣ = E (X )− E (Y )Cov(X , Y )
Var(X )