Lesson 26: Optimization II: Data Fitting

Lesson 26 (Chapter 17)Unconstrained Optimization II : Data Fitting

Math 20

November 21, 2007

Announcements

I Problem Set 10 assigned today. Due November 28.

I next OH: Today 1–3 by appt

I Midterm II: Thursday, 12/6, 7-8:30pm in Hall A.

Outline

Recall

Regression

Theorem (The Second Derivative Test)

Let f (x , y) be a function of two variables, and let (a, b) be acritical point of f . Then

I If ∂2f∂x2

∂2f∂y2 −

(∂2f

∂x∂y

)2> 0 and ∂2f

∂x2 > 0, the critical point is a

local minimum.

I If ∂2f∂x2

∂2f∂y2 −

(∂2f

∂x∂y

)2> 0 and ∂2f

∂x2 < 0, the critical point is a

local maximum.

I If ∂2f∂x2

∂2f∂y2 −

(∂2f

∂x∂y

)2< 0, the critical point is a saddle point.

All derivatives are evaluated at the critical point (a, b).

Return to the example

Let f (x , y) = 8x3 − 24xy + y3. Classify the critical points.

∂2f

∂x2= 48x

∂2f

∂x∂y= −24

∂2f

∂y∂x= −24

∂2f

∂y2= 6y

I Hf (0, 0) =

(0 −24−24 0

), which has negative determinant.

Hence (0, 0) is a saddle point.

I Hf (2, 4) = 24

(4 −1−1 1

)which, since the determinant is

positive and the top left entry is positive, indicates a localminimum.

Plotting the function

-2

0

2

4

0

5

-10

-5

0

5

10

-1 0 1 2 3-1

0

1

2

3

4

5

Online Demo

Try this site (thanks to Tony Pino):

http://www.slu.edu/classes/maymk/banchoff/LevelCurve.html

Launch the applet and enter:

I f (x , y) = x^3 - 3 * x * y + y^3/8 (1/3 of f from theexample)

I x from −1 to 10 in 50 steps

I y from −1 to 10 in 50 steps

I z from −10 to 10 in 50 steps

http://www.slu.edu/classes/maymk/banchoff/LevelCurve.html

Remarks

I The Hessian matrix will always be symmetric in our cases.I If the Hessian has determinant zero, nothing can be said from

this theorem:I f (x , y) = x4 + y4 has a local min at (0, 0)I f (x , y) = −x4 − y4 has a local max at (0, 0)I f (x , y) = x4 − y4 has a saddle point at (0, 0)

In each case Hf (x , y) =

(±12x2 0

0 ±12y2

), so Hf (0, 0) is the

zero matrix.

Outline

Recall

Regression

Regression

We are going to find the line in the plane which is “closest” to thefour points

(0, 1), (1, 1), (2, 2), (3, 2)

The equations for the slope m and y -intercept m take the form

m · 0+b =1

m · 1+b =1

m · 2+b =2

m · 3+b =2

Or, in matrix form: 0 11 12 13 1

[b1

b0

]=

1122

These equations are inconsistent!

The equations for the slope b1 and y -intercept b0 take the form

b1 · 0+b0 =1

b1 · 1+b0 =1

b1 · 2+b0 =2

b1 · 3+b0 =2

Or, in matrix form: 0 11 12 13 1

[b1

b0

]=

1122

These equations are inconsistent!

Treating m and b as variables now, write down the sum of thesquares of the distances between each data point yi and thecorresponding point mxi + b.

SolutionI’ll work in general:

SSE =n∑

i=1

(yi −mxi − b)2

Find the m and b which minimize the SSE.

SolutionSince

SSE =n∑

i=1

(yi −mxi − b)2

We have

∂SSE

∂m=

n∑i=1

2(yi −mxi − b)(−xi )= −2n∑

i=1

xiyi + 2mn∑

i=1

x2i + 2b

n∑i=1

xi

∂SSE

∂b=

n∑i=1

2(yi −mxi − b)(−1) = −2n∑

i=1

yi + 2mn∑

i=1

xi + 2bn

If (m, b) is a critical point, then

−2n∑

i=1

xiyi + 2mn∑

i=1

x2i + 2b

n∑i=1

xi = 0 =⇒ mn∑

i=1

x2i + b

n∑i=1

xi =n∑

i=1

xiyi

−2n∑

i=1

yi + 2mn∑

i=1

xi + 2bn = 0 =⇒ mn∑

i=1

xi + bn =n∑

i=1

yi

If E is the expected value (average) of a random variable, then weget

E (X 2)m + E (X )b = E (XY )

E (X )m + b = E (Y )

Using Cramer’s Rule we have

m =

∣∣∣∣E (XY ) E (X )E (Y ) 1

∣∣∣∣∣∣∣∣E (X 2) E (X )E (X ) 1

∣∣∣∣ =E (XY )− E (X )E (Y )

E (X 2)− E (X )2=

Cov(X , Y )

Var(X )

Also,

b =

∣∣∣∣E (X 2) E (XY )E (X ) E (Y )

∣∣∣∣∣∣∣∣E (X 2) E (X )E (X ) 1

∣∣∣∣ = E (X )− E (Y )Cov(X , Y )

Var(X )

Solution

This means the line of best fit is

y =2

5x +

9

10.

Lesson 26: Optimization II: Data Fitting

Technology

Transcript of Lesson 26: Optimization II: Data Fitting