Chapter 13: SIMPLE LINEAR REGRESSION. 2 Simple Regression Linear Regression.
Simple Linear Regression Estimation and Properties
Transcript of Simple Linear Regression Estimation and Properties
![Page 1: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/1.jpg)
Simple Linear RegressionEstimation and Properties
![Page 2: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/2.jpg)
Outline• Review of the Reading• Estimate parameters using OLS• Other features of OLS– Numerical Properties of OLS– Assumptions of OLS– Goodness of Fit
![Page 3: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/3.jpg)
Checking Understanding• What is the best estimate of E(Y)?• How would we find E(Y|Xi)?
• Y = B1 + B2X + u– What is B1?– What is B2?– What is u?
![Page 4: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/4.jpg)
Checking Understanding• What is a z-score?
• What is the mean of z(x)?• What is the standard deviation of z(x)?
z(x) =x� x
�x
![Page 5: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/5.jpg)
Checking Understanding• What is a z-score?
• Correlation:
r =
Pzxzy
n� 1
z(x) =x� x
�x
![Page 6: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/6.jpg)
Checking Understanding• Correlation:
• The regression line in z-scores:
r =
Pzxzy
n� 1
zy = mzx
![Page 7: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/7.jpg)
Checking Understanding• Correlation:
• The regression line in z-scores:• Can also be written as:
• Can also be written as:
r =
Pzxzy
n� 1
zy = mzxzy = mzx
zy = rzx
![Page 8: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/8.jpg)
Checking Understanding• Correlation:
• The regression line in z-scores:• Can also be written as: • Can also be written as:
• Remember:
r =
Pzxzy
n� 1
zy = mzxzy = mzxzy = rzx
m =cov(X,Y )
var(x)
![Page 9: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/9.jpg)
And What is Covariance?
• Cov(X,Y) = E[(X-E[X])(Y-E[Y])]• Cov(X,Y) = E[XY]-E[X]E[Y]• Covariance is positive if x and y are both
below their mean or both above their mean. It is negative if x is above its mean while y is below its mean or vice versa.
�xy = cov(X,Y ) = E[(X � µx)(Y � µy)]
�xy = cov(X,Y ) = E[(X � x)(Y � y)]
![Page 10: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/10.jpg)
And What is Covariance?
• Cov(X,Y) = E[ ( X - E[X] ) ( Y - E[Y] ) ]• Cov(X,Y) = E[XY] - E[X] E[Y]• Covariance is positive if x and y are both
below their mean or both above their mean. It is negative if x is above its mean while y is below its mean or vice versa.
• But it has units. It is easy to interpret the sign, but hard to interpret the number
�xy = cov(X,Y ) = E[(X � µx)(Y � µy)]�xy = cov(X,Y ) = E[(X � x)(Y � y)]
![Page 11: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/11.jpg)
Total Population of Money Spent and the Number of Votes
Effect of Money on Votes
Num
ber o
f Vot
es
0
12500
25000
37500
50000
Amount Spent- in millions0 3 5 8 10
![Page 12: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/12.jpg)
What we can see from the graph
• We can see the average value of Y for each value of X– These are the conditional expected values E(Y|X)
• If we join the conditional values of Y given each value of X we get the – Population Regression Line
![Page 13: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/13.jpg)
Population Regression Function and the Linear Model
• E(Y|Xi)=f(Xi)– The expected value of the distribution of Y,
given Xi is functionally related to Xi
• E(Y|Xi)=B1+B2Xi
![Page 14: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/14.jpg)
Two interpretations of linearity• Linear in Variables
– Which of the following is linear in variables and why?:• E(Y|Xi)=B1+B2Xi
2
• E(Y|Xi)=B1+B2Xi
• Linear in Parameters– Which of the following is linear in parameters and why?
• E(Y|Xi)=B1+B2Xi2
• E(Y|Xi)=B1+B22Xi
• Why Should We Care?– Linear Regression Requires linearity in parameters only
![Page 15: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/15.jpg)
Straight Line
Y=B1+B2Xi
![Page 16: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/16.jpg)
Quadratic
Y=B1+B2X+B3X2
![Page 17: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/17.jpg)
Adding in the Stochastic Term
• Yi=E(Y|Xi) + ui
• Systematic Component: E(Y|Xi)• Stochastic Disturbance: U
![Page 18: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/18.jpg)
The Sample Regression Function (SRF)
• Because of sampling fluctuation, any sample will only approximate our true Population Regression Function
• Stochastic form of the SRF:
![Page 19: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/19.jpg)
Primary Goal in Regression Analysis
• We want to estimate the PRF– Yi=B1+B2Xi+ui
• On the basis of the SRF
![Page 20: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/20.jpg)
One method• Choose the Sample Regression Function
such that the sum of the residuals is as small as possible
![Page 21: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/21.jpg)
Illustration and Problem
X
Y
u1=10
u2=-2
u3=2
u4=-10
![Page 22: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/22.jpg)
Alternative Method• Ordinary Least Squares (OLS) is a method of
finding the linear model which minimizes the sum of the squared errors.
– Example: (10)2 + (-2)2 + (2)2 + (-10)2 = 208
• This method is the best, linear unbiased estimator
![Page 23: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/23.jpg)
Good Spot for a break
![Page 24: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/24.jpg)
Minimizing the Sum of Squares• Our goal is to minimize the sum of the
squared errors.
• Since we have two unknowns, B1 and B2, we need to take the partial derivatives for the following equation:
![Page 25: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/25.jpg)
Partial Derivatives for B’s
• We start with our original equation:
• Now we take the partial derivatives– First equation is the partial derivative with respect to
B1,
– Second equation is with respect to B2
![Page 26: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/26.jpg)
Set Equal to Zero• Last set of equations:
• Next:
![Page 27: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/27.jpg)
The Normal Equations• Last:
• Divide both equations by –2• Multiply through• Separate summation terms and rearrange:
![Page 28: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/28.jpg)
Rewriting the Equation• Last Equation:
• We can rewrite
![Page 29: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/29.jpg)
Solving Equation• We have two equations with two unknowns, for which we
can use algebra
• Multiply first equation by sum of Xi and second by n• End up with…
![Page 30: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/30.jpg)
Subtract first equation from second and rearranging
![Page 31: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/31.jpg)
Last step• Last equation
• Multiply numerator and denominator by 1/n…recall that
• End up with
![Page 32: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/32.jpg)
We can now solve for B1
• If we go back to the first normal equation:
![Page 33: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/33.jpg)
What Does B2 Mean?
• Equation for B2 may not seem to make intuitive sense at first
• But if we break it down into pieces we can begin to see the logic
![Page 34: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/34.jpg)
In sum…
• If the changes in X are EQUAL to the changes in y, then B2 = 1
• If the changes in Y are LARGER than the changes in X, then B2 > 1
• If the changes in Y are SMALLER than the changes in X, then B2 < 1
![Page 35: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/35.jpg)
Let’s Do An Example!
![Page 36: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/36.jpg)
Calculating a and b• Mean of X is 4• Mean of Y is 12.71429
![Page 37: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/37.jpg)
Calculating B1 and B2
![Page 38: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/38.jpg)
Which Looks Like…This!Regression of Y on X
0
8
15
23
30
0 2 4 6 8
![Page 39: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/39.jpg)
Practice Problem• We have a sample of the amount of
money a each candidate spent in a state (in millions) and the percentage of the vote they received.
• Calculate the regression line and interpret.
![Page 40: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/40.jpg)
Data
State % vote Money spentCA 40 10FL 35 12GA 15 4MO 20 6OH 40 11VT 25 8
![Page 41: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/41.jpg)
Numerical Properties of OLS• Those properties that result from the method of
OLS– Expressed from observable quantities of X and Y– Point Estimator for B’s– Sample regression line passes through sample
means of Y and X– Sum of residuals is zero– Residuals are uncorrelated with the predicted Yi
– Residuals uncorrelated with Xi
![Page 42: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/42.jpg)
Assumptions of Classical Linear Regression
• A1: Linear Regression Model-Linear in parameters
• A2: X values are fixed in repeated sampling.
• A3: Zero mean value of the disturbance term ui
• A4: Homoskedasticity or Equal Variance of ui.
![Page 43: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/43.jpg)
More Assumptions• A5: No autocorrelation between disturbances
• A6: Zero covariance between ui and Xi
• A7: Number of observations n is greater than the number of parameters to be estimated
• A8: Variability in X values
![Page 44: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/44.jpg)
More Assumptions• A9: Regression model is correctly
specified.– The correct variables are included– We have the correct functional form– Correct assumptions about the probability
distributions of Yi, Xi and ui.• A10: With multiple regression, we add the
assumption of no perfect multicollinearity
![Page 45: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/45.jpg)
How “good” does it fit?
• To measure “reduction in errors” we need a benchmark for comparison.
• The mean of the dependent variable is a relevant and tractable benchmark for comparing predictions.
• The mean of Y represents our “best guess” at the value of Yi absent other information.
![Page 46: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/46.jpg)
Sums of Squares
• This gives us the following 'sum-of-squares' measures:
• Total Variation = Explained Variation + Unexplained Variation
![Page 47: Simple Linear Regression Estimation and Properties](https://reader031.fdocuments.us/reader031/viewer/2022022406/6216c59580ab8e78041cf9e6/html5/thumbnails/47.jpg)
How well does our model perform?
• R squared statistic– = TSS-USS/TSS– =ESS/TSS• Bounded between 0 and 1• Higher values indicate a better fit• Lower values more unexplained than explained
variance