Lecture 4
Econ 488
Ordinary Least Squares (OLS)
Objective of OLS Minimize the sum of squared residuals:
where
Remember that OLS is not the only possible estimator of the βs.
But OLS is the best estimator under certain assumptions…
n
iie
1
2
ˆmin
iKiKiii XXXY ...22110
iii YYe ˆ
Classical Assumptions1. Regression is linear in parameters2. Error term has zero population mean3. Error term is not correlated with X’s4. No serial correlation5. No heteroskedasticity6. No perfect multicollinearity and we usually add:7. Error term is normally distributed
Assumption 1: Linearity
The regression model: A) is linear
It can be written as
This doesn’t mean that the theory must be linear For example… suppose we believe that CEO salary is
related to the firm’s sales and CEO’s tenure. We might believe the model is:
iKiKiii XXXY ...22110
iiiii tenuretenuresalessalary 23210 )log()log(
Assumption 1: Linearity
The regression model: B) is correctly specified
The model must have the right variables No omitted variables The model must have the correct functional form This is all untestable We need to rely on economic
theory.
Assumption 1: Linearity
The regression model: C) must have an additive error term
The model must have + εi
Assumption 2: E(εi)=0Error term has a zero population meanE(εi)=0
Each observation has a random error with a mean of zero
What if E(εi)≠0?
This is actually fixed by adding a constant (AKA intercept) term
Assumption 2: E(εi)=0Example: Suppose instead the mean of εi
was -4.Then we know E(εi+4)=0
We can add 4 to the error term and subtract 4 from the constant term:
Yi =β0+ β1Xi+εi
Yi =(β0-4)+ β1Xi+(εi+4)
Assumption 2: E(εi)=0Yi =β0+ β1Xi+εi
Yi =(β0-4)+ β1Xi+(εi+4)
We can rewrite:Yi =β0*+ β1Xi+εi*
Where β0*= β0-4 and εi*=εi+4
Now E(εi*)=0, so we are OK.
Assumption 3: ExogeneityImportant!!All explanatory variables are uncorrelated
with the error termE(εi|X1i,X2i,…, XKi,)=0
Explanatory variables are determined outside of the model (They are exogenous)
Assumption 3: ExogeneityWhat happens if assumption 3 is violated?Suppose we have the model,Yi =β0+ β1Xi+εi
Suppose Xi and εi are positively correlated
When Xi is large, εi tends to be large as well.
Assumption 3: Exogeneity
“True” Line
-40
-20
0
20
40
60
80
100
120
0 5 10 15 20 25
“True Line”
Assumption 3: Exogeneity
“True” Line
“True Line”
Data
-40
-20
0
20
40
60
80
100
120
0 5 10 15 20 25
“True Line”
Data
Assumption 3: Exogeneity
-40
-20
0
20
40
60
80
100
120
0 5 10 15 20 25
“True Line”
Data
Estimated Line
Assumption 3: Exogeneity
Why would x and ε be correlated?Suppose you are trying to study the
relationship between the price of a hamburger and the quantity sold across a wide variety of Ventura County restaurants.
Assumption 3: Exogeneity
We estimate the relationship using the following model:
salesi= β0+β1pricei+εi
What’s the problem?
Assumption 3: Exogeneity
What’s the problem? What else determines sales of hamburgers? How would you decide between buying a burger
at McDonald’s ($0.89) or a burger at TGI Fridays ($9.99)?
Quality differs salesi= β0+β1pricei+εi quality isn’t an X
variable even though it should be. It becomes part of εi
Assumption 3: Exogeneity
What’s the problem? But price and quality are highly positively
correlated Therefore x and ε are also positively correlated. This means that the estimate of β1will be too
high This is called “Omitted Variables Bias” (More in
Chapter 6)
Assumption 4: No Serial CorrelationSerial Correlation: The error terms across
observations are correlated with each other
i.e. ε1 is correlated with ε2, etc.
This is most important in time seriesIf errors are serially correlated, an
increase in the error term in one time period affects the error term in the next.
Assumption 4: No Serial Correlation The assumption that there is no serial
correlation can be unrealistic in time seriesThink of data from a stock market…
Assumption 4: No Serial Correlation
-500
0
500
1000
1500
2000
1870 1920 1970 2020
Year
Rea
l S&
P 5
00 S
tock
Pri
ce Ind
ex
Price
Stock data is serially correlated!
Assumption 5: Homoskedasticity
Homoskedasticity: The error has a constant variance
This is what we want…as opposed toHeteroskedasticity: The variance of the
error depends on the values of Xs.
Assumption 5: Homoskedasticity
Homoskedasticity: The error has constant variance
Assumption 5: Homoskedasticity
Heteroskedasticity: Spread of error depends on X.
Assumption 5: Homoskedasticity
Another form of Heteroskedasticity
Assumption 6: No Perfect MulticollinearityTwo variables are perfectly collinear if one
can be determined perfectly from the other (i.e. if you know the value of x, you can always find the value of z).
Example: If we regress income on age, and include both age in months and age in years. But age in years = age in months/12 e.g. if we know someone is 246 months old, we
also know that they are 20.5 years old.
Assumption 6: No Perfect MulticollinearityWhat’s wrong with this?incomei= β0 + β1agemonthsi +
β2ageyearsi + εi
What is β1?It is the change in income associated with
a one unit increase in “age in months,” holding age in years constant. But if you hold age in years constant, age in
months doesn’t change!
Assumption 6: No Perfect Multicollinearity
β1 = Δincome/Δagemonths
Holding Δageyears = 0If Δageyears = 0; then Δagemonths = 0So β1 = Δincome/0
It is undefined!
Assumption 6: No Perfect MulticollinearityWhen more than one independent variable
is a perfect linear combination of the other independent variables, it is called Perfect MultiCollinearity
Example: Total Cholesterol, HDL and LDLTotal Cholesterol = LDL + HDLCan’t include all three as independent
variables in a regression.Solution: Drop one of the variables.
Assumption 7: Normally Distributed Error
Assumption 7: Normally Distributed Error
This is required not required for OLS, but it is important for hypothesis testing
More on this assumption next time.
Putting it all together
Last class, we talked about how to compare estimators. We want:
1. is unbiased. on average, the estimator is equal to the population
value
2. is efficient The variance of the estimator is as small as possible
)ˆ(E
Putting it all togehter
Gauss-Markov Theorem
Given OLS assumptions 1 through 6, the OLS estimator of βk is the minimum variance estimator from the set of all linear unbiased estimators of βk for k=0,1,2,…,K
OLS is BLUEThe Best, Linear, Unbiased Estimator
Gauss-Markov Theorem
What happens if we add assumption 7?Given assumptions 1 through 7, OLS is
the best unbiased estimatorEven out of the non-linear estimatorsOLS is BUE?
Gauss-Markov Theorem
With Assumptions 1-7 OLS is: 1. Unbiased: 2. Minimum Variance – the sampling distribution
is as small as possible 3. Consistent – as n∞, the estimators
converge to the true parameters As n increases, variance gets smaller, so each estimate
approaches the true value of β. 4. Normally Distributed. You can apply
statistical tests to them.
)ˆ(E
Top Related