Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of...
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of...
Multicollinearity
• Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1
• Including the omitted variable in a multiple regression solves the problem.
• The multiple regression finds the coefficient on X1, holding X2 fixed.
0 1 1 2 2
1
1 1
1 2
0
1
0
i i i i
i
i i
i i
Y X X
w
w X
w X
Multicollinearity (cont.)
• Multivariate Regression finds the coefficient on X1, holding X2 fixed.
• To estimate 1, OLS requires:
• Are these conditions always possible?
w1iX2i 0
Multicollinearity (cont.)
• To strip out the bias caused by the correlation between X1 and X2 , OLS has to impose the restriction
• This restriction in essence removes those parts of X1 that are correlated with X2
• If X1 is very correlated with X2, OLS doesn’t have much left-over variation to work with.
• If X1 is perfectly correlated with X2, OLS has nothing left.
Multicollinearity (cont.)
• Suppose X2 is simply a function of X1
• For some silly reason, we want to estimate the returns to an extra year of education AND the returns to an extra month of education.
• So we stick in two variables, one recording the number of years of education and one recording the number of months of education.
X1 12·X2
Multicollinearity (cont.)
1 2
0 1 1 2 2
0 1 2 2 2
0 1 2 2
1 2 1 2
12
(12 )
(12 )
, 12 .
Suppose the marginal contribution of another
month of schooling is .
We can pick any so long as
We cannot uniquely id
X X
Y X X
Y X X
Y X
entify our coefficients.
1
1 1 1 2
1 2
1 2 1 2
1 0
12
12 ( ) 1 0
We need such that
AND
Substituting in ...
AND
i
i i i i
i i
i i i i
w
w X w X
X X
w X w X
Multicollinearity (cont.)
• Let’s look at this problem in terms of our unbiasedness conditions.
• No weights can do both these jobs!
X1 aX2 bX3 cX4
Multicollinearity (cont.)
• Bottom Line: you CANNOT add variables that are perfectly correlated with each other (and nearly perfect correlation isn’t good).
• You CANNOT include a group of variables that are a linear combination of each other:
• You CANNOT include a group of variables that sum to 1 and also include a constant.
Multicollinearity (cont.)
• Multicollinearity is easy to fix. Simply omit one of the troublesome variables.
• Maybe you can find more data for which your variables are not multicollinear. This isn’t possible if your variables are weighted sums of each other by definition.
Checking Understanding
• You have a cross-section of workers from 1999. Which of the following variables would lead to multicollinearity?
1. A Constant, Year of birth, Age
2. A Constant, Year of birth, Years since they finished high school
3. A Constant, Year of birth, Years since they started working for their current employer
Checking Understanding (cont.)
1. A Constant, Year of Birth, and Age will be a problem.
• These variables will be multicollinear (or nearly multicollinear, which is almost as bad).
1999 -
1999·1 -1·
(except for some
slight slippage from month of birth)
Age Birthyear
Age Birthyear
Checking Understanding (cont.)
2. A Constant, Year of Birth, and Years Since High School PROBABLY suffers from ALMOST perfect multicollinearity.
• Most Americans graduate from high school around age 18. If this is true in your data, then
1999 - Birthyear 18 Years Since GraduationBirthyear 1·(1999 18) -1·(Years since H .S.)
Checking Understanding (cont.)
3. A Constant, Birthyear, Years with Current Employer is very unlikely to be a problem.
• There is usually ample variation in the ages at which different workers begin their employment with a particular firm.
• Multicollinearity
• When two or more of the explanatory variables are highly related (correlated)
• Collinearity exists so the question is how much before it becomes a problem.
• Perfect multicollinearity
• Imperfect Multicollinearity
• Using the Ballantine
• Detecting Multicollinearity
1. Check simple correlation coefficients (r)
If |r| > 0.8, then multicollinearity may be a problem
2. Perform a t-test at on the correlation coefficient
221
2
r
nrtn
3. Check Variance Inflation Factors (VIF) or the Tolerance (TOL)
• Run a regression of each X on the other Xs
• Calculate the VIF for each Bhati
)1(
1)ˆ(
2i
i RVIF
• The higher VIF, the severity of the problem of multicollinearity
• If VIF is greater than 5, then there might be a problem (arbitrarily chosen)
)ˆ(1 ivif
• Tolerance (TOR) = (1 – Rsq)
0 < TOR < 1
If TOR is close to zero then multicollinearity is severe.
You could use VIF or TOR.
• EFFECTS OF MULTICOLLINEARITY
1. OLS estimates are still unbiased
2. Standard error of the estimated coefficients will be inflated
3. t- statistics will be small
4. Estimates will be sensitive to small changes, either from dropping a variable or adding a few more observations
• With multicollinearity, you may accept Ho for all your t-test but reject Ho for you F-test
Dealing with Multicollinearity
1. Ignore It.
Do this if multicollinearity is not causing any problems.
i.e. if the t-statistics are insignificant and unreliable then do something. If not, do nothing
2. Drop a variable.
If two variables are significantly related, drop one of them (redundant)
3. Increase the sample size
The larger the sample size the more accurate the estimates