Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of...

22
Multicollinearity • Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 • Including the omitted variable in a multiple regression solves the problem. • The multiple regression finds the coefficient on X 1 , holding X 2 fixed.
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of...

Page 1: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

Multicollinearity

• Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1

• Including the omitted variable in a multiple regression solves the problem.

• The multiple regression finds the coefficient on X1, holding X2 fixed.

Page 2: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

0 1 1 2 2

1

1 1

1 2

0

1

0

i i i i

i

i i

i i

Y X X

w

w X

w X

Multicollinearity (cont.)

• Multivariate Regression finds the coefficient on X1, holding X2 fixed.

• To estimate 1, OLS requires:

• Are these conditions always possible?

Page 3: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

w1iX2i 0

Multicollinearity (cont.)

• To strip out the bias caused by the correlation between X1 and X2 , OLS has to impose the restriction

• This restriction in essence removes those parts of X1 that are correlated with X2

• If X1 is very correlated with X2, OLS doesn’t have much left-over variation to work with.

• If X1 is perfectly correlated with X2, OLS has nothing left.

Page 4: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

Multicollinearity (cont.)

• Suppose X2 is simply a function of X1

• For some silly reason, we want to estimate the returns to an extra year of education AND the returns to an extra month of education.

• So we stick in two variables, one recording the number of years of education and one recording the number of months of education.

X1 12·X2

Page 5: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

Multicollinearity (cont.)

1 2

0 1 1 2 2

0 1 2 2 2

0 1 2 2

1 2 1 2

12

(12 )

(12 )

, 12 .

Suppose the marginal contribution of another

month of schooling is .

We can pick any so long as

We cannot uniquely id

X X

Y X X

Y X X

Y X

entify our coefficients.

Page 6: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

1

1 1 1 2

1 2

1 2 1 2

1 0

12

12 ( ) 1 0

We need such that

AND

Substituting in ...

AND

i

i i i i

i i

i i i i

w

w X w X

X X

w X w X

Multicollinearity (cont.)

• Let’s look at this problem in terms of our unbiasedness conditions.

• No weights can do both these jobs!

Page 7: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

X1 aX2 bX3 cX4

Multicollinearity (cont.)

• Bottom Line: you CANNOT add variables that are perfectly correlated with each other (and nearly perfect correlation isn’t good).

• You CANNOT include a group of variables that are a linear combination of each other:

• You CANNOT include a group of variables that sum to 1 and also include a constant.

Page 8: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

Multicollinearity (cont.)

• Multicollinearity is easy to fix. Simply omit one of the troublesome variables.

• Maybe you can find more data for which your variables are not multicollinear. This isn’t possible if your variables are weighted sums of each other by definition.

Page 9: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

Checking Understanding

• You have a cross-section of workers from 1999. Which of the following variables would lead to multicollinearity?

1. A Constant, Year of birth, Age

2. A Constant, Year of birth, Years since they finished high school

3. A Constant, Year of birth, Years since they started working for their current employer

Page 10: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

Checking Understanding (cont.)

1. A Constant, Year of Birth, and Age will be a problem.

• These variables will be multicollinear (or nearly multicollinear, which is almost as bad).

1999 -

1999·1 -1·

(except for some

slight slippage from month of birth)

Age Birthyear

Age Birthyear

Page 11: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

Checking Understanding (cont.)

2. A Constant, Year of Birth, and Years Since High School PROBABLY suffers from ALMOST perfect multicollinearity.

• Most Americans graduate from high school around age 18. If this is true in your data, then

1999 - Birthyear 18 Years Since GraduationBirthyear 1·(1999 18) -1·(Years since H .S.)

Page 12: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

Checking Understanding (cont.)

3. A Constant, Birthyear, Years with Current Employer is very unlikely to be a problem.

• There is usually ample variation in the ages at which different workers begin their employment with a particular firm.

Page 13: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

• Multicollinearity

• When two or more of the explanatory variables are highly related (correlated)

• Collinearity exists so the question is how much before it becomes a problem.

• Perfect multicollinearity

• Imperfect Multicollinearity

Page 14: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

• Using the Ballantine

Page 15: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

• Detecting Multicollinearity

1. Check simple correlation coefficients (r)

If |r| > 0.8, then multicollinearity may be a problem

2. Perform a t-test at on the correlation coefficient

221

2

r

nrtn

Page 16: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

3. Check Variance Inflation Factors (VIF) or the Tolerance (TOL)

• Run a regression of each X on the other Xs

• Calculate the VIF for each Bhati

)1(

1)ˆ(

2i

i RVIF

Page 17: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

• The higher VIF, the severity of the problem of multicollinearity

• If VIF is greater than 5, then there might be a problem (arbitrarily chosen)

)ˆ(1 ivif

Page 18: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

• Tolerance (TOR) = (1 – Rsq)

0 < TOR < 1

If TOR is close to zero then multicollinearity is severe.

You could use VIF or TOR.

Page 19: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

• EFFECTS OF MULTICOLLINEARITY

1. OLS estimates are still unbiased

2. Standard error of the estimated coefficients will be inflated

3. t- statistics will be small

4. Estimates will be sensitive to small changes, either from dropping a variable or adding a few more observations

Page 20: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

• With multicollinearity, you may accept Ho for all your t-test but reject Ho for you F-test

Page 21: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

Dealing with Multicollinearity

1. Ignore It.

Do this if multicollinearity is not causing any problems.

i.e. if the t-statistics are insignificant and unreliable then do something. If not, do nothing

Page 22: Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X 1 Including the omitted variable.

2. Drop a variable.

If two variables are significantly related, drop one of them (redundant)

3. Increase the sample size

The larger the sample size the more accurate the estimates