MulticollinearityS. K. Bhaumik
Issues for discussion
1. Definition
2. Consequences
3. Tests
4. Some Remedial Measures
Reference: Sankar Kumar Bhaumik, Principles of Econometrics: A
Modern Approach Using EViews, Oxford University Press, 2015, Ch. 6
Definition
❖ While estimating multiple regression models, quite
often we obtain unsatisfactory results.
❖ This happens due to high variances and hence high
values of standard errors of the estimated coefficients.
❖ This is possible when there is little variation in
explanatory variables or high inter-correlations
among the explanatory variables or both.
❖ Muticollinearity represents a situation where the
explanatory variables of the multiple regression model
get highly correlated.
❖ So muticollinearity problem arises only in the case of
multiple regressions. 2
o Multicollinearity represents lack of independent
movement in the sample data on explanatory variables
→ multicollinearity is a feature of sample data rather
than that of population.
o When multicollinearity is present, it becomes difficult
to disentangle the separate effects of the explanatory
variables on the dependent variable.
Thus, if ),( 21 iii XXfY = , and iX1 and iX 2 are perfectly
correlated, then either could predict iY and the other
would become superfluous.
o Usually perfect correlation between the explanatory
variables is not observed → we observe high
correlation among the explanatory variables →
unable to obtain precise estimates for the unknown
parameters. 3
Consequences of Multicollinearity
The easiest way to understand the consequences of
multicollinearity problem is to compare simple and
multiple regression models.
Consider Simple Regression Models
iiYi XY 1111 ++= (1)
iiYiXY
2222 ++= (2)
and Multiple Regression Model
iiYiYiXXY +++=
21.212.1 (3)
Applying the OLS method, we obtain
2
1
1
1ˆ
i
ii
Yx
yx
= (4)
2
2
2
2ˆ
i
ii
Yx
yx
=
(5) 4
2
21
2
2
2
1
212
2
21
2.1)(
ˆ
iiii
iiiiiiiY
xxxx
xxyxxyx
−
−= (6)
2
21
2
2
2
1
211
2
12
1.2)(
ˆ
iiii
iiiiiiiY
xxxx
xxyxxyx
−
−=
(7)
Here 1ˆ
Y and 2ˆ
Y are two estimated slope coefficients
from simple regression models (1) and (2) respectively.
2.1ˆ
Y and 1.2ˆ
Y respectively are two estimated partial
regression coefficients from the multiple regression model
(3).
The lowercase letters denote deviation from sample
means for the variables. 5
We may define 111 SSrnyx YYii =
222 SSrnyx YYii =
211221 SSrnxx ii =
2
1
2
1 Snx i =
2
2
2
2 Snx i =
where 1S = standard deviation of iX1 ;
2S = standard deviation of iX 2 ;
YS = standard deviation of iY ;
1Yr = simple correlation between iY and iX1 ;
2Yr = simple correlation between iY and iX 2 ;
12r = simple correlation between iX1 and iX 2 ;
n = number of observations. 6
Putting the above values into the formulas for estimated
coefficients, we get
1
11ˆ
S
Sr YYY =
(8)
2
22ˆ
S
Sr YYY =
(9)
−
−=
1
2
12
21212.1
1ˆ
S
S
r
rrr YYYY (10)
−
−=
2
2
12
11221.2
1ˆ
S
S
r
rrr YYYY (11)
We now examine various cases depending upon the value
of 2
12r . 7
Case I: Absence of multicollinearity )0( 2
12=r
As 02
12 =r , equation (10) collapses to equation (8) and equation
(11) to equation (9) ⟹ we might abandon the multiple regression
and run two separate simple regressions.
However, there are two important points to remember here:
o How to obtain ̂ ?
We compute ̂ as
iYiYi XXY 2211ˆˆˆ −−=
o In the case of simple regressions, the variances of 1ˆ
Y and
2ˆ
Y would be upwardly biased (or greater than the variances
of 2.1ˆ
Y and 1.2ˆ
Y ).
Therefore, when we are working with multivariate data, it is
advisable to fit multiple regressions. 8
Case II: Perfect multicollinearity )1( 2
12=r
o As 12
12 =r , we cannot define equations (10) and (11)
o It would not be possible to obtain OLS estimates for
unknown parameters of the multiple regression
models.
Case III: High Degree of Multicollinearity (also called
Imperfect Multicollinearity) )1( 2
12r
o If 2
12r is close to unity, we say that there is high degree
of multicollinearity
o Here it would be possible to perform OLS estimation
of the multiple regression model, but the variances of
the estimates would be very large. 9
The above point may be explained using the formulas
used for computation of variances of 2.1ˆ
Y and 1.2ˆ
Y .
)1()ˆ(
2
12
2
1
2
2.1rx
Vari
Y−
=
)1()ˆ(
2
12
2
2
2
1.2rx
Vari
Y−
=
It is clear that
o 12
12 =r provides an undefined computation; and
o High value of 2
12r (close to 1) makes the variances of
the estimated partial regression coefficients large.
10
Tests for Multicollinearity
❖ Klein’s Rule: Multicollinearity would be regarded as a
problem if 22
kY RR .
Here 2
YR = squared multiple correlation coefficient between iY
and explanatory variables kiii XXX ,....,, 21 ; and
2
kR = squared multiple correlation coefficient between thk
explanatory variable and other explanatory variables.
Limitation: It cannot always correctly diagnose the presence of
multicollinearity in data. In particular, it has been found that
there are instances where in spite of 22
kY RR we still have small
variances of the OLS estimates and hence significant t-ratios.
11
❖ The Variance-Inflation Factor (VIF)
We know that multicollinearity produces high variances for the
OLS estimates and hence low t-ratios and insignificant
regression results → one way to understand the presence of
multicollinearity → compare the variances of OLS estimates for
two situations:
(i) where multicollinearity is absent (“ideal situation”); and
(ii) where multicollinearity is present (“observed situation”).
When multicollinearity is present, the variance of the estimated
coefficient of kth explanatory variable is measured by:
)1()ˆ(
22
2
kki
kRx
Var−
=
Under the ‘ideal situation”, 02 =kR , so that
2
2
)ˆ(ki
kx
Var
=
12
The VIF compares these two situations by taking a ratio of the
two variances.
2
2
2
22
2
1
1)1()ˆ(
k
ki
kki
kR
x
RxVIF
−=
−=
Now if 1)ˆ( =kVIF , we say that there is no multicollinearity
while 1)ˆ( kVIF indicates its presence.
Rule of thumb: 10VIF ⟹ serious multicollinearity
involving the corresponding explanatory variable.
Two points to note:
(i) VIF values are computed for each of the estimated slope
coefficients.
(ii) VIF values help to identify the multicollinear variables. 13
Limitations:
1. It comes more as a complaint that things are not
ideal; and
2. 2
kR is not the only factor responsible for inflating
)ˆ( kVar , it might also be due low value of 2
kix .
❖ Tolerance
Tolerance is the reciprocal of VIF:
211
kR
VIFtolerance −==
Obviously, here the rule of thumb is that tolerance values
of 0.10 or less indicate presence of serious
multicollinearity. 14
❖ The Condition Number (CN)
While VIF is computed for each of the estimated
coefficients, the condition number is an overall measure.
It conveys the status or condition of the data matrix X.
The formula used to compute condition number is:
)(
)(
XXmatrixtheofvalueeigenLowest
XXmatrixtheofvalueeigenHighestCN
=
Rules of thumb:
o 1=CN ⟹ no multicollinearity
o 101 CN ⟹ multicollinearity is negligible
o 3010 CN ⟹ moderate to strong multicollinearity
o 30CN ⟹ severe multicollinearity 15
Limitations of CN:
• It is also more of a complaint that things are not ideal.
• It has been shown that CN value may change with a
reparametrization of the variables. It may be brought closer
to 1 with suitable transformation of the variables.
Important points to remember:
✓ The tests of multicollinearity may provide some broad idea
about its presence. But they are of limited use from the
practical point of view.
✓ To understand the severity of the multicollinearity problem,
we should also look into the values of standard errors and t-
ratios for the estimated coefficients as also their statistical
significance.
✓ Apart from inter-correlations amongst the explanatory
variables, one should look into other aspects like standard
errors of the estimated coefficients, their t-ratios, 2R and
2R , F-ratio, and so on, to assess the usefulness of the
estimated multiple regression models. 16
Remedial Measures
• Of all econometric problems, multicollinearity is the most serious one
→ no measure can completely remove it when present in data.
• The measures discussed below only attempt to minimize its impact so
that reasonable regression results are obtained.
1. Increasing Sample Size → helps to reduce the severity of
multicollinearity → this becomes clear from the formulas for
variances of the estimates.
When multicollinearity is present, the variance of the estimated
coefficient of thk explanatory variable is given by
)1()ˆ(
22
2
kki
kRx
Var−
=
Now as sample size increases, 2
kix increases and )ˆ( KVar falls, unless
in all cases additional observations on kiX are equal to kiX , which is
most unlikely to happen. 17
Of course, we are unsure about what will happen to 2
kR when sample
size increases. But it is possible that with increasing sample size 2
kR
would also fall, which further reduces )ˆ( KVar .
2. Transformation of Variables:
• It has been found that the intensity of multicollinearity gets reduced
when transformed variables (ratio, first-difference etc.) are used
instead of variables in ‘levels’.
• For example, in a three variable model, although the levels of tX1
and tX 2 might be correlated, there is no a priori reason to believe
that the first-difference of these variables, )( 111 −− tt XX and
)( 122 −− tt XX , would also be correlated. However, there might be
autocorrelation in the first-difference regression model. 18
3. Dropping Variables:
• One of the easiest ways to overcome the multicollinearity problem.
• After identification of the multicollinear variables, the researcher
often drops some of them from the model.
• This is justified if the model included a large number of
explanatory variables of which all are not important.
• However, the implication of dropping variables approach needs to
be understood ⟶ it is necessary to clarify when dropping a
variable(s) is justified.
Suppose our three-variable multiple regression model is
iiii XXY ++= 2211 (12)
[We excluded the intercept term for the sake of simplicity of discussion.]
Assume: iX1 and iX 2 are highly correlated and we are interested about
iX1 and drop iX 2 to avoid multicollinearity. Then our model becomes
iii vXY += 11 (13)
Equation 12 → the “complete model”
Equation 13 → the “omitted variable model” 19
Let the OLS estimate of 1 from the complete model be 1̂ and that
from the omitted variable model be 1
~ . As regards 1̂ , we know that
=)ˆ( 1E and )1()ˆ(
2
12
2
1
2
1rX
Vari −
=
For 1
~ , we have to compute )
~(E and )
~(Var .
2
1
1
1
~
i
ii
X
YX
=
2
1
22111 )(
i
iiii
X
XXX
++=
2
1
1
2
1
21
21
i
ii
i
ii
X
X
X
XX
+
+=
Thus, 12
1
21
211)~
(
+=
i
ii
X
XXE ]0)([ =iE
This implies that 1
~ is a biased estimate.
20
2
11 )]~
(~
[)~
( EEVar −=
2
2
1
1
=
i
ii
X
XE
22
122
1 )(
1i
i
XX
= ])([ 22
=
iE
2
1
2
iX=
It shows that 1
~ , the OLS estimate of from the
omitted variable model, is biased but it has smaller
variance than 1̂
Implication →
• On the basis of unbiasedness property, 1̂ seems
preferable when the two variables are highly
correlated.
• On the basis of minimum variance property, 1
~ seems
preferable. 21
When one estimate is unbiased but not having minimum
variance, while the other estimate is biased but has
minimum variance, we face a difficult choice problem.
To overcome this problem, we compare the MSE for the
two estimates → this also helps to correctly choose from
two alternative estimates.
Let us consider the ratio of two mean-square errors.
)ˆ()ˆ(
)~
()~
(
)ˆ(
)~
(
1
2
1
1
2
1
1
1
Varbias
Varbias
MSE
MSE
+
+=
)ˆ(
)~
(
)ˆ(
)~
(
1
1
1
2
1
Var
Var
Var
bias+=
]0)ˆ([ 1 =bias 22
Here
)1(
)ˆ(
)~
(
2
12
2
1
2
2
2
1
21
2
1
2
1
rX
X
XX
Var
bias
i
i
ii
−
=
2
2
12
2
22
22
2
2
1
2
21 )1()(
rX
XX
XX i
ii
ii −
=
)ˆ(
1
2
2
2
2
12
Var
r=
2
2
2
12tr= (14)
Note that in equation 14, 2t is not estimated but ‘true t-ratio’
for iX 2 .
Further,
2
12
2
12
2
1
2
2
1
2
1
1 1
)1(
)ˆ(
)~
(r
rX
X
Var
Var
i
i −=
−
=
(15)
23
Now using equations 14 and 15, we can write
)1(1)1()ˆ(
)~
( 2
2
2
12
2
12
2
2
2
12
1
1 −+=−+= trrtrMSE
MSE
• Thus, if 12 t , )ˆ()~
( 11 MSEMSE and 1
~ should be
preferred.
• But as 2t is not known, we use 2̂t (i.e., estimated t-value)
from equation 2. Then, as an estimate of 1 , we may use
the conditional-omitted-variable estimator )~~
( 1 , which is
defined as
=
1ˆ~
1ˆˆ~~
21
21
1
tifestimatevariableomittedthe
tifestimateOLSthe
Implication →only if the computedt-valuefor a variable is less
than 1, we might drop that variable and accept the OV model.
Otherwise, continue with the complete model.
Limitation of the OV approach → this approach would not be
very attractive when a limited number of variables have been
used in the model. 24
Top Related