Post on 17-Jun-2018
Introductory Econometrics:
Intuition, Proof and Practice
The Basics
Jeffrey S. Zax
8/24/2008
-1-
Chapter 1: What is a Regression?
Section 1.0: The Basics
This chapter explains what a regression is and how to interpret it. Here are the essentials:
1. Section 1.4: The dependent or endogenous variable measures the behavior that we want
to explain with regression analysis.
2. Section 1.5: The explanatory, independent or exogenous variables measure things that
we think might determine the behavior that we want to explain. We usually think of them as
pre-determined.
3. Section 1.5: The slope estimates the effect of a change in the explanatory variable on the
value of the dependent variable.
4. Section 1.5: The t-statistic indicates whether the associated slope is reliable. The slope is
reliable if the prob-value associated with the t-statistic is .05 or less. In this case, we say
that the associated slope is statistically significant. This generally corresponds to an
absolute value of approximately two or greater for the t-statistic, itself. If the t-statistic has a
prob-value that is greater than .05, the associated slope coefficient is insignificant. This
means that it isn’t reliable.
-2-
5. Section 1.6: The intercept is usually uninteresting. It represents what everyone has in
common, rather than characteristics that might cause individuals to be different.
6. Section 1.6: We usually interpret only the slopes that are statistically significant. They
indicate the effect of their associated explanatory variables on the dependent variable
ceteris paribus, or holding all other characteristics constant that are included in the
regression.
7. Section 1.6: Continuous variables take on a wide range of values. Their slopes indicate
the change that would occur in the dependent variable if the value of the associated
explanatory variable increased by one unit.
8. Section 1.6: Discrete variables, sometimes called categorical variables, indicate the
presence or absence of a particular characteristic. Their slopes indicate the change that
would occur in the dependent variable if an individual who did not have that characteristic
were given it.
9. Section 1.7: Regression interpretation requires three steps. The first is to identify the
reliable slopes. The second is to understand their magnitudes. The third is to use this
understanding to verify or modify the behavioral intuition that motivated the regression in
the first place.
-3-
10. Section 1.7: Statistical significance is necessary in order to have interesting results, but not
sufficient. Important effects are those that are both statistically significant and substantively
large. Slopes that are statistically significant but substantively small indicate that the effects
of the associated explanatory variable can be reliably interpreted as unimportant.
11. Section 1.7: A proxy is a variable that is related to, but not exactly the variable we really
want. We use proxies when the variables we really want aren’t available. Sometimes this
makes interpretation difficult.
12, Section 1.8: If the prob-value associated with the F-statistic is .05 or less, the collective
effect of the ensemble of explanatory variables on the dependent variable is statistically
significant.
13. Section 1.8: Observations are the individual examples of the behavior under examination,
upon which the regression is based. All of the observations together constitute the sample.
14. Section 1.8: The R2, or coefficient of determination, represents the proportion of the
variation in the dependent variable that is explained by the explanatory variables. The
adjusted R2 modifies the R2 in order to take account of the numbers of explanatory variables
and observations. However, neither measures directly the reliability of the regression
results.
-4-
15. Section 1.9: F-statistics can be used to evaluate the contribution of a subset of explanatory
variables, as well as the collective statistical significance of all explanatory variables. In
both cases, the F-statistic is a transformation of R2 values.
16. Section 1.10: Regression results are useful only to the extent that the choice of variables in
the regression, variable construction and sample design are appropriate.
17. Section 1.11: Regression results may be presented in one of several different formats.
However, they all have to contain the same substantive information.
-5-
Chapter 2: The essential tool
Section 2.0: The Basics
This chapter reassures us that we can handle the material in this course. It reviews the
fundamental essential results regarding the summation, the principal algebraic tool. Here are the
essential things:
1. From section 2.1: This is not a math course. This is almost all just addition, subtraction,
multiplication and division. We can do this.
2. Equation 2.5, section 2.3: The summation of a constant is n times that constant:
a nai
n
=∑ =
1
.
3. Equation 2.7, section 2.3: Constants factor out of summations: ax a xii
n
ii
n
= =∑ ∑=
1 1.
Variables do not: .ax x aii
n
ii
n
= =∑ ∑≠
1 1
4. Equation 2.8, section 2.4: The average of the xi’s is .xx
n
ii
n
= =∑
1
-6-
5. Equation 2.9 and 2.10 , section 2.4: Weighted averages are ,xa x
nw
i ii
n
= =∑
1
where .a nii
n
=∑ =
1
6. Equation 2.14, section 2.5: The summation of a sum is the sum of the individual
summations: ( )x y x yi ii
n
ii
n
ii
n
+ = += = =∑ ∑ ∑
1 1 1.
7. Equation 2.19, section 2.5: The sum of the deviations from the average is zero:
( )x xii
n
− ==∑
10.
8. Equation 2.21, section 2.6:
( )x x xii
n
− ==∑
10.
9. Equation 2.28, section 2.6:
-7-
( ) ( )x x x x xi ii
n
ii
n
− = −= =∑ ∑
1
2
1.
10. Equation 2.37, section 2.6:
( ) ( )( ) ( )x x y x x y y y y xi ii
n
i ii
n
i ii
n
− = − − = −= = =∑ ∑ ∑
1 1 1.
11. Equation 2.40, section 2.7: Products within summations can be distributed and summed
individually: ( )x y z x z y zi i ii
n
i ii
n
i ii
n
+ = += = =∑ ∑ ∑
1 1 1
.
That's it! There are lots of other equations in this chapter, but they're all here either to help derive or
to help understand those listed here.
-8-
Chapter 3: Covariance and Correlation
Section 3.0: The Basics
This chapter develops simple ways to measure the direction of the association and the reliability
of the association between two variables in a sample. Here are the essentials:
1. Equation 3.7, section 3.2: The sample covariance is
( )( )( )
COV X Yx x y y
n
i ii
n
, .=− −
−=∑
1
1
It is symmetric with regard to X and Y.
2. Exercise 3.2: The sample covariance is not invariant to scale:
( ) ( )COV aX bY abCOV X Y, , .=
3. Section 3.3: A derivation begins with an accepted definition and concludes with an
implication that is usually not obvious and often very useful.
4. Equation 3.13, section 3.4: The sample correlation coefficient is
-9-
( ) ( )( ) ( )CORR X Y
COV X YSD X SD Y
,,
.=
It is symmetric with regard to X and Y.
5. Exercise 3.4: The sample correlation is invariant to scale.
( ) ( )CORR aX bY CORR X Y, ,=
-10-
Chapter 4: Fitting a Line
Section 4.0: The Basics
This chapter develops a simple method to measure the magnitude of the association between
two variables in a sample. The generic name for this method is regression analysis. The precise
name, in the case of only two variables, is bivariate regression. It assumes that the variable X
causes the variable Y. It identifies the best-fitting line as that which minimizes the sum of
squared errors in the Y dimension. The quality of this fit is measured informally by the
proportion of the variance in Y that is explained by the variance by X. Here are the
essentials:
1. Equation 4.1, section 4.3: The regression line predicts yi as a linear function of xi:
$y a bxi i= +
2. Equation 4.2, section 4.3: The regression error is the difference between the actual value
of yi and the value predicted by the regression line:
e y yi i i= − $
3. Equation 4.28, section 4.3: The average error for the regression line is equal to zero:
e = 0
-11-
4. Equation 4.36, section 4.3: The errors are uncorrelated with the explanatory variable:
( )CORR e X, = 0
5. Equation 4.43, section 4.4: The regression intercept is the difference between the average
value of Y and the slope times the average value of X:
a y bx= −
6. Equation 4.48, section 4.5: The slope is a function of only the observed values of xi and yi
in the sample:
( )
( )
( )( )
( )b
y y x
x x x
x x y y
x x
i ii
n
i ii
n
i ii
n
ii
n=−
−=
− −
−
=
=
=
=
∑
∑
∑
∑1
1
1
2
1
.
7. Equation 4.65, section 4.6: The R2 measures the strength of the association represented by
the regression line:
( )
( )
( )R
e
y y
b x x
y y
ii
n
ii
n
ii
n
ii
n2
2
1
2
1
2 2
1
2
1
1= −−
=−
−
=
=
=
=
∑
∑
∑
∑
-12-
8. Equations 4.66 and 4.67, section 4.6: The R2 in the bivariate regression is equal to the
squared correlation between X and Y, and to the squared correlation between Y and its
predicted values:
( )( ) ( )( )R CORR X Y CORR Y Y2 2 2= =, , $ .
-13-
Chapter 5: From Sample to Population
Section 5.0: The Basics
This chapter discusses the reasons why we can generalize from what we observe in one sample to
what we might expect in others. It defines the population, distinguishes between parameters and
estimators, and discusses why we work with samples when populations are what we are interested
in: The sample is an example. It demonstrates that, with the appropriate assumptions about the
structure of the population, a and b, as calculated in chapter 4, are Best Linear Unbiased Estimators
(BLUE) and Best Consistent Estimators (BCE) of the corresponding parameters in the population
relationship. Under these population assumptions, regression is frequently known as Ordinary
Least Squares regression, or OLS. Here are the essentials:
1. Section 5.1: We work with samples, even though populations are what we are interested in,
because samples are available to us and populations are not. Our intent is to generalize from
the sample that we see to the population that we don’t.
2. Equation 5.1, section 5.2: The population relationship between xi and yi is
y xi i i= + +α β ε .
It is sometimes referred to as the structural relationship.
3. Equation 5.5, section 5.3: Our first assumption about the disturbance is that its expected
-14-
value is zero:
( )E iε = 0.
4. Equation 5.16, section 5.4: The deterministic part of yi is
( )E y xi i= +α β .
5. Equations 5.6, section 5.3 and equation 5.20, section 5.4: Our second assumption about
the disturbance is that its variance is fixed. This implies that its variance is equal to that of
the dependent variable:
( ) ( )V V yi iε σ= = 2 .
6. Equation 5.22, section 5.4: Our third assumption about the disturbances is that they are
uncorrelated. This implies that the values of the dependent variable are uncorrelated:
( ) ( )COV COV y yi j i jε ε, , .= = 0
7. Equation 5.37, section 5.6: Our slope, b, is an unbiased estimator of the population
coefficient, $:
( )E b = β.
-15-
8. Equation 5.41, section 5.6: Our intercept, a, is an unbiased estimator of the population
constant, ":
( )E a = α.
9. Section 5.7: A simulation is an analysis based on artificial data that we construct from
parameter values that we choose.
10. Equation 5.50, section 5.8: The variance of b is
V bx xi
i
n( )( )
.=−
=∑
σ 2
1
2
11. Equation 5.51, section 5.8: The variance of a is
V an
x
x xii
n( )( )
.= +−
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
=∑
σ 22
2
1
1
12. Equation 5.60, section 5.9: According to the Gauss-Markov Theorem, no linear unbiased
estimator of $ is more precise than b. In other words, the variance of any other linear
-16-
unbiased estimator of $, d, is at least as large as that of b:
( ) ( )V d V b≥ .
13. Section 5.10: Consistent estimators get better as the amount of available information
increases. With enough information, consistent estimators are perfect.
-17-
Chapter 6: Confidence intervals and hypothesis tests
Section 6.0: The Basics
This chapter reviews the topics of confidence intervals and hypothesis tests. Confidence
intervals give us ranges that contain the parameters with pre-specified degrees of certainty. They are
more useful if they are narrower. Hypothesis tests evaluate whether the data at hand are consistent
or inconsistent with pre-specified beliefs about parameter values. They are more useful if they are
unlikely to contradict these beliefs when the beliefs are really true, and if they are unlikely to be
consistent with these beliefs when the beliefs are really false. Here are the essentials:
1. Equation 6.7, section 6.2: The fundamental equation of this chapter is
( )1
2 2− = − ≤
−≤
⎛⎝⎜
⎞⎠⎟α
δδα αP t d
SDtdf df( ) ( )
2. Equation 6.8, section 6.3: Confidence intervals consist of known boundaries with a
fixed probability of containing the unknown value of the parameter of interest. The
general expression is
( ) ( )( )12 2
− = − ≤ ≤ +α δα αP d t SD d d t SD ddf df( ) ( ) .
Confidence intervals ask the data for instruction.
-18-
3. Section 6.4: Hypothesis tests ask the data for validation. The null hypothesis is the
opposite of what we expect to find. Estimates in the acceptance region validate the null
hypothesis. Estimates in the rejection region contradict it.
4. Equation 6.14, section 6.4.1: The two-sided hypothesis test is
( ) ( )( )1 0 02 2− = − < < +α δ δα αP t SD d d t SD ddf df( ) ( ) .
5. Section 6.4.1: Reject the null hypothesis when the estimate falls in the rejection region,
the test statistic is greater than or equal to the critical value or the prob-value is less
than or equal to the significance level. These decision rules are all equivalent.
6. Equation 6.29, section 6.4.2: The one-sided, upper-tailed hypothesis test is
( )( )1 0− = < +α δ αP d t SD ddf( ) .
7. Section 6.4.3: The size of the test is its significance level, the probability of a type I
error. A type I error occurs when the null hypothesis is rejected even though it is true. It is
the statistical equivalent of convicting an innocent person.
8. Equation 6.36, section 6.4.3: A type II error occurs when the null hypothesis is accepted
even though it is false. It is the equivalent of acquitting a guilty person. The power of the
test is the difference between one and the probability of a type II error.
-19-
9. Section 6.4.3: All else equal, reducing the probability of either a type I or a type II error
increases the probability of the other.
10. Equation 6.40, section 6.4.3: Statistical distance is what matters, not algebraic
distance. The standard deviation of the estimator is the metric for the statistical distance.
11. Section 6.5: Any value within the confidence interval constructed at the (1!")%
confidence level would, if chosen as the null hypotheses, not be rejected by a two-sided
hypothesis test at the "% significance level.
-20-
Chapter 7: Inference in OLS
Section7.0: The Basics
This chapter addresses the question of how accurately we can estimate the values of $ and "
from b and a. Regression produces an estimate of the standard deviation of ,i. This, in turn,
serves as the basis for estimates of the standard deviations of b and a. With these, we can
construct confidence intervals for $ and " and test hypotheses about their values. Here are the
essentials:
1. Section 7.1: We assume that the true distributions of b and a are the normal. However,
because we have to estimate their variances in order to standardize them, we have to treat
their standardized versions as having t distributions if our samples are small.
2. Section 7.2: Degrees of freedom count the number of independent observations that
remain in the sample after accounting for the sample statistics that we’ve already
calculated.
3. Equation 7.2, section 7.3: The OLS estimator for F2, the variance of ,i, is
se
n
ii
n
2
2
1
2=
−=∑
.
-21-
4. Equation 7.9, section 7.3: The (1!")% confidence interval for $ is
( )
( )( )
( )1
2 2
22
2
1
22
2
1
− = −−
< < +−
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
−
=
−
=∑ ∑
α βα αP b t s
x xb t s
x x
n
ii
nn
ii
n .
5. Section 7.3: Larger samples and greater variation in xi yield narrower confidence
intervals. So does smaller F2, but we can’t control that.
6. Equation 7.16, section 7.4: The two-tailed hypothesis test for H0:$=$0 is
( )
( )( )
( )1 0
22
2
1
02
2
2
1
2 2− = −
−
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟< < +
−
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
−
=
−
=∑ ∑
α β βα αP ts
x xb t
s
x x
n
ii
nn
ii
n
The alternative hypothesis is H1:$…$0.
7. From section 7.4: The test of the null hypothesis H0:$=0 is always interesting because, if
true, it means that xi doesn’t affect yi.
8. Equation 7.26, section 7.4: The upper-tailed hypothesis test for H0:$=$0 is
-22-
( )
( )1 0
22
2
1
− = < +−
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟
−
=∑
α β αP b t s
x x
n
ii
n .
9. From section 7.5: The best linear unbiased estimator of E(y0) is .$y a bx0 0= +
10. From section 7.5: Predictions are usually more reliable if they are based on larger
samples, and made for values of the explanatory variable that are similar to those that
appear in the sample.
-23-
Chapter 8: What if the disturbances have non-zero
expectations?
Section 8.0: The basics
We can’t tell if the ,is have a constant expected value that is different from zero, and it doesn’t
make any substantive difference. However, if we’re aren’t even willing to assert that the ,is share
the same expected value, then we shouldn’t be running a regression in the first place. Here are the
essentials:
1. Section 8.2: If E(,i) equals some constant other than zero, b is still an unbiased estimator of
$ and a is still an unbiased estimator of the fixed component of the deterministic part of yi.
2. Section 8.2: An identification problem arises when we don’t have a suitable estimator for
a parameter whose value we would like to estimate.
3. Section 8.2: A normalization is a value that we assign to a parameter when we don’t have
any way of estimating it and when assigning a value doesn’t have any substantive
implications.
4. Section 8.2: If E(,i) equals some constant other than zero, it wouldn’t really matter and we
couldn’t identify this constant. Therefore, we always normalize it to zero.
-24-
5. Section 8.3: If E(,i) is different for each observation, the observations don’t come from the
same population. In this case, b and a don’t estimate anything useful.
-25-
Chapter 9: What if the disturbances have different
variances?
Section 9.0: The basics
This chapter addresses the possibility that the disturbances have different variances. In this case,
OLS estimates are still unbiased. However, they’re no longer Best-Linear-Unbiased. In addition, the
true variances of b and a are different from those given by the OLS variance formulas. In order to
conduct inference, we can either estimate their true variances, or we can often get BLU estimators
by transforming the data so that the transformed disturbances share the same variance. Here are
the essentials:
1. Section 9.2: When the disturbances don’t all have the same variance, it’s called
heteroskedasticity.
2. Section 9.2: The OLS estimators b and a remain unbiased for $ and " regardless of what
we assume for V(,i).
3. Section 9.3: The OLS estimators b and a are not Best-Linear-Unbiased and their true
variances are probably not estimated accurately by the OLS variance formulas.
4. Section 9.4: An auxiliary regression does not attempt to estimate a population
-26-
relationship in an observed sample. It provides supplemental information that helps us
interpret regressions that do.
5. Section 9.4: The White test identifies whether heteroskedasticity is bad enough to distort
OLS variance calculations.
6. Equation 9.6, section 9.5: The White heteroskedasticity-consistent variance estimator for
b is
( )( )
( )V b
x x e
x xW
ii
n
i
i
=−
−⎛
⎝⎜⎜
⎞
⎠⎟⎟
=∑
∑
2
1
2
2 2 .
It and the corresponding variance estimator for a are consistent even if heteroskedasticity is
present.
7. Equation 9.10, section 9.6: Weighted Least Squares (WLS) provides Best-Linear-
Unbiased estimators for $ and " if the different disturbance variances are known or can be
estimated:
ys
as
b xs
ei
iWLS
i
i
iiWLS
= + +1 .
-27-
8. Section 9.8: Maximum Likelihood provides Best-Consistent estimators for $ and " if the
different disturbance distributions are known.
9. Section 9.9: Heteroskedasticity can take many forms. Regardless, the White
heteroskedasticity-consistent variance estimator provides trustworthy estimates of the
standard deviations of OLS estimates. In contrast, WLS or ML estimates require procedures
designed specifically for each heteroskedastic form.
-28-
Chapter 10: What if the disturbances are correlated?
Section 10.0: The basics
This chapter deals with the possibility that the disturbances are correlated with each other. In this
case, OLS estimates are still unbiased. However, they’re no longer Best-Linear-Unbiased. In
addition, the true variances of b and a are probably different from those given by the OLS variance
formulas. In order to conduct inference, we can estimate their true variances. We can also attempt to
get BLU estimators by transforming the data so that the transformed disturbances have the
properties of chapter 5, or to get BC estimators by applying Maximum Likelihood. Here are the
essentials:
1. Section 10.2: When the disturbances are correlated, it’s called autocorrelation.
2. Section 10.2: Spatial autocorrelation describes the situation where a correlation exists
between the disturbances from observations that are near each other in spatial or social
terms.
3. Section 10.2: Serial correlation describes the situation where a correlation exists between
the disturbances from observations that are near each other in time. Shocks are the random
components of activity that are unique to the time unit in which they occur.
-29-
4. Section 10.2: Attenuation describes the situation in which the correlations between
observations get smaller as the observations get more distant from each other, either in
geographic, social or temporal terms.
5. Section 10.3: The OLS estimators b and a remain unbiased for $ and " regardless of
what we assume for COV(,i,,j).
6. Section 10.3: The OLS estimators b and a are not Best-Linear-Unbiased and their true
variances are probably not estimated accurately by the OLS variance formulas.
7. Section 10.4: The Newey-West variance estimators for V(b) and V(a) are approximately
accurate even if autocorrelation is present.
8. Section 10.5: In time-series data, first-order autocorrelation occurs when the disturbances
of consecutive observations are correlated. More generally, autocorrelation of order k occurs
when the disturbances of observations that are separated by k!1 units of time are correlated.
9. Section 10.5: In time-series data, the variance of the autocorrelated disturbances is larger,
and often a lot larger, than the variance of the underlying, uncorrelated shocks.
10. Section 10.6: Generalized Least Squares (GLS) provides Best-Linear-Unbiased
estimators for $ and " by transforming the data so as to remove the autocorrelation. This
-30-
requires either knowledge or estimates of the relevant autocorrelation.
11. Section 10.7: Greek letters always represent parameters. Greek letters with carets (^) over
them represent estimators of the parameter represented by the Greek letter, itself.
12. Section 10.7: The Durbin-Watson test is a common test for first-order autocorrelation in
time-series data. It can be approximated as
( )DW ≈ −2 1 $ ,ρ
where is the estimated correlation between ei and ei-1.$ρ
13. Section 10.8: The GLS procedure of section 5.6 can usually be implemented in two steps.
The first step estimates the required correlations and the second step estimates OLS on the
transformed data.
14. Section 10.10: If we have some form of autocorrelation other than first-order, the specific
procedures of sections 5.6 through 5.9 won’t work. However, procedures that follow the
same principles, tailored to the autocorrelation form that we’re actually dealing with, will.
15. Section 10.11: The trick is not in inventing a new procedure to fit each variation in the
assumptions. It is in reconceiving each population relationship so that it fits the one
procedure that we’ve already established.
-31-
Chapter 11: What if the disturbances and the
explanatory variables are related?
Section 11.0: The basics
This chapter deals with the possibility that the disturbances and the explanatory variables are
related. This problem is called endogeneity or simultaneity. If we have it, our OLS estimators are
biased and inconsistent. Worse, they are unsalvageable. As in the last two chapters, the solution can
be thought of as a two-step procedure consisting of two applications of OLS. For this reason, it is
often called two-stage-least-squares. It is also called, more generally, instrumental variables.
This technique provides estimators that are not unbiased, only consistent. Therefore, the success of
the solution in this chapter is much more sensitive to the size of the sample than it was in the
previous two chapters. Here are the essentials:
1. Section 11.2: The three most common sources of endogeneity are reverse causality,
measurement error and dynamic choice.
2. Equation 11.12, section 11.2: Endogeneity means that the explanatory variable is a
random variable and it’s correlated with the disturbances,
( )COV xi i, .ε ≠ 0
3. Equation 11.13, section 11.3: The consequence of endogeneity is that b is neither
-32-
unbiased nor consistent for $:
( )( )
( )E b E
x x
x x x
i ii
n
i ii
n= +−
−
⎛
⎝
⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟≠=
=
∑
∑β
εβ1
1
.
4. Equation 11.18, section 11.3: With measurement error, b is approximately equal to
( )( ) ( )β
νβ
V x
V x Vi
i i
*
*.
+<
OLS tends to understate the true magnitude of $.
5. Equations 11.19 and 11.20, section 11.4: An instrument, or an instrumental variable
zi has, roughly speaking, the following properties, at least as the sample size approaches
infinity:
( )COV x zi i, ≠ 0
and
( )COV zi iε , .= 0
6. Equation 11.21, section 11.4: The first step of our two-stage procedure is the OLS
instrumenting equation
-33-
x c dz fi i i= + + .
It provides an estimator of xi, , that is purged of the correlation between xi and ,i.$xi
7. Equation 11.23, section 11.4: We obtain the two-stage-least-squares (2SLS) estimators
of " and $ through OLS estimation of our second-step equation,
y a b x ei SLS SLS i i= + +2 2 $ .
8. Equation 11.24, section 11.4 and equations 11.30 and 11.32, section 11.5: The
instrumental variables (IV) estimator of $ is the same as the 2SLS estimator,
( )
( )
( )
( )b b
x x y
x x x
z z y
z z xIV SLS
i ii
n
i ii
n
i ii
n
i ii
n= =−
−=
−
−
=
=
=
=
∑
∑
∑
∑2
1
1
1
1
$ $
$ $ $
.
9. Section 11.6: The IV estimator bIV is a consistent estimator of $.
10. Equation 11.41, section 11.6: The estimated variance of the IV slope estimator is
( )( )
( )V b
s z z
z z xIV
IV ii
n
i ii
n=
−
−⎛⎝⎜
⎞⎠⎟
=
=
∑
∑
2 2
1
1
2 .
-34-
11. Section 11.7: It’s often difficult to find an appropriate instrument. The more likely it is that
zi satisfies one of the assumptions in equations 5.19 and 5.20, the less likely it is that it
satisfies the other. The best instruments are moderately correlated with xi, in order to satisfy
equation 5.19 without invalidating equation 5.20.
12. Equation 11.46, section 11.8: The Hausman test for endogeneity consists of the auxiliary
regression
y a b x b x ei i i i= + + +1 2 $ .
Endogeneity is present if the coefficient on , b2, is statistically significant.$xi
13. Section 11.9: It’s always necessary to have a behavioral argument as to why an instrument
might be appropriate. It’s usually possible to offer a counter-argument as to why it might
not be. Great instruments are hard to come by. Even acceptable instruments often require
considerable ingenuity to construct and justify.
-35-
Chapter 12: What if there is more than one x?
Section 12.0: The basics
If the population relationship includes two explanatory variables, but our sample regression
contains only one, our estimate of the effect of the included variable is almost surely biased. The
best remedy is to include the omitted variable in the sample regression. Minimizing the sum of
squared errors from a regression with two explanatory variables yields two slopes, each of which
represents the relationship between the parts of the dependent variable and the associated
explanatory variable that are not related to the other explanatory variable. These slopes are unbiased
estimators of the population coefficients. Here are the essentials:
1. Equation 12.1, Section 12.2: If there are two explanatory variables that affect yi, the
population relationship is
y x xi i i i= + + +α β β ε1 1 2 2 .
2. Equations 12.6,12.10 and 12.11, Section 12.2: If the population relationship is equation
5.1, but we still run the regression of equation 4.4, the slope of that regression is a biased
estimator of the true effect of x1i on yi, $1:
( )( )
( )E b
x x x
x x xb
i ii
n
i ii
n x x= +−
−= + ≠=
=
∑
∑β β β β β1 2
1 1 211
1 1 11
1 2 12 1.
-36-
This bias is usually referred to as specification error, omitted-variable bias or LOVE.
The regression of equation 4.4 mistakenly attributes some of $2, the effect of the omitted
variable x2i, to the included variable, x1i. The extent of this mistaken attribution is
determined by the extent to which x2i looks like x1i.
3. Equations 12.12,12.19,5.24 and 12.27, section 12.3: If we minimize the sum of squared
errors for the multivariate sample relationship,
y a b x b x ei i i i= + + +1 1 2 2 .
we get errors that have an average value of zero and are unrelated, at least linearly, to either
of the two explanatory variables.
4. Equation 12.65, section 12.4: Regression estimates the effect of x1i on yi as equivalent to
( ) ( ) ( ) ( )
( ) ( )
be e e e
e e
x x i x x yx i yxi
n
x x i x xi
n11
2
1
1 2 1 2 2 2
1 2 1 2
=−⎛
⎝⎜⎞⎠⎟ −⎛⎝⎜
⎞⎠⎟
−⎛⎝⎜
⎞⎠⎟
=
=
∑
∑,
the effect of the part of x1i that is not related to x2i on the part of yi that is not related to x2i.
This is why we can interpret a regression slope as measuring the effect of an explanatory
variable ceteris paribus, holding constant all other explanatory variables. Analogously,
regression estimates the effect of x2i on yi as equivalent to the effect of the part of x2i that is
not related to x1i on the part of yi that is not related to x1i.
-37-
5. Equation 12.85, section 12.5 and exercises 12.16 and 12.17: The slope and intercept
estimators from the regression of equation 5.12 are unbiased estimators of the coefficients
and constant in the population relationship of equation 5.1: E(b1)=$1, E(b2)=$2 and E(a)="
-38-
Chapter 13: Understanding and interpreting regression with
two x’s
Section 13.0: The basics
The slopes which we obtain when we minimize the sum of squared errors for the regression of
equation 12.12 are BLU estimates of the population coefficients. If the population relationship
includes two explanatory variables, the precision of these slopes depends heavily on the extent to
which the two explanatory variables are related. Including an irrelevant variable is inefficient, but
does not create bias. Everything that we have done in chapters 8 through 11 holds with two
explanatory variables, either exactly or with minor, sensible extensions. Here are the essentials:
1. Section 13.3: If x1i and x2i are highly correlated, it’s often called multicollinearity. If we
omit either, we bias the estimated effect of the other. If we include both, their estimated
effects are unbiased but may have large variances, especially if n is small. In general, the
only responsible way to achieve greater precision is to increase n. Multicollinearity cannot
be responsible for slopes with implausible signs or magnitudes, and cannot create spurious
significance.
2. Equation 13.15, section 13.4: The sample estimate of F2 is
( )( )s
y a b x b x
n
e
n
i i ii
n
ii
n
21 1 2 2
2
1
2
1
3 3=
− + +
−=
−= =∑ ∑
.
-39-
3. Equation 13.16, section 13.4: The sample standard deviation of b1 is
( )
( )( )( )
( )
( ) ( )
SD bs
x xx x x x
x x
s
e e
ii
n i ii
n
ii
n
x x i x xi
n1
2
1 1
2
1
2 2 1 11
2
2 2
2
1
2
2
1 1 2 1 2
= +
− −− −
⎛⎝⎜
⎞⎠⎟
−
= +−⎛
⎝⎜⎞⎠⎟
=
=
=
=
∑∑
∑
∑.
4. Section 13.5: Joint hypotheses specify values for two or more parameters simultaneously.
5. Equation 13.23, section 13.5: A restricted regression adopts a null hypothesis regarding
the value or values of one or more parameters. This null hypothesis may also be referred to
as an assumption or, most commonly, a restriction. The sum of squared errors from a
restricted regression is always at least as large as the sum of squared errors from an
unrestricted regression:
e eii
n
Ri
i
n
U
2
1
2
1= =∑ ∑⎛
⎝⎜
⎞⎠⎟ ≥
⎛⎝⎜
⎞⎠⎟ .
In particular, the sum of squared errors, and therefore the R2, can never go down when
another explanatory variable is added to the regression. However, the adjusted R2 can go
down.
6. Equation 13.29, section 13.6: The test of restrictions is
-40-
e e
j
e
n
F j n
ii
n
Ri
i
n
U
ii
n
U
2
1
2
1
2
1
3
3
= =
=
∑ ∑
∑
⎛⎝⎜
⎞⎠⎟ −
⎛⎝⎜
⎞⎠⎟
⎛
⎝
⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟
⎛⎝⎜
⎞⎠⎟
−
⎛
⎝
⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟
−~ ( , ).
7. Section 13.7: If we include an irrelevant variable in our regression, the slopes are still
unbiased estimators of the true coefficient values. In particular, the slope for the irrelevant
variable should be pretty close to zero, at least statistically. However, the inclusion of an
irrelevant variable will usually reduce the precision of the estimated effects of relevant
variables.
8. Section 13.8: With two explanatory variables, OLS slopes are still unbiased estimators if
the disturbances are heteroskedastic or autocorrelated. The White test, the White
heteroskedasticity-consistent variance estimator and the Newey-West autocorrelation-
consistent variance estimator are all still valid, but need to be reformulated to incorporate
-41-
the second explanatory variable. WLS or GLS are still required to obtain BLU estimators.
9. Equation 13.47, section 13.8: When one explanatory variable is endogenous, the other
explanatory variable must be included in the instrumenting equation, along with the
instrumental variable, itself.
10. Equations 13.50 and 13.51, section 13.8: If both explanatory variables are endogenous,
we need at least two instrumental variables.
-42-
Chapter 14: Express yourself
Section 14.0: The Basics
For our purposes, regression has to be a linear function of the constant and coefficients so that their
estimators can be linear functions of the dependent variable. However, explanatory variables can
appear in discrete and non-linear form. These forms give us the opportunity to represent a wide and
varied range of possible relationships between the explanatory and dependent variables.
1. Section 14.2: Dummy variables identify the absence or presence of an indivisible
characteristic. The intercept for observations that don’t have this characteristic is a. The
effective intercept for observations that do have this characteristic is a+b2, where b2 is the
slope associated with the dummy variable. The slope b2 estimates the fixed difference in yi
between those that do and do not have the characteristic at issue.
2. Section 14.2: We fall into the dummy variable trap when we enter one dummy variable for
a particular characteristic, and another dummy variable for the opposite or absence of that
characteristic. These dummy variables are perfectly correlated, so slopes and their variances
are undefined. We only need one dummy variable, because its slope measures the difference
between the values of yi for observations that have the characteristic and those that don’t or
have its opposite.
3. Equations 14.13 and 14.18, Section 14.3: The quadratic specification is
-43-
y x xi i i i= + + +α β β ε1 22 .
$1xi is the linear term. $2xi2 is the quadratic term. If $1>0 and $2<0, small changes in xi
increase E(yi) when and reduce it when .xi < − ββ
1
22 xi > − ββ
1
22
4. Equations 14.35 and 14.36, Section 14.4: The semi-log specification is
ln .y xi i i= + +α β ε
The coefficient is the expected relative change in the dependent variable in response to an
absolute change in the explanatory variable.
β =
⎡
⎣⎢
⎤
⎦⎥E
yyx
i
Δ
Δ.
5. Equations 14.38 and 14.39, Section 14.4: The log-log specification is
ln ln .y xi i i= + +α β ε
The coefficient is the elasticity of the expected change in the dependent variable with
respect to the change in the explanatory variable:
β =
⎡⎣⎢
⎤⎦⎥
⎡⎣⎢
⎤⎦⎥
E yy
xx
i
i
Δ
Δ.
-44-
6. Equations 14.48 and 14.52, Section 14.5: Interactions allow the effect of one variable to
depend on the value of another. The population relationship with an interaction is
y x x x xi i i i i i= + + + +α β β β ε1 1 2 2 3 1 2 .
The change in the expected value of yi with a change in x1i
ΔΔ
yx
x i1
1 3 2= +β β .
-45-
Chapter 15: More than two explanatory variables
Section 15.0: The Basics
The addition of a second explanatory variable in Chapter 12 adds only four new things to what there
is to know about regression. First, regression uses only the parts of each variable that are unrelated
to all of the other variables. Second, omitting a variable from the sample relationship that appears in
the population relationship almost surely biases our estimates. Third, including an irrelevant
variable does not bias estimates but reduces their precision. Fourth, the number of interesting joint
tests increases with the number of slopes. All four remain valid when we add additional explanatory
variables.
1. Equations 15.11,15.1 and 15.2, Section 15.2: The general form of the multivariate
population relationship is
y xi l lil
k
i= + +=∑α β ε
1.
The corresponding sample relationship is
y a b x ei l lil
k
i= + +=∑
1.
The predicted value of yi is
$ .y a b xi l lil
k
= +=∑
1
-46-
2. Equations 15.3 and 15.4, Section 15.2: When we minimize the sum of squared errors in
the multivariate regression, the errors sum to zero,
01
==∑ eii
n
,
and are uncorrelated in the sample with all explanatory variables.
( )0 11
= = ==∑ e x COV e x for all p ki pii
n
i pi, , .K
3. Equations 15.5 and 15.8, Section 15.2: The intercept in the multivariate regression is
a y b xl ll
k
= −=∑
1.
The slopes are
( ) ( ) ( ) ( )
( ) ( )
be e e e
e ep
x x x x x i x x x x x y x x x x i y x x x xi
n
x x x x x i x x x x xi
n
p p p k p p p k p p k p p k
p p p k p p p k
=−⎛
⎝⎜⎞⎠⎟ −⎛⎝⎜
⎞⎠⎟
−⎛⎝⎜
⎞⎠⎟
− + − + − + − +
− + − +
=
=
∑
∑
, , , ,
, ,
.1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1
12
1
K K K K K K K K
K K K K
4. Equations 15.9 and 15.17, Section 15.2: R2 is
( )( )R
e
y yCOV y y
ii
n
ii
n i i2
2
1
2
1
1= −−
==
=
∑
∑$ , .
-47-
Adjusted R2 is
( )adjusted Rs
V yi
22
1= − .
5. Section 15.2: Regression is not limited to two explanatory variables. However, the number
of observations must exceed the number of estimators and each explanatory variable must
have some part that is not related to all of the other explanatory variables in order to
calculate meaningful regression estimators.
6. Equations 15.12 and 15.14, Section 15.2: If the regression is specified correctly,
estimators are unbiased:
( ) ( )E a and E b for all p kp p= = =α β 1, , .K
If the regression omits explanatory variables, estimators are biased:
( ) ( ) ( )
( )
E be e
ep p m
x x x x x i x x x x x ii
n
x x x x x ii
nm k q
kp p p k q m p p k q
p p p k q
= +
⎡
⎣
⎢⎢⎢⎢
⎤
⎦
⎥⎥⎥⎥
− + − − + −
− + −
=
=
= −
∑
∑∑β β
, ,
,
.1 1 1 1 1 1
1 1 1
1
2
1
K K K K
K K
7. Equations 15.16,15.19 and 15.20, Section 5.3: The estimator of F2 is
-48-
se
n k
ii
n
2
2
1
1=
− −=∑
.
With this estimator, the standardized value of bp is a t random variable with n!k!1
degrees of freedom:
( )
( )b
s
e
tp p
x x x x x ii
n
n k
p p p k
−
− +=
− −
∑
β2
2
1
1
1 1 1,
~ .
K K
If n is sufficiently large, this can be approximated as a standard normal random variable.
If the sample regression is correctly specified, bp is the BLUE. If the disturbances are
normally distributed, it is also the BCE.
8. Equations 15.21 and 15.22, Section 15.3: The general form of the test between an
unrestricted alternative hypothesis and null hypothesis subject to j restrictions is
-49-
e e
j
e
n k
F j n k
ii
n
Ri
i
n
U
ii
n
U
2
1
2
1
2
1
1
1
= =
=
∑ ∑
∑
⎛⎝⎜
⎞⎠⎟ −
⎛⎝⎜
⎞⎠⎟
⎛
⎝
⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟
⎛⎝⎜
⎞⎠⎟
− −
⎛
⎝
⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟
− −~ ( , ).
For the null hypothesis that all coefficients are equal to zero, this reduces to
( )RR
n kk
F k n kU
U
2
211
1−
− −− −~ , .
9. Equation 15.26, Section 15.3: The Chow test for differences in regimes is
-50-
e e e
k
e e
n k
F k n k
ii
n
Ri
i
n
xi
i
n
x
ii
n
xi
i
n
x
i i
i i
2
1
2
1 0
2
1 1
2
1 0
2
1 1
1 1
1 1
1
2
1 2
= = = = =
= = = =
∑ ∑ ∑
∑ ∑
⎛⎝⎜
⎞⎠⎟ −
⎛⎝⎜
⎞⎠⎟ +
⎛⎝⎜
⎞⎠⎟
⎛
⎝⎜⎜
⎞
⎠⎟⎟
−
⎛
⎝
⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟
⎛⎝⎜
⎞⎠⎟ +
⎛⎝⎜
⎞⎠⎟
−
⎛
⎝
⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟
− −~ ( , ).
10. Equations 15.54,15.57,15.68 and 15.76, Section 15.6: In matrix notation, the vector
consisting of the intercept and slopes for a regression with k explanatory variables is
( )b = X X X y-1′ ′ .
It is an unbiased estimator of the vector of parameters,
( )E b = β.
Its variance is
( ) ( )V b X X2 -1= ′σ .
This is the smallest possible variance for an unbiased linear estimator vector: b is BLUE.
If the disturbances are normally distributed, it is also the BCE.