Introductory Econometrics: Intuition, Proof and Practice · Introductory Econometrics: Intuition,...

Introductory Econometrics:

Intuition, Proof and Practice

The Basics

Jeffrey S. Zax

8/24/2008

Chapter 1: What is a Regression?

Section 1.0: The Basics

This chapter explains what a regression is and how to interpret it. Here are the essentials:

1. Section 1.4: The dependent or endogenous variable measures the behavior that we want

to explain with regression analysis.

2. Section 1.5: The explanatory, independent or exogenous variables measure things that

we think might determine the behavior that we want to explain. We usually think of them as

pre-determined.

3. Section 1.5: The slope estimates the effect of a change in the explanatory variable on the

value of the dependent variable.

4. Section 1.5: The t-statistic indicates whether the associated slope is reliable. The slope is

reliable if the prob-value associated with the t-statistic is .05 or less. In this case, we say

that the associated slope is statistically significant. This generally corresponds to an

absolute value of approximately two or greater for the t-statistic, itself. If the t-statistic has a

prob-value that is greater than .05, the associated slope coefficient is insignificant. This

means that it isn’t reliable.

5. Section 1.6: The intercept is usually uninteresting. It represents what everyone has in

common, rather than characteristics that might cause individuals to be different.

6. Section 1.6: We usually interpret only the slopes that are statistically significant. They

indicate the effect of their associated explanatory variables on the dependent variable

ceteris paribus, or holding all other characteristics constant that are included in the

regression.

7. Section 1.6: Continuous variables take on a wide range of values. Their slopes indicate

the change that would occur in the dependent variable if the value of the associated

explanatory variable increased by one unit.

8. Section 1.6: Discrete variables, sometimes called categorical variables, indicate the

presence or absence of a particular characteristic. Their slopes indicate the change that

would occur in the dependent variable if an individual who did not have that characteristic

were given it.

9. Section 1.7: Regression interpretation requires three steps. The first is to identify the

reliable slopes. The second is to understand their magnitudes. The third is to use this

understanding to verify or modify the behavioral intuition that motivated the regression in

the first place.

10. Section 1.7: Statistical significance is necessary in order to have interesting results, but not

sufficient. Important effects are those that are both statistically significant and substantively

large. Slopes that are statistically significant but substantively small indicate that the effects

of the associated explanatory variable can be reliably interpreted as unimportant.

11. Section 1.7: A proxy is a variable that is related to, but not exactly the variable we really

want. We use proxies when the variables we really want aren’t available. Sometimes this

makes interpretation difficult.

12, Section 1.8: If the prob-value associated with the F-statistic is .05 or less, the collective

effect of the ensemble of explanatory variables on the dependent variable is statistically

significant.

13. Section 1.8: Observations are the individual examples of the behavior under examination,

upon which the regression is based. All of the observations together constitute the sample.

14. Section 1.8: The R2, or coefficient of determination, represents the proportion of the

variation in the dependent variable that is explained by the explanatory variables. The

adjusted R2 modifies the R2 in order to take account of the numbers of explanatory variables

and observations. However, neither measures directly the reliability of the regression

results.

15. Section 1.9: F-statistics can be used to evaluate the contribution of a subset of explanatory

variables, as well as the collective statistical significance of all explanatory variables. In

both cases, the F-statistic is a transformation of R2 values.

16. Section 1.10: Regression results are useful only to the extent that the choice of variables in

the regression, variable construction and sample design are appropriate.

17. Section 1.11: Regression results may be presented in one of several different formats.

However, they all have to contain the same substantive information.

Chapter 2: The essential tool

This chapter reassures us that we can handle the material in this course. It reviews the

fundamental essential results regarding the summation, the principal algebraic tool. Here are the

essential things:

1. From section 2.1: This is not a math course. This is almost all just addition, subtraction,

multiplication and division. We can do this.

2. Equation 2.5, section 2.3: The summation of a constant is n times that constant:

=∑ =

3. Equation 2.7, section 2.3: Constants factor out of summations: ax a xii

= =∑ ∑=

Variables do not: .ax x aii

= =∑ ∑≠

4. Equation 2.8, section 2.4: The average of the xi’s is .xx

= =∑

5. Equation 2.9 and 2.10 , section 2.4: Weighted averages are ,xa x

= =∑

where .a nii

=∑ =

6. Equation 2.14, section 2.5: The summation of a sum is the sum of the individual

summations: ( )x y x yi ii

+ = += = =∑ ∑ ∑

1 1 1.

7. Equation 2.19, section 2.5: The sum of the deviations from the average is zero:

( )x xii

− ==∑

8. Equation 2.21, section 2.6:

( )x x xii

− ==∑

( ) ( )x x x x xi ii

− = −= =∑ ∑

( ) ( )( ) ( )x x y x x y y y y xi ii

− = − − = −= = =∑ ∑ ∑

1 1 1.

11. Equation 2.40, section 2.7: Products within summations can be distributed and summed

individually: ( )x y z x z y zi i ii

+ = += = =∑ ∑ ∑

That's it! There are lots of other equations in this chapter, but they're all here either to help derive or

to help understand those listed here.

Chapter 3: Covariance and Correlation

This chapter develops simple ways to measure the direction of the association and the reliability

of the association between two variables in a sample. Here are the essentials:

1. Equation 3.7, section 3.2: The sample covariance is

( )( )( )

COV X Yx x y y

, .=− −

−=∑

It is symmetric with regard to X and Y.

2. Exercise 3.2: The sample covariance is not invariant to scale:

( ) ( )COV aX bY abCOV X Y, , .=

3. Section 3.3: A derivation begins with an accepted definition and concludes with an

implication that is usually not obvious and often very useful.

4. Equation 3.13, section 3.4: The sample correlation coefficient is

( ) ( )( ) ( )CORR X Y

COV X YSD X SD Y

It is symmetric with regard to X and Y.

5. Exercise 3.4: The sample correlation is invariant to scale.

( ) ( )CORR aX bY CORR X Y, ,=

Chapter 4: Fitting a Line

This chapter develops a simple method to measure the magnitude of the association between

two variables in a sample. The generic name for this method is regression analysis. The precise

name, in the case of only two variables, is bivariate regression. It assumes that the variable X

causes the variable Y. It identifies the best-fitting line as that which minimizes the sum of

squared errors in the Y dimension. The quality of this fit is measured informally by the

proportion of the variance in Y that is explained by the variance by X. Here are the

essentials:

1. Equation 4.1, section 4.3: The regression line predicts yi as a linear function of xi:

$y a bxi i= +

2. Equation 4.2, section 4.3: The regression error is the difference between the actual value

of yi and the value predicted by the regression line:

e y yi i i= − $

3. Equation 4.28, section 4.3: The average error for the regression line is equal to zero:

4. Equation 4.36, section 4.3: The errors are uncorrelated with the explanatory variable:

( )CORR e X, = 0

5. Equation 4.43, section 4.4: The regression intercept is the difference between the average

value of Y and the slope times the average value of X:

a y bx= −

6. Equation 4.48, section 4.5: The slope is a function of only the observed values of xi and yi

in the sample:

( )( )

x x y y

− −

7. Equation 4.65, section 4.6: The R2 measures the strength of the association represented by

the regression line:

1= −−

8. Equations 4.66 and 4.67, section 4.6: The R2 in the bivariate regression is equal to the

squared correlation between X and Y, and to the squared correlation between Y and its

predicted values:

( )( ) ( )( )R CORR X Y CORR Y Y2 2 2= =, , $ .

Chapter 5: From Sample to Population

This chapter discusses the reasons why we can generalize from what we observe in one sample to

what we might expect in others. It defines the population, distinguishes between parameters and

estimators, and discusses why we work with samples when populations are what we are interested

in: The sample is an example. It demonstrates that, with the appropriate assumptions about the

structure of the population, a and b, as calculated in chapter 4, are Best Linear Unbiased Estimators

(BLUE) and Best Consistent Estimators (BCE) of the corresponding parameters in the population

relationship. Under these population assumptions, regression is frequently known as Ordinary

Least Squares regression, or OLS. Here are the essentials:

1. Section 5.1: We work with samples, even though populations are what we are interested in,

because samples are available to us and populations are not. Our intent is to generalize from

the sample that we see to the population that we don’t.

2. Equation 5.1, section 5.2: The population relationship between xi and yi is

y xi i i= + +α β ε .

It is sometimes referred to as the structural relationship.

3. Equation 5.5, section 5.3: Our first assumption about the disturbance is that its expected

value is zero:

( )E iε = 0.

4. Equation 5.16, section 5.4: The deterministic part of yi is

( )E y xi i= +α β .

5. Equations 5.6, section 5.3 and equation 5.20, section 5.4: Our second assumption about

the disturbance is that its variance is fixed. This implies that its variance is equal to that of

the dependent variable:

( ) ( )V V yi iε σ= = 2 .

6. Equation 5.22, section 5.4: Our third assumption about the disturbances is that they are

uncorrelated. This implies that the values of the dependent variable are uncorrelated:

( ) ( )COV COV y yi j i jε ε, , .= = 0

7. Equation 5.37, section 5.6: Our slope, b, is an unbiased estimator of the population

coefficient, $:

( )E b = β.

8. Equation 5.41, section 5.6: Our intercept, a, is an unbiased estimator of the population

constant, ":

( )E a = α.

9. Section 5.7: A simulation is an analysis based on artificial data that we construct from

parameter values that we choose.

10. Equation 5.50, section 5.8: The variance of b is

V bx xi

n( )( )

11. Equation 5.51, section 5.8: The variance of a is

n( )( )

.= +−

⎜⎜⎜⎜

⎟⎟⎟⎟

12. Equation 5.60, section 5.9: According to the Gauss-Markov Theorem, no linear unbiased

estimator of $ is more precise than b. In other words, the variance of any other linear

unbiased estimator of $, d, is at least as large as that of b:

( ) ( )V d V b≥ .

13. Section 5.10: Consistent estimators get better as the amount of available information

increases. With enough information, consistent estimators are perfect.

Chapter 6: Confidence intervals and hypothesis tests

This chapter reviews the topics of confidence intervals and hypothesis tests. Confidence

intervals give us ranges that contain the parameters with pre-specified degrees of certainty. They are

more useful if they are narrower. Hypothesis tests evaluate whether the data at hand are consistent

or inconsistent with pre-specified beliefs about parameter values. They are more useful if they are

unlikely to contradict these beliefs when the beliefs are really true, and if they are unlikely to be

consistent with these beliefs when the beliefs are really false. Here are the essentials:

1. Equation 6.7, section 6.2: The fundamental equation of this chapter is

2 2− = − ≤

−≤

⎛⎝⎜

⎞⎠⎟α

δδα αP t d

SDtdf df( ) ( )

2. Equation 6.8, section 6.3: Confidence intervals consist of known boundaries with a

fixed probability of containing the unknown value of the parameter of interest. The

general expression is

( ) ( )( )12 2

− = − ≤ ≤ +α δα αP d t SD d d t SD ddf df( ) ( ) .

Confidence intervals ask the data for instruction.

3. Section 6.4: Hypothesis tests ask the data for validation. The null hypothesis is the

opposite of what we expect to find. Estimates in the acceptance region validate the null

hypothesis. Estimates in the rejection region contradict it.

4. Equation 6.14, section 6.4.1: The two-sided hypothesis test is

( ) ( )( )1 0 02 2− = − < < +α δ δα αP t SD d d t SD ddf df( ) ( ) .

5. Section 6.4.1: Reject the null hypothesis when the estimate falls in the rejection region,

the test statistic is greater than or equal to the critical value or the prob-value is less

than or equal to the significance level. These decision rules are all equivalent.

6. Equation 6.29, section 6.4.2: The one-sided, upper-tailed hypothesis test is

( )( )1 0− = < +α δ αP d t SD ddf( ) .

7. Section 6.4.3: The size of the test is its significance level, the probability of a type I

error. A type I error occurs when the null hypothesis is rejected even though it is true. It is

the statistical equivalent of convicting an innocent person.

8. Equation 6.36, section 6.4.3: A type II error occurs when the null hypothesis is accepted

even though it is false. It is the equivalent of acquitting a guilty person. The power of the

test is the difference between one and the probability of a type II error.

9. Section 6.4.3: All else equal, reducing the probability of either a type I or a type II error

increases the probability of the other.

10. Equation 6.40, section 6.4.3: Statistical distance is what matters, not algebraic

distance. The standard deviation of the estimator is the metric for the statistical distance.

11. Section 6.5: Any value within the confidence interval constructed at the (1!")%

confidence level would, if chosen as the null hypotheses, not be rejected by a two-sided

hypothesis test at the "% significance level.

Chapter 7: Inference in OLS

Section7.0: The Basics

This chapter addresses the question of how accurately we can estimate the values of $ and "

from b and a. Regression produces an estimate of the standard deviation of ,i. This, in turn,

serves as the basis for estimates of the standard deviations of b and a. With these, we can

construct confidence intervals for $ and " and test hypotheses about their values. Here are the

essentials:

1. Section 7.1: We assume that the true distributions of b and a are the normal. However,

because we have to estimate their variances in order to standardize them, we have to treat

their standardized versions as having t distributions if our samples are small.

2. Section 7.2: Degrees of freedom count the number of independent observations that

remain in the sample after accounting for the sample statistics that we’ve already

calculated.

3. Equation 7.2, section 7.3: The OLS estimator for F2, the variance of ,i, is

−=∑

4. Equation 7.9, section 7.3: The (1!")% confidence interval for $ is

( )( )

− = −−

< < +−

⎜⎜⎜⎜

⎟⎟⎟⎟

=∑ ∑

α βα αP b t s

x xb t s

5. Section 7.3: Larger samples and greater variation in xi yield narrower confidence

intervals. So does smaller F2, but we can’t control that.

6. Equation 7.16, section 7.4: The two-tailed hypothesis test for H0:$=$0 is

( )( )

( )1 0

2 2− = −

⎜⎜⎜⎜

⎟⎟⎟⎟< < +

⎜⎜⎜⎜

⎟⎟⎟⎟

⎜⎜⎜⎜

⎟⎟⎟⎟

=∑ ∑

α β βα αP ts

x xb t

The alternative hypothesis is H1:$…$0.

7. From section 7.4: The test of the null hypothesis H0:$=0 is always interesting because, if

true, it means that xi doesn’t affect yi.

8. Equation 7.26, section 7.4: The upper-tailed hypothesis test for H0:$=$0 is

( )1 0

− = < +−

⎜⎜⎜⎜

⎟⎟⎟⎟

α β αP b t s

9. From section 7.5: The best linear unbiased estimator of E(y0) is .$y a bx0 0= +

10. From section 7.5: Predictions are usually more reliable if they are based on larger

samples, and made for values of the explanatory variable that are similar to those that

appear in the sample.

Chapter 8: What if the disturbances have non-zero

expectations?

Section 8.0: The basics

We can’t tell if the ,is have a constant expected value that is different from zero, and it doesn’t

make any substantive difference. However, if we’re aren’t even willing to assert that the ,is share

the same expected value, then we shouldn’t be running a regression in the first place. Here are the

essentials:

1. Section 8.2: If E(,i) equals some constant other than zero, b is still an unbiased estimator of

$ and a is still an unbiased estimator of the fixed component of the deterministic part of yi.

2. Section 8.2: An identification problem arises when we don’t have a suitable estimator for

a parameter whose value we would like to estimate.

3. Section 8.2: A normalization is a value that we assign to a parameter when we don’t have

any way of estimating it and when assigning a value doesn’t have any substantive

implications.

4. Section 8.2: If E(,i) equals some constant other than zero, it wouldn’t really matter and we

couldn’t identify this constant. Therefore, we always normalize it to zero.

5. Section 8.3: If E(,i) is different for each observation, the observations don’t come from the

same population. In this case, b and a don’t estimate anything useful.

Chapter 9: What if the disturbances have different

variances?

This chapter addresses the possibility that the disturbances have different variances. In this case,

OLS estimates are still unbiased. However, they’re no longer Best-Linear-Unbiased. In addition, the

true variances of b and a are different from those given by the OLS variance formulas. In order to

conduct inference, we can either estimate their true variances, or we can often get BLU estimators

by transforming the data so that the transformed disturbances share the same variance. Here are

the essentials:

1. Section 9.2: When the disturbances don’t all have the same variance, it’s called

heteroskedasticity.

2. Section 9.2: The OLS estimators b and a remain unbiased for $ and " regardless of what

we assume for V(,i).

3. Section 9.3: The OLS estimators b and a are not Best-Linear-Unbiased and their true

variances are probably not estimated accurately by the OLS variance formulas.

4. Section 9.4: An auxiliary regression does not attempt to estimate a population

relationship in an observed sample. It provides supplemental information that helps us

interpret regressions that do.

5. Section 9.4: The White test identifies whether heteroskedasticity is bad enough to distort

OLS variance calculations.

6. Equation 9.6, section 9.5: The White heteroskedasticity-consistent variance estimator for

( )( )

( )V b

−⎛

⎝⎜⎜

⎠⎟⎟

It and the corresponding variance estimator for a are consistent even if heteroskedasticity is

present.

7. Equation 9.10, section 9.6: Weighted Least Squares (WLS) provides Best-Linear-

Unbiased estimators for $ and " if the different disturbance variances are known or can be

estimated:

= + +1 .

8. Section 9.8: Maximum Likelihood provides Best-Consistent estimators for $ and " if the

different disturbance distributions are known.

9. Section 9.9: Heteroskedasticity can take many forms. Regardless, the White

heteroskedasticity-consistent variance estimator provides trustworthy estimates of the

standard deviations of OLS estimates. In contrast, WLS or ML estimates require procedures

designed specifically for each heteroskedastic form.

Chapter 10: What if the disturbances are correlated?

This chapter deals with the possibility that the disturbances are correlated with each other. In this

case, OLS estimates are still unbiased. However, they’re no longer Best-Linear-Unbiased. In

addition, the true variances of b and a are probably different from those given by the OLS variance

formulas. In order to conduct inference, we can estimate their true variances. We can also attempt to

get BLU estimators by transforming the data so that the transformed disturbances have the

properties of chapter 5, or to get BC estimators by applying Maximum Likelihood. Here are the

essentials:

1. Section 10.2: When the disturbances are correlated, it’s called autocorrelation.

2. Section 10.2: Spatial autocorrelation describes the situation where a correlation exists

between the disturbances from observations that are near each other in spatial or social

terms.

3. Section 10.2: Serial correlation describes the situation where a correlation exists between

the disturbances from observations that are near each other in time. Shocks are the random

components of activity that are unique to the time unit in which they occur.

4. Section 10.2: Attenuation describes the situation in which the correlations between

observations get smaller as the observations get more distant from each other, either in

geographic, social or temporal terms.

5. Section 10.3: The OLS estimators b and a remain unbiased for $ and " regardless of

what we assume for COV(,i,,j).

6. Section 10.3: The OLS estimators b and a are not Best-Linear-Unbiased and their true

variances are probably not estimated accurately by the OLS variance formulas.

7. Section 10.4: The Newey-West variance estimators for V(b) and V(a) are approximately

accurate even if autocorrelation is present.

8. Section 10.5: In time-series data, first-order autocorrelation occurs when the disturbances

of consecutive observations are correlated. More generally, autocorrelation of order k occurs

when the disturbances of observations that are separated by k!1 units of time are correlated.

9. Section 10.5: In time-series data, the variance of the autocorrelated disturbances is larger,

and often a lot larger, than the variance of the underlying, uncorrelated shocks.

10. Section 10.6: Generalized Least Squares (GLS) provides Best-Linear-Unbiased

estimators for $ and " by transforming the data so as to remove the autocorrelation. This

requires either knowledge or estimates of the relevant autocorrelation.

11. Section 10.7: Greek letters always represent parameters. Greek letters with carets (^) over

them represent estimators of the parameter represented by the Greek letter, itself.

12. Section 10.7: The Durbin-Watson test is a common test for first-order autocorrelation in

time-series data. It can be approximated as

( )DW ≈ −2 1 $ ,ρ

where is the estimated correlation between ei and ei-1.$ρ

13. Section 10.8: The GLS procedure of section 5.6 can usually be implemented in two steps.

The first step estimates the required correlations and the second step estimates OLS on the

transformed data.

14. Section 10.10: If we have some form of autocorrelation other than first-order, the specific

procedures of sections 5.6 through 5.9 won’t work. However, procedures that follow the

same principles, tailored to the autocorrelation form that we’re actually dealing with, will.

15. Section 10.11: The trick is not in inventing a new procedure to fit each variation in the

assumptions. It is in reconceiving each population relationship so that it fits the one

procedure that we’ve already established.

Chapter 11: What if the disturbances and the

explanatory variables are related?

This chapter deals with the possibility that the disturbances and the explanatory variables are

related. This problem is called endogeneity or simultaneity. If we have it, our OLS estimators are

biased and inconsistent. Worse, they are unsalvageable. As in the last two chapters, the solution can

be thought of as a two-step procedure consisting of two applications of OLS. For this reason, it is

often called two-stage-least-squares. It is also called, more generally, instrumental variables.

This technique provides estimators that are not unbiased, only consistent. Therefore, the success of

the solution in this chapter is much more sensitive to the size of the sample than it was in the

previous two chapters. Here are the essentials:

1. Section 11.2: The three most common sources of endogeneity are reverse causality,

measurement error and dynamic choice.

2. Equation 11.12, section 11.2: Endogeneity means that the explanatory variable is a

random variable and it’s correlated with the disturbances,

( )COV xi i, .ε ≠ 0

3. Equation 11.13, section 11.3: The consequence of endogeneity is that b is neither

unbiased nor consistent for $:

( )( )

( )E b E

n= +−

⎜⎜⎜⎜

⎟⎟⎟⎟≠=

4. Equation 11.18, section 11.3: With measurement error, b is approximately equal to

( )( ) ( )β

V x Vi

OLS tends to understate the true magnitude of $.

5. Equations 11.19 and 11.20, section 11.4: An instrument, or an instrumental variable

zi has, roughly speaking, the following properties, at least as the sample size approaches

infinity:

( )COV x zi i, ≠ 0

( )COV zi iε , .= 0

6. Equation 11.21, section 11.4: The first step of our two-stage procedure is the OLS

instrumenting equation

x c dz fi i i= + + .

It provides an estimator of xi, , that is purged of the correlation between xi and ,i.$xi

7. Equation 11.23, section 11.4: We obtain the two-stage-least-squares (2SLS) estimators

of " and $ through OLS estimation of our second-step equation,

y a b x ei SLS SLS i i= + +2 2 $ .

8. Equation 11.24, section 11.4 and equations 11.30 and 11.32, section 11.5: The

instrumental variables (IV) estimator of $ is the same as the 2SLS estimator,

( )b b

z z xIV SLS

n= =−

9. Section 11.6: The IV estimator bIV is a consistent estimator of $.

10. Equation 11.41, section 11.6: The estimated variance of the IV slope estimator is

( )( )

( )V b

z z xIV

−⎛⎝⎜

⎞⎠⎟

11. Section 11.7: It’s often difficult to find an appropriate instrument. The more likely it is that

zi satisfies one of the assumptions in equations 5.19 and 5.20, the less likely it is that it

satisfies the other. The best instruments are moderately correlated with xi, in order to satisfy

equation 5.19 without invalidating equation 5.20.

12. Equation 11.46, section 11.8: The Hausman test for endogeneity consists of the auxiliary

regression

y a b x b x ei i i i= + + +1 2 $ .

Endogeneity is present if the coefficient on , b2, is statistically significant.$xi

13. Section 11.9: It’s always necessary to have a behavioral argument as to why an instrument

might be appropriate. It’s usually possible to offer a counter-argument as to why it might

not be. Great instruments are hard to come by. Even acceptable instruments often require

considerable ingenuity to construct and justify.

Chapter 12: What if there is more than one x?

If the population relationship includes two explanatory variables, but our sample regression

contains only one, our estimate of the effect of the included variable is almost surely biased. The

best remedy is to include the omitted variable in the sample regression. Minimizing the sum of

squared errors from a regression with two explanatory variables yields two slopes, each of which

represents the relationship between the parts of the dependent variable and the associated

explanatory variable that are not related to the other explanatory variable. These slopes are unbiased

estimators of the population coefficients. Here are the essentials:

1. Equation 12.1, Section 12.2: If there are two explanatory variables that affect yi, the

population relationship is

y x xi i i i= + + +α β β ε1 1 2 2 .

2. Equations 12.6,12.10 and 12.11, Section 12.2: If the population relationship is equation

5.1, but we still run the regression of equation 4.4, the slope of that regression is a biased

estimator of the true effect of x1i on yi, $1:

( )( )

( )E b

x x xb

n x x= +−

−= + ≠=

∑β β β β β1 2

1 1 211

1 1 11

1 2 12 1.

This bias is usually referred to as specification error, omitted-variable bias or LOVE.

The regression of equation 4.4 mistakenly attributes some of $2, the effect of the omitted

variable x2i, to the included variable, x1i. The extent of this mistaken attribution is

determined by the extent to which x2i looks like x1i.

3. Equations 12.12,12.19,5.24 and 12.27, section 12.3: If we minimize the sum of squared

errors for the multivariate sample relationship,

y a b x b x ei i i i= + + +1 1 2 2 .

we get errors that have an average value of zero and are unrelated, at least linearly, to either

of the two explanatory variables.

4. Equation 12.65, section 12.4: Regression estimates the effect of x1i on yi as equivalent to

( ) ( ) ( ) ( )

( ) ( )

be e e e

x x i x x yx i yxi

x x i x xi

1 2 1 2 2 2

1 2 1 2

=−⎛

⎝⎜⎞⎠⎟ −⎛⎝⎜

⎞⎠⎟

−⎛⎝⎜

⎞⎠⎟

the effect of the part of x1i that is not related to x2i on the part of yi that is not related to x2i.

This is why we can interpret a regression slope as measuring the effect of an explanatory

variable ceteris paribus, holding constant all other explanatory variables. Analogously,

regression estimates the effect of x2i on yi as equivalent to the effect of the part of x2i that is

not related to x1i on the part of yi that is not related to x1i.

5. Equation 12.85, section 12.5 and exercises 12.16 and 12.17: The slope and intercept

estimators from the regression of equation 5.12 are unbiased estimators of the coefficients

and constant in the population relationship of equation 5.1: E(b1)=$1, E(b2)=$2 and E(a)="

Chapter 13: Understanding and interpreting regression with

two x’s

The slopes which we obtain when we minimize the sum of squared errors for the regression of

equation 12.12 are BLU estimates of the population coefficients. If the population relationship

includes two explanatory variables, the precision of these slopes depends heavily on the extent to

which the two explanatory variables are related. Including an irrelevant variable is inefficient, but

does not create bias. Everything that we have done in chapters 8 through 11 holds with two

explanatory variables, either exactly or with minor, sensible extensions. Here are the essentials:

1. Section 13.3: If x1i and x2i are highly correlated, it’s often called multicollinearity. If we

omit either, we bias the estimated effect of the other. If we include both, their estimated

effects are unbiased but may have large variances, especially if n is small. In general, the

only responsible way to achieve greater precision is to increase n. Multicollinearity cannot

be responsible for slopes with implausible signs or magnitudes, and cannot create spurious

significance.

2. Equation 13.15, section 13.4: The sample estimate of F2 is

( )( )s

y a b x b x

i i ii

21 1 2 2

− + +

−= =∑ ∑

3. Equation 13.16, section 13.4: The sample standard deviation of b1 is

( )( )( )

( ) ( )

x xx x x x

n i ii

x x i x xi

2 2 1 11

1 1 2 1 2

− −− −

⎛⎝⎜

⎞⎠⎟

= +−⎛

⎝⎜⎞⎠⎟

∑∑

4. Section 13.5: Joint hypotheses specify values for two or more parameters simultaneously.

5. Equation 13.23, section 13.5: A restricted regression adopts a null hypothesis regarding

the value or values of one or more parameters. This null hypothesis may also be referred to

as an assumption or, most commonly, a restriction. The sum of squared errors from a

restricted regression is always at least as large as the sum of squared errors from an

unrestricted regression:

1= =∑ ∑⎛

⎝⎜

⎞⎠⎟ ≥

⎛⎝⎜

⎞⎠⎟ .

In particular, the sum of squared errors, and therefore the R2, can never go down when

another explanatory variable is added to the regression. However, the adjusted R2 can go

6. Equation 13.29, section 13.6: The test of restrictions is

∑ ∑

⎛⎝⎜

⎞⎠⎟ −

⎛⎝⎜

⎞⎠⎟

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

⎛⎝⎜

⎞⎠⎟

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

−~ ( , ).

7. Section 13.7: If we include an irrelevant variable in our regression, the slopes are still

unbiased estimators of the true coefficient values. In particular, the slope for the irrelevant

variable should be pretty close to zero, at least statistically. However, the inclusion of an

irrelevant variable will usually reduce the precision of the estimated effects of relevant

variables.

8. Section 13.8: With two explanatory variables, OLS slopes are still unbiased estimators if

the disturbances are heteroskedastic or autocorrelated. The White test, the White

heteroskedasticity-consistent variance estimator and the Newey-West autocorrelation-

consistent variance estimator are all still valid, but need to be reformulated to incorporate

the second explanatory variable. WLS or GLS are still required to obtain BLU estimators.

9. Equation 13.47, section 13.8: When one explanatory variable is endogenous, the other

explanatory variable must be included in the instrumenting equation, along with the

instrumental variable, itself.

10. Equations 13.50 and 13.51, section 13.8: If both explanatory variables are endogenous,

we need at least two instrumental variables.

Chapter 14: Express yourself

For our purposes, regression has to be a linear function of the constant and coefficients so that their

estimators can be linear functions of the dependent variable. However, explanatory variables can

appear in discrete and non-linear form. These forms give us the opportunity to represent a wide and

varied range of possible relationships between the explanatory and dependent variables.

1. Section 14.2: Dummy variables identify the absence or presence of an indivisible

characteristic. The intercept for observations that don’t have this characteristic is a. The

effective intercept for observations that do have this characteristic is a+b2, where b2 is the

slope associated with the dummy variable. The slope b2 estimates the fixed difference in yi

between those that do and do not have the characteristic at issue.

2. Section 14.2: We fall into the dummy variable trap when we enter one dummy variable for

a particular characteristic, and another dummy variable for the opposite or absence of that

characteristic. These dummy variables are perfectly correlated, so slopes and their variances

are undefined. We only need one dummy variable, because its slope measures the difference

between the values of yi for observations that have the characteristic and those that don’t or

have its opposite.

3. Equations 14.13 and 14.18, Section 14.3: The quadratic specification is

y x xi i i i= + + +α β β ε1 22 .

$1xi is the linear term. $2xi2 is the quadratic term. If $1>0 and $2<0, small changes in xi

increase E(yi) when and reduce it when .xi < − ββ

22 xi > − ββ

4. Equations 14.35 and 14.36, Section 14.4: The semi-log specification is

ln .y xi i i= + +α β ε

The coefficient is the expected relative change in the dependent variable in response to an

absolute change in the explanatory variable.

⎣⎢

⎦⎥E

5. Equations 14.38 and 14.39, Section 14.4: The log-log specification is

ln ln .y xi i i= + +α β ε

The coefficient is the elasticity of the expected change in the dependent variable with

respect to the change in the explanatory variable:

⎡⎣⎢

⎤⎦⎥

⎡⎣⎢

⎤⎦⎥

6. Equations 14.48 and 14.52, Section 14.5: Interactions allow the effect of one variable to

depend on the value of another. The population relationship with an interaction is

y x x x xi i i i i i= + + + +α β β β ε1 1 2 2 3 1 2 .

The change in the expected value of yi with a change in x1i

1 3 2= +β β .

Chapter 15: More than two explanatory variables

The addition of a second explanatory variable in Chapter 12 adds only four new things to what there

is to know about regression. First, regression uses only the parts of each variable that are unrelated

to all of the other variables. Second, omitting a variable from the sample relationship that appears in

the population relationship almost surely biases our estimates. Third, including an irrelevant

variable does not bias estimates but reduces their precision. Fourth, the number of interesting joint

tests increases with the number of slopes. All four remain valid when we add additional explanatory

variables.

1. Equations 15.11,15.1 and 15.2, Section 15.2: The general form of the multivariate

population relationship is

y xi l lil

i= + +=∑α β ε

The corresponding sample relationship is

y a b x ei l lil

i= + +=∑

The predicted value of yi is

$ .y a b xi l lil

= +=∑

2. Equations 15.3 and 15.4, Section 15.2: When we minimize the sum of squared errors in

the multivariate regression, the errors sum to zero,

==∑ eii

and are uncorrelated in the sample with all explanatory variables.

( )0 11

= = ==∑ e x COV e x for all p ki pii

i pi, , .K

3. Equations 15.5 and 15.8, Section 15.2: The intercept in the multivariate regression is

a y b xl ll

= −=∑

The slopes are

( ) ( ) ( ) ( )

( ) ( )

be e e e

x x x x x i x x x x x y x x x x i y x x x xi

x x x x x i x x x x xi

p p p k p p p k p p k p p k

p p p k p p p k

=−⎛

⎝⎜⎞⎠⎟ −⎛⎝⎜

⎞⎠⎟

−⎛⎝⎜

⎞⎠⎟

− + − + − + − +

− + − +

, , , ,

.1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1

K K K K K K K K

K K K K

4. Equations 15.9 and 15.17, Section 15.2: R2 is

( )( )R

y yCOV y y

n i i2

1= −−

∑$ , .

Adjusted R2 is

( )adjusted Rs

1= − .

5. Section 15.2: Regression is not limited to two explanatory variables. However, the number

of observations must exceed the number of estimators and each explanatory variable must

have some part that is not related to all of the other explanatory variables in order to

calculate meaningful regression estimators.

6. Equations 15.12 and 15.14, Section 15.2: If the regression is specified correctly,

estimators are unbiased:

( ) ( )E a and E b for all p kp p= = =α β 1, , .K

If the regression omits explanatory variables, estimators are biased:

( ) ( ) ( )

E be e

ep p m

x x x x x i x x x x x ii

x x x x x ii

nm k q

kp p p k q m p p k q

p p p k q

⎢⎢⎢⎢

⎥⎥⎥⎥

− + − − + −

− + −

∑∑β β

.1 1 1 1 1 1

K K K K

7. Equations 15.16,15.19 and 15.20, Section 5.3: The estimator of F2 is

− −=∑

With this estimator, the standardized value of bp is a t random variable with n!k!1

degrees of freedom:

x x x x x ii

p p p k

− +=

− −

1 1 1,

If n is sufficiently large, this can be approximated as a standard normal random variable.

If the sample regression is correctly specified, bp is the BLUE. If the disturbances are

normally distributed, it is also the BCE.

8. Equations 15.21 and 15.22, Section 15.3: The general form of the test between an

unrestricted alternative hypothesis and null hypothesis subject to j restrictions is

F j n k

∑ ∑

⎛⎝⎜

⎞⎠⎟ −

⎛⎝⎜

⎞⎠⎟

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

⎛⎝⎜

⎞⎠⎟

− −

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

− −~ ( , ).

For the null hypothesis that all coefficients are equal to zero, this reduces to

F k n kU

− −− −~ , .

9. Equation 15.26, Section 15.3: The Chow test for differences in regimes is

F k n k

= = = = =

= = = =

∑ ∑ ∑

∑ ∑

⎛⎝⎜

⎞⎠⎟ −

⎛⎝⎜

⎞⎠⎟ +

⎛⎝⎜

⎞⎠⎟

⎝⎜⎜

⎠⎟⎟

⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟

⎛⎝⎜

⎞⎠⎟ +

⎛⎝⎜

⎞⎠⎟

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

− −~ ( , ).

10. Equations 15.54,15.57,15.68 and 15.76, Section 15.6: In matrix notation, the vector

consisting of the intercept and slopes for a regression with k explanatory variables is

( )b = X X X y-1′ ′ .

It is an unbiased estimator of the vector of parameters,

( )E b = β.

Its variance is

( ) ( )V b X X2 -1= ′σ .

This is the smallest possible variance for an unbiased linear estimator vector: b is BLUE.

If the disturbances are normally distributed, it is also the BCE.

Introductory Econometrics: Intuition, Proof and Practice · Introductory Econometrics: Intuition,...

Documents

Transcript of Introductory Econometrics: Intuition, Proof and Practice · Introductory Econometrics: Intuition,...

Introductory Econometrics: Chapter 2 Slides

ECON4150 - Introductory Econometrics Lecture 2: Review of ... · ECON4150 - Introductory Econometrics Lecture 2: Review of Statistics Monique de Haan (moniqued@econ.uio.no) Stock

ECON4150 - Introductory Econometrics Lecture 6: OLS with … · ECON4150 - Introductory Econometrics Lecture 6: OLS with Multiple Regressors Monique de Haan (moniqued@econ.uio.no)

Introductory Econometrics: Intuition, Proof and Practice · PDF filewould occur in the dependent variable if an individual ... variable construction and sample design are ... A derivation

Introductory Econometrics - Brandeis Users' Home Pages

Kaplan: Introductory Econometrics · Introductory Econometrics Description, Prediction, and Causality Second edition David M. Kaplan

Tasks with solutions for Introductory Econometrics 3 ...

Introductory econometrics for finance 4th edition ebook

ECON 3150/4150 (Introductory Econometrics) Problem sets

INTRODUCTORY GRADUATE ECONOMETRICS

Introductory Econometrics - Brandeis Universityyanzp/Study Notes... · 2006. 7. 7. · Introductory Econometrics Study Notes by Zhipeng Yan Chapter 4 Multiple Regression Analysis:

INTRODUCTORY ECONOMETRICS - Assetsassets.cambridge.org/97805218/43195/frontmatter/9780521843195... · INTRODUCTORY ECONOMETRICS This highly accessible and innovative text and accompanying

ECON4150 - Introductory Econometrics Lecture 11 ... - Introductory Econometrics Lecture 11: Nonlinear Regression Functions Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter

ECON4150 - Introductory Econometrics Lecture 16: Instrumental ...

ECON141 – Introductory Econometrics€¦ · Economic and Financial Studies Economics Department ECON141 – Introductory Econometrics First Semester, 2008 UNIT OUTLINE

Kaplan: Introductory Econometrics

Introductory econometrics test bank

0. Introductory econometrics

Introductory Econometrics Slides

Introductory Econometrics for Finance _ Cambridge University _ press(2008)