Econometrics Notes

85
Topic 1 Background Material John Stapleton () Background Material 1 / 85

description

Applied Econometrics Notes

Transcript of Econometrics Notes

  • Topic 1Background Material

    John Stapleton

    () Background Material 1 / 85

  • Table of Contents

    1.1 A review of some basic statistical concepts

    1.2 Random regressors

    1.3 Modelling the conditional mean

    1.3.1 Specifying a functional form for the conditional mean

    1.3.2 Choosing the regressors

    1.4 Some asymptotic theory

    1.4.1 Introduction

    1.4.2 Consistency

    1.4.3 Asymptotic normality

    1.4.4 Asymptotic effi ciency

    1.5 Testing linear restrictions on the parameters1.6 A review of generalized least squares (GLS)

    (ETC3410) Background Material 2 / 85

  • 1.1 A Review of some basic statical concepts I

    Definition (1.1)

    Let x be a discrete random variable which can take on the values (x1,x2,......., xn) with probabilities ( f (x1), f (x2), ......., f (xn)) respectively. Thenthe mean or expected value or expectation of x , which we denote by E (x), isdefined as:

    E (x) =n

    i=1xi f (x i ).

    If x is a continuous random variable with probability density function f (x),then

    E (x) = xf (x)dx .

    For any set of random variables x , y and z, the expectations operator satisfies thefollowing rules:

    (ETC3410) Background Material 3 / 85

  • 1.1 A Review of some basic statical concepts II

    R1 E(x + y + z) = E(x) + E(y) + E(z).

    R2 E(k) = k for any constant k.

    R3 E(kx) = kE(x) for any constant k.

    R4 E(k + x) = k + E(x) for any constant k.

    R5 In general, E(xy) 6=E(x)E(y).

    (ETC3410) Background Material 4 / 85

  • 1.1 A Review of some basic statical concepts III

    Definition (1.2)

    The variance of the random variable x , which we denote by Var(x), is definedas:

    Var(x) = E{[x E (x)]2}= E [x2 + E (x)2 2xE (x)]= E (x2) + E (x)2 2E (x)E (x)= E (x2) + E (x)2 2E (x)2

    = E (x2) E (x)2.

    Informally, Var(x) measures how tightly the values of x are clustered around themean.

    (ETC3410) Background Material 5 / 85

  • 1.1 A Review of some basic statical concepts IV

    Definition (1.3)Let x and y be two random variables. Then the covariance between x and y ,which we denote by Cov(x , y) is defined as:

    Cov(x , y) = E{[x E (x)][y E (y)]}.

    Cov(x , y) measures the degree of linear association between x and y .

    Notice that

    Cov(x , y) = E{[x E (x)][y E (y)]}= E [xy xE (y) yE (x) + E (x)E (y)]= E (xy) E (x)E (y) E (y)E (x) + E (x)E (y)= E (xy) E (x)E (y).

    (ETC3410) Background Material 6 / 85

  • 1.1 A Review of some basic statical concepts V

    Therefore, in the special case in which

    E (x) = 0 and/or E (y) = 0,

    the formula for the covariance between x and y simplifies to

    Cov(x , y) = E (xy).

    For any pair of random variables x and y and any constants a and b, the Varoperator satisfy the following rules:

    R6 Var(a) = 0.

    R7 Var(ax) = a2Var(x).

    R8 Var(ax + by) = a2Var(x) + b2Var(y) + 2abCov(x , y).

    (ETC3410) Background Material 7 / 85

  • 1.1 A Review of some basic statical concepts VI

    R9 If x and y are independent random variables, Cov(x,y) = 0 and

    Var(ax + by) = a2Var(x) + b2Var(y).

    As a measure of linear association, the covariance suffers from two seriouslimitations:

    The value of Cov(x , y) depends on the units in which x and y aremeasured.The value of Cov(x , y) is diffi cult to interpret. For example, how to weinterpret the statement that

    Cov(x , y) = 2?

    Correlation, which we define below, is a superior measure of the degree oflinear association between two random variables.

    (ETC3410) Background Material 8 / 85

  • 1.1 A Review of some basic statical concepts VII

    Definition (1.4)Let x and y be two random variables. Then the correlation between x and y ,which we denote by Corr(x , y), is defined as:

    Corr(x , y) =Cov(x , y)SD(x)SD(y)

    ,

    whereSD(x) = Var(x)1/2, SD(y) = Var(y)1/2.

    It can be shown that1 Corr(x , y) 1.

    (ETC3410) Background Material 9 / 85

  • 1.1 A Review of some basic statical concepts VIII

    Corr(x , y) is unit free and is easy to interpret. For example, if

    Corr(x , y) = 0.8

    we conclude that there is a strong, positive, linear relationship between xand y.

    (ETC3410) Background Material 10 / 85

  • 1.2 Random regressors I

    In introductory econometrics units it is often assumed that the regressors inthe model are not random variables. For example, in the simple bivariateregression model

    yi = 0 + 1xi + ui

    yi and ui are assumed to be random variables, but xi is assumed to be afixed number which does not change in value from sample to sample.

    While this assumption is useful for pedagogical purposes because it simplifiesthe analysis, it is inappropriate for the nonexperimental data with whichwe typically work in disciplines such as economics and finance.

    Nonexperimental data is data that is not generated by performing acontrolled experiment.

    (ETC3410) Background Material 11 / 85

  • 1.2 Random regressors II

    When working with nonexperimental data, it is appropriate to treat boththe dependent variable and the regressors in our regression models asrandom variables. Under this more realistic assumption, when we collect asample of data

    (yi ,xi ), i = 1, 2, ...,N

    we are effectively making a drawing from the joint probability distributionof the random variables

    (yi ,xi ).

    Consider the multivariate linear regression model

    yi = 0 + 1xi1 + 2xi2 + ...+ kxik + ui . (1.1)

    LetfJ (yi , xi1, ...., xik |),

    (ETC3410) Background Material 12 / 85

  • 1.2 Random regressors III

    denote the joint probability distribution of the random variables(yi , xi1, ...., xik ), with parameter vector . That is, is the vector ofparameters that appears in the mathematical formula for the jointprobability distribution of (yi , xi1, ...., xik ).

    Recall from elementary statistics that

    fJ (yi , xi1, ...., xik |) = fC (yi |xi1, ...., xik ,)fJ (xi1, ...., xik |), (1.2)

    where:

    fJ (yi , xi1, ...., xik |) is the joint probability distribution of(yi , xi1, ...., xik ).fC (yi |xi1, ...., xik ,) is the probability distribution of yi conditional on(xi1, ...., xik ).fJ (xi1, ...., xik |) is the joint probability distribution of (xi1, ...., xik ).

    (ETC3410) Background Material 13 / 85

  • 1.2 Random regressors IV

    Notice that the conditional probability distribution,fC (yi |xi1, ...., xik ,), enables us to make probability statements about yconditional on the values of (xi1, ...., xik ) being fixed.

    The most general statistical analysis of the behavior of (yi , xi1, ...., xik )would involve constructing a mathematical model of fJ (yi , xi1, ...., xik |).However, this task is usually too diffi cult and instead we restrict ourattention to modelling fC (yi |xi1, ...., xik ,).Since,

    fJ (yi , xi1, ...., xik |) = fC (yi |xi1, ...., xik ,)fJ (xi1, ...., xik |), (1.2)

    this strategy obviously means that we ignore fJ (xi1, ...., xik |), and loseany information that it contains regarding the parameter vector .

    (ETC3410) Background Material 14 / 85

  • 1.2 Random regressors V

    The strategy of focusing on fC (yi |xi1, ...., xik ,) and ignoringfJ (xi1, ...., xik |) does not entail any loss of information in the followingspecial case. Let

    = (, ),

    where is the vector of parameters of interest, and assume that

    fJ (yi , xi1, ...., xik |) = fC (yi |xi1, ...., xik , )fJ (xi1, ...., xik |) (1.3)

    Notice that in (1.3) the parameter vector of interest appears only in theconditional distribution of yi .

    When (1.3) holds, (xi1, ...., xik ) are said to be weakly exogenous withrespect to , and there is no loss of information as a result of ignoringfJ (xi1, ...., xik |) and focusing exclusively on fC (yi |xi1, ...., xik , ).

    (ETC3410) Background Material 15 / 85

  • 1.2 Random regressors VI

    In fact, even modelling fC (y |x1, x2, ..., xk ) is usually too diffi cult. Instead,we typically, we focus on only one feature of the conditional distribution ofyi , namely the conditional mean, which we denote by E (y |x1, x2, ..., xk ).(To economize on notation, the parameter vector and the subscript i aresuppressed).

    In particular, we are usually most interested in estimating and testinghypotheses about how the conditional mean of y changes in response tochanges in (x1, x2, ..., xk ).

    Typically, y will not assume its conditional mean value. Let u denote thedeviation of y from its conditional mean. Then, by definition,

    u = y E (y |x1, x2, ..., xk ). (1.4)

    Rearranging (1.4) we obtain

    y = E (y |x1, x2, ..., xk ) + u. (1.5)

    (ETC3410) Background Material 16 / 85

  • 1.2 Random regressors VII

    Equation (1.5) is sometimes referred to as the error form of the model, orthe model in error form.

    When we take conditional expectations of both sides of (1.5) we obtain

    E (y |x1, x2, ..., xk ) = E (y |x1, x2, ..., xk ) + E (u|x1, x2, ..., xk ),

    which implies that

    E (u|x1, x2, ..., xk ) = 0. (1.6)

    Equations (1.5) and (1.6) together imply that we can always express y asthe sum of its true conditional mean and a random error term, which itselfhas a conditional mean of zero.

    (ETC3410) Background Material 17 / 85

  • 1.2 Random regressors VIIIIf xj is a continuous variable, the marginal or partial effect of xj on theaverage value of y is given by

    E (y |x1, x2, ..., xk )xj

    . (1.7)

    A great deal of applied econometrics consists of trying to correctly specifythe conditional mean of the dependent variable y , and trying to obtain anestimator of the marginal effects of interest that has goodstatisticalproperties.

    There are two aspects to specifying the conditional mean of the dependentvariable:

    We must specify a functional form for the conditional mean.

    We must decide what explanatory variables to include in theconditional mean function.

    We briefly consider each of these issues in the following two subsections.

    (ETC3410) Background Material 18 / 85

  • 1.3 Modelling the conditional mean I1.3.1 Specifying a functional form for the conditional mean

    In order to model the conditional mean we have to make an assumptionabout its functional form. The assumption that we make has importantimplications for:

    How we compute the marginal effects of the x variables.

    The properties of the marginal effects.

    How we interpret the regression coeffi cients.

    The method we use to estimate the regression coeffi cients.

    In this section we briefly consider the most common specifications that areused for the conditional mean. To economize on notation, we assume amodel with two explanatory variables and an intercept.

    M1 The conditional mean is assumed to be linear in both the parameters andthe regressors.

    (ETC3410) Background Material 19 / 85

  • 1.3 Modelling the conditional mean II1.3.1 Specifying a functional form for the conditional mean

    Under this specification the conditional mean is given by

    E (y |x1, x2) = + 1x1 + 2x2, (1.8)

    and the model in error form is

    y = E (y |x1, x2) + u= + 1x1 + 2x2 + u. (1.9)

    From (1.8) we have

    E (y |x1, x2)xj

    = j , j = 1, 2. (1.10)

    Under this specification for the conditional mean:

    The marginal effect of xj is constant and equal to j .

    (ETC3410) Background Material 20 / 85

  • 1.3 Modelling the conditional mean III1.3.1 Specifying a functional form for the conditional mean

    j measures the change in the conditional mean of the dependentvariable arising from a one unit change in xj , holding the otherregressor constant.

    The marginal effect of xj does not vary across observations and doesnot depend on the value of any of the regressors.

    M2 The conditional mean is assumed to be linear in the parameters butnonlinear in one or more of the regressors.

    For example,E (y |x1, x2) = + 1x21 + 2x22 , (1.11)

    or, in error form,

    y = E (y |x1, x2) + u= + 1x

    21 + 2x

    22 + u. (1.12)

    (ETC3410) Background Material 21 / 85

  • 1.3 Modelling the conditional mean IV1.3.1 Specifying a functional form for the conditional mean

    From (1.11) we have

    E (y |x1, x2)x1

    = 21x1,E (y |x1, x2)

    x2= 22x2. (1.13)

    Under this specification:

    The marginal effect of xj is not measured by j .The marginal effect of xj varies with the value of xj .The marginal effect of xj measures the change in the conditional meanof the dependent variable arising from a one unit change in xj holdingthe other regressor constant.

    (ETC3410) Background Material 22 / 85

  • 1.3 Modelling the conditional mean V1.3.1 Specifying a functional form for the conditional mean

    In some cases, a model specification that allows some of the marginal effectsto vary, such as M2, may be more realistic than one that constrains all themarginal effects to be constant. For example, if we wished to study theeffect of education on average wages, we might specify the conditional meanof wages as

    E (wage|educ , exp er , race, gender)= + 1educ + 2 exp er + 3race + 4gender + 5 exp er

    2.

    (1.14)

    Since (1.14) implies that

    E (wage|educ , exp er , race, gender) exp er

    = 2+25 exp er ,

    (ETC3410) Background Material 23 / 85

  • 1.3 Modelling the conditional mean VI1.3.1 Specifying a functional form for the conditional mean

    this specification allows the marginal effect of experience to depend on thelevel of experience.

    M3 The conditional mean of the natural log of the dependent variable isassumed to be linear in the parameters and the natural log of theexplanatory variables. (log-linear model).

    Under this specification

    E (ln y |x1, x2) = + 1 ln x1 + 2 ln x2, (1.15)

    or, in error form,

    ln y = E (ln y |x1, x2) + u= + 1 ln x1 + 2 ln x2 + u. (1.16)

    (ETC3410) Background Material 24 / 85

  • 1.3 Modelling the conditional mean VII1.3.1 Specifying a functional form for the conditional mean

    Although the model is nonlinear in the regressors, it is linear in the naturallog of the regressors and in the parameters and can easily be estimated byOLS.

    From (1.16) we have

    ln y ln xj

    = j , j = 1, 2. (1.17)

    This specification is often attractive because the regression coeffi cientscan be interpreted as elasticities or percentage changes.In (1.17) j measures the percentage change in the level of y arisingfrom a one percent change in the level of xj , holding the otherregressor constant. That is, j measures the elasticity of y (not lny)with respect to xj (not lnxj ), holding the other regressor constant.

    (ETC3410) Background Material 25 / 85

  • 1.3 Modelling the conditional mean VIII1.3.1 Specifying a functional form for the conditional mean

    To see this note that

    ln y ln x

    = lim ln x0

    ( ln y ln x

    ) ln y

    ln x, for small ln x .

    Let

    ln y = ln y1 ln y0

    ln x = ln x1 ln x0.

    (ETC3410) Background Material 26 / 85

  • 1.3 Modelling the conditional mean IX1.3.1 Specifying a functional form for the conditional mean

    Then

    ln y = ln y1 ln y0

    = ln(y1y0

    )= ln

    (y1y0 1+ 1

    )= ln

    (y1y0 y0y0+ 1)

    = ln(y1 y0y0

    + 1)

    y1 y0y0

    for small changes in y

    100 ln y (y1 y0y0

    )100

    % change in y.(ETC3410) Background Material 27 / 85

  • 1.3 Modelling the conditional mean I1.3.1 Specifying a functional form for the conditional mean

    In deriving this approximation we have used the fact that,

    ln(N + 1) N

    for any "small" number N. For example,

    ln(0.2+ 1) = 0.18 0.2.

    Using the same logic,

    100 ln x % change in x.

    (ETC3410) Background Material 28 / 85

  • 1.3 Modelling the conditional mean II1.3.1 Specifying a functional form for the conditional mean

    Therefore, for small changes in x and y,

    ln y ln x

    ln y ln x

    =100 ln y100 ln x

    % change in y% change in x

    .

    For example, if1 = 2

    in M3, then a one percent increase in x1, holding x2 fixed, is associatedwith a two percent increase in y.

    M4 The conditional mean of the log of the dependent variable is assumed tobe linear in the parameters and in the level of the regressors. (log-levelmodel)

    (ETC3410) Background Material 29 / 85

  • 1.3 Modelling the conditional mean III1.3.1 Specifying a functional form for the conditional mean

    Under this specification the model is given by

    E (ln y |x1, x2) = + 1x1 + 2x2, (1.18)

    or, in error form,

    ln y = E (ln y |x1, x2) + u= + 1x1 + 2x2 + u. (1.19)

    From (1.19) we have ln yxj

    = j , j = 1, 2. (1.20)

    Under this specification:

    (ETC3410) Background Material 30 / 85

  • 1.3 Modelling the conditional mean IV1.3.1 Specifying a functional form for the conditional mean

    100j measures the percentage change in the level of y arising froma one unit change in the level of xj , holding the other regressorconstant, since

    100j = 100 ln yxj

    100 ln yxj

    100 ln yxj

    % change in yxj

    .

    For example, if1 = 0.2

    (ETC3410) Background Material 31 / 85

  • 1.3 Modelling the conditional mean V1.3.1 Specifying a functional form for the conditional mean

    in M4, then a one unit increase in x1, holding x2 fixed, is associatedwith a twenty percent increase in y.The marginal effect of xj on the % change in y is constant.

    All of the specifications for the conditional mean of y that we haveconsidered so far have the property that they are linear in the parameters.Models that are linear in the parameters can generally be estimated by OLS.Of course, whether or not the OLS estimator has good statistical propertiesdepends on other features of the model such as, for example, whether or notthe errors are homoskedastic.

    (ETC3410) Background Material 32 / 85

  • 1.3 Modelling the conditional mean VI1.3.1 Specifying a functional form for the conditional mean

    Many models that appear to be nonlinear in the parameters can betransformed into models that are linear in the parameters. For example, themodel given by

    y = e [+1x1+2x2 ]eu (1.21)

    in nonlinear in the parameters. However, taking logs on both sides of (1.21)we obtain

    ln y = + 1x1 + 2x2 + u, (1.22)

    which is linear in the parameters.

    Notice that the parameters in (1.22) are exactly the same as the parametersin (1.21), so when we estimate (1.22) we get estimates of the parameters in(1.21). However, because it is linear in the parameters, (1.22) is much easierto estimate than (1.21).

    M5 The conditional mean of the dependent variable is intrinsically nonlinear inthe parameters.

    (ETC3410) Background Material 33 / 85

  • 1.3 Modelling the conditional mean VII1.3.1 Specifying a functional form for the conditional mean

    Some models of the conditional mean of the dependent variable areintrinsically nonlinear in the parameters in the sense that they cannot bemade linear by applying a mathematical transformation, such as taking logs.

    For example, assume that

    E (y |x1, x2) =1

    1+ e(+1x1+2x2). (1.23)

    This model is known as the logit model and is studied in topic 2. The logitmodel is intrinsically nonlinear since it cannot be made linear in theparameters by applying a mathematical transformation.

    Intrinsically nonlinear models cannot be estimated by OLS. They aretypically estimated by using the method of maximum likelihood or, lesscommonly, the method of nonlinear least squares.

    (ETC3410) Background Material 34 / 85

  • 1.3 Modelling the conditional mean VIII1.3.1 Specifying a functional form for the conditional mean

    In intrinsically nonlinear models such as (1.23) the marginal effects of theregressors:

    Are not given by the regression coeffi cients.Depend on the values of the regressors.

    As we will see in Topic 2, a nonlinear specification for the conditional meanof the dependent variable is sometimes more appropriate than a linearspecification, given the nature of the dependent variable.

    (ETC3410) Background Material 35 / 85

  • 1.3 Modelling the conditional mean I1.3.2 Choosing the regressors

    Consider the linear regression model

    y = + 1x1 + 2x2 + .....+ k1xk1 + kxk + u. (1.24)

    It is very important to understand the role of the error term, u, in (1.24).The error term represents all those variables that affect thedependent variable that have not been explicitly included asregressors in the model.

    If one the regressors, say xi , in (1.24) is correlated with any of the omittedvariables that are contained in u, then xi will necessarily be correlated withu. A regressor that is correlated with the error term is referred to as anendogenous regressor.

    (ETC3410) Background Material 36 / 85

  • 1.3 Modelling the conditional mean II1.3.2 Choosing the regressors

    For example, suppose that the correct model in error form is

    y = + 1x1 + 2x2 + .....+ k1xk1 + kxk + u (1.24)

    but we estimate

    y = + 1x1 + 2x2 + .....+ k1xk1 + v . (1.25)

    In this case, we have omitted the relevant regressor xk . It follows from(1.24) and (1.25) that

    v = kxk + u. (1.26)

    The omitted variable xk is now incorporated in the error term, v, in (1.25).If, for example, xk is correlated with x2, then x2 will be correlated with v in(1.25). That is, x2 will be an endogenous regressor.

    (ETC3410) Background Material 37 / 85

  • 1.3 Modelling the conditional mean III1.3.2 Choosing the regressors

    As we will see in topic 3, when a regression equation contains one or moreendogenous regressors both the OLS and GLS estimators of the regressioncoeffi cients lose their desirable statistical properties. Specifically, bothestimators are inconsistent. (The concept of consistency is discussed insection 1.4 below).

    In light of this result, it is clearly very important to think carefully aboutwhich regressors to include in the model, and in particular what factors wewish to control for.

    However, even when we are very careful in selecting the regressors, omittinga relevant regressor may be unavoidable. This will be the case when one ormore of the relevant regressors is unobservable.

    (ETC3410) Background Material 38 / 85

  • 1.3 Modelling the conditional mean IV1.3.2 Choosing the regressors

    For example, suppose that we are interested in estimating the marginal effectof education on an individual wage, controlling for experience, race, gender,experience and ability. In this case the conditional mean of interest is

    E (wage|educ , exp er , race, gender , exp er2, ability)= + 1educ + 2 exp er + 3race + 4gender + 5 exp er

    2

    +6ability , (1.27)

    which implies that the model in error form is

    wage = + 1educ + 2 exp er + 3race + 4gender + 5 exp er2

    +6ability + u. (1.28)

    (ETC3410) Background Material 39 / 85

  • 1.3 Modelling the conditional mean V1.3.2 Choosing the regressors

    In (1.28)

    E (wage|educ , exp er , race, gender , exp er2, ability)educ

    = 1.

    That is, 1 measures the marginal effect of education on the average wage,controlling for differences in experience, race, gender and ability.

    Unfortunately, since ability is unobservable, we cant explicitly include it inthe model. Consequently, the equation that we actually estimate is

    wage = + 1educ + 2 exp er + 3race+ 4gender + 5 exp er2+ v ,(1.28a)

    wherev = 6ability + u.

    (ETC3410) Background Material 40 / 85

  • 1.3 Modelling the conditional mean VI1.3.2 Choosing the regressors

    We will see in Topic 3 that if, as we suspect, education and ability arecorrelated, the OLS estimator of 1 in equation (1.28a) will no longer be

    "reliable" even in very large samples. More specifically, the OLS estimator of1 will be an inconsistent estimator of the marginal effect of education onthe average wage controlling for differences in experience, race, gender andability.. (The concept of consistency is discussed in section 1.4 below).

    Informally, if we estimate (1.28a) by OLS, the OLS estimate of 1 will be an"unreliable" estimate of the marginal effect of education on wages,controlling for exper, race, gender and ability.

    In Topic 4 we will discuss how to deal with the problem of endogenousregressors.

    (ETC3410) Background Material 41 / 85

  • 1.4 Some asymptotic theory I1.4.1 Introduction

    In topics 2 we will study models in which it is desirable to allow theconditional mean of the dependent variable to be nonlinear in theparameters and in topics 3 to 8 we will allow the regressors in our models tobe correlated with the error term. In these models it is generally impossibleto derive estimators that can be shown to be unbiased, effi cient andnormally distributed in finite samples. In fact, in these models:

    The finite sample properties of the estimators that we use are typicallyunknown.In addition, the finite sample distributions of our test statistics are alsotypically unknown.

    When conducting inference in these models we are forced to rely almostentirely on asymptotic results, that is results that can be proved to holdonly as the sample size goes to infinity.

    (ETC3410) Background Material 42 / 85

  • 1.4 Some asymptotic theory II1.4.1 Introduction

    The strategy researchers use in these circumstances is to derive theasymptotic distributions of estimators and test statistics and to use theseasymptotic distributions as approximations to the finite sample distributionsof the estimators and test statistics. In effect, we proceed "as if " theasymptotic distributions are valid in finite samples. However, we never knowhow accurate these approximations are in a given application.

    In this section we provide a brief and relatively informal discussion of theimportant concepts of consistency, asymptotic normality and asymptoticeffi ciency. A more detailed and technical discussion of these concepts isprovided in ETC3400.

    (ETC3410) Background Material 43 / 85

  • 1.4 Some asymptotic theory I1.4.2 Consistency

    Let bn denote an estimator of the parameter , given a sample of size n.Formally, bn is said to be a consistent estimator of if

    Pr (|bn | < ) 1 as n , for all > 0. (1.29)

    When (1.29) holds we say that bn converges in probability to or that

    is the probability limit of bn, which we denote by

    p lim(bn) = . (1.30)

    Intuitively, bn is a consistent estimator of if the probability that bn isarbitrarily close to goes to 1 as the sample size gets infinitely large.

    (ETC3410) Background Material 44 / 85

  • 1.4 Some asymptotic theory II1.4.2 Consistency

    The practical implication of bn being a consistent estimator of is thatthere is a very high probability that bn will be very closeto when thesample size is large, and in this sense bn will be a goodestimator of inlarge samples.

    Obviously, consistency is a very desirable for an estimator.

    There are four useful properties of the plim operator which we state belowwithout proof. We will use these properties on several occasions later in thelecture notes.

    Let x1n and x2n be two random variables such that

    p lim (x1n) = x1, p lim (x2n) = x2.

    That is, the random variables x1n and x2n converge in probability to therandom variables x1 and x2 respectively. Then the following properties canbe shown to hold:

    (ETC3410) Background Material 45 / 85

  • 1.4 Some asymptotic theory III1.4.2 Consistency

    P1 The plim of a sum is the sum of the plims. That is,

    p lim (x1n+x2n) = p lim (x1n) + p lim (x2n) = x1 + x2.

    P2 The plim of a product is the product of the plims. That is,

    p lim (x1nx2n) = p lim (x1n)p lim (x2n) = x1x2.

    P3 The plim of the inverse is the inverse of the plim. That is,

    p lim (x11n ) = [p lim (x1n)]1 =

    1x1, x1 6= 0.

    P4 The plim of a ratio is the ratio of the plims. That is,

    p lim(x1nx2n

    )=p lim (x1n)p lim (x2n)

    ,=x1x2, x2 6= 0.

    (ETC3410) Background Material 46 / 85

  • 1.4 Some asymptotic theory IV1.4.2 Consistency

    Although P1, P2, P3 and P4 above have been stated for scalar randomvariables, they can be generalized to random vectors and random matrices.(That is, vectors and matrices whose elements are random variables).

    (ETC3410) Background Material 47 / 85

  • 1.4 Some asymptotic theory I1.4.3 Asymptotic normality

    Let the scalar bn denote an estimator of the unknown parameter , given asample of size n. The estimator bn is a random variable and, like anyrandom variable, has a probability distribution. The form of this distributionmay depend on n. That is, as n increases the form of the probabilitydistribution of bn may change.

    Using a body of mathematics known as central limit theorems, manyrandom variables whose probability distribution based on a finite sample(finite sample distribution) is unknown, can be shown to have a well definedprobability distribution as the sample size tends to infinity.

    When this is the case, the random variable in question is said to "convergein distribution" and the probability distribution to which it converges iscalled a limiting (or limit) distribution.

    (ETC3410) Background Material 48 / 85

  • 1.4 Some asymptotic theory II1.4.3 Asymptotic normality

    When bn is a consistent estimator,

    p lim(bn) = ,

    which means that bn collapses to a single point as n goes to infinity, inwhich case the limiting distribution of bn is degenerate.

    In order to obtain a non-degenerate limiting distribution for a consistentestimator we "normalize" bn as described below.

    Formally, we say that bn has a limiting normal distribution if

    n(bn ) d N(0,V ), (1.31)

    where N(0,V) denotes a normally distributed random variable with mean

    zero and some unknown variance V, and the notationd denotes

    convergence in distribution as n tends to infinity.

    (ETC3410) Background Material 49 / 85

  • 1.4 Some asymptotic theory III1.4.3 Asymptotic normality

    Although we refer to the estimator bn as having a limiting normaldistribution, it is clear from (1.31) that it is actually the random variable

    n(bn )

    that converges to a normal random variable as n goes to infinity.

    Equation (1.31) is an exact result, not an approximation. It states that

    n(bn ) N(0,V ) (1.32)

    is strictly true as n tends to infinity.

    However, assume that

    n(bn ) N(0,V )

    for large, but finite, n (where the symbol denotes "is approximately").(ETC3410) Background Material 50 / 85

  • 1.4 Some asymptotic theory IV1.4.3 Asymptotic normality

    Recall that if x is a random variable and c and d are constants, then

    E (c + dx) = c + dE (x)

    var(c + dx) = d2var(x).

    Using these results it follows that if

    n(bn ) N(0,V )

    then

    bn 1nN (0,V ) ,

    bn N

    (0,Vn

    ),

    (ETC3410) Background Material 51 / 85

  • 1.4 Some asymptotic theory V1.4.3 Asymptotic normality

    bn +N

    (0,Vn

    ),

    bn N

    (,Vn

    ). (1.33)

    Equation (1.33) states that in a large finite sample bn is approximatelynormally distributed with mean and variance Vn .

    It is conventional to rewrite (1.33) as

    bnasy N

    (,Vn

    ). (1.34)

    Equation (1.34) is referred to as the asymptotic distribution of bn, andVn is referred to as the asymptotic variance of bn.

    (ETC3410) Background Material 52 / 85

  • 1.4 Some asymptotic theory VI1.4.3 Asymptotic normality

    In summary, whenever

    n(bn ) d N(0,V ), (1.31)

    we say that bn is asymptotically normally distributed with asymptoticdistribution

    bnasy N

    (,Vn

    ), (1.34)

    In econometrics we use the asymptotic distribution of bn as anapproximation to the true distribution of bn in a finite sample (i.e. we usethe asymptotic distribution of bn as an approximation to its finite sampledistribution).

    (ETC3410) Background Material 53 / 85

  • 1.4 Some asymptotic theory VII1.4.3 Asymptotic normality

    Notice that the asymptotic distribution (1.34) is derived from the limitingdistribution (1.31) by assuming that the latter is approximately true in largefinite samples.

    Obviously, the larger the sample size the more likely it is that the asymptoticdistribution is a good approximation to the true finite sample distribution ofbn.

    Note:

    Most estimators used in econometrics satisfy

    n(bn ) d N(0,V ). (1.31)

    The results stated in (1.31) and (1.34) generalize to the case in whichbn is a kx1 vector rather than a scalar, as assumed above.In the case in which bn is a kx1 vector, is also a kx1 vector and Vn isa kxk variance matrix.

    (ETC3410) Background Material 54 / 85

  • 1.4 Some asymptotic theory VIII1.4.3 Asymptotic normality

    Knowledge of the asymptotic distribution of bn is useful for two principalreasons:

    It can be used to construct confidence intervals for our estimates.It can be used to construct (asymptotically valid) hypothesis tests - aswe will see in section 1.5 below.

    (ETC3410) Background Material 55 / 85

  • 1.4 Some asymptotic theory I1.4.4 Asymptotic effi ciency

    The estimator bn is asymptotically effi cient if:

    (i) bn is a consistent estimator of .

    (ii) The asymptotic variance of bn is at least as small as that of any otherconsistent estimator. That is,

    Avar(bn) Avar(bn)

    where bn denotes any other consistent estimator of .

    Notice that, just as we restrict our attention to unbiased estimators whendefining finite sample effi ciency, we restrict our attention to consistentestimators when defining asymptotic effi ciency.

    (ETC3410) Background Material 56 / 85

  • 1.4 Some asymptotic theory II1.4.4 Asymptotic effi ciency

    Asymptotic variance is the criterion that we use to choose between two ormore consistent estimators. The consistent estimator with the smallestasymptotic variance is generally preferred.

    In Topic 2 we will introduce the estimation method known as maximumlikelihood estimation. One of the most attractive features of maximumlikelihood estimation is that, provided the statistical/econometric modelmodel is correctly specified, the maximum likelihood estimator will be:

    consistentasymptotically normally distributedasymptotically effi cient

    (ETC3410) Background Material 57 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model I

    A hypothesis test that is valid in a sample of any size is called an exacttest. Tests that are valid only in large samples are called asymptotic tests.

    Generally speaking, exact tests are available only in the linear regressionmodel with normally distributed, homoskedastic, serially uncorrelated errors.Once we relax these very restrictive assumptions, we are forced to useasymptotic tests.

    Many hypotheses of economic interest can be expressed as linear restrictionson the parameters of an econometric model. For example, consider the wageequation,

    wage = + 1educ + 2 exp er + 3race+ 4gender + 5 exp er2+ v .(1.28)

    (ETC3410) Background Material 58 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model II

    Suppose that we wish to simultaneously test the following hypotheses:

    (i) The marginal effect of educ is equal but opposite in sign to themarginal effect of exper for someone who has one year ofexperience.

    (ii) The marginal effect of gender is twice that of race.Since,

    MEeduc = 1

    MEexp = 2 + 25,

    the hypothesis that the marginal effect of educ is equal but opposite insign to the marginal effect of exper implies that

    1 = (2 + 25), or 1 + 2 + 25 = 0.

    (ETC3410) Background Material 59 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model III

    Since

    MEgender = 4

    MErace = 3,

    the hypothesis that the marginal effect of gender is twice that of raceimplies that

    4 = 23, or 4 23 = 0.

    Notice that each of these economic hypotheses has been expressed as arestriction on the parameters of the model.

    (ETC3410) Background Material 60 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model IV

    The two hypotheses we wish to test impose the following two linearrestrictions on the parameters of the wage equation

    1 + 2 + 25 = 0

    (1.35)

    4 23 = 0

    The restrictions in (1.35) can be written more compactly as

    R(2x6)

    (6x1)

    = r(2x1)

    , (1.36)

    (ETC3410) Background Material 61 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model V

    where

    R =[0 1 1 0 0 20 0 0 2 1 0

    ], =

    [ 1 2 3 4 5

    ],

    r =[00

    ].

    To see this note thatR = r

    [0 1 1 0 0 20 0 0 2 1 0

    ]

    12345

    =[00

    ]

    (ETC3410) Background Material 62 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model VI

    [1 + 2 + 2523 + 4

    ]=

    [00

    ]

    1 + 2 + 25 = 0

    (1.35)

    4 23 = 0

    In general, q (independent) linear restrictions on the kx1 vector can bewritten as

    R(qxk )

    (kx1)

    = r(qx1)

    . (1.37)

    The precise definitions of R and r depend on the particular restrictions beingtested.

    (ETC3410) Background Material 63 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model VII

    The advantage of expressing our restrictions in the form of (1.37) is that itenables us to represent a set of linear restrictions on without specifyingexactly what the restrictions are, and to derive results that will hold for anyset of linear restrictions on .

    Under the null hypothesis that the restrictions in (1.37) are correct,

    R r = 0. (1.38)

    However, since is unknown, how do we determine whether or not (1.38)holds?

    An obvious approach is to consider whether or not

    R r = 0,

    where is our estimator of .

    (ETC3410) Background Material 64 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model VIII

    However, is a random variable the value of which varies from sample tosample. Therefore, the question we need to consider is whether or notR r is statistically significantly different from zero.To determine whether or not R r is statistically significantly differentfrom zero we need to know the probability distribution of R r . We nextshow that the asymptotic distribution of R r can be derived from ourknowledge of the asymptotic distribution of .

    Assume that

    asy N

    (,Vn

    ). (1.39)

    Then,

    R asy RN

    (,Vn

    )

    (ETC3410) Background Material 65 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model IX

    R

    asy N(R,

    RVRn

    ),

    R r asy N

    (R,

    RVRn

    ) r

    R r asy N

    (R r , RVR

    n

    ). (1.40)

    In going from the second line to the third line of the derivation we used theresult that

    Var(R ) = RVar()R

    =RVRn

    .

    (ETC3410) Background Material 66 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model X

    Equation (1.40) implies that under the null hypothesis that

    R r = 0,

    R r asy N(0,RVRn

    ). (1.41)

    In principal, we could use (1.41) as our test statistic. However, if we did so,the critical value for our test would depend on R, and there would be adifferent critical value for each possible choice of R.

    We can eliminate the dependence on R of the critical value for our teststatistic by transforming our test statistic from a normal random variableinto a chi-square variable. The transformation is achieved by appealing tothe following well known theorem in mathematical statistics.

    (ETC3410) Background Material 67 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model XI

    Theorem1. Let Z be a kx1 random vector. If

    Zasy N(0,),

    thenZ 1Z

    asy 2(q),where q is the rank of the matrix .

    (ETC3410) Background Material 68 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model XII

    Applying Theorem 1 to

    R r asy N(0,RVRn

    ), (1.41)

    with R r playing the role of Z, we conclude that, under the nullhypothesis

    R r = 0, (1.42)(R r

    ) (RVR n

    )1 (R r

    )asy 2(q), (1.43)

    where q is the number of restrictions imposed under the nullhypothesis.

    (ETC3410) Background Material 69 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model XIII

    The statistic on the left-hand side of (1.43) is not feasible, since it dependson the unknown matrix V. A feasible test statistic for testing (1.42) is givenby

    W =(R r

    )(RVR n

    )1 (R r

    )asy 2(q), (1.44)

    where V is a consistent estimator of V, i.e.

    p lim( V ) = V .

    As long as V is a consistent estimator of V, the left-hand sides of (1.43) and(1.44) are asymptotically equivalent.

    Note:

    (ETC3410) Background Material 70 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model XIV

    A hypothesis test based on (1.44) is called a Wald test (because itwas first proposed by Hermann Wald in 1943).A Wald test is the most common form of hypothesis test used ineconometrics because, unlike other tests, a Wald test can be conductedno matter what estimation method is used to estimate the regressionequation.Since only the asymptotic distribution of W is known, the Wald test isan asymptotic test, and may be unreliable in small samples.The Wald test statistic in (1.44) is sometimes written as

    W = n(R r

    ) (RVR

    )1 (R r

    ). (1.45)

    (ETC3410) Background Material 71 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model XV

    The derivation of (1.44) depends crucially on the result that

    asy N

    (,Vn

    ), (1.39)

    and illustrates how knowledge of the asymptotic distribution of anestimator can be used to construct an asymptotically valid test statistic.

    Testing at the 5% significance level, we reject the null hypothesis that

    R r = 0 (1.42)

    ifWcalc >

    20.95(q),

    where Wcalc denotes the sample value of the test statistic, and 20.95(q)denotes the 95th percentile of the chi-square distribution with q degrees offreedom.

    (ETC3410) Background Material 72 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model XVI

    By Theorem 1 above

    q = rank

    (RVRn

    ),

    which in turn can be shown to equal the number of restrictions imposedunder the null hypothesis. Therefore q in

    W =(R r

    )(RVR n

    )1 (R r

    )asy 2(q) (1.44)

    is always equal to the number of restrictions imposed under the nullhypothesis that

    R r = 0 (1.42)

    (ETC3410) Background Material 73 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model XVII

    Equivalently, we reject (1.42) if

    p value < 0.05,

    wherep value = prob

    [2(q) > Wcalc

    ].

    It can be shown that

    F Wq

    asy F (q, n k), (1.46)

    where n is the sample size and k denotes the number of regressors in themodel (including the constant).

    (ETC3410) Background Material 74 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model XVIII

    Consequently, one can also implement the Wald test as an asymptoticF-test. In this case we reject the null hypothesis if

    Fcalc Wcalcq

    > F0.95(q, n k),

    where F0.95(q, n k) denotes the 95th percentile of an F distribution withq degrees of freedom in the numerator and n-k degrees of freedom in thedenominator.

    Note:

    Tests based on (1.44) and (1.46) are asymptotically equivalent.However, they produce different p-values in finite samples.Some software packages report results based on (1.44), some reportresults based on (1.46) and some report both.

    (ETC3410) Background Material 75 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model XIX

    Some researchers believe that that F (q, n k) is a betterapproximation to the finite sample distribution of Wq than

    2(q) is tothe finite sample distribution of W. Consequently, they use (1.46) inthe hope that it will produce more reliable results in a finite sample.

    In the special case of testing

    H0 : k = 0

    in the linear regression equation

    y = 1 + 2x2 + .....+ k + u

    (i.e. testing the individual significance of xk ), it can be shown that he teststatistic

    W =(R r

    )(RVR n

    )1 (R r

    )asy 2(q) (1.44)

    (ETC3410) Background Material 76 / 85

  • 1.5 Testing linear restrictions on the parameters of aneconometric model XX

    reduces to

    Wz =k

    se(k )

    asy N(0, 1). (1.47)

    If the model is estimated by maximum likelihood, the null hypothesis

    R r = 0 (1.42)

    can also be tested by performing a likelihood ratio (LR) test. It can beshown that under (1.42) the test statistic

    LR 2(lu lr )asy 2(q), (1.48)

    where lu and lr respectively denote the maximized values of thelog-likelihood function of the unrestricted and restricted models, and q againdenotes the number of restrictions imposed under the null. The LR test willbe discussed in more detail in Topic 2.

    (ETC3410) Background Material 77 / 85

  • 1.6 A review of generalized least squares (GLS) I

    Consider the linear regression model

    yi = 0 + 1xi1 + 1xi2 + ...+ 1xik + ui , i = 1, ..., n, (1.49)

    or in matrix notationy = X + u. (1.50)

    In introductory econometrics units it is often assumed that

    var(ui |xi ) = 2, i = 1, ....n (1.51)

    andcov(ui , uj |xi ,xj ) = 0, i 6= j , (1.52)

    wherexi = (xi1, xi2, ....., xik ).

    Equations (1.51) and (1.52) respectively state that the errors in (1.50) areconditionally homoskedastic and conditionally serially uncorrelated.

    (ETC3410) Background Material 78 / 85

  • 1.6 A review of generalized least squares (GLS) II

    When (1.51) and (1.52) hold,

    Var(u|X )(nxn)

    =

    2 0 . . 00 2 0 . 0. . . . .. . . . .0 . . . 2

    = 21 0 . . 00 1 0 . 0. . . . .. . . . .0 . . . 1

    = 2In.When

    Var(u|X ) = 2In, (1.53)

    the errors in (1.50) are said to be "spherical", and when (1.53) is violatedthey are said to be "non-spherical".

    Notice that when the errors are spherical, the error covariance matrix is ascalar identity matrix, that is, an identity matrix multiplied by a scalar 2 .

    Assumption (1.51) is usually unrealistic for cross-section data, andassumption (1.52) is usually unrealistic for time series data.

    (ETC3410) Background Material 79 / 85

  • 1.6 A review of generalized least squares (GLS) III

    Denote the conditional error variance matrix for non-spherical errors by

    Var(u|X ) = 6= 2In, (1.54)

    where the precise form of depends on the nature of the departure fromsphericity. For example, in the case of conditionally uncorrelated,heteroskedastic errors

    =

    21 0 . . 00 22 0 . 0. . . . .. . . . .0 . . . 2n

    .

    (ETC3410) Background Material 80 / 85

  • 1.6 A review of generalized least squares (GLS) IV

    It is well known that when (1.54) holds the OLS estimator of in

    y = X + u. (1.50)

    is ineffi cient. In this case an effi cient estimator can be obtained byexecuting the following steps:

    S1 Multiply on both sides ofy = X + u (1.50)

    by 1/2 and obtain

    1/2y = 1/2X +1/2u,

    ory = X+ u, (1.55)

    wherey 1/2y , X 1/2X , u 1/2u.

    (ETC3410) Background Material 81 / 85

  • 1.6 A review of generalized least squares (GLS) V

    Notice that

    Var(u|X ) = Var(1/2u)= 1/2Var(u|X )1/2

    = 1/21/2 (using (1.54))= 1/2(1/21/2)1/2

    = 00

    = In. (1.56)

    Therefore, the errors in (1.55) are spherical, and by the Gauss-Markovtheorem in (1.55) can be effi ciently estimated by OLS.

    (ETC3410) Background Material 82 / 85

  • 1.6 A review of generalized least squares (GLS) VI

    S2 Applying the usual OLS formula to

    y = X+ u (1.55)

    we obtain

    = (X X)1X y

    =[(1/2X )(1/2X

    ]1(1/2X )y

    =[X 1/21/2X

    ]1X 1/21/2y

    =[X 1X

    ]1X 1y . (1.57)

    The estimator in (1.57) is called the generalized least squares (GLS)estimator of in the regression equation

    y = X + u, (1.50)

    (ETC3410) Background Material 83 / 85

  • 1.6 A review of generalized least squares (GLS) VII

    and is denoted by

    GLS =[X 1X

    ]1X 1y . (1.58)

    In summary, the OLS estimator of in

    y = X + u, (1.50)

    isOLS = (X

    X )1X y ,

    and the GLS estimator of is

    GLS =[X 1X

    ]1X 1y . (1.58)

    (ETC3410) Background Material 84 / 85

  • 1.6 A review of generalized least squares (GLS) VIII

    Notice GLS can be obtained from the formula for the OLS estimator,

    OLS = (XX )1X y ,

    by inserting 1 between X X and between X y , where

    = Var(u|X )

    GLS is not a feasible estimator, since it depends on the unknown matrix1. A feasible GLS (FGLS) estimator is given by

    FGLS =[X 1X

    ]1X 1y , (1.59)

    wherep lim = .

    That is, is a consistent estimator of .Many of the estimators that we will discuss in this unit are FGLS estimators.

    (ETC3410) Background Material 85 / 85