Lecture 02 Revisedeconometrics regression

download Lecture 02 Revisedeconometrics regression

of 5

Transcript of Lecture 02 Revisedeconometrics regression

  • 8/9/2019 Lecture 02 Revisedeconometrics regression

    1/5

    Lecture 2: Simple Linear Regression ModelQF06-2015Sunil Paul

    Recap

    In the last lecture we have discussed that a joint distribution of any two variables can be factored into twoways:

    f(X; Y) =f(YjX):f(X)andf(X; Y) =f(XjY):f(Y):In particular,

    1. Iff(X; Y) is bivariate normal (bvn) thenf(YjX) and f(XjY) are normally distributed,

    2. Iff(X; Y) is bivariate normal (bvn) thenE(YjX)and E(XjY) are linear in X and Y, and

    3. var(YjX) and var(XjY) are constant.

    We can summarize this results as follows;(YjX) N(1+ 2X;

    2YjX) and (XjY) N(1+ 2Y;

    2XjY);where 1 = y 2x; 2 = xy=

    2x or

    yx

    and = xyxy

    :

    Similarly2yjx=

    2y

    22

    2x= (1 )

    2y:

    Regression as conditional expectation function

    Consider a model of consumption (Y) as a function of income(X). Since the model includes two variablesthe data generating process of these variables for N population may be governed by some complex bivariatedistribution. Concentrating on conditional distribution, economic theory would suggest that the averageconsumption(Y) can be expressed as a function of income(X) using conditional expectations as follows:

    E(YjX) = g(X):Theoretically the conditional expectation functiong()is expected to be an increasing function of X.( As

    income increases consumption increases).Now assume that the conditional expectation function is linear as in the case of bvn, then;E(YjX) = +X:

    This conditional expectation function is known as population regression function.The specication given above gives an exact or deterministic relation between Y and X.Similarly the average consumption of an ith household for a given income is given as ;E(YjXi) = +Xi:However the actual consumption ofith household need not be equal to E(YjXi).We can denote the discrepancy between actual (Yi)and expected with ui as follows;ui = Yi E(YjXi):The discrepancy can occur due to many reasons.

    Possible sources of error term are:

    Randomness in human behavioral,

    Eect of committed variables,

    Measurement errors.

    Withui,the deterministic relation specied earlier becomes a stochastic one as follows:Yi= E(YjXi) +ui:With linearity of CEF we have;Yi= +Xi+ui;where + xi is the deterministic component , u is the stochastic or random component known as error

    with a known distribution function, and are known as regression coecients or population parameters.

    1

  • 8/9/2019 Lecture 02 Revisedeconometrics regression

    2/5

    Estimation,estimators and estimates

    We may not know the population distribution and do not have the luxury to get population data. We needto estimate the parameters of CEF from sample data. The task of estimation is to determine a rule, methodor function that species the unknown parameters in terms of sample observations of random variables or(in some cases) xed quantities. In a bivariate case Y and X are the variables (here X is assumed to bexed in repeated samples).

    There are dierent methods to estimate the regression coecients They are:

    Method of Moments

    Method of Least Squares

    Method of Maximum Likelihood and so on.

    These methods, rules or formulas converts sample observations into actual estimate of population regres-sion coecients ( and ) and are known as estimators. The estimators are often denoted by ; and :The numerical values obtained using these estimators are called an estimate. As far as simple regressionis concerned these estimators give identical estimates. We will start our discussion with method of leastsquares.

    Method of Least Squares (OLS)

    OLS suggests a criteria to choose the best t line given the observation on dependent and independentvariables.

    The principle of least squares suggest that the parameters of a regression model are to be estimated in

    such a way that the sum of the squared deviations between the actual and the estimated assumes a minimum

    value.

    This can be stated as follows;minb;b

    RSS=Pn

    i=1u2i =

    Pni=1(Yi

    Xi)2;

    where RSS is Residual Sum of Squares, ; and ui are estimated values of their counterparts. Eachpair of values ofand will give dierent set of values for ui and Xi=bYi: We can get the expressionfor alpha hat and beta hat as follows:

    @RSS@

    = 0 =)P

    2(Y X)(1) = 2P

    u= 0:

    i.e.P

    Y =n+P

    X (1).

    Similarly,

    @RSS

    @ = 0 =)

    P2(Y X)(Xi) = 2

    PXu= 0:

    i.e.P

    Y X= P

    X+P

    X2 (2).

    Equations (1) and (2) are called normal equations.

    From eq. (1) we have;

    = YX

    Substituting to eq (2) we get;

    2

  • 8/9/2019 Lecture 02 Revisedeconometrics regression

    3/5

    PY X=

    PX(YX ) +

    PX2

    =nX(YX ) +P

    X2 ( Using the factP

    X=nX):Thus,

    PY X=nX Y +(

    PX2 nX

    2):

    Rearranging the above equations we get the expression for beta hat;

    =P

    XYnXYPX2nX

    2 =P

    (XX)(YY)P(XX)2

    =P

    xyPx2

    = SxySxx

    ;

    whereSxy =P

    xy=P

    (X X)(Y Y) =P

    XY nXY ;

    Sxx=P

    (X X)2 =P

    X2 nX2

    and,

    Syy =P

    (Y Y)2:

    Similarly the2 has to be estimated frombui;

    b2 = Pbun2

    ; are the OLS estimators ofand:Once we get and we can estimate Y i.e.bY :UsingbY we can estimate the residualsbu:Numerical properties of least squaresThe normal equations shows that;

    1. Residual sum to zeroPbu = 0: It also implies regression line pass through the mean values of X andY(with

    Pbu= 0 we havePY =n+PX: Dividing both sides by number od sample observationn we get Y =n+X):

    2. Residuals and independent variables are uncorrelated cov(Xbu) = 0[ from second normal eq we havePXbu = 0; andPXbu =P(x+ X)bu =P xbu+ XPbu =P xbu (sincePbu = 0):This implies

    cov(Xbu) = 0]:3. The sum of the estimatedbYis from the sample is equal to the sum of the actual Yis i.e.PbY =PY

    orbY =Y( This can be proved using rst property. We knowbui= YibYi hencePbu=PYPbYsince

    Pbu= 0 we havePbY =PY ):4. LS residuals and the estimatedbYis are uncorrelatedPbY u= 0( proof: substitutebYi= ^Xi intoPbY u= 0 and also use property 1 and 2):

    3

  • 8/9/2019 Lecture 02 Revisedeconometrics regression

    4/5

    Decomposition of the sum squares

    The total variation in Y can be decomposed into variation due to regression and residuals.

    Consider,ui = Yi Xi:

    Substituting= YXwe get;ui = (Yi Y) (Xi X) = y x, whereyi= (Yi Y); and xi= (Xi X):

    Squaring both sides and applying summation we get;Pu2 =

    P(y x)2 =

    Py2 2

    Pxy+

    2Px2:

    Using =P

    xyPx2

    we have;Pu2 =

    Py2 2

    PxyPx2

    Pxy+

    PxyPx2

    2Px2 =

    Py2 2 (

    Pxy)2Px2

    + (P

    xy)2Px2

    =P

    y2 P

    xy:

    ThereforeP

    u2 =P

    y2 P

    xy;

    whereP

    u2 =RSS, P

    xy= ESS( Explained Sum of Square);andP

    y2 =T SS( Total Sum of Squares).

    Thus TSS=ESS+RSS

    The proportion of TSS explained by the regression can be denoted byr2xy( Coecient of determination).

    In a simple regression it is the square of correlation coecient;

    r2xy = ESSTSS

    = TSSRSSTSS

    = P

    xyPy2

    :

    Assumptions of classical linear regression model(CLRM)

    Since there are dierent estimators to get an estimate of population parameter, we need to check the char-acteristics of these estimators in terms of its desirable properties. The properties of the estimators dependon the way the data is generated. Depending on the validity of these assumption we have to decide on theestimators that should be to used while estimating the population parameters. The assumption regardingdata generating process are known asAssumptions of classical linear regression model(CLRM).Theyare:

    1. Linearity : The dependent variable Y is calculated as a linear function of independent variables andan error term.

    Violations of this assumption is known as specication errors such as wrong regressors, nonlinearity,changing parameters.

    2. Expected values of error term is zero i.e. E(ui) = 0for all i:

    Violation of this assumption leads to biased intercept.

    3. Common variance. var(ui) = E(ui)2 =2 for all i.

    Violation of this assumption is known as heteroskedasticity.

    4. Error terms are not correlated with each other i.e. E(ui;uj) = 0 for i 6=j:

    Violation of this assumption is known as autocorrelation.The above statements from 2 to 5 can be summarized as follows; ui = iid(0;

    2 _) i.e. uiare indepen-dently and identically distributed with mean zero and a constant variance.

    5. Independence ofXj :and ui;i.e. the explanatory variable X is nonstochastic(xed in repeated samples),and hence, not correlated with the error term

    Violations of this assumptions are errors in measuring the independent variable, autoregression andsimultaneous equations bias.

    4

  • 8/9/2019 Lecture 02 Revisedeconometrics regression

    5/5

    Reference:

    1. Johnston, J and J. DiNardo (1997) Econometric Methods,McGraw Hill, ( Chapter 1)

    2. Maddala,. G.S (1992). Introduction to Econometrics, 2nd ed., Macmillan.(Ch 3)

    5