Chapter 7: The Multiple Regression Model: Estimation0 is the intercept and β 1 β 2,..,β k are...

Chapter 7: The Multiple Regression Model:Estimation

Statistics and Introduction to Econometrics

M. Angeles Carnero

Departamento de Fundamentos del Análisis Económico

Year 2014-15

M. Angeles Carnero (UA) Chapter 7: MRM Year 2014-15 1 / 58

Definition of the multiple regression model

The main inconvenience of the simple regression model is that itusually does not reflect a causal relationship between y and x. Thereason is that the crucial assumption that the mean of the errorconditional on x is zero does not usually hold, since in most of theapplications there are other factors affecting y and correlated withx.The analysis of the multiple regression is more useful since itallows to explicitly control for different factors affecting thedependent variable at the same time.In the multiple regression model we consider k+ 1 randomvariables y, x1, x2, ..., xk representing a population and we areinterested in explaining y with x1, x2, ..., xk. For example, y can bethe hourly wage, x1 years of education and x2 the labourexperience.


We have to establish an equation relating y and x1, x2, ..., xk, andthe simplest model considers a linear relationship that it isassumed to be valid for the population of interest

y = β0 + β1x1 + β2x2 + ...+ βkxk + u

y ! dependent variable, explained variable or response variable.x1, x2, ..., xk ! independent variables, explanatory variables,control variables or regressors.u ! Error term or random shock capturing the effect of otherfactors affecting y and not included in x1, x2, ..., xk. It does notmatter how many explanatory variables are included in themodel, there will always be unobserved factors in the error term.β0 is the intercept and β1 β2, .., βk are denoted as slope parameters.We have thererefore k+ 1 unknown parameters that we want toestimate using a random sample (x1, x2, ..., xk, y).


βj reflects the variation of y given an increase in one unit of xj,holding fixed x1, ..xj�1, xj+1, ..., xk, and also the rest of the factorsaffecting y and that are captured by the error term u.The main characteristic of the multiple regression model is that itis linear in parameters since, as the simple regression model, boththe dependent variable and the explanatory variable can benonlinear transformations of other variables.The fact that the variables are transformations of others does notchange the estimation of the parameters of the model. However,as in the simple regression model, it is important to take intoaccount if the variables are linear or nonlinear transormationswhen obtaining the results of the estimation.We illustrate now with an example the difference of the simpleregression model and the multiple regression model obtainedseen in the previous Chapter 6.


Example 1Consider the following multiple regression model that relates theindividual wage with the years of education and the labourexperience

log(wage) = β0 + β1educ+ β2exper+ u (1)

In comparison with the simple regression model relating wage witheduc, the equation (1) extracts exper from the error term andincludes it explicitly in the model.As seen below, as in the simple regression model, in the multipleregression model we need to impose assumptions on therelationship of u with the independent variables , educ and exper.However, we can ensure that: since (1) includes years of experienceexplicitly, we are able to capture the effect of the educationalattainment on wage, holding fix experience.In the simple regression analysis (that includes exper in the errorterm), we had to assume that experience is not correlated witheducation, which is not very realistic.


The OLS estimator. Interpretation.

We see in what follows how to estimate the parametersβ0, β1, β2, .., βk in the multiple regression model using a randomsample, f(x1i, x2i, .., xki, yi) : i = 1, 2, .., ng, from the population.Since this data have been withdrawn from a population definedby the regression model, for each observation i, we can establishthat

yi = β0 + β1x1i + β2x2i + ...+ βkxki + ui

where ui is the error term of observation i which contains all thefactors affecting yi and not included in x1i, x2i, .., xki.As in the simple regression model, the OLS estimation methodchooses as estimators of β0, β1, β2, .., , βk those values of b0, b1, .., bkminimising the squared sum of the residuals, where the residualsare analogously defined as in the simple regression.


The objective function to be minimised is

s(b0, b1, .., bk) =n

∑i=1(yi � b0 � b1x1i � ...� bkxki)

2

Also, as in the simple regression model, the estimated coefficientsare obtained by computing the partial derivatives of the objectivefunction and making them equal to zero.If the partial derivatives are equal to zero, we obtain that thosevalues bβ0, bβ1, .., bβk minimising the objective function are thesolutions to the following system of k+ 1 equations with k+ 1unknowns:

n

∑i=1(yi � bβ0 � bβ1x1i � ....� bβkxki) = 0

n

∑i=1

x1i(yi � bβ0 � bβ1x1i � ..� bβkxki) = 0

...n

∑i=1

xki(yi � bβ0 � bβ1x1i � ..� bβkxki) = 0

(2)


These formulae are denoted as first order conditions of the OLSestimators.In this course we are not going to see how to solve this system ofequations, since any statistical package computes the OLSestimates. The objective in this course is learning how to interpretthe results of the estimation.Note that in order for the OLS estimators to be well defined weneed a unique solution. This is assumed here, since in the practicethis condition usually holds if the model is well specified. Lateron, the condition that is required in order to guarantee a uniquesolution will be established.The OLS regression line or sample regression function isdefined as by = bβ0 +

bβ1x1 + ...+ bβkxk

where bβ0 is the OLS estimation of the intercept or constant termand bβ1, .., bβk are the OLS estimates of the slopes.


Finally, it is convenient to point out that the OLS estimators can bealso obtained by the method of moments.

To do so, we have to assume as in the simple regression that

E(u j x1, .., xk) = E(u) = 0

From this condition, we can conclude that

E(u) = 0

E(xju) = 0, j = 1, 2, .., k

and substituting u = y� β0 � β1x1...� βkxk, we have

E(y� β0 � β1x1 � ...� βkxk) = 0

E(xj�y� β0 � β1x1 � ...� βkxk

�) = 0, j = 1, 2, .., k

(3)

Since the sample counterparts of these equations coincide with theOLS first order condition divided by n, the method of momentsestimators based on (3) are the OLS estimators.


Interpretation of the results of the regressionWe next discuss how to interpret the results of the estimation. Westart with the case with two explanatory variables

by = bβ0 +bβ1x1 + bβ2x2 (4)

The constant term, bβ0, is the predicted value of y when x1 = 0 andx2 = 0. In many cases it does not make sense to consider that bothx1 and x2 are zero, and in this cases bβ0 is not interesting itself.However, it is important not to forget to include bβ0 whenpredicting y from the regression line.


bβ1 and bβ2 are interpreted as partial effects (or marginal effects).From the regression line (4) we have that

4by = bβ14x1 + bβ24x2

so that we can obtain a change in the predicted value of y given achange in 4x1 units of x1 and a change in 4x2 units of x2.Therefore, if x2 is held fixed, that is if 4x2 = 0, we have

4by = bβ14x1

and bβ1 is the change in by when x1 increases in one unit (4x1 = 1)while holding x2 fixed.Analogously if x1 is held fixed, that is if 4x1 = 0, we have

4by = bβ24x2

and bβ2 is the variation in by when x2 increases in one unit(4x2 = 1) while holding x1 fixed.


Example 1 (cont.)Using the data in file WAGE1 from Wooldridge, the multipleregression model in (1) has been estimated with the followingresults:

\log(wage) = 0.217+ 0.098 educ+ 0.010exper

where wage is the hourly wage in dollars, educ are years ofeducation, exper are years of experience in the labour market.

As in the simple regression, since the dependent variable is in logs,the coefficients should be interpreted in percentages.The estimated parameter bβ1 indicates that, holding labourexperience constant, an additional year of education predicts anincrease of 9.8% (100 � 0.098) in hourly wage.Analogously, bβ2 indicates that, holding constant the years ofeducation, an additional year of labour experience predicts anincrease of 1% (100 � 0.010) in hourly wages.


When there are more than 2 explanatory variables, theinterpretation of the results is similar. If we consider theregression line:

by = bβ0 +bβ1x1 + bβ2x2 + ...+ bβkxk

We can obtain a change in the predicted value of y given a changein 4x1 units of x1, 4x2 units of x2, ..., 4xkunits of xk

4by = bβ14x1 + bβ24x2 + ...+ bβk4xk

Therefore, if x2, ..., xk are held fixed, that is if 4x2 = 0, ...,4xk = 0,we have that

4by = bβ14x1

and bβ1 captures the change in by when x1 increases by one unit(4x1 = 1) holding x2, .., xk constant.The interpretation of the rest of the estimated parameters issimilar.


Example 1 (cont.) Using the data in example 1 again, thefollowing multiple regression model has been estimated

log(wage) = β0 + β1educ+ β2exper+ β3tenure+ u

where tenure are the years of tenure in the current firm, with thefollowing results:

\log(wage) = 0.284+ 0.092 educ+ 0.0041exper+ 0.022tenure

The estimated coefficient for tenure indicates that, if we hold fixedyears of education and years of experience, an additional year oftenure in the current firm predicts an increase of 2.2% (100 � 0.022)in the hourly increase.Note that we are holding fixed the total labour experience. If wewant to analyse the effect of an additional year of tenure holdingthe previous labour experience and years of education, we wouldneed to take into account that exper would also increase in one unit.In this case, the model predicts an increase of 2.61%(100 � (0.0041+ 0.022)) in the hourly wage.


Measurement unitsIt is very important to take into account the units of measurementwhen interpreting the results of a regression.As in the case of the simple regression, the estimated value of theparameters of the regression model depends on the measurementunits of the dependent variable and the explanatory variables.If we have already estimated the parameters of the model usingcertain units of the variables, the estimated values for theseparameters can be easily computed if we change the measurementunits.The way of computing the new estimated parameters when wechange the measurement units of one or more variables in themodel is analogous to the case of the simple regression.


Example 2Using the data in file CEOSAL1 from Wooldridge with a sampleof n = 209 executive directors, the following model has beenestimated:

\salary = 830.63+ 19.63 roe+ 0.016 sales

where salary is the annual wage in thousands of dollars, roe is theaverage return (in percentage) on equities of the firm and sales arethe sales of the firm in millions of dollars.

Given this estimated model, we have that an increase in onepercentage point in the return of the shares of the firm predicts anincrease in the salary of the executive director of 19.63 thousands ofdollars (19630 dollars).If we change the measurement units of the dependent variable andthe wage is now in hundreds of dollars, what would the newestimated coefficients be?


Example 2 (cont.)

Let salary100 be the wage in hundreds of dollars. Clearly therelationship between salary100 and salary is

salary100 = 10 � salary

This change of units implies that we have to multiply all theestimated coefficients by 10.

\salary100 = 8306.3+ 196.3 roe+ 0.16 sales

n = 209, R2 = 0.029

As in simple regression, we have that the interpretation of theregression results does not change when the measurement unitschange, since as before, an increase in one percentage point in thereturn of the shares of the firm predicts an increase in the salary ofthe executive director of 196.3 hundreds of dollars (19630 dollars).If we change not the measurement units of one of the explanatoryvariables, for example, if return is measured as a proportion, whatwould the new estimated coefficients be?


Example 2 (cont.)

Let roe1 be the return on equities in proportion. It is clear that therelationship between roe1 and roe is

roe1 =1

100roe

This change in units implies that we should multiply by 100 theestimated coefficient of the returns on equity

\salary100 = 8306.3+ 19630 roe1+ 0.16 sales

n = 209, R2 = 0.029

Again, we see that the interpretation of the regression results doesnot change when the measurement units change, since as above, anincrease of one percentage point in the returns of shares in the firmpredicts an increase in the salary of the executive director of19630� 0.01 = 196.3 hundreds of dollars (19630 dollars).


Functional formAs we saw in Example 1, we could also include logarithm of thosevalues in the multiple regression model.The interpretation of the estimation results when some of thevariables are in logs is analogous to the simple regression case.Example 3

Using a sample of 7038 Spanish households, the following modelhas been estimated

log(gvest) = β0 + β1 log(renta) + β2nad+ β3nhijos+ u

with the following results

\log(gvest) = �1.06+ 0.49 log(renta) + 0.042nad+ 0.088nhijos

where gvest is the annual expenditure of a household in shows andclothes (in thousands on Euros), renta is the annual householdincome (in thousands of Euros), nad is the number of adults in thehousehold and nhijos is the number of children below 18 years oldin the household.


Example 3 (cont.)

In these results, we obtain that:

The estimated income-elasticity of the expenditure in shoes andclothes is 0.49. That is, holding fixed the number of adults and thenumber of children in the household, an increase of 1% in incomeimplies an increase of 0.49% in the expenditure of shoes and clothes.If the number of children increases in 1, holding fixed the number ofadults and income, the estimated model predicts that the expenditurein shoes and clothes increases by 8.8% (100� 0.088 = 8.8).Analogously, if the number of adults increases by 1, holding fixed thenumber of children and income, the estimated model predicts thatthe expenditure in shoes and clothes increases by 4.2%(100� 0.042 = 4.2).


The multiple regression model allows to include in the modelsome functions of the same variable. A case that is quite used inthe practice is the model including a variable and its square

y = β0 + β1x+ β2x2 + u

What it is the partial effect of x on y in this model?Clearly β1 does not capture the partial effect, since β1 would be thechange in y given an increase of x in one unit, holding constant x2,and obviously if we increase x by one unit we cannot hold constantits square.In order to compute the partial effect we must compute thederivative

∂y∂x= β1 + 2β2x

and therefore, the partial effect (or marginal effect) of x on y,holding fixed the rest of the factors affecting y and captured in u, isβ1 + 2β2x.Note that in this model, the partial effect is not constant, it can beincreasing or decreasing, depending on the sign of β2.


Example 1 (cont.)Using again the data in example 1, the following model has beenestimated

log(wage) = β0 + β1educ+ β2exper+ β3expersq+ u

where expersq is the square of the labour experience years, withthe following results:

\log(wage) = 0.128+ 0.090educ+ 0.041exper� 0.00071expersq

Based on this estimated model, we have that holding fixed thenumber of years of education, an increase in one year of labourexperience predicts an increase in hourly wage of100(0.041� 2� 0.00071exper)%.


Example 1 (cont.)

We see that this effect is not constant since it depends on the yearsof experience. An increase in one year of experience has a highereffect for those individuals with few years of education than forthose with many years of education. There are decreasing returnsto experience.For example, for individuals with one year of labour experience anincrease of one year predicts an increase in hourly wage of 3.96%(100(0.041� 2� 0.00071) = 3.958).For an individual with 20 years of experience, an increase inexperience of one year predicts and increase in hourly wage of1.26% (100(0.041� 2� 0.00071� 20) = 1.26).


Comparison between the estimates of the simpleand multiple regression

The estimates of the simple regression model are not the same asthe estimates of the multiple regression model.In the case of a multiple regression model, the estimatedparameter associated to one variable capture s the partial effect ofthat variable on the dependent variable, holding fixed the rest ofexplanatory variables in the model.If we delete all those variables and consider a simple regressionmodel, the estimated slope will only capture a direct effect of thevariable of interest on the dependent variable, but also an indirecteffect of those other variables affecting y and correlated with thevariable of interest.


Example 1 (cont.)If we compare the results obtained of the multiple regressionmodel of the log of wage on the years of education and the yearsof labour experience

\log(wage) = 0.217+ 0.098 educ+ 0.010exper

with those results obtained in Chapter 6 for the simple regressionmodel of the log of wage on the level of education

log([wage) = 0.584+ 0.083 educ

we see that the multiple regression model estimates that the returnto education is 9.8% while in the simple regression model is 8.3%.


Example 1 (cont.)The reason for this discrepancy is that in these two models theestimated parameters capture different things:

In the simple regression model, the estimated coefficient of educ(multiplied by 100) captures the percentage change in wages givenan increase in one year of education, without holding fixedexperience.While in the multiple regression model, the estimated coefficient ofeduc (multiplied by 100) captures the percentage change in wagesgiven an increase in one year of education, holding labourexperience fixed.Since there is a negative correlation between experience andeducation, and in the simple regression model we do not control forexperience, an increase of one year in year of education implies adecrease in labour experience, and since a decrease in labourexperience negatively affects wages, the estimated parameter of thesimple regression model is smaller than in the multiple regressionmodel.


Fitted Values and residuals. Goodness of fit.

The fitted value for observation i is defined as

byi = bβ0 +bβ1x1i + ...+ bβkxki

The residual for observation i is defined as the difference betweenthe observed value yi and the fitted value byi.

bui = yi � byi

The fitted values and the OLS residuals in the multiple regressionmodel verify certain algebraic properties that are immediateextensions of the simple regression model


Algebraic properties of the OLS regression1. The sum, and therefore the sample mean, of the residuals is zero

1n

n

∑i=1bui = 0 (5)

2. The sample covariance between the observed values for eachexplanatory variable and the residuals is zero, that it is

1n� 1

n

∑i=1

xjibui = 0, j = 1, .., k

Note that 1n�1

n

∑i=1

xjibui is the sample covariance between the

observed values of xj and the residuals, since the mean of theresiduals is zero.3. The OLS regression line goes through point (x1, x2, .., xk, y).


4. The mean of the fitted values coincides with the mean of theobserved values, that it is

by = 1n

n

∑i=1byi =

1n

n

∑i=1

yi = y

5. The sample covariance between the fitted values and theresiduals is zero, that it is

1n� 1

n

∑i=1byibui = 0

Note that 1n�1

n

∑i=1byibui is the sample covariance between the fitted

values and the residuals, since the mean of the residuals is zero.


Goodness of fitIf we defined, as in the simple mode, the Total Sum of Squares(TSS), the Explained Sum of Squares (SSE) and the Sum ofSquares Residuals (SSR) as

TSS =n

∑i=1(yi � y)2

SSE =n

∑i=1(byi � y)2

SSR =n

∑i=1bu2

i

It can be shown that

TSS = SSE+ SSR

and therefore,

1 =SSESST

+SSRSST


As in the simple regression, R�squared is defined as

R2 =SSESST

= 1� SSRSST

R2 captures the proportion of the variability of the dependentvariable explained by the explanatory variables of the model andthe following condition always holds:

0 � R2 � 1

The value of R2 cannot decrease (and in general increases) when weintroduce an additional explanatory variable in the model. Thereason is that including a new variable does not change the TSS(since it only depends on the dependent variable) while the SSRdecreases (or it stays constant)It is important to emphasise that again in social sciences, very oftenthe values of R2 are low. This is usually the case if we work withcross sectional data.The fact of obtaining a value of R2 small does not mean that theOLS estimation is not useful. The OLS estimators can deliver agood estimate of the partial effects of the different variables even ifR2 is low.


Statistical Properties of the OLS estimators

Up to now, we have studied the algebraic properties of the OLSestimation.In this section, we go back to the population model in order tostudy the statistical properties of the OLS estimators in themultiple regression model.

We consider now bβ0, bβ1, .., bβk as random variables, that is, asestimators of the population parameters β0, β1, .., βk and we studynow some properties of their distributions.


Unbiasedness of the OLS estimators

We study under which assumptions the OLS estimators are unbiased.Assumption MLR.1 (linearity in parameters)The dependent variable y is related in the population with theexplanatory variables and the error term through the followingpopulation model

y = β0 + β1x1 + ..+ βkxk + u (6)

Assumption MLR.2 (random sample)The data arise from a random sample of size n:f(x1i, .., xki, yi) : i = 1, 2, .., ng, from the population modelAssumption MLR.3 (zero conditional mean)

E(u j x1, .., xk) = 0

Assumption MLR.4 (No perfect collinearity)In the sample, none of the explanatory variables is constant andthere are not exact linear relationships between the explanatoryvariables.


Comments on the assumptionsAssumption MLR.1 formally establishes the population model, andsince we have mentioned, the main characteristic of the model isthat it is linear in parameters.The population model (6) is quite flexible since both the dependentvariable and the explanatory variable can be arbitrary functions ofthe underlying variables of interest.Assumption MLR.2 is appropriate in many applications (althoughnot in all of them) when we work with cross-sectional data. Whenwe work with time series the observations are not independent ingeneral and assumption MLR.2 is not satisfied. Nonetheless, whenwe work with time series, it is also possible to establish otherassumptions that guarantee the unbiasedness of the OLSestimators.Assumptions MLR.1 and MLR.2 imply that for each observationrandomly withdrawn from the population, we have

yi = β0 + β1x1i + ...+ βkxki + ui,

where ui is the error term of observation i and contains theunobservables affecting yi.Note that the error term ui is different from the residual bui


Comments on the assumptions (cont.)

Assumption MLR.3 can fail if the relationship between thedependent variable and the explanatory variable is not correctlyspecified. For example, if the population model includes as avariable and its square as regressors, if we do not include thesquare assumption MLR.3 fails.Assumption MLR.3 also fails if we omit an important factor whichis correlated with any of the explanatory variables in the model.

With the multiple regression model, we can include more factors asexplanatory variables, and therefore it is less likely that omittingvariables is a problem in the multiple regression model with respectto the simple regression model.However, in any application, there will always be factors that wecannot include, given the limitations in the data set, or factors that wejust ignore.If these factors not included in the model are correlated with one ormore independent variables, assumption MLR.3 fails.


Comments on the assumptions (cont.)

Assumption MLR.4 is more complex in the multiple regressionmodel than in the simple regression model.

Recall that, in the simple regression model, assumption MLR.4implied that the observed values of the explanatory variable are notall the same. In the multiple regression model we need this conditionfor each of the explanatory variables, and additionally we need thatthere is no exact linear relationship between the explanatoryvariables. This guarantees that the system of equations (2) has aunique solution and we can compute the OLS estimators.When in the regression model one of the regressors is a linearfunction of the other variables, it is said that there is a problem ofperfect collinearity.The problem of perfect collinearity is a problem of a wrongspecification in the model, that can arise if for example we includetwo variables and their sum. We come back to this problem inChapter 9 when the binary variables are discussed.


Assumption MLR.3 is equivalent to

E(y j x1, .., xk) = β0 + β1x1 + ..+ βkxk

Therefore, under assumption MLR.3:

β0 is the mean of y when all the explanatory variables are zero.β1 captures the variation in the mean of y given an increase in oneunit of x1, while holding constant x2, .., xk. The interpretation of therest of the slope parameters is analogous.

Since bβ1 is an estimator of β1, one can interpret bβ1 as the estimatedchange in the mean of y given an increase in one unit of x1, whileholding x2, .., xk fixed. The interpretation of the estimates of therest of the slope parameters is analogous.


Property

Under assumptions MLR.1 to MLR.4, bβ0, bβ1, .., bβk are unbiasedestimators of parameters β0, β1, .., βk, that is,

E�bβj

�= βj, j = 0, 1, 2, .., k

Notes

Generally, if any of the four assumptions considered above is notsatisfied, then the estimator is not unbiased.As mentioned above, if assumption MLR.4 fails we could notobtain an OLS estimate.Finally, assumption MLR.3 is the crucial assumption to show theunbiasedness of the OLS estimator. If this assumption fails, theestimators would be in general biased.


Variances of the OLS estimatorsAs in the simple regression model, we will compute the varianceof the OLS estimators under the additional assumption ofhomoskedasticity. This assumption establishes that the varianceof the error term u conditional on x1, .., xk, is constant, that is, itdoes not depend on x1, .., xk.Assumption MLR.5 (homoskedasticity)

Var(u j x1, .., xk) = σ2

When Var(u j x1, .., xk) depends on some of the variables xj, it issaid that the errors are heteroskedastic.It is convenient to point out that assumption MLR.5 does not playany role in the unbiasedness of the OLS estimators.


Assumptions MLR.1 to MLR.5 are denoted as Gauss-Markovassumptions and we see below that under these assumptions, theOLS estimators verity certain efficiency properties.We obtain the expression of the variance of the OLS estimatorsunder assumptions MLR1 to MLR5.Nonetheless, it is important to emphasise that assumption MLR5is not crucial in order to obtain the variances of the OLSestimators. When the homoskedasticity assumption is notsatisfied, it is also possible to obtain the variances of the OLSestimators, although the expressions are more complicated.


Variance of the sampling distribution of bβj, j = 1, 2, .., k.

Under assumptions MLR.1 to MLR.5

Var�bβj

�=

σ2

(n� 1)S2xj(1� R2

j )(7)

where S2xj

is the sample variance of variable xj and R2j is R-square

of the regression of xj on the rest of the explanatory variables ofthe models (including the constant term). Note that the varianceshould be conditional on the observed values of the explanatoryvariables.


Parts of the variance of bβj

As in the simple regression model, the variance of bβj depends onthe variance of the error, the sample variance of xj and the samplesize, and we have that:

The higher the variance of the error term, σ2, the higher the varianceof every bβj, and therefore, if the variance of the unobservablesaffecting y is very large, it is very difficult to estimate precisely theparameters.The higher is the variance of xj the smaller the variance of bβj, andtherefore, if variable xj has a low dispersion it is very difficult toprecisely estimate βj.

The higher the sample size, the smaller the variance of every bβj.

In the multiple regression model, the variance of bβj also depends

on R2j . This term did not appear in the variance of bβj in the simple

regression model because in this case there was only oneexplanatory variable. It is important to distinguish R2

j from theR-square of the regression of y on x1, ..., xk.


Parts of the variance of bβj (cont.)

We analyse now the effect of R2j on the variance of bβj, starting with

the simple model with two explanatory variables:

y = β0 + β1x1 + β2x2 + u

In this case

Var�bβ1

�=

σ2

(n� 1)S2x1 (1� R2

1)

where R21 is the R-square of the regression of x1 on x2 (including a

constant term).Since R-square captures the goodness of fit, a large value of R2

1 showsthat x2 explains a large part of the variability of x1 and thus, x1 and x2are closely correlated.

In expression Var�bβ1

�, we see that the larger R2

1 the larger thevariance and therefore, a large degree of correlation between theexplanatory variables x1 and x2 can lead to large variances of the OLSestimators.


Parts of the variance of bβj (cont.)

Effect of R2j on the variance of bβj (cont.)

In the general case, R2j is the proportion of the variance of xj explained

by the rest of the variables of the model. The most favourable case iswhen xj is not correlated with any of the other explanatory variablesin the model, since in this case R2

j = 0. This is the best case in order toestimate βj, but this is usually not found in practice.

Assumption MLR.4 discards the other extreme case (i.e. R2j = 1),

because if R2j = 1 means that xj is a perfect linear combination of the

rest of the variables in the model.What happens if R2

j is "close" to 1? From expression Var�bβj

�we

deduce that the closest R2j is to 1, the higher Var

�bβj

�is. The situation

where there is a high correlation between the explanatory variables isdenoted as multicollinearity.It is important to point out that R2

j "close" to 1 but different from 1does not imply a violation of assumption MLR.4. In fact, the situationof multicollinearity does not violate any of our assumptions, but inthis case the estimates of the coefficients are very imprecise (withlarge variance).


Estimation of the variance of the error termAs in the simple regression model, the estimator of the variance ofthe error term, σ2, will be based on the squared sum of theresiduals.In the multiple regression model with k explanatory variables, then residuals satisfy the k+ 1 linear restrictions of the first orderconditions (2). Therefore, the residuals haven� (k+ 1) = n� k� 1 degrees of freedom and the unbiasedestimator of σ2 is bσ2 =

1n� k� 1

n

∑i=1bu2

i

Using this estimator for σ2, the estimated variance of bβj,j = 1, 2, ..k, is defined as

\Var

�bβj

�=

bσ2

(n� 1)S2xj(1� R2

j )


The standard error of the regression (SER) is defined as

bσ = pbσ2

bσ is an estimator of the standard deviation of the error term, σ.

The standard error of bβj, j = 1, 2, ..k, that it is denoted by se�bβj

�,

is the squared root of the estimated variance of bβj

se�bβj

�=

r\

Var�bβj

�=

bσq(n� 1)S2

xj(1� R2

j )

se�bβj

�is an estimator of the standard deviation of bβj and

therefore this is a measure of the dispersion of bβj.


As discussed in the simple regression model, the standard errorsplay a crucial role when making inference, that is, when testinghypothesis on the parameters of the model or when confidenceintervals should be provided.Example 1 (cont.)Using the data in example 1, the following multiple regressionmodel has been estimated

log(wage) = β0 + β1educ+ β2exper+ u

and the standard errors have been computed. The results of theestimation, including the standard errors, are usually presented inthe following way:

\log(wage) = 0.217(0.109)

+ 0.098(0.0076)

educ+ 0.010(0.0016)

exper


Standard errors and measurement units

If we change the units of measurement of the one or some of thevariables in the model, the standard errors change exactly in thesame way as the estimated coefficients.Example 2 (cont.)Using the data in example 2 the following results have beenobtained:

\salary = 830.63(223.91)

+ 19.63(11.08)

roe+ 0.016(0.0089)

sales

where salary is measured in thousands of dollars, roe as apercentage and sales in millions of dollars. If salary is measured inhundreds of dollars (salary100), we have to multiply by 10 all theestimated coefficients and all the standard errors:

\salary100 = 8306.3(2239.1)

+ 196.3(110.8)

roe+ 0.16(0.089)

sales


Efficiency of the OLS estimator:The Gauss-Markov theorem

Under assumptions MLR.1 to MLR.4 the OLS estimators areunbiased. However, under these assumptions, there are manyother unbiased estimators of βj.The question is: Is it possible that there are other unbiasedestimators with smaller variances than the OLS estimators?In what follows, we see that if assumptions MLR.1 to MLR.5 aresatisfied, and we delimit in an appropriate way the type ofestimators competing with OLS, we have that OLS estimators arethe best within this group.Gauss-Markov TheoremUnder assumptions MLR.1 to MLR.5, bβ0, bβ1, .., bβk are best linearunbiased estimators (BLUE) of parameters β0, β1, .., βk,respectively.


Meaning of this theorem:What does it mean that an estimator is linear?

An estimator eβj is linear if it is a linear function of the observedvalues of the dependent variable, that is, if it can be written as

eβj =n

∑i=1

wjiyi

where wji are functions of the observed values of the explanatoryvariables.It can be shown that the OLS estimators are linear

What does it mean that an estimator is unbiased?An estimator is unbiased if his mean coincides with the true value ofthe parameter.

What does it mean that the OLS estimators are the best (optimal)?The estimators are the best (optimal) when each bβj is the estimatorwith the smallest variance within the group of linear and unbiasedestimators of βj. That is, if eβj is another linear and unbiased estimatorof βj, it is satisfied that

var(eβj) � var(bβj)


Specification Problems

Inclusion of irrelevant variablesThe inclusion of irrelevant variables means that one (or more) ofthe variables included in the model have a zero partial effect onthe dependent variable, that it is, its coefficient in the populationis zero.For example, consider the model

y = β0 + β1x1 + β2x2 + β3x3 + u (8)

satisfying assumptions MLR.1 to MLR.4 and assume that variablex3 is irrelevant, that is, that β3 = 0.


Since we do not know the value of β3 in the population andtherefore we do not know that β3 = 0, we estimate the modelincluding variable x3. What are the consequences on the OLSestimator of including an irrelevant variable in the model?Including an irrelevant variable does not have any consequenceon the unbiasedness of the OLS estimators, since if assumptionsMLR.1 to MLR.4 are not verified, the OLS estimators are unbiasedfor any value of the parameters, which includes the case whereone of the parameters is zero.


Regarding the variance of the OLS estimators, we see that underassumptions MLR5, the variances of the OLS estimators in model(8) are higher than the variances of the OLS estimators in a modelthat does not include the irrelevant variable, that is, the OLSestimators of model

y = β0 + β1x1 + β2x2 + u (9)

Let bβ1 be the OLS estimator of β1 in model (9) and let eβ1 be the OLSestimator of β1 in model (8). The variances of these estimators are:

Var�bβ1

�=

σ2

(n� 1)S2x1(1� R2

2), Var

�eβ1

�=

σ2

(n� 1)S2x1(1� R2

23)

whereR22 is the R-square of a regression of x1 on x2 and R2

23 is theR-squared of a regression of x1 on x2 and x3.Therefore, since R2

2 � R223, we have that Var

�bβ1

�� Var

�eβ1

�.

Analogously, if bβ2 is the OLS estimator of β2 in model (9) and eβ2 isthe OLS estimator of β2 in model (8), we have that

Var�bβ2

�� Var

�eβ2

�.


In summary, including an irrelevant variable in a regressionmodel does not have any consequence on the unbiasedness of theOLS estimators. The cost is that the estimates are less precise thanthose obtained when this variable is not included.If we add an irrelevant variable in a regression model with morethan two explanatory variables, the consequences are the same asin the case we have just seen with two explanatory variables.Also, if we add more than one irrelevant variable in a regressionmodel, the consequences are the same as in the case where oneirrelevant variable is added.


Omitted relevant variablesWe analyse now the case where we omit a relevant variable in themodel, that is, we omit a variable whose population coefficient isdifferent from zero.Consider now the case where the true population model has twoexplanatory variables

y = β0 + β1x1 + β2x2 + u

and assume that assumptions MLR.1 to MLR.4 are satisfied.Assume that, either because we ignore that variable x2 has aneffect on y or because we do not observe variable x2 in our sample,we estimate a simple regression model of y on x1, and we obtainthe following estimated model

ey = eβ0 +eβ1x1

where the symbole is used instead of the usual symbol b in orderto emphasise that the estimation corresponds to a misspecifiedmodel.


It can be shown that

E�eβ1

�= β1 + β2

Sx1x2

S2x1

where Sx1x2 is the sample covariance between x1 and x2 and S2x1

isthe sample variance of x1.

Since in general E�eβ1

�6= β1, the OLS estimator eβ1 is biased.

Is there any case where the bias is zero?

The first case, when β2 = 0 is trivial, since in this case we are notomitting a relevant variable because the partial effect of x2 on y iszero.The second case is when Sx1x2 = 0, that is, when the variables x1and x2 are not correlated in the sample.


In general, the sign of the bias depends on the sign of β2 and thesign of Sx1x2 as the following table shows:

Corr(x1, x2) > 0 Corr(x1, x2) < 0β2 > 0 Positive Bias Negative Biasβ2 < 0 Negative Bias Positive Bias

The mean reason why a variable is omitted in practice is becausethere is not information about it we the following exampleillustrates:

Example 1 (cont.) Consider the model

log(wage) = β0 + β1educ+ β2abil+ u

where abil is the "innate ability" which is unobserved.In this model, by definition, the higher the ability, the higher theproductivity and therefore a higher salary, so that β2 > 0.Additionally, there are reasons to believe that education and innateability are positively correlated, so that the estimation of theregression of wage on a constant a years of education overestimates,in general, the returns to schooling.


Consider now the general case where a model with k explanatoryvariables

y = β0 + β1x1 + ...+ βkxk + u

where βk 6= 0.If variable xk is omitted and we estimate the regression of y onx1, , .., xk�1, the OLS estimators of all the coefficients are in generalbiased unless the correlation of xk with the rest of variables in themodel are all equal to zero.If only xk is correlated with one of the other variables, the OLSestimators of all the coefficients are biased.In this general case, it is difficult to establish the sign of the bias.


Chapter 7: The Multiple Regression Model: Estimation0 is the intercept and β 1 β 2,..,β k are...

Documents

Transcript of Chapter 7: The Multiple Regression Model: Estimation0 is the intercept and β 1 β 2,..,β k are...