Applied Econometrics (QEM) - Prediction, Testing Joint...
Transcript of Applied Econometrics (QEM) - Prediction, Testing Joint...
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Applied Econometrics (QEM)Prediction, Testing Joint Hypotheses, Model Specification and
Collinearitybased on Prinicples of Econometrics
Jakub Mućk
Department of Quantitative Economics
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 1 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Outline
1 Prediction
2 Testing Joint HypothesesRestricted least squares estimatorWald testNon-sample information
3 Model SpecificationOmitted variablesIrrelevant variablesRESET test
4 Collinearity
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 2 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Prediction
Prediction involves using the regression model to compute fitted (pre-dicted) values of the dependent variable (y0), either within the sampleor for observations outside the sample.Key assumption:
y0 = β0 + β1x10 + . . .+ βkxk0 + e0. (1)
where e0 is a random error.The least squares predictor of y0
y0 = βLS0 + βLS
1 x10 + . . .+ βLSk xk0. (2)
The forecast error (for the least squares predictor):
f = y0−y0 = (β0 + β1x10 + . . .+ βkxk0 + e0)−(βLS
0 + βLS1 x10 + . . .+ βLS
k xk0
)(3)
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 3 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
BLUP
The expected value of the forecast error:
E(f) = E(y0)− E(y0)
=(E(βLS
0 ) + E(βLS1 )x10 + . . .+ E(βLS
k )xk0
)−
(β0 + β1x10 + . . .+ βkxk0 + E(e0))= β0 + β1x10 + . . .+ βkxk0 − β0 − β1x10 − . . .− βkxk0 − E(e0)= 0.
therefore y0 is an unbiased predictor of y0.If assumption A#1-A#6 (without normality of error term) hold theny0 is the best linear unbiased predictor (BLUP).
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 4 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
The variance of the forecast – simple regression
The variance of the forecast/prediction (simple linear regression):
var(f) = σ2
[1 + 1
N+ (x0 − x)2∑N
i=1 (xi − x)2
]. (4)
The variance of the forecast is smaller when:the variance of the random errors (σ2) is smaller,the sample size (N) is larger,the variation in the explanatory variable (
∑N
i=1 (xi − x)2)is larger,the deviation of explanatory value (at the forecast) from its sampleaverage is smaller ((x0 − x)2).
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 5 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
The confidence intervals– simple regression
The estimates of variance:
ˆvar(f) = σ2
[1 + 1
N+ (x0 − x)2∑N
i=1 (xi − x)2
], (5)
where σ2 is the estimated error term variance.The standard error of the forecast/prediction :
se(f) =√
ˆvar(f). (6)
The 100× (1− α)% prediction interval:
(y0 − tcse(f), y0 + tcse(f)) . (7)
where tc is the t-student critical value.
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 6 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
The variance of the forecast – multiple regression
The variance of the prediction in multiple regression is more compli-cated since we have to take into account the covariance between esti-mates:Example for two explanatory variables:
var(f) = var(y0 − y0)
= var[β0 + β1x10 + β2x20 + e0 − βLS
0 − βLS1 x10 − βLS
2 x20
]= var
[e0 − βLS
0 − βLS1 x10 − βLS
2 x20
]= var(e0) + var(βLS
0 ) + x10var(βLS1 ) + x20var(βLS
2 )+2x10cov(βLS
0 , βLS1 ) + 2x20cov(βLS
0 , βLS2 )
+2x10x20cov(βLS1 , βLS
2 ).
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 7 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Restricted least squares estimatorWald testNon-sample information
Outline
1 Prediction
2 Testing Joint HypothesesRestricted least squares estimatorWald testNon-sample information
3 Model SpecificationOmitted variablesIrrelevant variablesRESET test
4 Collinearity
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 8 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Restricted least squares estimatorWald testNon-sample information
Testing Joint Hypotheses
A null hypothesis with multiple conjectures, expressed with more thanone equal sign, is called a joint hypothesis.[Example ] Wages (w) and experience (exper):
w = β0 + β1exper + β2exper2 + ε. (8)
Are wages related to experience?To answer the above question we should test jointly H0 : β1 = 0 andH0 : β2 = 0.The joint null is H0 : β1 = β2 = 0.Test of H0 is a joint test for whether all two conjectures hold simulta-neously.
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 9 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Restricted least squares estimatorWald testNon-sample information
Restricted least squares estimator
The restricted least square estimator is obtained by minimizingthe sum of squares (SSE) subject to set of restrictions, which is afunction of the unknown parameters, given the data:
SSE (β0, β1, . . . , βK) =N∑
i=1[yi − β0 − β1x1 − . . .− βKxK ]2
subject to restrictions.
Examples of restrictions:β1 = β2,β1 = 2.
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 10 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Restricted least squares estimatorWald testNon-sample information
Wald test I
Wald test allows to test a set of linear restrictions.the F-statistic determines what constitutes a large reduction or asmall reduction in the sum of squared errors:
F = (SSER − SSEU ) /JSSEU/ (N −K) , (9)
where:J is the number of restrictions,N is the number of observations,K is the number of coefficients in the unrestricted model,SSER is sum of squared error in restricted model,SSEU is sum of squared error in unrestricted model,
If the null is true then the F-statistic has an F-distribution with Jnumerator degrees of freedom and N −K denominator degrees of free-dom.
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 11 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Restricted least squares estimatorWald testNon-sample information
Wald test II
If the null can be rejected then, the differences in sum of squared er-rors between restricted model (SSER) and unrestricted model(SSEU) become large.
In other words, the imposed restriction significantly reduce the abilityof the model to fit the data.
The F-test can also be used in many application:Testing economic hypotheses.Testing the significance of the model.Excluding/including s set of explanatory variables.
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 12 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Restricted least squares estimatorWald testNon-sample information
Testing the significance of the modelMultiple regression model with K explanatory variable:
y = β0 + β1x1 + . . .+ β2xk + ε. (10)
Test of the overall significance of the regression model. Thenull hypothesis:
H0 : β1 = β2 = . . . = βk = 0, (11)
while the alternative is that at least one coefficient is different from 0.In this test the restricted model:
y = β0, (12)
which implies that SSER = SST .Thus, the F -statistic in the overall significance test can be written as:
F = (SST − SSE) /KSSE/ (N −K − 1) . (13)
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 13 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Restricted least squares estimatorWald testNon-sample information
t and F statistics
If a single restriction is considered bot t and F statistics can be used,The results will be identical.This is due to an exact relationship between t- and F-distributions. Thesquare of a trandom variable with df degrees of freedom is an F randomvariable with 1 degree of freedom in the numerator and dfdegrees offreedom in the denominator
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 14 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Restricted least squares estimatorWald testNon-sample information
Non-sample information
In many cases we have information over and above the informationcontained in the sample observation.This non-sample information can be taken from e.g. economic theory.[Example] Production function. Consider the regression of logged out-put (y) on logged capital (k) and logged labor input (l):
y = β0 + β1k + β2 + ε. (14)
The natural assumption to verify is constant return to scale (CRS). Inthis case:
β1 + β2 = 1. (15)
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 15 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Omitted variablesIrrelevant variablesRESET test
Outline
1 Prediction
2 Testing Joint HypothesesRestricted least squares estimatorWald testNon-sample information
3 Model SpecificationOmitted variablesIrrelevant variablesRESET test
4 Collinearity
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 16 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Omitted variablesIrrelevant variablesRESET test
Model Specification
A model could be misspecified whenimportant explanatory variables are omitted,irrelevant explanatory variables are included,a wrong functional form is chosen,the assumptions of the multiple regression model are not satisfied
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 17 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Omitted variablesIrrelevant variablesRESET test
Omitted variables IOmission of a relevant variable (defined as one whose coefficient isnonzero) might lead to an estimator that is biased. This bias is knownas omitted-variable bias.Let’s assume true DGP (data generating process):
y = β0 + β1x1 + β2x2 + ε. (16)
Consider the case when we do not have data on x2.Equivalently, we impose the restriction that β2 = 0. According to ourtrue DGP this restriction is invalid.Then the expected value of the least squares estimator of β1:
E(βLS1 ) = β1 + β2
cov(x1, x2)var(x2) , (17)
and the omitted variable bias bias:
bias(βLS
1
)= E(βLS
1 )− β1 = β2cov(x1, x2)var(x2) . (18)
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 18 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Omitted variablesIrrelevant variablesRESET test
Omitted variables II
The omitted bias is larger if:the true slope on omitted variable β2 is higher,the omitted variable (x2) is more correlated with the included variable(x3).
However, there is no bias when the omitted variable is not correlatedwith the explanatory variables.
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 19 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Omitted variablesIrrelevant variablesRESET test
Irrelevant variables I
Due to omitted-variable bias one might follow strategy to include asmany variable as possible.However, doing so may also inflate the variance of estimate.The inclusion of irrelevant variables may reduce the precision of theestimated coefficients for other variables in the equation
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 20 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Omitted variablesIrrelevant variablesRESET test
RESET test I
RESET (REgression Specification Error Test) is designed to de-tect omitted variables and incorrect functional form.Consider the multiple linear regression:
y = β0 + β1x1 + . . .+ βkxk + ε. (19)
[Step #1]. Obtain the least square estimates and calculate the fittedvalues:
y = βLS0 + βLS
1 x1 + . . .+ βLSk xk (20)
[Step #2]. Consider the following auxiliary regressions:
Model 1 : y = β0 + β1x1 + . . .+ βkxk + γ1y2 + ε.
Model 2 : y = β0 + β1x1 + . . .+ βkxk + γ1y2 + γ2y
3 + ε.
Obtain the least squares estimators of γ1 in Model 1 and/or γ1 and γ2in Model 2.
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 21 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Omitted variablesIrrelevant variablesRESET test
RESET test II
[Step #3]. Consider the following null:
Model 1 : H0 : γ1 = 0,Model 2 : H0 : γ1 = γ2 = 0,
In both cases the null hypothesis is about misspecification.The RESET test is very general test allowing for testing functionalform. However, if we reject the null we do not know what is the sourceof misspecification.
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 22 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Outline
1 Prediction
2 Testing Joint HypothesesRestricted least squares estimatorWald testNon-sample information
3 Model SpecificationOmitted variablesIrrelevant variablesRESET test
4 Collinearity
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 23 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Collinearity I
When data are the result of an uncontrolled experiment, many of theeconomic variables may move together in systematic ways.This problem is labeled collinearity and explanatory variable are saidto be collinear.Example: multiple regression with two explanatory variable
y = β0 + β1 + x1 + β2x2 + ε. (21)
The variance of the least squares estimator for β2:
var(βLS
2
)= σ2
(1− r223)∑N
i=1 (xi2 − x2), (22)
where r23 is the correlation between x2 and x3.
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 24 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Collinearity II
Extreme case: r23 = 1 then the x2 and x3 are perfectly collinear. Inthis case the least squares estimator is not defined and we cannot obtainthe least squares estimates.If r2
23 is large then:the standards errors are large =⇒ small (in modulus) t statistics.Typically, it leads to the conclusion that parameter estimates are notsignificantly different from zero,estimates may be very sensitive to the inclusion or exclusion of a fewobservations,estimates may be very sensitive to the exclusion of insignificant vari-ables.
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 25 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Identifying and mitigating collinearity
Detecting collinearity:pairwise correlation between explanatory variables,variance inflation factor (VIF) which is calculated for each explana-tory variable. The VIF is a function of R2 from auxiliary regression ofthe selected explanatory variable on the remaining explanatory vari-ables:
V IFi = 11−R2
i
. (23)
The values above 10 suggests collinearity.Dealing with collinearity:
Obtaining more infromation.Using non-sample information, i.e., restrictions on parameters.
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 26 / 26
PredictionTesting Joint Hypotheses
Model SpecificationCollinearity
Identifying and mitigating collinearity
Detecting collinearity:pairwise correlation between explanatory variables,variance inflation factor (VIF) which is calculated for each explana-tory variable. The VIF is a function of R2 from auxiliary regression ofthe selected explanatory variable on the remaining explanatory vari-ables:
V IFi = 11−R2
i
. (23)
The values above 10 suggests collinearity.Dealing with collinearity:
Obtaining more infromation.Using non-sample information, i.e., restrictions on parameters.
Jakub Mućk Applied Econometrics (QEM) Meeting #4 Prediction, Joint Hypotheses, Model Specification and Collinearity 26 / 26