PRACTITIONER'S CORNER: Logarithmic Dependent Variables and Prediction Bias

4
PRACTITIONER'S CORNER* Logarithmic Dependent Variables and Prediction Bias Peter Kennedy It is common in econometrics to find estimation undertaken via a regres- sion in which the regressand is the logarithm of the original dependent variable. In many such studies interest focuses exclusively on the slope coefficients and under the usual assumptions regarding how the data were generated, the slope coefficient estimates are best linear unbiased. But often attention is also paid to prediction of Y, the original depend- ent variable. In such cases the usual method of estimating Y, taking the exponential of the estimate j of y = ln Y produces biased predictions; although this result has been known for some time, its application to the prediction context has not been stressed, and an examination of recent econometric studies suggests that practitioners do not realize that this bias can in some circumstances be sufficiently large as to warrant an adjustment. Such circumstances are most likely to arise in the context f simulation studies, since the variance of the predicted y can be large when prediction is undertaken outside the bounds of the data set. Consider for expository purposes the estimating problem: Y = aX13 exp(e) (1) where X is an independent variable and a and í3 are fixed parameters. (Generalization to the case of Y = aX1X2 .. Xf,K exp(e) is straight- forward. It should also be clear that a similar bias exists for the semi- logarithmic estimating form). Exp() is assumed to be a lognormally- distributed multiplicative error term with mean one, so that the mean of Y conditional on X is aX13, the usual interpretation attached to such algebraic specifications. (See, e.g., Goldberger (1968a), p. 3.) If is often the case that researchers assume that Fe O, implying that the condi- tional expectation of Y is EY = aX13 exp (a2). However, the results to follow do not depend on which of these two specifications is adopted. The model (1) gives rise to the log-linear estimating form: y = y + ßx + e * The purpose of Practitioner's Corner is to publish brief methodological notes of interest to applied economists. The Editors welcome submissions of this sort. 389

Transcript of PRACTITIONER'S CORNER: Logarithmic Dependent Variables and Prediction Bias

Page 1: PRACTITIONER'S CORNER: Logarithmic Dependent Variables and Prediction Bias

PRACTITIONER'S CORNER*

Logarithmic Dependent Variables and Prediction Bias

Peter Kennedy

It is common in econometrics to find estimation undertaken via a regres-sion in which the regressand is the logarithm of the original dependentvariable. In many such studies interest focuses exclusively on the slopecoefficients and under the usual assumptions regarding how the datawere generated, the slope coefficient estimates are best linear unbiased.But often attention is also paid to prediction of Y, the original depend-ent variable. In such cases the usual method of estimating Y, taking theexponential of the estimate j of y = ln Y produces biased predictions;although this result has been known for some time, its application tothe prediction context has not been stressed, and an examination ofrecent econometric studies suggests that practitioners do not realizethat this bias can in some circumstances be sufficiently large as towarrant an adjustment. Such circumstances are most likely to arise inthe context f simulation studies, since the variance of the predicted ycan be large when prediction is undertaken outside the bounds of thedata set.

Consider for expository purposes the estimating problem:

Y = aX13 exp(e) (1)

where X is an independent variable and a and í3 are fixed parameters.(Generalization to the case of Y = aX1X2 .. Xf,K exp(e) is straight-forward. It should also be clear that a similar bias exists for the semi-logarithmic estimating form). Exp() is assumed to be a lognormally-distributed multiplicative error term with mean one, so that the meanof Y conditional on X is aX13, the usual interpretation attached to suchalgebraic specifications. (See, e.g., Goldberger (1968a), p. 3.) If is oftenthe case that researchers assume that Fe O, implying that the condi-tional expectation of Y is EY = aX13 exp (a2). However, the results tofollow do not depend on which of these two specifications is adopted.The model (1) gives rise to the log-linear estimating form:

y = y + ßx + e

* The purpose of Practitioner's Corner is to publish brief methodological notes of interestto applied economists. The Editors welcome submissions of this sort.

389

Page 2: PRACTITIONER'S CORNER: Logarithmic Dependent Variables and Prediction Bias

390 BULLETIN

where y = in Y, x = lnX and 'y = ma. Here is normally distributedwith variance a2 and mean This follows from the well-known fact(see Aitchison and Brown (1957)) that if w is distributed normally withmean /1 and variance a2, then exp(w) is distributed lognormally withmean exp(t + 4a2)

Regressing y on a constant and x produces an unbiased estimate ¡3 of¡3 and a biased estimate ' of 'y, where the 'bias' of ' is a2. In mostestimation contexts cr2 is expected to be quite small so that this biasmay be of little consequence, particularly when compared to the poten-tial bias discussed below. Moreover, = ln - j2 is unbiasedlyestimated.

Because e is distributed normally, ¡3 and '5 are joint-normally distri-buted, and any prediction 9 = '' + ¡3x is normally distributed. The usualway of predicting Y is via the transformation Y = exp (9). The bias ofY as a prediction of EY = aX introduced by this traditional method isrevealed by:

EI' = exp[E9 + 5V(9)]

=exp['ya2+ßx +V(9)]=EY exp[a2 + V(9)1

= EY exp[-4a2(l z'(Z'Z)1z)]where V(9) is the variance of , z is a vector of values of the indepen-dent variables in the log-linear estimating equation for the predictionin question, and (Z'Z)1 is the relevant second moment matrix of thedata.

The first term in the power of the multiplicative bias expression,(a2) does not become zero asymptotically, but is likely to be closeto zero if the fit of the log-linear equation is good. As noted by Gold-berger (1968), it reflects the fact that the traditional formula estimatesthe conditional median of Y rather than the conditional mean. Thesecond term, [4V(9)], does approach zero asymptotically, but candiffer markedly from zero, even in large samples, if prediction isattempted outside the data set. As every student of econometricsknows, the variance of prediction is smallest at the sample mean ofthe independent variables and grows with the square of the distancefrom this mean. (See, e.g., Johnston (1972), p. 40.)

Predicting inside the data set for the purpose of constructing a tablecomparing actual and predicted Y 's, or to graph actual against pre-dicted Y's, for example, is probably not misleading. Outside thedata set, however, the variances attached to predictions of y canbecome quite large, resulting in overestimation in predicting Y usingthe usual prediction method. Thus results such as the world tradesimulations of Ripley (1980), the demand for money forecasts ofLaumas and Spencer (1980), the calculation of the impact on wage

Page 3: PRACTITIONER'S CORNER: Logarithmic Dependent Variables and Prediction Bias

391

inflation of a 25 per cent change in unemployment of Bond (1980),and the estimation of the impact on the demand for cash of a zerotax rate of Tanzi (1980) could be misleading. In cases such as these,where the variance of j could be expected to be large, an alternativeestimator of Y, designed to reduce the resulting bias, may be approp-riate. It must be noted, though, that correcting for bias may worsenmean square error, providing a rationale for ignoring the adjustmentsuggested below.

The expression derived earlier for EY suggests the alternativeestimator1

}'* Ç'. exp[4u2fr())]where â2 is the usual estimate of a2 and J'(j) is the usual estimate ofV(9). (â2 = SSE/(T - K) where SSE is the sum of squared errors fromthe log-linear regression, T is the sample size, and K is the number ofregressors (including the constant). V(ji) = ô2z,(Z'Z)1z. In theexample of model (1), z'. = (l,x0) where x0 is the specific value ofx corresponding to the Y to be forecast.) Although calculation of Yis straightforward, it is not easy from published results, given the usualreporting standards. The factor V(j) differs for each prediction sincethe values of the independent variables differ. Furthermore, knowledgeof the entire estimated variancecovariance matrix of the parameterestimates is required, information that is not usually reported whenpresenting the results of empirical studies. The onus should thereforebe on the researcher to provide an indication of the magnitude of theupward bias of predictions generated by the traditional method.

The magnitude of the bias noted above depends on the size of a2 andthe prediction variance, things that can vary considerably from study tostudy. For the example of Goldberger (1968) in which income velocityis predicted by the short-term rate of interest, an interest rate com-parable to today's rate generates a net upward bias of around 2 percent. For the study of Maki and Christensen (1980), in which theindustry wage rate is determined by degree of unionism, worker educa-tion, proportion of female workers, industry profit and geographicaldispersion, prediction of the wage for zero-profit, geographically-concen-trated industry employing non-union, all-male educated workers suffersfrom a bias of about 2 per cent. These are cases of well-fitting equationswith reasonably predise coefficient estimates. When coefficient esti-mates are less precise, prediction well outside the range of the data,

This alternative, Y', corresponds to that proposed by Meulenberg (1965) for a specialcase. Although Y" is consistent, it is not unbiased; it is proposed here in preference to theunbiased estimate suggested by Goldberger (1968) on the basis of its relative computationalease. Goldberger's alternative involves an infinite sum of ratios of gamma functions for whichV(j) is an argument Giles (1982) suggests that a similar bias correction in another context isquite satisfactory.

Page 4: PRACTITIONER'S CORNER: Logarithmic Dependent Variables and Prediction Bias

392 BULLETIN

such as is done in the examples cited earlier, could easily produce a biaslarge enough to cause considerable concern. This is especially true whenthis lack of precision arises from factors other than a large â2; anexample could be the presence of multicollinearity, although it ispossible for the variance of j to be small if z* is as 'collinear' as Z. (See,e.g., Conlisk (1971).) Prudent researchers should check that theirpredictions are not subject to a large bias from this source. Reportingthe associated prediction standard errors would also be useful.

Simon Fraser University, Burnaby, B.C.

REFERENCES

Aitchison, J. and Brown, J. A. C. (1957). The Lognormal Distribution, London,Cambridge University Press.

Bond, Manan (1980). 'Exchange Rates, Inflation and Vicious Circles', IMF StaffPapers,Vol. 27,pp. 679-711.

Conlisk, John (1971). 'When Collineanity is Desirable', Western Economic Journal,Vol. 9, pp. 393-407.

Giles, D. E. A. (1982). 'The Interpretation of Dummy Variables in Semi-logarithmicEquations: Unbiased Estimation', Economics Letters, Vol. 10, pp. 77-79.

Goldberger, A. S. (1968). 'The Interpretation and Estimation of Cobb-DouglasFunctions', Econometrica, Vol. 35, pp. 464-72.

Goldberger, A. S. (1968a). Topics in Regression Analysis, New York, Macmillan.Johnston, J. (1972). Econometric Methods, 2nd edition, New York, McGraw-Hill.Laumas, G. S. and Spencer, E. (1980). 'The Stability of the Demand for Money:

Evidence from the Post-1973 Period', Review of Economics and Statistics, Vol.62, pp. 455-59.

Maki, Dennis and Christensen, Sandra (1980). 'The Union Wage Effect Re-ExaminedRelations Industrielles, Vol. 35, pp. 2 10-30.

Meulenberg, M. T. G. 'On the Estimation of an Exponential Function', Econo-metrica, Vol. 33, pp. 863-68.

Ripley, Duncan (1980). 'The World Model of Merchandise Trade: SimulationApplications', IMF Staff Papers, Vol. 27, pp. 285-319.

Tanzi, Vito (1980). 'The Underground Economy in the United States: Estimatesand Implications', Banca Nazionale del Lavoro Quarterly Review, Vol. 135, pp.427-53.