Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2...

33
Forecast Encompassing Tests and Probability Forecasts Michael P. Clements Department of Economics, University of Warwick David I. Harvey School of Economics University of Nottingham June 25, 2004 Abstract This paper analyses the large and small-sample properties of tests of forecast encom- passing when applied to probability forecasts. We consider correlated forecasts, using the distribution described by Cook and Johnson (1981) in both the analytical derivations and the Monte Carlo simulations. The usefulness of the tests is demonstrated with an appli- cation to the evaluation of recession probability forecasts obtained from logit models with dierent leading indicators as explanatory variables. JEL classication: C12, C15, C53. Keywords: Probability forecasts, encompassing tests, recession probabilities. Corresponding author. Department of Economics, University of Warwick, Coventry, CV4 7AL, UK. Tel.: +44 (0)24 7652 2990. Email: [email protected]

Transcript of Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2...

Page 1: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Forecast Encompassing Tests and Probability Forecasts

Michael P. Clements∗

Department of Economics,University of Warwick

David I. HarveySchool of Economics

University of Nottingham

June 25, 2004

Abstract

This paper analyses the large and small-sample properties of tests of forecast encom-passing when applied to probability forecasts. We consider correlated forecasts, using thedistribution described by Cook and Johnson (1981) in both the analytical derivations andthe Monte Carlo simulations. The usefulness of the tests is demonstrated with an appli-cation to the evaluation of recession probability forecasts obtained from logit models withdifferent leading indicators as explanatory variables.JEL classification: C12, C15, C53.Keywords: Probability forecasts, encompassing tests, recession probabilities.

∗Corresponding author. Department of Economics, University of Warwick, Coventry, CV4 7AL, UK. Tel.:+44 (0)24 7652 2990. Email: [email protected]

Page 2: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

1 Introduction

There is an extensive literature in economic and management science1 attesting to the useful-ness of forecast combination. That is, a linear combination of two or more forecasts may oftenyield more accurate forecasts than using a single forecast. The forecast combination literaturetypically assesses the out-of-sample accuracy of combinations whose weights have been deter-mined in-sample, although a number of studies (e.g., Newbold and Granger (1974), Stock andWatson (1999) and Fildes and Ord (2002)) cite evidence that suggests that forms of combi-nation that do not take account of the relative past performances of the forecasts are oftencompetitive with ‘optimal’ schemes.

Tests of forecast encompassing assess whether ex post a linear combination of forecastsresults in a statistically significant reduction in the mean squared forecast error (MSFE) rel-ative to using a particular forecast, and can be viewed as an indicator of when combinationsmight be useful ex ante, although Chong and Hendry (1986) originally presented such tests asmodel-specification tests for large-scale models where more conventional tests may be infeasible.Forecast encompassing is formally equivalent to ‘conditional efficiency’ due to Nelson (1972)and Granger and Newbold (1973), whereby a forecast is conditionally efficient if the variance ofthe forecast error from a combination of that forecast and a rival forecast was not significantlyless than that of the original forecast alone.

Tests of forecast encompassing and the closely-related tests of equal forecast accuracy ofDiebold and Mariano (1995) are routinely calculated in forecast comparison exercises in ap-plied work. Recent developments focus on the effects of the non-normality of the forecast errorson the test statistics (Harvey, Leybourne and Newbold (1998, 2000)), on the impact of para-meter estimation uncertainty on tests of forecast encompassing and other tests of predictiveaccuracy when the forecasts are model-based (West (1996) and West and McCracken (1998),and especially West (2001)), and on the limiting distributions of the tests when the forecastsare from models which are nested (Clark and McCracken (2001)).

In this paper, we investigate the use of tests of forecast encompassing for evaluating rivalprobability forecasts. Probability forecasts assign a probability to an event occurring, andso are defined on the open interval (0 1). Given the historical preoccupation of economicforecasting with point forecasts of economic variables which (often at least in principle) cantake any value on the real line, the papers we have cited implicitly adopt this framework. Inthe recent forecasting literature there has been considerable interest in interval and densityforecasts.2 These give, respectively, the range within which the outcome is expected to fall witha pre-set probability, and present a complete description of the probabilities assigned to eachsub-interval. Continuing in this vein, the forecast user may be primarily interested in whetherthe probabilities assigned to a variable falling in a certain key range are reasonably accurate.3

1See inter alia Diebold and Lopez (1996) and Newbold and Harvey (2002) for recent surveys, and Clemen(1989) for an annotated bibliography.

2See, inter alia, Granger, White and Kamstra (1989), Chatfield (1993), Christoffersen (1998), Engle andManganelli (1999), Clements and Taylor (2003), Diebold, Gunther and Tay (1998), Tay and Wallis (2000),Wallis (2003).

3For example, since the Spring of 1997, the Monetary Policy Committee in the UK has been charged withdelivering an inflation rate of ±1 percentage points around a specified target, currently 2% according to theconsumer price index inflation measure. Whether formalised or not, considerations of this type are an integralpart of most Western countries’ macroeconomic stabilisation policies.

1

Page 3: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Having accurate forecast probabilities of events of particular interest may be more importantthan the forecast density being correctly calibrated throughout its range.4

The forecast accuracy metric for assessing forecast encompassing in the context of pointforecasts is usually squared-error loss with empirical counterpart of mean-squared forecast error(MSFE): see Clements and Hendry (1993a, 1993b) for a critique of the use of this loss function.Probability forecasts are commonly evaluated using the quadratic probability score (QPS: seeBrier (1950)), defined as:

NMP =1

k

kXq=1

2 (cq − vq)2 (1)

where cq is the forecast probability, and vq is a binary variable denoting whether or not theevent occurs in period q, for a sequence of forecasts and outcomes for q = 1 k. As QPSis proportional to squared-error loss, the adoption of this measure leads to tests of forecastencompassing based on whether the combination of forecasts results in a statistically significantreduction in the expected squared error, as in the point forecast case. We follow the pointforecasting literature in considering linear combinations of the forecasts. These take the formc�q = + 1c1q + 2c2q, where c1q and c2q are two rival sets of forecasts, and c�q is the combinedforecast.

An alternative loss function for evaluating probability forecasts is the log probability score(LPS):

IMP = −1

k

kXq=1

[vq ln cq + (1− vq) ln (1− cq)] (2)

which penalizes large mistakes more heavily than QPS. The weights 1 and 2 that define theoptimal combined predictor for LPS will not in general match the weights that minimize QPS(or MSFE). In this paper we consider forecast encompassing tests for QPS as it is the mostcommonly used loss function for probability forecasts. As the same loss function underpins thestandard point forecast encompassing tests, this choice also ensures our findings are comparableto those in the point forecasting literature.

The plan of the paper is as follows. Section 2 outlines tests of forecast encompassing forpoint forecasts, and Section 3 considers their validity for DGPs involving probability forecasts.Section 4 derives the limiting distributions of the test statistics for probability forecasts, andSection 5 looks at small-sample size and power considerations. An empirical application ispresented in Section 6, while Section 7 concludes.

4Event probability forecasts are intimately related to interval forecasts. Event probability forecasts assign aprobability to a variable falling in a pre-specified range. Interval forecasts specify a range for a given probability.Both focus on forecasting a particular aspect of the underlying distribution of the random variable. A densityforecast can be viewed as being comprised of a sequence of interval forecasts generated by allowing the nominalcoverage rate to vary over all values in the unit interval, or alternatively, a sequence of event probability forecastswhere the events exhaust the possible set of outcomes. The evaluation of probability forecasts for a specific eventtherefore assesses one aspect of the underlying distribution.

2

Page 4: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

2 Forecast encompassing tests and point forecasts

A set of forecasts {c1q}kq=1 is said to forecast encompass another {c2q}kq=1 if the latter contains nouseful information not already present in the former for forecasting {vq}kq=1. cfq is the forecastmade of period q at period q−1, so the forecasts are one-step ahead. Useful is to be understoodin the mean-squared-error sense that a linear combination of c1q and c2q has a MSFE notsignificantly smaller than that of c1q alone. We define the forecast errors as bfq = vq − cfq,f = 1 2.

Forecast encompassing tests are based on one of the following three regressions:

FE (1)vq = + 1c1q + 2c2q + �q

FE (2)vq = 0 +

¡1− 0

2

¢c1q + 0

2c2q + �q

or, equivalently:b1q = 0 + 0

2 (b1q − b2q) + �q

FE (3)vq = 00 + c1q + 00

2c2q + �q

or, equivalently:b1q = 00 + 00

2c2q + �q

The null hypothesis that c1q forecast encompasses c2q is 2 = 0 in FE (1), and similarly02 = 0 and 00

2 = 0 for FE (2) and FE (3) respectively. Regression FE (1) is the most generalformulation, in that the coefficient on c1q is left unrestricted under both the null and alternative.The alternative is generally that 2 6= 0, so that the null would be rejected if c2q entered thecombination with a (significantly) negative weight. Relative to FE (1), the second regressiontests 0

2 = 0 with 1 + 2 = 1 as part of the maintained hypothesis, so that only convexcombinations are allowed, and the alternative is often specified as 0

2 , 0. FE (3) tests 002 = 0

but with a maintained of 1 = 1. FE (3) is the form of the original Chong and Hendry (1986)forecast encompassing test. Each of the above regressions allows the individual forecasts to bebiased. Under the null 2 = 0, c1q is unbiased when = (1− 1)B(c1q); this condition simplifiesto = 0 when 1 = 1, as assumed in FE (2) and FE (3) under the null.

When applied to testing forecast encompassing of point forecasts, FE (2) is sometimespreferred over FE (3). If vq is integrated of order 1 (vq ∼ F(1)), then provided both c1q and c2q

cointegrate with vq with cointegrating vector [1 : −1], i.e., vq− c1q ∼ F (0), vq− c2q ∼ F (0), FE(2) is balanced in terms of the orders of integration of the dependent variable and the regressormatching under both the null and alternative. However, in these circumstances FE (3) regressesan F (0) variable on an F (1) regressor.

Apart from such considerations arising from the time series properties of the data, if weassume vq ∼ F (0), then under standard assumptions about the forecast errors the q-ratios on

2,02 and

002 have limiting null distributions which are standard normal in all three regres-

sions. Harvey et al. (1998) consider the effects of conditionally heteroscedastic forecast errorsin FE (2) resulting from non-normal forecast errors, and show that the limiting distribution of

3

Page 5: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

the standard test of 02 = 0 will no longer be standard normal, and that using K(0 1) critical

values will lead to over-rejection of the null. They propose the use of modified tests usingheteroscedasticity-robust standard errors, as well as procedures adapting the Diebold and Mar-iano (1995) test for equal forecast accuracy to the problem of testing for forecast encompassing.

3 Forecast encompassing tests and probability forecasts

In this section we consider the validity of the forecast encompassing testing approaches de-scribed above when applied to probability forecasts. The testing approaches are analysed for anumber of DGP specifications for probability forecasts, and in each case the legitimacy of threeframeworks is considered. Because we begin with the QPS loss function and define forecastencompassing as the condition that 2 = 0 in the regression FE (1), the first step is to establishwhether the DGPs we consider possess the property of one set of forecasts (taken to be c1q)encompassing the other.

3.1 Preliminary data generating processes

Consider the following DGP:

DGP (1)vq = 1(r1q , sq + �) c1q = r1q c2q = r2q

where � is a constant (|�| � 1), r1q and r2q are drawn from a bivariate distribution with R (0 1)marginal distributions, and sq is R (0 1), drawn independently of r1q and r2q for all q.5 Moreprecisely, we consider r1q and r2q to be drawn from the distribution described by Cook andJohnson (1981), with the joint density given by:

c (r1q r2q) =+ 1

(r1qr2q)−( +1)

µr− 1

1q + r−1

2q − 1

¶−( +2)

(3)

The degree of correlation between r1q and r2q, 0 ≤ ≤ 1, is determined by the parameter�, with → 0 as � → ∞ and → 1 as � → 0. The DGP is chosen to be of this form sothat the forecasts have the characteristics of probability forecasts (i.e., they are defined on theunit interval), while also allowing for contemporaneous correlation among the predictions (aproperty frequently observed in practice). Moreover, vq depends only on c1q, suggesting thatthis is a candidate for a forecast-encompassing DGP, although that conjecture proves to beincorrect in general, as shown below.

Drawing on the results in the Appendix, it is apparent that the probability forecasts c1q

are in general biased for the actual probabilities (of the event vq = 1). The true probability ofvq = 1 for DGP (1) is:

B (vq) =

12(1− 2� + �2) when � , 012 when � = 012(1− 2�− �2) when � � 0

5Note that vq can also be written as vq = 1(r1q , tq), where tq ∼ R (� 1 + �).

4

Page 6: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

compared with the expected or average forecast probability B (c1q) = 12 . Thus B (vq) 6= B (c1q)

unless � = 0. The rival forecasts c2q are similarly biased for DGP (1) when � 6= 0.A reasonable expectation of well conceived forecasts is that they should be unbiased. This

property can clearly be imposed on DGP (1) by setting � = 0, and we denote the resultingsimplified DGP, appropriate for unbiased or bias-corrected forecasts, as:

DGP (2)vq = 1(r1q , sq) c1q = r1q c2q = r2q

Our analysis continues with this intuitively appealing DGP in the first instance; more generalspecifications with weaker assumptions (e.g. DGP (1)) are considered subsequently.

3.2 Forecast encompassing tests under DGP (2)

For the most general forecast encompassing framework, FE (1):

vq = + �1c1q + �2c2q + �q

the population values of the parameters are:

= B(vq)− �1B(c1q)− �2B(c2q) (4)

�1 =S (c2q)∂(vq c1q)−∂(c1q c2q)∂(vq c2q)

S (c1q)S (c2q)− [∂(c1q c2q)]2(5)

�2 =S (c1q)∂(vq c2q)−∂(c1q c2q)∂(vq c1q)

S (c1q)S (c2q)− [∂(c1q c2q)]2 (6)

Under DGP (2), using results derived in the Appendix:

S (c1q) = ∂(vq c1q) ∂(vq c2q) = ∂(c1q c2q) (7)

and substituting into (4)-(6) gives:

= 0 �1 = 1 �2 = 0

Therefore, DGP (2) constitutes a valid forecast-encompassing DGP, as �2 = 0 under FE (1), asrequired. The value of the QPS for any linear combination of forecasts that attaches a non-zeroweight to c2q is no less than for combinations that exclude c2q. Moreover, under DGP (2) thec1q forecasts are unbiased and efficient.

Turning to FE (2):b1q = 0 + �02 (b1q − b2q) + �q

it is apparent that relative to FE (1) this approach imposes the condition that �1 +�2 = 1. Thisholds under DGP (2), thus FE (2) should also be a valid approach to testing encompassing forthis DGP, a result confirmed by examining the population parameters for the FE (2) regression:

0 = B(b1q)− �02B(b1q − b2q)

5

Page 7: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

�02 =∂(b1q b1q − b2q)

S (b1q − b2q)

=S (c1q)−∂(vq c1q) + ∂(vq c2q)−∂(c1q c2q)

S (c1q) + S (c2q)− 2∂(c1q c2q)

Given (7), we have �02 = 0 as required under the null of forecast encompassing; also, sinceB(b1q) = 0, 0 = 0.

The third testing approach, FE (3):

b1q = 00 + �002c2q + �q

has regression population parameters defined by:

00 = B(b1q)− �002B(c2q)

�002 =∂(b1q c2q)

S (c2q)

=∂(vq c2q)−∂(c1q c2q)

S (c2q)

As with FE (1) and FE (2), we have �002 = 0 under DGP (2), as required for encompassing, and 00 = 0; this is consistent with the fact that FE (3) imposes �1 = 1, a restriction that holdsunder DGP (2).

In the preliminary case considered thus far, the specification given by DGP (2) is a forecastencompassing specification, in the sense that �2 = 0 in FE (1). Moreover the other twotesting frameworks are also valid. However, the assumptions underlying DGP (2) may be toorestrictive, and we also consider more general DGPs. There a number of ways in which DGP(2) could be generalized. The most obvious is to return to DGP (1), relaxing the assumptionof forecast unbiasedness/bias-correction. Another possibility is to allow the forecasts to followdistributions other than R(0 1). For example, we might replace r1q r2q sq in DGP (2) withpowers of R(0 1) random variables. We consider these generalisations next, and find that theequivalence of the three approaches no longer holds in general.

3.3 Forecast encompassing tests under DGP (1)

For the most general forecast encompassing regression, FE (1), the population values of theparameters are given by (4)-(6). The requirement for encompassing is that �2 = 0, which holdswhen:

S (c1q)∂(vq c2q)−∂(c1q c2q)∂(vq c1q) = 0 (8)

6

Page 8: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Now under DGP (1), focusing on the case 0 � � � 1 for simplicity, results derived in theAppendix show that:

S (c1q) =1

12

∂(c1q c2q) =� + 1

Z 1

0

Z 1

0r− 1

1q r−1

2q

µr− 1

1q + r− 1

2q − 1

¶−( +2)

/r1q/r2q − 1

4

∂(vq c1q) =1

12(1− �)2(1 + 2�)

∂(vq c2q) =1

4(1− �2)−

Z 1

Z 1

0r− 1

2q

µt− 1

q + r− 1

2q − 1

¶−( +1)

/r2q/tq

Evaluation of the above integrals for different � and � shows that (8) does not hold in general,i.e., DGP (1) is not necessarily a forecast encompassing specification. In fact, the only caseswhere DGP (1) is such that c1q encompasses c2q are when r1q and r2q are independent (i.e.,the forecasts are independent so ∂(vq c2q) = ∂(c1q c2q) = 0 giving �2 = 0), or when � = 0 (inwhich case we have DGP (2) and forecast encompassing as examined above).

For the cases where DGP (1) is a forecast encompassing specification, it is important toexamine the validity of the three test approaches. We now consider, therefore, the situationwhere the forecasts and actuals are given by DGP (1), but with r1q and r2q independent. FE(1) is valid by construction, given that the corresponding regression is used in the encompassingdefinition.

With regard to FE (2), the test will be valid if �02 = 0, i.e., if:

S (c1q)−∂(vq c1q) + ∂(vq c2q)−∂(c1q c2q) = 0 (9)

Given independence of r1q and r2q, this simplifies to:

S (c1q)−∂(vq c1q) = 0 (10)

which, given the above results, does not hold when � 6= 0. Tests based on FE (2) are therefore notvalid in these circumstances. The reason for this test failure is that the assumption underlyingthe FE (2) approach, �1 +�2 = 1, is violated for DGP (1). This can be seen from the fact that,given independence of r1q and r2q, �1 can easily be shown to be:

�1 =∂(vq c1q)

S (c1q)

Since (10) does not hold when � 6= 0, we have �1 6= 1, and given �2 = 0, the assumption that�1 + �2 = 1 is violated.

The FE (3) approach is valid if �002 = 0, i.e., if:

∂(vq c2q)−∂(c1q c2q) = 0 (11)

It is clear from the independence of r1q and r2q that (11) holds, ensuring the validity of FE (3)in this case. This is interesting given that FE (3) assumes �1 = 1, a condition that is not methere; it appears that the effect of incorrectly imposing �1 = 1 is absorbed by the constant termin the regression, with = �(1− �)2 but 00 = �( �2 − 1).

7

Page 9: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

To summarise the findings for DGP (1), and its simplified version DGP (2), we can considerfour possible scenarios, where the forecasts are (i) independent and unbiased, (ii) correlatedand unbiased, (iii) independent and biased, and (iv) correlated and biased. Cases (i)-(iii) giverise to forecast encompassing specifications, while (iv) does not. For the three encompassingspecifications, FE (1) and FE (3) are valid testing approaches in each case, however FE (2) isnot valid for case (iii) since the bias in c1q is such that �1 + �2 6= 1.

3.4 Forecast encompassing tests under an alternative DGP

As discussed above, we now consider an alternative generalisation of DGP (2) involving powersof R(0 1) random variables:

DGP (3)vq = 1(rd

1q , seq ) c1q = rg1q c2q = rh

2q

Under this DGP, results derived in the Appendix give:

B(vq) =e

d + e B(c1q) =

1

g + 1 B(c2q) =

1

h + 1

It can easily be shown that c1q and c2q are unbiased if and only if g = de and h = d

e respectively.The Appendix also shows that under DGP (3):

S (c1q) =g2

(2g + 1) (g + 1)2

∂(c1q c2q) = B(rg1qr

h2q)−

1

(g + 1)(h + 1)

∂(vq c1q) =deg

(d + ge + e) (d + e) (g + 1)

∂(vq c2q) = B(rde1qr

h2q)−

e

(d + e)(h + 1)

where

B(r�1qr

�2q) =

� + 1

Z 1

0

Z 1

0r�−( +1)1q r

�−( +1 )2q

µr− 1

1q + r− 1

2q − 1

¶−( +2)

/r1q/r2q (12)

Recall that the condition required for forecast encompassing is (8). There are three generalcases for which encompassing holds. First, if the c1q forecasts are unbiased, i.e., g = d

e , then itfollows directly that both S (c1q) = ∂(vq c1q) and ∂(vq c2q) = ∂(c1q c2q), establishing that (7)holds. This is a sufficient condition for (8), �2 = 0 (for all degrees of correlation between r1q andr2q). The second general case is when r1q and r2q (and hence the forecasts) are independent,regardless of the values of d, e, g and h. Finally, there also exist configurations of d, e, g, hand � which result in (8) being satisfied, but where the forecasts are biased and correlated. Asin the previous sub-section, we now consider the validity of FE (2) and FE (3) for these threesituations.

8

Page 10: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

First consider the case where g = de . Since (7) is satisfied, it is clear that �

02 = 0 and �002 = 0,

validating the FE (2) and FE (3) tests. This follows since we also have = 0 and �1 = 1 here,so c1q is efficient and the assumptions underpinning FE (2) and FE (3) are appropriate.

With regard to the second case of r1q r2q independence, FE (2) will not always be valid.Given forecast independence, �02 = 0 when (10) is true, a condition which holds for somecombinations of d, e and g but not others; note that the value of h is irrelevant here. Forexample, as outlined above, if g = d

e , (10) holds, we have = 0 and �1 = 1, and FE (2) is valid.Similarly, if d = g = 1 and e = 1

2 , (10) again holds and FE (2) can be used; here 6= 0 but�1 = 1, so although the forecasts are biased, the crucial condition �1 + �2 = 1 is maintained.On the other hand, there are many parameter configurations for which �1 + �2 6= 1, leading toFE (2) failure. For example, suppose d = e = g = h = 2, so that r1q, r2q and sq in DGP (3)are squares of independent R(0 1) random variables; in this case (10) is violated and �02 6= 0.

In contrast, whenever r1q and r2q are independent, (11) must be true for all d, e, g andh. This implies �002 = 0, with FE (3) again being a valid approach to testing for forecastencompassing, even if �1 6= 1.

Lastly, consider the remaining class of encompassing possibilities, where (8) holds but theforecasts are correlated and biased. Such cases cannot be examined analytically, but can beinvestigated using numerical integration of the above moments. An example is when d = g = 1,e = h = 2, and � = 0330 (giving = 0724). In these circumstances, neither FE (2) nor FE(3) are appropriate forecast encompassing tests due to the fact that S (c1q) 6= ∂(vq c1q) and∂(vq c2q) 6= ∂(c1q c2q), violating (9) and (11) so that neither �02 nor �002 are zero. In thesecircumstances, it also follows that 6= 0 and �1 6= 1.

To conclude this section, we find that forecast encompassing holds for DGP (2), whilefor more general specifications such as DGP (1) and DGP (3), supplemental conditions mustalso be satisfied for c1q to encompass c2q. Given a forecast encompassing DGP, FE (1) is alegitimate approach to testing for encompassing by definition, while FE (2) and FE (3) can beinvalid, in the sense that the population value of the parameters �02 and �002 can differ from thehypothesized value of zero under the null. The question posed in this section considers whichforecast encompassing tests are valid approaches for DGPs involving probability forecasts; theanswer essentially lies in the extent to which the required assumptions for each approach are metfor typical probability forecast DGPs, and the degree of robustness to these assumptions beingviolated. Of the three testing frameworks, FE (1) is clearly the most reliable. An appealingfeature of FE (3) relative to FE (2) is that in some of the encompassing situations consideredwhere �1 6= 1, the former remains valid while the latter fails, suggesting greater reliability of FE(3). To this end, the preference for FE (2) over FE (3) in the point forecast evaluation literaturemay be reversed for probability forecasts, while FE (1) might be considered superior overall.Equally, if �1 = 1 is ensured, perhaps through prior bias-correction, then all approaches willconstitute valid testing procedures in the sense in which we are using that term in this section.That said, an evaluation of the approaches also depends on the limiting distributions of thetest statistics, as well as finite-sample issues of size and power. We turn to these issues in thenext section.

9

Page 11: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

4 Test statistics and asymptotic distributions

In this section we present forecast encompassing test statistics for use with FE (1), FE (2)and FE (3), and examine their asymptotic distributions. The standard test for forecast encom-passing is the q-ratio on �2, �

02 or �002 in the FE (1), FE (2) or FE (3) regression respectively.

We begin by deriving the asymptotic properties of this test, denoted O, under DGP (2). ThisDGP is chosen as it is an intuitively appealing probability forecast encompassing specifica-tion for which FE (1), FE (2) and FE (3) are valid, as well as being representative of otherencompassing DGPs in terms of the results for O.

4.1 Limiting distribution of FE (1) R test under DGP (2)

Consider denoting the FE (1) regression in the general form:

vq = U 0q� + �q q = 1 k (13)

where U 0q = [1 c1q c2q] and �0 = [ �1 �2], and in matrix form:

v = U� + � (14)

Under standard assumptions (see, for example, White (1984, Theorem 5.3, p.109)), which holdhere:

k1.2(� − �)a−→ K(0A)

where

A = J−1NJ−1 J = B(UqU0q) N = S

Ãk−1.2

QXq=1

Uq�q

! (15)

Further, under the forecast encompassing null implied by DGP (2), �2 = 0, i.e.:

k1.2�2a−→ K(0 a33)

where a33 is the (3 3) element of A.The standard q-ratio on �2 is given by:

O = ˆ−1�−1.233 �2

where �33 is the (3 3) element of (U 0U)−1 and ˆ2 is the usual least squares estimator of2 = S (�q). Since ˆ2 m−→ S (�q) = B(�2

q ) and k−1U 0U m−→J , the limit distribution of O is:

Oa−→ K(0 '2) '2 = [B(�2

q )(33]−1a33 (16)

where (33 is the (3 3) element of J−1.Now under the DGP (2) null hypothesis, = 0, �1 = 1 and �2 = 0, so:

�q = vq − c1q

Let the moments B(r�1qr

�2q), as defined in (12), be denoted byj��. Drawing on results derived in

the Appendix, and assuming, as is reasonable for one-step ahead prediction, that the elementsof Uq�q are uncorrelated with those of Up�p for all p 6= q, allows us to write:

10

Page 12: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

J =

1 B(c1q) B(c2q)B(c1q) B(c2

1q) B(c1qc2q)B(c2q) B(c1qc2q) B(c2

2q)

=

1 1.2 1.21.2 1.3 j11

1.2 j11 1.3

N =

S (�q) ∂(c1q�q �q) ∂(c2q�q �q)∂(c1q�q �q) S (c1q�q) ∂(c1q�q c2q�q)∂(c2q�q �q) ∂(c1q�q c2q�q) S (c2q�q)

=

1.6 1.12 j11 −j21

1.12 1.20 j21 −j31

j11 −j21 j21 −j31 j21 −j22

Expanding for J−1 and A, and substituting into (16) gives:

'2 = 72 |J |−1

½µ1

2j11 − 1

6

¶·µ1

2j11 − 1

6

¶1

6+

µ1

4−j11

¶1

12+

1

12(j11 −j21)

¸+

µ1

4−j11

¶·µ1

2j11 − 1

6

¶1

12+

µ1

4−j11

¶1

20+

1

12(j21 −j31)

¸+

1

12

·µ1

2j11 − 1

6

¶(j11 −j21) +

µ1

4−j11

¶(j21 −j31) +

1

12(j21 −j22)

¸¾where

|J | =1

2j11 −j2

11 −1

18

Values of '2 can be evaluated using numerical integration, and for non-zero correlations ,(as determined by �), '2 6= 1. The standard forecast encompassing test based on FE (1) doesnot therefore have a limiting standard normal distribution under DGP (2), and application ofK(0 1) critical values will result in an asymptotically size-distorted test. This results from theerror term being conditionally heteroscedastic with regard to the regressors c1q and c2q. Notethat an exception is when the forecasts are independent; when independence holds, j�� =B(r�

1q)B(r�2q), and substitution of this result into the above expression yields '2 = 1, with O

consequently being correctly sized in the limit.Limiting test sizes can be evaluated by numerical integration for a given forecast dependence

parameter � and significance level. Table 1 reports asymptotic sizes for a variety of correlations, for two-sided nominal 5%-level tests. It can be seen that although the asymptotic sizes differfrom nominal size, the degree of the size distortion is mild.

4.2 Limiting distribution of FE (2) R test under DGP (2)

Consider now denoting the FE (2) regression in the form of (13) and (14). Under the samestandard assumptions as considered for FE (1), we have:

Oa−→ K(0 '2) '2 = [B(�2

q )(22]−1a22 (17)

where O = ˆ−1�−1.222 �

02 and where (22, �22 and a22 are the (2 2) elements ofJ−1, (U 0U)−1 and

A respectively.

11

Page 13: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Given that 0 = 0 and �02 = 0 under the DGP (2) null hypothesis, we can note that:

�q = b1q = vq − c1q

as before. Using results from the Appendix, and again assuming uncorrelatedness among theelements of Uq�q and Up�p for all p 6= q, we have:

J =

·1 B(b1q − b2q)

B(b1q − b2q) B[(b1q − b2q)2]

¸=

·1 00 2.3− 2j11

¸

N =

·S (�q) ∂[(b1q − b2q)�q �q]

∂[(b1q − b2q)�q �q] S [(b1q − b2q)�q]

¸=

·1.6 j11 −j21 − 1.12

j11 −j21 − 1.12 2j31 −j21 −j22 + 1.20

¸

Expanding for J−1 and A, and substituting into (17) gives:

'2 =3(2j31 −j21 −j22 + 1

20)13 −j11

Values of '2 can again be evaluated using numerical integration, and for FE (2), '2 6= 1 forall correlations ,. Unlike FE (1), this result also holds when the forecasts are independent, with'2 = 4

5 in that case. The standard forecast encompassing test based on FE (2) will therefore beasymptotically incorrectly-sized, and the extent of this distortion can be seen in Table 1, wherevalues of '2 and the corresponding limiting test sizes are given for two-sided nominal 5%-leveltests. The FE (2)-based test is undersized in the limit for low correlations and oversized forhigher correlations, although, as with FE (1), the effects are not too severe.

4.3 Limiting distribution of FE (3) R test under DGP (2)

Lastly, we examine the test based on the FE (3) regression. Consider again denoting theregression in the form of (13) and (14). Under the same standard assumptions as before:

Oa−→ K(0 '2) '2 = [B(�2

q )(22]−1a22 (18)

where O = ˆ−1�−1.222 �

002 and where (22, �22 and a22 are the (2 2) elements ofJ−1, (U 0U)−1 and

A respectively.Given that 00 = 0 and �002 = 0 under the DGP (2) null hypothesis, we again have �q =

vq − c1q. Following the analysis for the other tests, results from the Appendix combined withthe uncorrelatedness assumption yields:

J =

·1 B(c2q)

B(c2q) B(c22q)

¸=

·1 1.2

1.2 1.3

¸N =

·S (�q) ∂(c2q�q �q)

∂(c2q�q �q) S (c2q�q)

¸=

·1.6 j11 −j21

j11 −j21 j21 −j22

¸

12

Page 14: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Expanding for J−1 and A, and substituting into (18) gives:

'2 = 72(2j21 −j11 −j22) + 3

Numerical integration shows that '2 6= 1 for all non-zero correlations ,, while the specialcase of forecast independence gives '2 = 1. Thus, as with FE (1), the standard forecastencompassing test based on FE (3) is incorrectly sized in the limit, unless the forecasts areindependent. Table 1 highlights the extent of the problem for a range of correlations, and, inline with the other tests, the resulting undersize is generally of a relatively small magnitude.

4.4 Modified test statistics and limiting distributions

Since O is not an asymptotically correctly sized test in general under any of the three testingapproaches, it is important to consider alternative tests that are correctly sized. Although theasymptotic size distortions are not severe under DGP (2), it is desirable to use tests which followstandard asymptotic distributions; also, more general DGP specifications may well result inmore severe asymptotic mis-sizing for the O statistic. Moreover, optimal e-steps ahead forecasterrors follow anJ>(e−1) process, and a simple test such asO does not admit such dependence,implicitly assuming e = 1. To achieve robustness to the conditional heteroscedasticity presentin the encompassing regressions, and allow for forecast horizons greater than one, we can applythe tests proposed by Harvey et al. (1998), derived in the context of point forecast evaluation(using FE (2) with = 0).

First, we can consider two modified regression-based tests which employ consistent estima-tors of N in (15). Newbold and Harvey (2002) extend such tests to the FE (1) frameworkand also introduce a modification to allow for multiple step ahead prediction. For FE (1) themodified tests are given as follows:

O1 =�2q

a1�33.k O2 =

�2qa2�33.k

where af�33 is the (3 3) element of Af = J−1NfJ−1, with:

J = k−1U 0U

N1 = k−1

"kX

q=1

UqU0q�

2q +

e−1X=1

kXq= +1

Uq− �q− U 0q�q + +

e−1X=1

kXq= +1

Uq�qU0q− �q−

#

N2 = k−1

"kX

q=1

UqU0qr

2q +

e−1X=1

kXq= +1

Uq− rq− U 0qrq + +

e−1X=1

kXq= +1

UqrqU0q− rq−

#

and where rq are the residuals from least squares estimation of the FE (1) regression with thenull restriction �2 = 0 imposed. Test statistics for FE (2) and FE (3) can be similarly definedby making the obvious substitutions.

Harvey et al. (1998) also propose tests based on a Diebold andMariano (1995)-type approachfor FE (2). The statistic can be modified to allow for 6= 0 by working with deviations from

13

Page 15: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

means, and it is straightforward to define an equivalent test based on FE (3):

AJ = k

e−1X=−(e−1)

kXq=| |+1

(aq − a)(aq−| | − a)

−1.2

a

JAJ = k−1.2[k + 1− 2e + k−1e(e− 1)]1.2AJ

whereaq = (b1q − b1)[(b1q − b1)− (b2q − b2)] for FE (2)aq = (b1q − b1)(c2q − c2) for FE (3)

and a = k−1Pk

q=1 aq. It is also possible to define AJ and JAJ tests for FE (1) by makinguse of the Frisch-Waugh theorem. The population parameter �2 in FE (1) is identical to �2 in:

.1q = �2.2q

where .1q and .2q are the errors from the regression of vq and c2q, respectively, on a constantand c1q. The null of �2 = 0 therefore holds when B(.1q.2q) = 0, allowing use of the AJ andJAJ tests with:

aq = .1q.2q for FE (1)

where .1q and .2q denote the residuals corresponding to .1q and .2q.Provided forecast encompassing holds and conditions are such that the relevant testing

framework (FE (1), FE (2) or FE (3)) is valid, all these tests will be asymptotically distributedstandard normal. This occurs for O1 and O2 because A in (15) is consistently estimated whenthe heteroscedasticity-consistent estimators of N are employed; for AJ andJAJ , asymptoticnormality follows from standard results.

5 Monte Carlo

5.1 Size and power experiments

We estimate by Monte Carlo simulation the small-sample properties of the tests of forecastencompassing for FE (1), FE (2) and FE (3) based on the DGPs discussed in section 3. Thelimiting distributions are recorded in Section 4. In addition to the O test for each of theseregressions, we also examine the four modified test statistics discussed in Section 4.4, namely,the O1, O2, AJ andJAJ statistics. Hence we examine the finite-sample performance of fivetests for each of the three forecast encompassing regression approaches. In keeping with therecommendations of Harvey et al. (1998) for point forecast encompassing tests, critical valuesfrom the qk−1 distribution are employed for all except the AJ test (for which asymptoticK(0 1) critical values are used). Two-sided versions of the tests are used throughout, constantsare always included in the regressions (with the appropriate modifications for the Diebold-Mariano-type tests described above) and the size and power estimates are based on 50,000replications.

For the correlated forecast DGPs, we use the method suggested by Cook and Johnson (1981)for obtaining drawings of dependent uniform random variables. That is, we draw V1 and V2

independently from a gamma (1,1) distribution, and draw U as a gamma ( 1). We then

14

Page 16: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

calculate rf = (1 + Vf.Uf)−�, f = 1 2. r1 and r2 are then marginally R (0 1) but with the joint

density function given by (3); r1 and r2 are then the forecast probabilities.Our results are presented in a number of tables. Tables 2 and 3 record size estimates for DGP

(2) when , = 0 and , = 08, respectively, corresponding to cases of independent and correlatedforecasts. We consider sample sizes ranging from 8 to 512. Some investigation indicated thatk = 512 could be viewed as the large-sample case, as the results changed little if k was increasedup to 10,000.

The results presented in these two tables can be summarised as follows. Under independenceO is correctly sized for FE (1) and FE (3) in large samples, but not for FE (2), in line withthe results in Section 4; the modified tests achieve correct size for all approaches as expected.O1 and AJ tend to inflate the size for small k, whereas the opposite is true of O2, but JAJappears to have good size throughout (except perhaps when k is small for FE (2)). Whenwe dispense with the assumption of independence of forecasts, O is not correctly sized for anyof the three approaches even in the limit, although as previously noted in Table 1, and as isapparent in Table 3, the size distortions are typically small. The four modified statistics arecorrectly sized in the limit, as shown in the table, but the tendency for O1 and AJ to beoversized, and for O2 to be undersized, for small k, is generally exacerbated relative to the caseof independent forecasts. JAJ again has the best size behaviour overall (apart from someoversize for small k for FE (3)).

Table 4 presents results for independent forecasts but when the DGP is DGP (1) so thatthe forecasts are biased. From Section 3 we have that FE (1) and FE (3) are valid but notFE (2): this is clearly borne out by the table, where for the specific parameter values chosenthe size of the FE (2) tests is 100% for k = 512. Table 5 records results for an instance of thealternative DGP of Section 3.4. Using the notation in that section, d = g = 1, e = h = 2, andthe forecasts are generated by powering up correlated uniforms, with , = 0724. Recall that inthis case, FE (1) is valid but FE (2) and FE (3) are not. The results confirm the theory in thisrespect, with only modified FE (1)-based tests achieving correct size in large samples.

The power estimates are reported in Tables 6 and 7. The DGP is:

vq = 1(0r1q + (1− 0)r2q , sq) c1q = r1q c2q = r2q

and we consider 0 = 075 and 0 = 05, to illustrate the increase in power as the relative weightof c2q increases, as well as considering , = 0 and , = 05. Thus the DGP corresponds to DGP(2) in that � = 0 and both forecasts are unbiased. All the tests are seen to be consistent, withpower increasing in k and 0. For , = 0, we find that test statistics based on FE (2) have largepower advantages over the other two approaches. For , = 05, the performance of tests basedon FE (1) is much closer to that of FE (2), whilst the FE (3) statistics have significantly lowerpower. For a given forecast encompassing regression approach, there is generally little variationin power across the testing procedures, particularly for the independent forecast results. Thepowers of the reliably sized JAJ statistics are competitive with the other tests in general,especially at all but the smallest values of k when the other statistics are on occasion subjectto significant size distortions.

Given the results of these experiments and those of Section 3, it appears that the FE (1)-based JAJ test constitutes the most robust procedure of the forecast encompassing tests wehave considered, with, on balance, the most desirable size and power properties across a range

15

Page 17: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

of DGPs. However, if one was confident that the assumptions required for the validity of FE(2) were satisfied, more power might be achieved by adopting the FE (2)-based JAJ test.

5.2 Parameter estimation uncertainty

When forecasts are derived from models with estimated parameters (as in the empirical exam-ple), it has recently been shown that the asymptotic distributions of tests of predictive accuracymay be affected: see West (1996), West and McCracken (1998) and West (2001). West (2001)establishes that forecast encompassing tests of the sort we describe as FE (2) will be affected bythe additional source of uncertainty from estimating the models’ parameters. We estimate byMonte Carlo the effects of parameter estimation uncertainty on tests of forecast encompassingfor probability forecasts.

A useful way to incorporate parameter estimation uncertainty is to posit the following DGP:

DGP (4)

vq = 1(c1q , sq) c1q =exp(u1q)

1 + exp(u1q) c2q =

exp(u2q)

1 + exp(u2q)

where u1q u2q are independent K(0 1) and sq ∼ R(0 1), independent of u1q, u2q. Given thatthe forecasts are independent, ∂(c1q c2q) = ∂(vq c2q) = 0, satisfying (8) and (11) and ensuring�2 = �002 = 0. Further, results in the Appendix establish that (10) is also true, so that �02 = 0.Thus, in the absence of parameter estimation uncertainty, DGP (4) satisfies the property offorecast encompassing (�2 = 0), and tests based on FE (2) and FE (3) are valid (�02 = �002 = 0).

The forecast probabilities c1q and c2q that incorporate estimation uncertainty are generatedfrom in-sample logit regressions of vq on a constant and u1q, and vq on a constant and u2q,respectively. Data is generated from DGP (4) for q = 1 OO + 1 O + k. The modelsare estimated on observations 1 through O, and forecasts are generated of O + 1 to O + k.

Table 8 presents Monte Carlo size estimates for a range of values of k.O for theJAJ testfor all three test approaches (FE (1), FE (2) and FE (3)), again based on 50,000 replications, aswell as results for experiments based on DGP (4) without prior parameter estimation. Matchingthe findings of West (2001), with the exception of the somewhat anomalous k = 8 results, theJAJ test statistic for FE (2) becomes increasingly oversized as k.O increases, although forthe statistics based on FE (1) and FE (3) there is no evidence of size distortion. In the empiricalillustration k.O is approximately 1

2 , so we conjecture that parameter estimation uncertaintywill play a relatively minor role, even for the FE (2) statistics. A full analysis of the effects ofparameter estimation uncertainty for tests of predictive accuracy of probability forecasts is leftfor future research.

6 Evaluating US recession forecast probabilities

Our empirical application considers the information value of two leading indicators for post WarUS recessions using simple logit models. Specifically, we compare forecasts of the probabilitiesof recessions from logit models that use, respectively, interest rate spreads, and some transfor-mation of the oil price. The forecast probabilities are tested for encompassing. Recessions aretaken as those defined by the NBER’s Business Cycle Dating Committee6, and the motivation

6See http://www.nber.org/cycles.html.

16

Page 18: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

for focusing on these two sets of explanatory variables is now explained. The usefulness of finan-cial variables for predicting US recessions has been established by Estrella and Mishkin (1998),Anderson and Vahid (2001) and Hamilton and Kim (2000), inter alia. The slope of the yieldcurve tends to dominate other financial variables as a predictor of recessions, in that the yieldcurve is often the single best choice, and the incremental benefit of including other financialvariables is often small. Chauvet and Potter (2002) allow for structural breaks in probit modelsthat use the yield curve to predict recessions, but we use simple constant-parameter models asour aim is simply to illustrate the use of the forecast encompassing tests.

Oil prices may also serve as a useful leading indicator for output growth (e.g., Hamilton(1983), Mork (1989), Hooker (1996), Hamilton (1996), Carruth, Hooker and Oswald (1998) andHamilton (2000)) although there is some debate over the form of the relationship. Hamilton(1996) suggests that output should be related to the net increase in oil prices over the previousyear, and constructs a variable that is the excess of the oil price in the current quarter over theirpeak value during the previous year (or zero when they remain below the local peak). Thus,increases in the price of oil which simply reverse previous (within the preceding year) declinesdo not depress output growth. Lee, Ni and Ratti (1995) suggest that increases in oil pricesat times of high volatility should be downweighted relative to those occurring during tranquilperiods.

We use quarterly spread and oil price data for the period 1960:1—1999:4.7 We divide theperiod into an in-sample estimation period (1960:1—1979:4) and a forecast period (1980:1—1999:4). Preliminary work suggested that the logit models of the recession indicator on thespread (and an intercept) appeared to work best if the spread is lagged three or four periods,and we chose a three period lag. For the oil price model, the recession indicator was regressedon the increase in current quarter oil prices over the peak attained in the previous three years(when positive), this variable being lagged two periods. The in-sample fits of the two logitregression models are summarized in Table 9. The superior individual performance of thespread is apparent from the QPS, LPS and pseudo-O2 statistics.8

Nevertheless, the forecast encompassing tests consider whether the oil price-logit modelforecast probabilities contain useful information in terms of delivering a statistically-significantreduction in QPS when used in combination with the yield curve predicted probabilities. Theforecast probabilities underlying Table 10 are one-step ahead, and based on fixed coefficientestimates. For example, the spread-logit model regresses vq on an intercept and the spreadlagged three periods for the period 1960:4 to 1979:4. The forecast probability for 1980:1 isgenerated using the estimated parameters and in principle information up to an including1979:4: given the model form, the forecast simply uses the value of the spread in 1979:2. The

7The spread data is the 10-year treasury constant maturity rate less the 3-month treasury bill at secondarymarket rate. These are popular choices of long and short rates. Quarterly data is obtained from the monthlydata by averaging. The data were taken from the FRED website http://research.stlouisfed.org/fred2. Theoil price data are freely available from James Hamilton’s web page http://weber.ucsd.edu/~jhamilto.

8Various O2 measures have been proposed in the literature for discrete choice models. We follow Estrella andMishkin (1998) and report a measure defined as:

O2 = 1− iir

iio

−( 2kiio)

where iir and iio are the unrestricted maximised value of log likelihood, and the value imposing the restrictionthat all the slope is zero. The m-value is of the standard likelihood ratio test compared to a 2

1 distribution.

17

Page 19: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

forecast probability for 1980:2 uses the same estimated parameters and the value of the spreadin 1979:3, and so on. Forecast probabilities are generated for all forecast period dates, so k = 80.

From Table 10 it is apparent that we always reject the null that the oil price-model forecastsencompass the spread-model forecasts when testing �2 = 0 versus the one-sided alternative of�2 , 0 at the 5% level. This is as expected given the superior in-sample performance of thespread-model probability forecasts. The evidence for whether the spread-model probabilityforecasts encompass the oil-model forecasts is perhaps less clear, but with the exception of theless reliable O and O1 statistics there is little evidence against the null at the 5% level. This isparticularly true for the preferred JAJ tests for all three forecast encompassing approaches.

7 Conclusions

We have investigated the use of tests of forecast encompassing for evaluating probability fore-casts. Forecast encompassing tests are a commonly used test of predictive ability of pointforecasts, and we follow the point forecast literature quite closely. We adopt the quadraticprobability score (QPS) as the loss function, noting that the QPS is the most commonly usedloss function for probability forecasts. In common with the point forecast literature, we alsoconsider linear combinations of probability forecasts. In many respects our findings parallelthose for point forecast encompassing tests. In particular, while the most general version of theforecast encompassing test is valid by definition, care must be taken when adopting more re-stricted forms of the encompassing regression (as frequently used in the point forecast context).This follows since tests based on the restricted regressions can be invalid for certain plausibleDGPs, where the implicit assumptions underpinning the regressions are violated. Further, evenwhen a particular testing approach is valid, standard q-tests are not asymptotically correctlysized in general.

We have analysed analytically and via Monte Carlo simulation the limiting and small prop-erties of the standard tests, as well as modifications to take care of the aforementioned sizedistortions, and are able to suggest a statistic that appears to be reasonably robust and exhibitgood power properties across a range of DGPs, namely, the modified Diebold-Mariano versionof the most general version of the forecast encompassing tests (referred to as JAJ for FE(1), in the paper). A caveat to this recommendation is if one is working with forecasts whichsatisfy �1 = 1 in the FE (1) regression (e.g. bias-corrected forecasts); more power might thenbe achieved from using the JAJ test based on the convex-combination version of the encom-passing regression (FE (2)). Note, however, that while the FE (1)-based procedure appearsrobust to the effects of parameter estimation uncertainty when the forecasts are derived fromestimated models, the test based on FE (2) does not.

Our analysis allows for correlated forecasts. We use the Cook-Johnson family of jointuniform distributions in our analytical derivations, as well as to generate random variables withthe desired properties in the Monte Carlo. We have not explicitly addressed the problem of serialcorrelation in the forecast errors induced by the overlapping nature of multi-step forecasts, as wehave adopted a one-step ahead framework. It is well known that optimal e-steps ahead forecasterrors follow an J>(e − 1) process. But we have argued that the modifications described inSection 4.4 should allow for forecast horizons greater than one, as suggested by Harvey et al.(1998) in the context of point forecast evaluation.

The tests are illustrated with an application to the evaluation of the recession probability

18

Page 20: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

forecasts from two logit models based on alternative leading indicators of recession. The resultsof assessing the information content of yield curves and oil prices, within a logit framework, arefairly clear: using the more reliable forecast encompassing tests, the predictions using the yieldcurve are found to forecast encompass those based on the oil price. Thus the test statisticsand supporting analysis have proved useful in determining the relative importance of the twoleading indicators in generating recession probabilities.

Appendix

Appendix to Section 3

We first require moments of random variables drawn from the Cook-Johnson distribution. Therandom variables have marginal R(0 1) distributions, and the following results are derived fromthe properties of the R(0 1) and the Cook-Johnson joint density given in (3). They are relevantfor the calculations based on DGP (1), DGP (2) and DGP (3):

c(rfq) =

½1 0 � rfq � 10 otherwise

f = 1 2

B(r�fq) =

R 10 r�

fq/rfq =1

2 + 1 f = 1 2

B(r�1qr

�2q) =

� + 1

Z 1

0

Z 1

0r�−( +1)1q r

�−( +1)2q

µr− 1

1q + r− 1

2q − 1

¶−(�+2)

/r1q/r2q

For DGP (3) we have:

S (c1q) = B(r2g1q )−B(rg

1q)2 =

g2

(2g + 1) (g + 1)2

∂(c1q c2q) = B(rg1qr

h2q)−B(rg

1q)B(rh2q)

= B(rg1qr

h2q)−

1

(g + 1) (h + 1)

For the forecasts implied by DGP (1) and DGP (2) the corresponding moments can be obtainedby setting g = h = 1 in the above formulae, viz., S (cfq) = 1

12 , f = 1 2, and ∂(c1q c2q) =B(r1qr2q)− 1

4 .Now consider moments involving vq. For DGP (1), given that vq = 1(r1q , tq) with

tq ∼ R(� 1 + �):

B (vq) =

Z 1+�

Z 1

01(r1q , tq)/r1q/tq

=

R 1�

R 1tq

1/r1q/tq when � , 0R 10

R 1tq

1/r1/t when � = 0R 0�

R 10 1/r1q/tq +

R 1+�0

R 1tq

1/r1q/tq when � � 0

=

12(1− 2� + �2) when � , 012 when � = 012(1− 2�− �2) when � � 0

19

Page 21: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Focusing on � ≥ 0:

∂(vq c1q) = B(vqr1q)−B(vq)B(r1q)

=

Z 1

Z 1

tq

r1q/r1q/tq − 1

4(1− 2� + �2)

=1

12(1− �)2(1 + 2�)

∂(vq c2q) = B(vqr2q)−B(vq)B(r2q)

B(vqr2q) =

Z 1

Z 1

0

Z 1

tq

r2qc(r1q r2q)/r1q/r2q/tq

=� + 1

Z 1

Z 1

0r− 1

2q

Z 1

tq

r−( +1 )1q

µr− 1

1q + r− 1

2q − 1

¶−(�+2)

/r1q/r2q/tq

=� + 1

Z 1

Z 1

0r− 1

2q

(�

� + 1

"r

+1

2q −µt− 1

q + r− 1

2q − 1

¶−(�+1)#)

/r2q/tq

=

Z 1

Z 1

0

(r2q − r

−1

2q

µt−1

q + r− 1

2q − 1

¶−(�+1))

/r2q/tq

so that:

∂(vq c2q) =1

4(1− �2)−

Z 1

Z 1

0r− 1

2q

µt− 1

q + r−1

2q − 1

¶−(�+1)

/r2q/tq

Under DGP (2), vq = 1(r1q , sq) with sq ∼ R(0 1), so results for this DGP are as above with� = 0. Note that the first result gives B (vq) = 1

2 , and the second yields ∂(vq c1q) = 112 = S (c1q).

Moreover, we can also derive a useful result for ∂(vq c2q) under DGP (2):

∂(vq c2q) = B(vqr2q)−B(vq)B(r2q)

=

Z 1

0

Z 1

0

Z 1

01(r1q , sq)r2qc(r1q r2q)/r1q/r2q/sq − 1

4

=

Z 1

0

Z 1

0r2qc(r1q r2q)

Z 1

01(sq � r1q)/sq/r1q/r2q − 1

4

=

Z 1

0

Z 1

0r2qc(r1q r2q)

Z r1q

01/sq/r1q/r2q − 1

4

=

Z 1

0

Z 1

0r1qr2qc(r1q r2q)/r1q/r2q − 1

4

= ∂(c1q c2q)

Finally, for DGP (3):

B (vq) =

Z 1

0

Z 1

01(rd

1q , seq )/r1q/sq

=

Z 1

0

Z 1

se.dq

1/r1q/sq

=e

d + e

20

Page 22: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

∂(vq c1q) = B(vqrg1q)−B(vq)B(rg

1q)

=

Z 1

0

Z 1

se.dq

rg1q/r1q/sq − e

(d + e)(g + 1)

=deg

(d + ge + e) (d + e) (g + 1)

∂(vq c2q) = B(vqrh2q)−B(vq)B(rh

2q)

=

Z 1

0

Z 1

0

Z 1

01(rd

1q , seq )rh2qc(r1q r2q)/r1q/r2q/sq − e

(d + e)(h + 1)

=

Z 1

0

Z 1

0rh

2qc(r1q r2q)

Z rd.e1q

01/sq/r1q/r2q − e

(d + e)(h + 1)

=

Z 1

0

Z 1

0r

de1qr

h2qc(r1q r2q)/r1q/r2q − e

(d + e)(h + 1)

= B(rde1qr

h2q)−

e

(d + e)(h + 1)

Appendix to Section 4

We require the moments involved in the expressions forJ and N for FE (1), FE (2) and FE (3)under DGP (2). Note that �q = vq − c1q in each case, and that v�q = vq for all 2 , 0. We makeuse of results for DGP (2) in the Appendix to Section 3. First we note the following equalityfor all 2 (:

B(vqr�1qr

�2q) =

Z 1

0

Z 1

0

Z 1

01(r1q , sq)r

�1qr

�2q/r1q/r2q/sq

=

Z 1

0

Z 1

0r�

1qr�2q

Z r1q

01/sq/r1q/r2q

=

Z 1

0

Z 1

0r�+1

1q r�2q/r1q/r2q

= B(r�+11q r�

2q)

Using these results and the fact that B(r�1qr

�2q) = B(r�

1qr�2q), the required moments can be

derived:

S (�q) = B[(vq − c1q)2]

= B(v2q ) + B(r2

1q)− 2B(r21q)

=1

6

S (c1q�q) = B[r21q(vq − r1q)

2]−B[r1q(vq − r1q)]2

= B(r31q)−B(r4

1q)

=1

20

21

Page 23: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

S (c2q�q) = B[r22q(vq − r1q)

2]−B[r2q(vq − r1q)]2

= B(r1qr22q)−B(r2

1qr22q)

= j21 −j22

∂(c1q�q �q) = B[r1q(vq − r1q)2]−B[r1q(vq − r1q)]B(vq − r1q)

= B(r21q)−B(r3

1q)

=1

12

∂(c2q�q �q) = B[r2q(vq − r1q)2]−B[r2q(vq − r1q)]B(vq − r1q)

= B(r1qr2q)−B(r21qr2q)

= j11 −j21

∂(c1q�q c2q�q) = B[r1qr2q(vq − r1q)2]−B[r1q(vq − r1q)]B[r2q(vq − r1q)]

= B(r21qr2q)−B(r3

1qr2q)

= j21 −j31

B[(b1q − b2q)2] = B[(c2q − c1q)

2]

= B(r22q) + B(r2

1q)− 2B(r1qr2q)

=2

3− 2j11

S [(b1q − b2q)�q] = B[(r2q − r1q)2(vq − r1q)

2]−B(r2q − r1q)B(vq − r1q)

= B(r1qr22q) + B(r3

1q)− 2B(r21qr2q)−B(r2

1qr22q)−B(r4

1q) + 2B(r31qr2q)

= 2j31 −j21 −j22 +1

20

∂[(b1q − b2q)�q �q] = B[(r2q − r1q)(vq − r1q)2]−B[(r2q − r1q)(vq − r1q)]B(vq − r1q)

= B(r1qr2q)−B(r21qr2q)−B(r2

1q)−B(r31q)

= j11 −j21 − 1

12

Appendix to Section 5

Under DGP (4), denoting the probability density function of c1q by c(c1q):

B(vq) =

Z 1

0

Z 1

01(c1q , sq)c(c1q)/sq/c1q

=

Z 1

0c(c1q)

Z c1q

01/sq/c1q

=

Z 1

0c1qc(c1q)/c1q

= B(c1q)

22

Page 24: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

B(vqc1q) =

Z 1

0

Z 1

01(c1q , sq)c1qc(c1q)/sq/c1q

=

Z 1

0c1qc(c1q)

Z c1q

01/sq/c1q

=

Z 1

0c2

1qc(c1q)/c1q

= B(c21q)

giving:

∂(vq c1q) = B(vqc1q)−B(vq)B(c1q)

= B(c21q)−B(c1q)B(c1q)

= S (c1q)

23

Page 25: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

References

Anderson, H. M., and Vahid, F. (2001). Predicting the probability of a recession with nonlinearautoregressive leading indicator models. Macroeconomic Dynamics, 5, 482—505.

Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly WeatherReview, 75, 1—3.

Carruth, A. A., Hooker, M. A., and Oswald, A. J. (1998). Unemployment equilibria and inputprices: Theory and evidence from the United States. Review of Economics and Statistics,80, 621—628.

Chatfield, C. (1993). Calculating interval forecasts. Journal of Business & Economic Statistics,11, 121—135.

Chauvet, C., and Potter, S. (2002). Predicting a recession: evidence from the yield curve inthe presence of structural breaks. Economic Letters, 77, 245—253.

Chong, Y. Y., and Hendry, D. F. (1986). Econometric evaluation of linear macro-economicmodels. Review of Economic Studies, 53, 671—690. Reprinted in Granger, C. W. J. (ed.)(1990), Modelling Economic Series. Oxford: Clarendon Press.

Christoffersen, P. F. (1998). Evaluating interval forecasts. International Economic Review, 39,841—862.

Clark, T. E., and McCracken, M. W. (2001). Tests of equal forecast accuracy and encompassingfor nested models. Journal of Econometrics, 105, 85—110.

Clemen, R. T. (1989). Combining forecasts: A review and annotated bibliography. InternationalJournal of Forecasting, 5, 559—583. Reprinted in Mills, T. C. (ed.) (1999), EconomicForecasting. The International Library of Critical Writings in Economics. Cheltenham:Edward Elgar.

Clements, M. P., and Hendry, D. F. (1993a). On the limitations of comparing mean squaredforecast errors. Journal of Forecasting, 12, 617—637. With discussion. Reprinted in Mills,T. C. (ed.) (1999), Economic Forecasting. The International Library of Critical Writingsin Economics. Cheltenham: Edward Elgar.

Clements, M. P., and Hendry, D. F. (1993b). On the limitations of comparing mean squaredforecast errors: A reply. Journal of Forecasting, 12, 669—676.

Clements, M. P., and Taylor, N. (2003). Evaluating prediction intervals for high-frequency data.Journal of Applied Econometrics, 18, 445 — 456.

Cook, R. D., and Johnson, M. E. (1981). A family of distributions for modelling non-ellipticallysymmetric multivariate data. Journal of the Royal Statistical Society, Series B, 43, 210—218.

Diebold, F. X., Gunther, T. A., and Tay, A. S. (1998). Evaluating density forecasts: Withapplications to financial risk management. International Economic Review, 39, 863—883.

Diebold, F. X., and Lopez, J. A. (1996). Forecast evaluation and combination. In Maddala,G. S., and Rao, C. R. (eds.), Handbook of Statistics, Vol. 14, pp. 241—268: Amsterdam:North—Holland.

Diebold, F. X., and Mariano, R. S. (1995). Comparing predictive accuracy. Journal of Businessand Economic Statistics, 13, 253—263. Reprinted in Mills, T. C. (ed.) (1999), Economic

24

Page 26: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Forecasting. The International Library of Critical Writings in Economics. Cheltenham:Edward Elgar.

Engle, R. F., and Manganelli, S. (1999). CAViaR: Conditional Autoregressive Value-at-Risk byregression quantiles. UCSD Discussion Paper 99-20, Department of Economics, UCSD.

Estrella, A., and Mishkin, F. S. (1998). Predicting US recessions: Financial variables as leadingindicators. Review of Economics and Statistics, 80, 45—61.

Fildes, R., and Ord, K. (2002). Forecasting competitions — their role in improving forecastingpractice and research. In Clements, M. P., and Hendry, D. F. (eds.), A Companion toEconomic Forecasting, pp. 322—353: Oxford: Blackwells.

Granger, C. W. J., and Newbold, P. (1973). Some comments on the evaluation of economicforecasts. Applied Economics, 5, 35—47. Reprinted in Mills, T. C. (ed.) (1999), EconomicForecasting. The International Library of Critical Writings in Economics. Cheltenham:Edward Elgar.

Granger, C. W. J., White, H., and Kamstra, M. (1989). Interval forecasting: An analysis basedupon ARCH-quantile estimators. Journal of Econometrics, 40, 87—96.

Hamilton, J. D. (1983). Oil and the macroeconomy since World War II. Journal of PoliticalEconomy, 91, 228—248.

Hamilton, J. D. (1996). This is what happened to the oil price-macroeconomy relationship.Journal of Monetary Economics, 38, 215—220.

Hamilton, J. D. (2000). What is an oil shock? Journal of Econometrics, 113, 363—398.Hamilton, J. D., and Kim, D. H. (2000). A re-examination of the predictability of economic

activity using the yield spread. NBER Working Papers, 7954.Harvey, D. I., Leybourne, S., and Newbold, P. (1998). Tests for forecast encompassing. Journal

of Business and Economic Statistics, 16, 254—259. Reprinted in Mills, T. C. (ed.) (1999),Economic Forecasting. The International Library of Critical Writings in Economics. Chel-tenham: Edward Elgar.

Harvey, D. I., Leybourne, S., and Newbold, P. (2000). Tests for multiple forecast encompassing.Journal of Applied Econometrics, 15, 471—482.

Hooker, M. A. (1996). Whatever happened to the oil price-macroeconomy relationship?. Journalof Monetary Economics, 38, 195—213.

Lee, K., Ni, S., and Ratti, A. (1995). Oil shocks and the macroeconomy: The role of pricevariability. Energy Journal, 16, 39—56.

Mork, K. A. (1989). Oil and the Macroeconomy when prices go up and down: An extension ofHamilton’s results. Journal of Political Economy, 97, 740—744.

Nelson, C. R. (1972). The prediction performance of the FRB-MIT-PENN model of the USeconomy. American Economic Review, 62, 902—917. Reprinted in Mills, T. C. (ed.) (1999),Economic Forecasting. The International Library of Critical Writings in Economics. Chel-tenham: Edward Elgar.

Newbold, P., and Granger, C. W. J. (1974). Experience with forecasting univariate time seriesand the combination of forecasts. Journal of the Royal Statistical Society, A, 137, 131—146. Reprinted in Mills, T. C. (ed.) (1999), Economic Forecasting. The InternationalLibrary of Critical Writings in Economics. Cheltenham: Edward Elgar.

25

Page 27: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Newbold, P., and Harvey, D. I. (2002). Forecasting combination and encompassing. In Clements,M. P., and Hendry, D. F. (eds.), A Companion to Economic Forecasting, pp. 268—283:Oxford: Blackwells.

Stock, J. H., and Watson, M. W. (1999). A comparison of linear and nonlinear univariatemodels for forecasting macroeconomic time series. In Engle, R. F., and White, H. (eds.),Cointegration, Causality and Forecasting: A Festschrift in Honour of Clive Granger, pp.1—44. Oxford: Oxford University Press.

Tay, A. S., and Wallis, K. F. (2000). Density forecasting: A survey. Journal of Forecasting, 19,235—254. Reprinted in Clements, M. P. and Hendry, D. F. (eds.) (2002), A Companion toEconomic Forecasting. Oxford: Blackwells.

Wallis, K. F. (2003). Chi-squared tests of interval and density forecasts, and the Bank ofEngland’s fan charts. International Journal of Forecasting, 19, 165—176.

West, K. D. (1996). Asymptotic inference about predictive ability. Econometrica, 64, 1067—1084.

West, K. D. (2001). Tests for forecast encompassing when forecasts depend on estimatedregression parameters. Journal of Business and Economic Statistics, 19, 29—33.

West, K. D., and McCracken, M. W. (1998). Regression-based tests of predictive ability. In-ternational Economic Review, 39, 817—840.

White, H. (1984). Asymptotic Theory for Econometricians. London: Academic Press.

26

Page 28: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Table 1. Asymptotic variances and sizes of nominal 5%-levelstandard tests for forecast encompassing; DGP (2)

FE (1) FE (2) FE (3)

�2 Size �2 Size �2 Size

∞ 0.000 1.000 5.000 0.800 2.843 1.000 5.0006.985 0.100 0.995 4.946 0.827 3.116 0.989 4.8793.220 0.200 0.997 4.960 0.856 3.417 0.973 4.6961.956 0.300 1.004 5.046 0.887 3.745 0.953 4.4661.318 0.400 1.018 5.202 0.920 4.104 0.928 4.1930.929 0.500 1.037 5.425 0.955 4.493 0.900 3.8830.664 0.600 1.061 5.706 0.992 4.909 0.867 3.5310.469 0.700 1.088 6.026 1.030 5.341 0.828 3.1290.314 0.800 1.114 6.329 1.065 5.754 0.781 2.6540.180 0.900 1.123 6.441 1.088 6.022 0.716 2.054

27

Page 29: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Table 2. Empirical sizes of nominal 5%-level tests; DGP (2), = 0

k O O1 O2 AJ JAJ

FE (1) 8 6.49 17.42 0.23 13.66 4.7216 5.31 10.23 2.85 9.07 5.6732 4.86 7.06 4.07 6.71 5.2364 4.97 5.97 4.64 5.79 5.14

128 5.00 5.53 4.89 5.44 5.14256 4.91 5.21 4.84 5.16 4.99512 4.88 4.96 4.82 4.96 4.89

FE (2) 8 4.65 15.98 0.77 15.98 8.2716 3.57 10.99 5.57 11.20 8.2132 3.12 8.28 5.90 8.39 7.0764 2.96 6.68 5.62 6.75 6.13

128 2.80 5.73 5.22 5.78 5.46256 2.81 5.40 5.17 5.44 5.30512 2.85 5.24 5.09 5.24 5.14

FE (3) 8 5.73 14.40 0.17 12.62 4.8416 5.17 9.39 2.78 8.60 5.4332 4.86 6.70 3.87 6.34 4.9964 4.91 5.91 4.52 5.78 5.13

128 5.00 5.48 4.79 5.43 5.04256 4.94 5.21 4.83 5.18 4.99512 4.88 4.98 4.82 4.98 4.88

Table 3. Empirical sizes of nominal 5%-level tests; DGP (2), = 0�8

k O O1 O2 AJ JAJ

FE (1) 8 8.40 23.05 0.20 12.77 4.1116 6.86 14.80 2.53 8.71 5.2132 6.57 9.88 4.20 6.81 5.3764 6.26 7.18 4.64 5.89 5.20

128 6.34 6.14 4.91 5.47 5.17256 6.38 5.60 4.96 5.24 5.10512 6.24 5.16 4.82 5.00 4.92

FE (2) 8 7.39 21.06 0.14 12.48 4.4516 6.38 14.20 2.85 8.75 5.5732 6.08 9.81 4.35 6.94 5.5764 5.78 7.28 4.82 6.00 5.37

128 5.82 6.15 4.85 5.47 5.12256 5.83 5.67 5.03 5.32 5.14512 5.74 5.30 5.01 5.14 5.06

FE (3) 8 4.61 16.03 1.73 17.39 10.1016 3.46 10.94 5.83 11.52 8.6232 3.01 7.76 5.71 8.01 6.8064 2.74 6.37 5.38 6.50 5.88

128 2.79 5.79 5.30 5.90 5.59256 2.69 5.31 5.09 5.34 5.19512 2.61 5.00 4.88 5.03 4.96

28

Page 30: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Table 4. Empirical sizes of nominal 5%-level tests; DGP (1), = 0, � = 0�5

k O O1 O2 AJ JAJ

FE (1) 8 3.76 7.58 0.04 5.66 1.9316 3.95 5.15 1.07 4.70 2.4632 4.65 5.34 2.74 5.27 3.8264 4.84 5.81 4.37 5.74 4.96

128 5.04 5.61 4.93 5.52 5.18256 4.88 5.17 4.83 5.14 5.00512 5.01 5.07 4.93 5.07 4.98

FE (2) 8 21.11 39.00 0.75 28.01 13.8516 27.26 36.50 17.08 29.78 23.7932 43.25 43.95 33.24 40.71 36.9464 69.23 66.42 61.39 65.13 63.28

128 93.00 91.32 90.24 91.04 90.62256 99.82 99.70 99.67 99.70 99.68512 100.00 100.00 100.00 100.00 100.00

FE (3) 8 5.63 14.64 0.21 12.61 4.9016 5.25 9.56 2.91 8.61 5.4332 5.03 7.07 4.20 6.66 5.3164 5.00 6.05 4.60 5.84 5.14

128 5.04 5.49 4.84 5.42 5.10256 5.03 5.30 4.97 5.26 5.10512 4.94 4.99 4.83 4.98 4.90

Table 5. Empirical sizes of nominal 5%-level tests; DGP (3), = 0�724,vq = 1(r1q , s2

q), c1q = r1q, c2q = r2

2q

k O O1 O2 AJ JAJ

FE (1) 8 4.95 16.41 0.94 10.06 3.4116 3.56 10.62 1.66 6.60 3.6632 3.40 8.75 3.16 5.90 4.3864 3.41 7.51 4.35 5.71 4.98

128 3.39 6.24 4.71 5.28 4.97256 3.54 5.71 4.94 5.22 5.06512 3.46 5.33 4.96 5.10 5.01

FE (2) 8 4.11 14.41 0.25 11.17 4.2916 3.23 10.59 2.03 7.66 4.4732 3.46 9.80 3.98 6.86 5.2664 4.40 9.57 6.00 7.57 6.72

128 5.95 10.30 8.25 9.03 8.59256 9.34 13.95 12.38 12.94 12.63512 16.02 21.64 20.46 20.87 20.69

FE (3) 8 6.00 14.20 0.64 13.53 6.8216 4.90 8.77 3.23 8.80 5.7632 5.78 8.17 5.06 8.33 6.5264 9.44 12.25 9.84 12.20 10.96

128 17.08 21.30 19.66 21.18 20.34256 33.95 40.36 39.22 40.28 39.69512 63.56 70.34 69.86 70.30 70.06

29

Page 31: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Table 6. Estimated powers of nominal 5%-level tests; DGP (2), = 0

� = 0�75 � = 0�5

k O O1 O2 AJ JAJ O O1 O2 AJ JAJ

FE (1) 8 7.97 20.60 0.49 16.19 6.27 12.62 28.45 1.08 22.86 10.2016 9.23 16.22 5.80 14.52 9.91 20.57 31.37 14.64 28.49 21.7832 13.92 18.07 12.36 17.36 14.71 39.00 45.92 36.63 44.60 40.4164 24.06 26.68 23.15 26.24 24.58 67.94 71.48 67.31 70.91 69.05

128 43.91 45.73 43.55 45.52 44.47 93.62 94.31 93.64 94.22 93.91256 73.05 73.94 73.12 73.85 73.44 99.90 99.91 99.90 99.91 99.90512 95.67 95.84 95.70 95.82 95.75 100.00 100.00 100.00 100.00 100.00

FE (2) 8 8.04 18.50 0.36 14.38 5.90 17.65 30.25 0.37 21.94 8.7216 11.64 14.87 3.69 12.29 7.56 34.77 37.86 12.47 31.78 22.1432 21.58 21.38 12.47 19.46 15.75 64.55 63.91 48.83 60.17 54.6164 41.80 41.02 34.65 39.24 36.80 92.22 91.72 88.68 90.78 89.76

128 72.36 72.47 69.52 71.56 70.46 99.83 99.82 99.75 99.79 99.77256 95.66 95.97 95.61 95.84 95.72 100.00 100.00 100.00 100.00 100.00512 99.95 99.97 99.97 99.97 99.97 100.00 100.00 100.00 100.00 100.00

FE (3) 8 7.24 17.12 0.21 15.01 6.09 11.50 24.42 0.54 21.40 9.6316 9.23 14.68 5.19 13.66 9.24 19.67 28.87 12.95 26.92 19.9932 14.05 17.34 11.91 16.85 14.22 36.95 43.48 34.24 42.51 38.3364 23.95 26.13 22.66 25.76 24.11 65.22 68.90 64.55 68.30 66.35

128 43.20 44.94 42.78 44.67 43.64 92.04 92.93 92.21 92.85 92.50256 72.27 73.17 72.29 73.06 72.66 99.82 99.87 99.85 99.86 99.85512 95.26 95.46 95.34 95.47 95.38 100.00 100.00 100.00 100.00 100.00

Table 7. Estimated powers of nominal 5%-level tests; DGP (2), = 0�5

� = 0�75 � = 0�5

k O O1 O2 AJ JAJ O O1 O2 AJ JAJ

FE (1) 8 8.77 21.59 0.33 15.01 5.49 12.70 27.94 0.61 19.78 8.0216 9.72 15.76 4.23 12.01 7.92 19.66 27.84 9.99 22.38 16.2232 13.60 15.25 8.93 13.32 11.09 34.36 37.13 25.94 33.91 29.9164 21.58 20.67 16.60 19.54 17.96 59.78 59.30 53.40 57.68 55.46

128 37.30 35.14 32.35 34.30 33.24 87.88 86.99 85.37 86.51 85.89256 63.82 61.24 59.78 60.76 60.26 99.29 99.20 99.13 99.18 99.15512 89.83 88.69 88.38 88.64 88.49 100.00 100.00 100.00 100.00 100.00

FE (2) 8 8.46 19.99 0.19 13.09 4.79 14.00 26.90 0.21 16.39 5.9016 10.42 14.70 2.94 9.91 5.96 23.91 27.67 6.04 18.84 11.9432 15.48 14.85 7.10 11.73 9.19 42.76 40.04 24.10 33.73 28.8064 26.10 22.54 16.72 20.02 18.26 71.07 66.91 58.66 63.38 61.03

128 46.57 41.83 38.02 40.21 39.04 94.44 92.98 91.48 92.26 91.88256 75.74 72.63 70.83 71.76 71.30 99.87 99.83 99.79 99.80 99.79512 96.15 95.55 95.31 95.45 95.37 100.00 100.00 100.00 100.00 100.00

FE (3) 8 5.52 15.85 0.59 15.82 7.65 6.73 18.73 0.83 18.92 9.7316 5.44 12.49 5.60 12.49 9.02 8.43 18.43 9.31 18.42 13.9332 6.54 12.02 8.71 12.04 10.21 12.97 22.17 17.03 22.25 19.4564 8.95 13.97 12.04 14.01 12.96 22.73 32.15 29.00 32.20 30.53

128 14.17 19.64 18.36 19.67 18.98 42.91 52.53 50.65 52.47 51.52256 25.45 31.75 30.94 31.79 31.32 74.07 80.44 79.73 80.38 80.05512 47.27 54.07 53.63 54.09 53.82 96.38 97.68 97.63 97.68 97.65

30

Page 32: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Table 8. Empirical sizes of nominal 5%-level JAJ tests withparameter estimation uncertainty; DGP (4)

k.O

k 2 1 0.5 0.25 0.125 0.0625 No estimation

FE (1) 8 3.21 4.51 4.61 4.79 4.82 4.97 5.3116 5.28 5.63 5.43 5.60 5.63 5.88 5.8232 5.46 5.36 5.49 5.36 5.38 5.40 5.4864 5.21 5.19 5.40 5.20 5.39 5.24 5.38

128 5.14 5.15 5.12 5.05 5.25 5.18 5.08256 5.16 5.05 4.86 5.10 4.97 5.10 5.12512 5.01 5.05 5.14 4.85 5.04 5.10 4.95

FE (2) 8 7.69 8.36 8.79 9.74 10.51 11.17 7.7116 16.56 11.27 10.04 9.67 9.55 9.31 7.3032 18.56 13.08 10.75 9.22 8.52 7.89 6.3664 21.21 14.81 11.02 8.37 7.55 6.98 5.81

128 23.30 15.57 10.84 8.11 7.25 6.30 5.36256 24.61 16.50 10.94 8.01 6.53 5.97 5.21512 25.27 16.36 10.80 8.05 6.57 5.84 5.01

FE (3) 8 2.68 3.84 4.28 4.45 4.61 4.62 5.1016 4.86 5.10 5.18 5.33 5.27 5.41 5.3432 5.05 5.11 5.25 5.04 5.18 5.18 5.3064 5.06 5.08 5.26 5.03 5.25 5.01 5.22

128 5.08 5.08 5.04 4.94 5.15 5.13 5.01256 5.06 5.05 4.89 5.09 4.93 5.12 5.13512 5.01 5.03 5.14 4.86 5.03 5.07 4.95

31

Page 33: Forecast Encompassing Tests and Probability Forecasts · 2006. 11. 14. · the standard test of 0 2 =0will no longer be standard normal, and that using K(0 1) critical values will

Table 9. In-sample fits of logit models

QPS LPS O2 Slope m-value

Spread 0.154 0.258 0.316 0.000Oil price 0.232 0.393 0.074 0.017

Table 10. Forecast encompassing tests of probability forecasts of recession

Panel A. E0: Spread forecast encompasses oil price

O O1 O2 AJ JAJ

FE (1) 0.003 0.016 0.112 0.108 0.111FE (2) 0.001 0.042 0.083 0.079 0.082FE (3) 0.045 0.135 0.198 0.196 0.199

Panel B. E0: Oil price forecast encompasses spread

O O1 O2 AJ JAJ

FE (1) 0.000 0.010 0.029 0.025 0.027FE (2) 0.000 0.011 0.031 0.027 0.029FE (3) 0.000 0.020 0.030 0.026 0.028

Note: The entries are m-values of the null of forecast encompassing, thus an entryless than 0.05 indicates rejection at the 5% level of a one-sided test of the null that�2 = 0 versus �2 , 0. The O, O1, O2 and JAJ statistics are compared to theqk−1 distribution, while the AJ statistics are compared to the K(0� 1).

32