An evaluation of simple vs. complex selection rules for forecasting … · 2017-04-07 · the best...

An evaluation of simple vs. complex selection

rules for forecasting many time series

Fotios Petropoulos

Prof. Robert Fildes

Outline

• Introduction

• Research Questions

• Experimental set-up: methods, data & rules

• Empirical investigation

• Implications & limitations

• Conclusions & perspectives

Introduction

• Forecasters regularly face the question of choosing from a

set of alternative forecasting methods.

• The forecasting methods usually considered are simple.

• Two distinct approaches have been proposed for dealing

with this problem (Fildes, 1989):

• aggregate selection

• individual selection

• The main objective of the current research is to investigate

the conditions under which individual model selection may

be beneficial.

Why is individual method selection important?

• If individual selection could be done perfectly, the gains over

the best ex-post aggregate selection would be substantial.

• Improvements in univariate forecasting accuracy financially

valuable.

• Results of various competitions:

• no single best method

• wide divergence in comparative performance by series

• within homogeneous data sets unexpected methods

dominate

• The alternative: to develop a new general method. Is this

feasible?

Research Questions (1)

• Segmenting the data:

• Predictable/Unpredictable • Is the performance of non-seasonal Random Walk

forecasting method is better than the median performance

of all other methods under investigation as defined by

Mean Absolute Error in the validation data?

• Trended/Non-Trended • Cox-Stuart test over a 12-period centered moving average

• Seasonal/Non-Seasonal • Friedman’s non-parametric test

RQ1. Is individual model selection more effective when

applied to groups of time series with specific characteristics?


RQ2. What are the effects on individual selection of including

more methods in the pool under consideration?

• Number of models included in the pool of alternatives.

• Effectively a variant of over-fitting, the more models

included, the higher the probability that the wrong model is

chosen due to the randomness in the data.

• Every possible combination of smaller pools of two (2) up to

twelve (12) methods is also examined.

• For example, in the case of a pool of methods equal to

four (4), all 495 possible pools of methods are checked,

the number of 4-combination in a set of 12


• Many of the methods included in typical extrapolative

competitions produce similar forecasts.

• Selection between methods that produce similar forecasts

cannot prove valuable.

• Selecting among methods with low to medium correlated

outputs and similar levels of accuracy is intuitively more

promising.

RQ3. Do pools of methods with low correlation, in terms of

forecast error, provide better forecasting performance when

individual selection rules are considered compared to more highly

correlated pools?


• Individual selection would be unlikely to be beneficial when a

single method is dominant.

• Identify sub-populations where a dominant method exists (or

not).

• Divide the data series into two groups in terms of the

performance of one of the best methods.

• 1st Group (Dominant method): Theta’s performance in the top

three

• 2nd Group (Non-dominant method): Theta’s performance

outside the top three

RQ4. Individual method selection is of most value when no

dominant method is identified across the population.


• The performance of the different methods is also analysed

for their stability.

• Stability in a specific series can be measured by the average

(across time origins) Spearman’s rank correlation coefficient.

• A series is defined as stable when its Spearman’s rho falls in

the top quartile of the data set.

RQ5. Individual selection is only effective compared to aggregate

selection when relative performance in the pool of methods under

consideration is stable.

How to choose a ‘best’ method?

• In-sample fit rules

• MSE, AIC, BIC etc

• Makes intuitive sense, but on its own has poor

predictive properties (Pant & Starbuck, 1990).

• Out-of-sample rules

• Makes even more sense: what has forecast the most

accurately, will forecast the most accurately.

• Complex selection rules

• Based on numerous data characteristics & combination

approaches (for example, Collopy & Armstrong, 1992;

Shah, 1997; Meade, 2000).

Selection Rules (individual selection)

Rule 1. Use the method with best fit as measured by the minimum one-

step ahead in sample Mean Squared Error.

Rule 2. Use the method with the best out-of-sample one-step-ahead

forecast error, in terms of Mean Absolute Percentage Error.

Rule 3. Use the method with best out-of-sample h-step-ahead forecast,

in terms of Mean Absolute Percentage Error, and apply this method to

forecast for just the same lead time.

Rule 4. Use the best out-of-sample 1-18-steps-ahead, in terms of Mean

Percentage Absolute Error, method to forecast for all lead times.

Benchmarks

The empirical results focus on comparing the performance of individual

selection versus the two simple benchmarks:

• Aggregate selection: this uses the single best method based on the

one-step-ahead out-of-sample performance on the validation sample

• The combination of methods applying equal weights to each method

included in the selection pool.

The set-up: data & methods

• M3 data

• monthly frequency

• 126 or more observations

• n = 998 series

48 obs

Initialization

42 obs

Selection

36 obs

Evaluation

Rolling Forecasting:

18 forecasts ahead are

calculated for each origin

T1 T2

• Methods: Naïve, SES, Holt, Damped, Holt-Winters,

Theta, ARIMA

Individual selection approaches:

A graphical illustration

48 obs 42 obs

36 obs



48 obs



49 obs



50 obs



89 obs



miPFPiMethodBest m

kTTAPEkT ,6,1|minarg timeslead allfor |_ 1, 212



90 obs

miPFPiMethodBest m

kTTAPEkT ,6,1|minarg timeslead allfor |_ 1, 212

Individual Selection vs Aggregate Selection & Combination % of cases improved when all series are considered

Me

tho

ds

in

se

lec

tio

n

po

ol

Ave

rag

e C

orr

ela

tio

n

Nu

mb

er

of

ca

se

s % of cases Individual Selection performed better

vs. Aggregate vs. Combination

Rule

1

Rule

2

Rule

3

Rule

4

Rule

1

Rule

2

Rule

3

Rule

4

2-4 Low 611 31.3 65.0 44.4 86.4 33.4 75.0 83.1 90.0

High 170 13.5 65.3 55.3 75.3 18.8 60.6 70.6 80.6

5-8 Low 2712 17.4 51.5 47.7 92.7 38.6 80.7 89.5 97.5

High 291 4.8 78.7 62.2 90.7 9.6 84.5 82.1 95.9

9-12 Low 295 3.7 46.4 41.7 97.6 45.1 91.2 95.3 100.0

High 4 0.0 100.0 25.0 100.0 25.0 100.0 100.0 100.0

Best Practices

Segment Best Practice

Entire Data Set Individual selection (Rule 4) applied on a high number of low correlated methods

Predictable Combination applied on medium number of high correlated methods

Unpredictable Individual selection (Rule 4) applied on a high number of high correlated methods

Trended Individual selection (Rule 4) applied on a high number of high correlated methods

Non-Trended Combination applied on a medium number of high correlated methods

Seasonal Individual selection (Rule 3) applied on a medium number of high correlated methods

Non-Seasonal Combination applied on a medium number of high correlated methods

Dominant method Aggregate selection applied on a small number of high correlated methods

Non-dominant

method Aggregate selection applied on a medium number of high correlated methods

Stable Individual selection (Rule 4) applied on a high number of low correlated methods

Unstable Combination applied on a high number of high correlated methods

Accuracy improvements (MdAPE %)

Ag

gre

ga

te

Sele

cti

on

Co

mb

ina

tio

n

Ind

ivid

ua

l

Se

lec

tio

n

Me

dia

n

Pe

rfo

rma

nc

e o

f

Be

st

Pra

cti

ce

DD

am

p

(Be

nc

hm

ark

)

Imp

rove

me

nt

Entire Data Set 7.4% 7.6% 7.2% 7.1% 7.5% 4.8%

Predictable 6.5% 6.9% 6.6% 6.4% 6.6% 2.7%

Unpredictable 9.1% 8.8% 8.7% 8.4% 9.5% 11.4%

Trended 6.8% 7.0% 6.8% 6.7% 6.9% 2.4%

Non-Trended 14.9% 15.8% 15.2% 14.6% 15.1% 3.3%

Seasonal 9.2% 9.5% 9.0% 8.9% 9.2% 3.0%

Non-Seasonal 4.5% 4.4% 4.6% 4.4% 4.5% 2.1%

Dominant Method 7.8% 8.4% 7.9% 8.0% 8.2% 2.5%

Non-dominant

method 6.7% 6.9% 6.8% 6.6% 6.8% 3.0%

Stable 6.9% 7.8% 6.8% 6.6% 6.9% 4.1%

Unstable 7.6% 7.6% 7.6% 7.5% 8.1% 7.7%

Implications & Limitations

• Enhance the integrated automatic selection procedures.

• Best practices indicate a mix of simple and easy to apply approaches

with more complicates selection schemes.

• Computational intensive → specialized system design .

• Limited data history available.

• Data coming from a specific industry might appear more

homogeneous.

• Adjustments in the design → defining useful segments.

Conclusions

• Segmenting the series enables identifying suitable sub-populations of

data, where the application of individual selection is more effective.

• Improvements from individual selection over aggregate selection are

recorded when small to medium pools of methods are considered.

• Aggregate selection produces the best results when a single method

displays dominant performance across a specific sample of series.

• Individual selection produces more accurate forecasts, when series are

identified as stable. However, for the unstable series, the combination of

methods is the most robust choice.

• The selection rule relying to the forecast performance over all horizons

(Rule 4) proved better, while simply relying on past in-sample performance

over the fitted data proved inadequate.

• Next step: Enhance the selection rules by using a large number of

variables proposed in the literature to characterise a time series.

Thank you for your attention [email protected]

?

An evaluation of simple vs. complex selection rules for forecasting … · 2017-04-07 · the best...

Documents

Transcript of An evaluation of simple vs. complex selection rules for forecasting … · 2017-04-07 · the best...