MORE RAPID TOURISM STATISTICS USING...

38
Statistica Applicata Vol. 18, n. 3, 2006 445 MORE RAPID TOURISM STATISTICS USING AUXILIARY VARIABLES Roberto Gismondi 1 ISTAT, Italian National Statistical Institute, Business Short-term Statistics Directorate. Via Tuscolana 1788, 00173 Roma, Italy, [email protected] Abstract The ISTAT monthly survey on arrivals and nights spent in the Italian tourist establishments produces final estimates after 180 days from the end of the reference month, while national users need for more rapid short-term indicators, namely data within 30-60 days. In this context, we propose and compare some quick estimation methods aimed at improving timeliness and quality of provisional estimates. A particular attention is paid to the potential self-selection bias affecting natural quick respondent units. An empirical application – referred to the period 2002-2004 – has been carried out, based on random replications of theoretical quick respondents and on a subset of Italian provinces, whose data are normally available in advance. Keywords: Provisional estimate, Late respondent, Nights spent, Self-selection bias, Tourism. 1. THE TRADE-OFF BETWEEN TIMELINESS AND PRECISION OF ESTIMATES Among the main components on which the EU statistical definition of quality for short-term statistics is founded (EUROSTAT, 2000), accuracy and timeliness seem to be the most relevant both for producers and users of statistical data. While accuracy is normally measured by the percent difference between provisional and final estimates, timeliness is measured as the time lag between the reference time point (or the end of the reference period) and the date of data dissemination. 1 The opinions herein expressed don’t involve ISTAT and must be addressed to the author only, as well as possible errors or omissions. All tables and graphs derive from elaborations on ISTAT data.

Transcript of MORE RAPID TOURISM STATISTICS USING...

Page 1: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

Statistica Applicata Vol. 18, n. 3, 2006 445

MORE RAPID TOURISM STATISTICS USINGAUXILIARY VARIABLES

Roberto Gismondi1

ISTAT, Italian National Statistical Institute, Business Short-term StatisticsDirectorate. Via Tuscolana 1788, 00173 Roma, Italy, [email protected]

Abstract

The ISTAT monthly survey on arrivals and nights spent in the Italian touristestablishments produces final estimates after 180 days from the end of the reference month,while national users need for more rapid short-term indicators, namely data within 30-60days. In this context, we propose and compare some quick estimation methods aimed atimproving timeliness and quality of provisional estimates. A particular attention is paid tothe potential self-selection bias affecting natural quick respondent units. An empiricalapplication – referred to the period 2002-2004 – has been carried out, based on randomreplications of theoretical quick respondents and on a subset of Italian provinces, whosedata are normally available in advance.

Keywords: Provisional estimate, Late respondent, Nights spent, Self-selection bias,Tourism.

1. THE TRADE-OFF BETWEEN TIMELINESS AND PRECISION OFESTIMATES

Among the main components on which the EU statistical definition of qualityfor short-term statistics is founded (EUROSTAT, 2000), accuracy and timelinessseem to be the most relevant both for producers and users of statistical data. Whileaccuracy is normally measured by the percent difference between provisional andfinal estimates, timeliness is measured as the time lag between the reference timepoint (or the end of the reference period) and the date of data dissemination.

1 The opinions herein expressed don’t involve ISTAT and must be addressed to the author only, aswell as possible errors or omissions. All tables and graphs derive from elaborations on ISTATdata.

Page 2: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

446 Gismondi R.

In particular, the EU Council Directive 95/57/EC on tourism statistics(Council of the European Union, 1995) requests to all the statistical institutes of theEU Member States to collect and transmit to EUROSTAT monthly data concerninginternal tourism. A very sensitive variable is the number of nights spent in touristaccommodations located inside the national territory, broken down by nationality(Italians and foreigners) and kind of establishment (hotels and other collectiveaccommodations, “o.c.a.”).

In this context, source of data is the ISTAT census monthly survey on touristestablishments (ISTAT, Anni vari). Data are not picked up directly by ISTAT, butby local agencies for tourist promotion; further, they are summed up by the 103Italian provinces and then sent to ISTAT in order to achieve to national figures. Eventhough provisional estimates are spread out after 3 months (according to requestsof the Directive), complete definitive data are available after 6 months only. Delaydepends from the extremely heterogeneous sensitiveness of tourist accommodationsrespect to short-term tourism analysis, different territorial organisations and datatransmission tools.

However, national and international users – asking for quicker data in orderto better analyse short-term developments – identified in 45 days a reasonable timebenchmark for provisional estimates for the tourist sector.

As a matter of fact, actually monthly data concerning some provinces areavailable in a relatively short time; in particular, in 2004 8 provinces were quitealways able to send data within 45 days from the end of reference month, as will beseen in paragraph 6.

The basic idea is that quick complete data of some provinces can be used toget, with a certain degree of error, quick estimates for national amounts as well. Onthe other hand, the main theoretical problem faced in the paper concerns thepossible self-selection of quick provinces, so that the use of whatever quickrespondent for estimating late responses could lead to seriously biased provisionalestimates (Royall, 1988; Drudi and Filippucci, 2000).

As underlined by Bolfarine and Zacks (1992), the question of robustness ofpredictors of population quantities can be faced using 3 strategies: 1) imposingrestrictions to the possible super-population models adopted; 2) imposing restrictionsto the samples to be selected; 3) using Bayes predictors that adaptively consider thepossibility that each one out of a series of alternative models is the correct model.In this context, we propose and compare various quick estimation methods(paragraphs 3, 4, 5 and 6), mostly based on strategy 1) and, on a lesser extent, onstrategy 2). In particular, in paragraph 4 we propose an approach aimed atevaluating and reducing the possible bias due to the non random selection of quick

Page 3: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 447

respondents, while paragraphs 7 contains an application to true tourism data, wheresome model-based estimation techniques are compared with the ratio estimatoractually used in the survey in terms of MAPE (mean of the absolute percent errors,assessing empirical efficiency).

A serious technical constraint, also influencing the choice of methodologicalproposals, is the shortness of available time series, since monthly data at theprovince level have been officially diffused by ISTAT only from year 2002; at themoment, time series are only 36 months long, so that for each province only 3observations related to the same month are available2. Moreover, the sub-sampleof quick respondent provinces cannot be defined a priori or driven in some way, sothat also methods acting on properties of particular quick samples cannot be used3.

In the follow, we mean as “provisional quick estimate” the estimation of aparameter of interest – in the frame of a given statistical survey – obtained on the basisof a quick sub-sample available at a time t’ before time t correspondent to the “finalestimate”, that will be based on a final sample including both quick and laterespondents. Revisions can be calculated by the difference between provisional andfinal estimates. In the particular case of tourism, final estimate refers to the completepopulation (Italy as a whole). Two main methodological approaches could be used:• approach based on the sampling design: the efficiency of a quick estimation

strategy depends on the probability distribution derived from the particularsampling design adopted both for final and (if any) preliminary estimates. Ofcourse, this approach could not be reliable enough in a context where theavailable (quick) sample is relatively small and the response process cannot bemodelled on the basis of the original sampling design only.

• Approach based on a super-population model (Cassel, Särndal and Wretman,1983), on the basis of which quality evaluations and the choice of the unitsbelonging to the quick sub-sample are carried out on the basis of the meansquared error respect to the particular model underlying observed data.

For both approaches, the availability of additional information external to thesurvey – or related to the survey in terms of historical micro-data, as it happens inmany longitudinal surveys – can be extremely helpful.

The need to explore tourist data is further steered by clear evidence: even

2 That is the main reason explaining the not recourse to a time series approach, that supposes longtime series and regularity along time of the error profile. Useful theoretical suggestions areavailable in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller (1998) andBattaglia and Fenga (2003).

3 A relevant example is given by balanced sampling (Royall, 1992), that will be considered inparagraph 2.

Page 4: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

448 Gismondi R.

though literature concerning forecasting tourist demand is very wide, problemsrelated to the use of quick respondents to predict final estimations including laterespondents as well is quite uncommon in the tourist field. In particular, a resumeof methodologies for forecasting tourism demand is given in Song and Witt (2000),while traditional linear regression models are proposed by Costa and Manente(2000, 129-202) and Divisekera (2003). Recourse to ARIMA models and exponentialsmoothing (widely discussed in Harvey, 1984) is available in Lim and Mc Aleer(2001), while late attempts to apply genetic algorithms to real tourist populationsfor a better decision making are given by Hernández-López (2004).

On the other hand, applications concerning other economic sectors, butstrictly linked with the problem herein discussed can be found in Maravalle, Politiand Iafolla (1993), Aelen (2003), Ullberg (2003), Falorsi, Alleva, Bacchini andIannaccone (2005).

2. A SHORT OVERVIEW ON THE ITALIAN INTERNAL TOURISM

A preliminary brief overview of the Italian internal tourism late trends and therole played by the above mentioned 8 quick provinces can be helpful (table 2.1),before introducing the next theoretical working models.

Tab. 2.1: Tourist nights spent (million) in Italy in 2004 and 2003 (total and 8 “quick” provinces).

Total Hotels O.c.a.

Total Italians Foreigners Total Italians Foreigners Total Italians Foreigners

2004

Total 345,3 204,2 141,2 233,8 136,6 97,2 111,5 67,5 44,0

8 provinces 53,8 30,3 23,5 39,0 20,8 18,2 14,8 9,5 5,3

Share 8 provinces 15,6 14,8 16,6 16,7 15,2 18,7 13,3 14,1 12,0

2003

Total 344,4 204,8 139,7 229,2 135,2 93,9 115,3 69,5 45,7

8 provinces 54,9 31,1 23,8 39,4 21,3 18,2 15,5 9,8 5,6

Share 8 provinces 15,9 15,2 17,0 17,2 15,7 19,3 13,4 14,2 12,3

Var.% 2004/2003

Total 0,3 -0,3 1,1 2,0 1,0 3,4 -3,2 -2,9 -3,8

8 provinces -2,0 -2,6 -1,2 -1,1 -2,3 0,3 -4,3 -3,2 -6,1

% weight on Total

Total 100,0 59,1 40,9 67,7 39,6 28,1 32,3 19,6 12,7

8 provinces 100,0 56,3 43,7 72,5 38,6 33,9 27,5 17,7 9,8

Correlation 04/03 0,996 0,994 0,997 0,993 0,990 0,995 0,999 0,997 0,999

Page 5: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 449

In 2004 internal tourism in Italy reached 345,3 million nights spent. Italiansexplain the 59,1% of total nights spent and about 2 nights on 3 are spent in hotels(by Italians or foreigners). In particular, nights spent by Italians in hotels representthe largest among the four components of internal tourism considered in the follow(39,6%), while the lowest relative weight characterises foreigners in o.c.a. (12,7%).

The growth respect to 2003 was 0,3%, that is the result of a decrease forItalians (-0,3%) and a growth of foreigners (1,1%); further, completely differenttrends characterise tourism in hotels (+2,0%) rather then in o.c.a. (-3,2%).

In the above mentioned 8 quick provinces, tourists spent 53,8 million nights,that are only the 15,6% of the total. The share ranges from 12,0% for foreigners ino.c.a. to 18,7% for foreigners in hotels. For what concerns trends, it must beremarked the quite different dynamic respect to the whole Italy, since nights spentdecreased the 2,0% respect to 2003. These evidences, of course, do not favourgoodness of provisional estimates based on the quick panel only.

Finally, a clear evidence is given by the quite large linear correlation betweendata referred to 2 following years: calculation based on data at the province levelfor 2003 and 2004 led to correlations ranging from 0,990 (Italians in hotels) up to0,999 (o.c.a. for foreigners and the total), with an average overall level of 0,996.This empirical result justifies the introduction of models as those in paragraphs 3and 4, where the auxiliary variables can be reasonably given by the same y-variabledelayed of one and/or two years. On the other hand, analyses on a monthly basiscould be more problematic, also because the Italian tourism is quite seasonal andinside the same province different kinds of localities can coexist (sea, hills,countryside, mountains, lakes, historical sites).

3. SUPER-POPULATION MODEL WITH A SINGLE AUXILIARYVARIABLE

From now on U will indicate the target population with size N, n is the sizeof a sample S and the main purpose of the sample survey is the estimation of the

population mean yU . S can indicate both the provisional quick sample and the final

sample including late respondents as well. For each population unit we’ll supposeas true the following regression model, defined as:

i i i

i

iy x

E i

VAR= + += ∀

=α β εε

ε σ( )

( )where

02

ii

i j

v i

COV if i j

∀= ≠

( , )ε ε 0

(3.1)

Page 6: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

450 Gismondi R.

where expected values, variances and covariances are referred to the model and notto any sampling design, x is an additional variable strongly correlated with y and tobe specified, as well as the function vi, with α, β and σ2 given, but generallyunknown parameters. Even though a specific model (3.1) should be defined foreach reference period t, at the moment no time labels are used. Under model (3.1),a sample S U⊆ is supposed to be available, with S S U∪ = .

3.1 THE CASE α = 0

If in model (3.1) we put α = 0, we get the widely used regression model R. Aswell known, (Cicchitelli, Herzel and Montanari, 1992), the optimal linear predictor– e.g. the one minimising MSE respect to the model, E T yU( )− 2 – is given by:

* * *ˆ ˆT yn

Nx

N n

NS S=

+−

β where ββ =

∑ ∑

i i i

S

i i

S

x y v x v/ /1

2 (3.2)

and its variance (equal to the MSE under model (3.1)) respect to the model will beequal to:

MSE T x x v vi

S

i i

S

i

S

( ) ( / )* =

+

∑ ∑ ∑

2

2

2

2

σN . (3.3)

Relevant particular cases are obtained if v=1 – when (3.2) reduces to the regressionestimator through the origin – and v=x – when (3.2) is the common ratio estimator(the one currently used in the survey on tourist establishments for calculatingprovisional estimates within 90 days) and the corresponding model variances willfollow straightforwardly. Moreover, under model (3.1) the sample mean is optimalif and only if x=v=1. Let’s note how the case v=1 translates in a model-based contextthe common hypothesis of homoschedasticity. If (3.2) expresses the optimalestimator formula, (3.3) suggests that the best choice of the sample simply consists,when it is possible, in selecting the n units in the universe having the largestx-values.

A consequence of (3.3) is that whatever sample S is available – in particular,when S=SP it is the sample including the provisional quick respondents – the beststrategy consists, according to predictor (3.2), in using all the n available units.However, this strategy could be dangerous, for these two main reasons:1. quality of estimates strongly depends on the validity of all assumptions in model

(3.1). In particular, an estimator as (3.2) could be seriously biased when model(3.1) is wrong.

Page 7: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 451

2. The choice of the n biggest units does not guarantee a low variance, because itdepends on the relative weight of the sample on the overall x-amount: generallyspeaking, when this weight is lower than 50% other estimators and/or sampleselection rules could perform better.

A way to reduce potential bias mentioned in the above point 1 consists in usingbalanced samples. Under model (3.1) – e.g., when only one auxiliary x-variable istaken into account – a sample S of size n is balanced with respect to the weightsroot(v) if it satisfies the condition:

i i

S

i

U

i

U

x n v x v∑ ∑ ∑= . (3.4)

It could be chosen among all the possible samples of size n using variousalgorithms, as those proposed by Valliant, Dorfman and Royall (2000), Gismondi(2002) and Deville and Tillé (2004). Royall (1992) showed that, if the previouslinear model R holds and a balanced sample can be found, the best linear unbiasedpredictor under the model is:

bal v i

Ui i

ST n v N y v,ˆ = − ∑ ∑1( )( ) (3.5)

having, among all samples satisfying (3.4), the lowest mean squared error given by:

MSE T v n v n v vbal v iU

iU

iS

i( ˆ ), =

+ −∑ ∑ ∑− −2

1 12UU N∑

2

2

σ. (3.6)

Under the statement v=1, the balance condition (3.4) becomes S Ux x=(sample and population means must be equal), while the optimal predictor derived

from (3.5) is the sample mean bal ST y,ˆ

1 = , which MSE is Nn− −( )1 21 σ . So, if the

sample is balanced the sample mean is still optimal even when x≠1. The greatadvantage in using a balanced sample is that it preserves from bias if model (3.1)is wrong4. More in general, it reduces the negative effects of respondents’ self-selection process on parameters estimation (as for β estimation in (3.2) and σ 2estimation in (3.3)), mainly if the true model formalisation is unknown (Drudi andFerrante, 2003).

4 For instance, if y i= θ+βxi+εi, then the mean squared error (3.3) increases for a constant equalto θ 2.

Page 8: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

452 Gismondi R.

Properties of balanced samples represent a useful theoretical result up to nownot very exploited in current sampling practice, with some exceptions given byISTAT (2005) and Gismondi (2003; 2007). As a matter of fact, given U and model(3.1), balanced samples could not exist, or anyway a quick balanced sample cannotbe planned in advance, as in the case of statistics on internal tourism. In the mostpart of practical situations when provisional estimates are needed, the availablenatural sub-sample of quick respondents must be considered as given and it isgenerally not balanced, because of the mentioned self-selection bias. However, asimple ex-post strategy consists in selecting from the whole available quick samplea sub-sample – as much larger as possible – that minimises the unbalancing ratio.

In symbols, when v=1 instead of (3.4) one could have S Ux kx= , wherek x xS U= / is the unbalancing ratio. Then the predictor to be used would turn out tobe y ks / , that using the whole sample S is equivalent to the ratio estimator derivedfrom (3.2), but when v=x.

However, levels of k more near to one (e.g., sub-samples almost balanced)could be got using for provisional estimates only a sub-sample s U⊆ with sizens ≤ n. The (quick) estimator will be given by:

sU

s

yx

x

where s U⊆ . (3.7)

In theory, one could select a very small sub-sample s in order to have s Ux x≈and to reduce near to zero the original bias – so that (3.7) becomes the simple sub-sample mean – but on the other hand a too small sub-sample size ns coulddangerously increase the sub-sample variance. Then, an empirical rule couldconsist in imposing a priori that sn n≥ −( )1 γ , where γ =0,05 or γ =0,10. Finally,it is worthwhile to underline that this strategy leads to a modified ratio estimator,given by:

SU

S

s

S

S

s

yx

x

y

yx

x

(3.8)

where the term in squared brackets is the coefficient that modifies the original ratioestimator based on the whole sample S with the aim to reduce its original bias.

3.2 THE CASE α ≠ 0

If we consider the only case with the common statement vi=1 for each unit i, weget the usual linear homoschedastic regression model with one auxiliary variable. Asa particular case of the general solution (9.2), we have the optimal predictor given by:

Page 9: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 453

* * *( )ˆ ˆT yn

Nx

N n

NS S=

+ +−

⇔α β ˆ ( )* *T y x xS U S= + −β (3.9)

where:

*β̂ = −( )

−( )

∑ ∑

i S iS

i S

S

x x y x x

12 aand ˆ ˆ* *

α β= −S Sy x (3.10)

and its mean squared error will be (see general formula (9.4)):

MSE T N x x x xN n

nU S i S

S

( )* = −( ) −( )

+−

∑21

2

2σN . (3.11)

It is worthwhile to note that under a model-based approach the optimalsampling strategy will be purposive, namely, the one that selects with probabilityone the sample S consisting of those units which x-values minimise the firstquantity in squared brackets in (3.11). In a provisional estimation context, oneshould use a provisional quick sample as much balanced as possible respect to thevariable x (the quick sample and the population x-means should be approximatelythe same), or anyway a quick sample characterised by a very large x-variance(Cassel, Särndal and Wretman, 1977, 128). This strategy does not correspond tothat minimising (3.3) when a regression model through the origin is supposed tobe true (choice of the n biggest units).

4. A MODEL FOR EVALUATING SELF-SELECTION BIAS

It is possible to model potential structural differences between provisionaland late respondents. We can suppose that population U can be split into 2 separatesub-populations UP and UL, including respectively NP units (those that are allpotential Provisional quick respondents) and NL units (those that are all potential

Late respondents), with U U UP L= ∪ and N=NP+NL. These sub-populations do not

derive from any preliminary stratification, but depend on some latent factorunderlying units under observation. For each of the 2 sub-populations (labelledwith h,where h=P,L) this model can be supposed true:

hi h i hi

hi

hiy x

E h i

VAR= += ∀

β εε

ε( ) ,

( )where

0

== ∀= ≠

h i

hi hj

v h i

COV se i j

2

0

σε ε

,

( , ) for h=P,L (4.1)

where all symbols keep the same meaning as for model (3.1). The main difference

Page 10: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

454 Gismondi R.

is that we suppose a priori that provisional and late respondents could have differentmodel means and variances5. Supposing no final non-response, the provisional andlate samples SP and SL will include respectively nP and nL=(n-nP) units, where

S S SP L= ∪ , P P PU S S= ∪ , L L LU S S= ∪ , S S SP L= ∪ . Not observed units willbe, respectively, NP–nP and NL– nL. Prospect 4.1 gives an overall resuming scheme.

Prospect 4.1: Different patterns for preliminary and late respondents.

STRUCTURE SIZE

Total units Provisional Late units Total units Provisional Late unitsunits units

Population U UP UL N NP NL

Sample S SP SL n nP nL

Not observed

population S SP SL N-n NP– nP NL– nL

If the main purpose is the estimation of the overall total yU, if xU is the x-totalin the whole population U the unknown total to be estimated will be given by:

U Pii

N

Lii

N

P L Uy y y y y E yP L

= + = += =∑ ∑

1 1

( )where == +P P L Lx xβ β . (4.2)

4.1 OPTIMAL FINAL PREDICTION UNDER MODEL (4.1)

If yS is the sample y-total, a linear predictor of yU can be written as:

( ) ( )ˆPL S PL S S Pi Pi

S

Li LiS

T y y y c y c yP L

= + = + +∑ ∑ (4.3)

where ( )ˆ

PL Sy is the predictor of the unknown amount (yU–yS) and coefficients cPi

and cLi must be determined. Under model (4.1), the unbiasedness condition is

equivalent to E T yPL U( )( ) − = 0 . We can show that under model (4.1) the final bestlinear unbiased predictor will be given by6:

5 In this context we also suppose that the belonging to one of the two sub-populations is adeterministic (even though often unknown) feature of each unit and does not depend on anyprobabilistic mechanism.

6 See appendix 9.2. Formulas (4.4) and (4.5) come back analogous to (3.2) and (3.3) if model (4.1)reduces to (3.1).

Page 11: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 455

( )*PL S S

Pi Pi

PiS

Pi

PiS

T y xx y

v

x

vP P

P P

= +

−12

∑∑ ∑

+ +

L L

L

S SLi Li

LiS

y xx y

v

112Li

LiS

x

vL

(4.4)

and its mean squared error will be given by:

MSE T xx

vvPL P S

Pi

PiS

Pi

SP

P P

( )( )* =

+

∑2 2

12

σ ∑∑ ∑ ∑

+

+

L SLi

LiS

Li

SL

L L

xx

vv2 2

12

σ

. (4.5)

One can verify consequences of the assumption of model (3.1) instead of thetrue model (4.1). In this case no distinction between provisional and late respondentswould occur in the final estimation process, so that the predictor used would havethe form T y c yS i i

S

= +∑ and, in particular, would be given by predictor T* in (3.2)unless the multiplying factor N. Under the true model (4.1), NT* is still unbiased if:

Pi

iS

i

iSS S

x

v

x

vx x

P

∑ ∑

12 2

+

∑ ∑Li

iS

i

iSS S

x

v

x

vx x

L

12 2

= 0 . (4.6)

If v=x, condition (4.6) implies that we should have:

i

S

i

S

i

S

i

S

i

S

i

U

i

S

x

x

x

x

x

x

xP

P

L

L

P

P

L

∑∑

∑∑

∑∑

= ⇔ =∑∑∑ i

U

xL

(4.7)

and in this case the predictor (4.4) is equal to NT*. The identity (4.7) is satisfied ifprovisional respondents determine a share of the x-total in the provisional respondentssub-population equal to that concerning late respondents. However, in general therecourse to predictor NT* under a true model as (4.1) leads to a not null bias, givenby:

Bias E NT y xx

v

x

vU S

i

iS

Pi

iSP

= − =

∑ ∑( )*

12 2

β

+

− +∑ ∑Li

iSP i

SL i

x

vx

L P

β β β2

xxLS

. (4.8)

Moreover, its mean squared error will be given by7:

7 We can compare formula (9.7), formally similar to (4.9) when adding the squared bias term.

Page 12: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

456 Gismondi R.

MSE NT E NT y VAR c y VAR yU i

Si S( ) ( ) (* * *= − =

+∑2 )) + 2Bias (4.9)

where:

.

VAR c y xx

v

x

vi

Si S

i

iS

Pi

i

*∑ ∑

=

2

22

22

σPP L

P

S

Li

iS

S P i

S

L i

x

v

VAR y v

∑ ∑

+

= +

22

2 2

σ

σ σ( ) vvLS

∑ (4.10)

4.2 PROVISIONAL ESTIMATION UNDER MODEL (4.1)

If only provisional quick respondents can be used for estimation, we candefine the general linear predictor of the unknown amount yU, given by:

( ) ˆP S S S Pi PiS

T y y y c yP P P

P

= + = +∑ . (4.11)

The unbiasedness condition under model (4.1) implies:

E T y E y y y y y yP U S S S S S SP P P L P L( ) ( ˆ ) ( )( ) [ ]− = + − + + + =00 (4.12)

that, after some passages, leads to:

Pi PiL

PS

L Sc x x xP

P=

+∑ ββ (4.13)

where L S Sx x xL L= + . We will also have:

MSE T E T y VAR c y VAP P U Pi PiSP

( ) ( )( ) ( )= − =

+∑2 RR y y

PS L( )+ (4.14)

where the main difference respect to (9.7) is that the second variance term refers tothe y-amount concerning all the non observable units (those belonging to UP but notincluded in the sample and those belonging to UL). Under model (4.1) we easilyobtain:

MSE T c v v vP P Pi Pi

S

P Pi

S

L Li

UP P L

( )( ) = + +∑ ∑ ∑2 2 2 2σ σ σ (4.15)

and minimisation of (4.15) under constraint (4.13) leads to optimal solutions:

Page 13: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 457

PiPi

Pi

L

P

L SPi

PiS

cx

vx x

x

vP

P

* = +

∑ββ

12

(4.16)

( )*P S

L

P

L SPi Pi

PiS

T y x xx y

vP P

P

= + +

∑β

β

∑1

2Pi

PiS

x

vP

. (4.17)

We could verify that predictor (4.17) is also the optimal predictor under the

constraint E T TP PL( )( ) ( )*− = 0 instead of (4.12), so that in a provisional estimate

context minimising mean squared error respect to the overall amount yU isequivalent to minimise the expected difference between provisional and finalestimates. Finally, under the true model (4.1) the predictor (3.2) – optimal under thefalse model (3.1) – can be written as:

( ) ( )P S L SPi Pi

PiS

Pi

Pi

T y x xx y

v

xP P

P

= + +

−12

vvPS

(4.18)

that turns out to be equal to (4.17) if βL=βP.

4.3 IMPLEMENTATION OF PROVISIONAL PREDICTION

In order to implement the optimal solution (4.17), one should know the truevalues βP and βL. Since they are generally unknown, we can use a sub-optimalprovisional prediction derived from (4.17), that for the estimation of the mean canbe written as:

( )

**

ˆ ˆ

ˆ ˆ ˆPS

L

P

L S P

TN N

y x xP P

= + +

1 ββ

β (4.19)

where P*β is given by the second relation (3.2) when the only provisional sample

SP is considered, while Lβ̂ ,

Pβ̂ , Lx̂ and PSx̂ are estimates for L P Lxβ β, , and

PSx .One can reasonably put:

L S L SL

x x N xnn

NL L

ˆ ˆ= =

(4.20)

P P P PS S P P SP

P S Px x N n xnn

N n x nN

nˆ ( ˆ )= − = −

= −1

. (4.21)

Page 14: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

458 Gismondi R.

For what concerns βP e βL , there are 2 options. In the first case, one coulduse the optimal estimates derived from the second relation (3.2) applied separatelyto provisional and late respondents, both available with reference to a period (t-1)before time of reference t8, so that:

L L t P P tˆ ˆ

( )*

( )*β β β β= =− −1 1and . (4.22)

In the second case, one could put:

L L t P P tˆ ˆ

( )*

( )*β β β β= =−1 and (4.23)

where the theoretical advantage respect to (4.22) is that in this case we use the actualoptimal estimate of βP, e.g. that based on provisional quick respondents availablewith reference to time t. In both cases, provided that in (4.19) we can substitute

P P t*

( )*β β= , the final formula of the sub-optimal predictor of the population mean

is given by:

( )

*ˆ ˆ

ˆ ( )PS P

L

P

S P STN N

y n x n nN

nx

P L P= + −

+1 β

βPP Pn

N n

n

*β . (4.24)

If we estimate the optimal coefficients (4.16) on the basis of (4.20), (4.21) andone between (4.22) and (4.23), and estimates for model variances are available aswell, an estimate of the mean squared error of predictor (4.24) will be given by:

MSE TN N

c v nNP

P Pi Pi

S

P

P

ˆ ˆˆ ( ˆ )( )

*

*

= +∑1

22 2σ

nnv n

N

nvP S L L SP L

+

1 2 2ˆ ˆσ σ .(4.25)

5. THE MODEL PROPOSED BY FULLER

Fuller (1990) analysed the general form of the BLU predictor of the populationmean in a generalised least squared context. Supposing the simple case of one onlyauxiliary variable x, the supposed underlying model is given by:

8 Generally speaking, in a short-time survey context, if we refer to a month t of the year A, period(t-1) is given by the same month t referred to the previous year (A-1), in order to properly takeinto account seasonality of coefficients in the model (3.1).

Page 15: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 459

i y i x i

i

y x

E i

VA= + − += ∀

µ θ µ εε

( )

( )

where

0

RR i

COV if i ji

i j

( )

( ; )

ε σε ε

= ∀= ≠

2

0(5.1)

where in this case the super-population means µy and µx are explicitly formalisedinto the linear model and represent, as well as θ and σ 2, unknown parameters tobe estimated.

Supposing a simple random sampling design, and denoting as xµ̂ the GLSestimator for µx and as ̂θ the regression coefficient obtained in the regression of yon x using the complete set S of n observations, then the estimator for the mean isgiven by:

y S x Sy xˆ ˆ( ˆ )µ θ µ= + − (5.2)

with variance given approximately by:

VAR n VARy x( ˆ ) ( ˆ )µ σ θ µ= +−1 2 2 . (5.3)

The use of result (5.2) in a provisional estimation context followsstraightforwardly. The optimal prediction of the effective unknown populationmean Uy can be based on (5.2) as well, taking into account that for provisionalestimation only the provisional sample SP is available; according to symbolsintroduced in paragraph 3 we can write (5.2) as:

U S U Sy y x xP Pˆ ˆ( )= + −θ (5.4)

where the population x-mean is supposed known. The main difference between(5.4) and the standard regression estimator (3.9) is that in (5.4) estimation shouldbe based on all the n units belonging to S and not only to the nP units belonging tothe provisional quick sample SP. A particular, relevant case is when x=y(t-1), becausethe procedure (5.2) is normally very efficient when correlation between y and x isvery high (Fuller, 1990, 173). It follows that (5.4) becomes:

U t S t U t S ty y y yP P( ) ( ) ( ) ( )

ˆ ˆ( )= + −− −θ 1 1 . (5.5)

Formula (5.5) shows that, at time t, the mean calculated on provisional quickrespondents must be added to the difference between true and estimated y-meansat time (t-1), weighted on the basis of a regression coefficient estimated using bothprovisional and late respondents. That can be done, for instance, estimating thiscoefficient with reference to the previous time (t-1), in a way similar to thatdescribed in paragraph 4.3. As a matter of fact, procedure (5.5) can be another way

Page 16: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

460 Gismondi R.

to reduce self-selection bias’ effects.A generalisation of formula (5.5) simply consists in supposing a multiplicative

instead of an additive super-population model. In this case the new predictor will be:

U t S tU t

S t

y yy

yP

P

( ) ( )

ˆ

( )

( )

ˆ =

θ

1

1

(5.6)

that can be viewed as another form of modified ratio estimator (compare (3.8)).

6. A MODEL WITH TWO AUXILIARY VARIABLES

At the moment, provisional estimates of tourism nights spent are based on oneauxiliary variable, given by nights spent in the same month of the previous year. Aswe will see in paragraph 7, provisional estimates could be based on two auxiliaryvariables. In this case the theoretical reference is given by the general multi-regression model GMR, briefly commented in the appendix 1. The general idea isthat the use of more than one auxiliary variable could increase precision ofprovisional estimates, that on the other hand could be still affected by a self-selection bias. In this context the symbol S will still indicate (as in paragraph 3) ageneral available sample (final or provisional). The bivariate regression model –that is a particular case of model (9.1) – is given by:

i i i i

i

y x x

E

V= + + +=

α β β εε

1 1 2 2

0( )

(where ii i

i j

vC if i j

ε σε ε

)

( , )

== ≠

2

0(6.1)

with α, β1, β2 and σ 2 given unknown parameters.

6.1 THE CASE α = 0

If in model (6.1) we put α = 0, we get a bivariate regression model throughthe origin. According to (9.2) and (9.3), we can obtain these explicit formulas forthe optimal unbiased linear predictor:

* * *ˆ ˆTn

Ny

n

Nx xS S S=

+ −

+( )1 1 1 2 2β β (6.2)

where:

1

22

1 1 2 2*β̂ = −

∑ ∑ ∑∑i

iS

i i

iS

i i

i

i i

iSS

x

v

x y

v

x x

v

x y

v

∑ ∑∑1

12

22 2

1 2i

iS

i

i

i i

iSS

x

v

x

v

x x

v

(6.3a)

Page 17: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 461

2

12

2 1 2 1*β̂ = −

∑ ∑ ∑∑i

iS

i i

iS

i i

i

i i

iSS

x

v

x y

v

x x

v

x y

v

∑ ∑∑1

12

22 2

1 2i

iS

i

i

i i

iSS

x

v

x

v

x x

v

(6.3b)

and the MSE formula derives directly from (9.4). In particular, if vi=1 for each uniti (6.2) reduces to the common regression estimator without constant term; in thecase of heteroschedasticity, one can put vi= x1i x2i for each unit i, so that (6.2) leadsto a double ratio estimator - an extension of the univariate ratio estimator derivedfrom (3.1) when v=x – where:

1

2

1 2 1

1

1

2

*β̂ = −

∑ ∑ ∑

−i

iS

i

iS

i

iS

i

i

x

x

y

xn

y

x

x

xSS

i

iS

x

xn∑ ∑ −

2

1

2(6.4a)

2

1

2 1 2

1

2

1

*β̂ = −

∑ ∑ ∑

−i

iS

i

iS

i

iS

i

i

x

x

y

xn

y

x

x

xSS

i

iS

x

xn∑ ∑ −

1

2

2. (6.4b)

Empirical results related to tourism data showed that the predictor (6.2)performed better than the bivariate ratio-cum-product type estimators proposed bySingh (1965) and renewed by Perri (2005), whose precision can be seriouslyaffected by the presence of some anomalous ratios.

6.2 THE CASE α ≠ 0

As well known, under an ordinary bivariate regression model the optimalsolution can be written as:

* * * *ˆ ˆ ˆT

n

Ny

n

Nx xS S S=

+ −

+ +1 1 1 2 2α β β(( ) (6.5)

where – supposing that vi=1 for each unit i and putting rszw as the sample correlation

coefficient between variables z and w and Stdsz as the sample standard deviation

concerning variable z:

1 21 2 1 2

1 21

*β̂ =

−−

Sy x Sy x S x x

S x x

Sy

S

r r r

r

Std

11

2 1 2

1 2

2 21x

Sy x Sy x Sy x

S x xStd

r r r

=−−

; ˆ *β

rr

Std

Std

y x

Sy

S x

S S

= − −

2

1 1 2

;

ˆ ˆ* * *α β ˆ̂β 2Sx

(6.6)

and also in this case the MSE formula derives directly from (9.4). In particular, one

Page 18: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

462 Gismondi R.

could verify that, as in the case α = 0 and the univariate case α ≠ 0, the best choiceof the sample does not necessarily lead to the selection of the n largest units.

Recently Dalabehera and Sahoo (1999) analysed conditions under which alinear estimator based on two auxiliary variables and a stratified sampling shouldperform better than a single variable regression or ratio estimator. A more generaltheoretical proposal for large samples was also given by Montanari (1987).

A bivariate regression model was recently applied in the tourism statisticscontext by Gismondi, Mirto and Salamone (2003), where estimation of each laterespondent y-value was based on a constant term, the late respondent y-value in thesame period of the previous year (x1) and the current y-mean calculated on quickprovisional respondents (x2)

9.

7. EMPIRICAL RESULTS

Late experience in the Italian monthly on survey internal tourism showed thatavailable quick responses concern quite always the same 8 provinces: completedata are normally supplied within 45 days from the end of reference month byTorino, Savona, Bolzano, Bologna, Siena, Ascoli Piceno, Foggia and Cagliari.These provinces are assumed to be the available quick respondents useful toestimate late responses (non responses at the moment of the quick estimation) of themissing provinces. Up to now, quick data concerning these provinces have beenconsidered as the basic input to implement the estimator of late responses currentlyused in the survey, that is a simple ratio estimator (see formula (3.2) when xindicates nights spent in month (m-12) – where m is the reference month – and v=x).

One must underline that alternative strategies could be based on the use ofmore detailed data (at the single local agency level) or of more detailed models(specific models according to the kind of locality, as mountain, lakes, sea, etc.).However, in the former case one should face the serious problem due to demographyof local agencies (splits, merges, inclusion of new municipalities, etc.); in the latter,single models could be based on a too few number of “quick” units.

For what concerns the choice of auxiliary variables, it derives from the highcorrelation between couples of historical data (m, m-12) – as yet seen in paragraph2 – and the objective lack of good alternatives. One must note that poorer resultscould be got when a significant change in the number of tourist accommodations

9 The same approach was applied on the current monthly dataset used for the empirical attemptdescribed in the next paragraph. Since quality of preliminary estimates turned out to be quitealways worst than that obtained using other compared methods, these results have been omitted.

Page 19: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 463

occurred from one year to another (which produces a lower correlation amonghistorical data), as it can more frequently happen for the other collectiveaccommodations.

Assessment of quality of compared estimators has been evaluated on a widerplatform as well, so that an additional simulation applied to true data has beencarried out as well. Year of reference was 2004, while data concerning 2002 and2003 were used as source of auxiliary information10.

In details, 4 separate domains have been taken into account: Italians in hotels,Italians in the other collective accommodations (“o.c.a.”), foreigners in hotels andforeigners in “o.c.a.”. Estimates for the “total” are always got summing up separatedomain estimates, as currently done in whatever provisional estimation contextfaced in the real current survey. Then, we selected at random11 for 100 times the20% of the 103 provinces (10 provinces) and, in a second step, the 50% of provinces(51 provinces). In each of the 100 cases, a particular simulated subset of provisionalquick respondents has been supposed to be available, on the basis of which aprovisional estimation of the simulated non respondents was carried out. In eachrandom replication, the same subset of quick respondents was adopted in eachdomain and for each month.

This kind of experiment can give a clearer idea on robustness of 13 alternativeprovisional estimation techniques, resumed in table 7.3, even though the only wayto evaluate the presence of a real self-selection bias consists in referring to resultsderived from the application of these quick estimators to the above mentioned 8provinces data.

The frequency distribution of the 103 Italian provinces by class of total nightsspent is quite asymmetric, as the most part of output distributions concerning theItalian service activities. Size of quick provinces in terms of yearly nights spent ismedium-high – with the highest level reached by Bolzano, the second tourist Italianprovince after Venezia. Referring to total nights spent in 2004, the average ratio

10 Monthly tourism data at the province level have been officially released by ISTAT only for theseyears. When simulations were carried out, the last available year was 2004, since the delay of theirpublication is normally about one year. Older data were used by Gismondi (2006), but they werepartially based on unofficial ad hoc extrapolations.

11 Simple random sampling with size n=10 and n=51 was replicated for 100 times, through selectionfrom the 103 “random numbers” implicitly associated to each province. A more detailedsimulation referred to the most relevant domain (Italians in hotels) – based on 1.000 randomsamples – was carried out as well. Comparative results were quite similar to those got using only100 replications, mainly because of the effect of the presence or absence of some large units inthe “quick” sample remains fundamentally the same for each estimator.

Page 20: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

464 Gismondi R.

between the quick sample and the whole population means is 2,01, so that thenatural quick sample is heavily over-balanced respect to the whole population (seebalance condition (3.4)).

As yet seen in table 2.1, on the whole (table 7.1), the percent weight of thequick sample in terms of nights spent ranges from 12,5% (foreigners in “o.c.a.”) upto 18,9% (foreigners in hotels).

Tourism in hotels represents almost the 70% of total internal tourism in Italy,so that in order to guarantee good overall provisional estimates a predictor shouldperform well especially for the first 2 domains. Even though coverage of randomreplications referred to the number of provinces and not to the amount of nightsspent, final coverage of the y-variable of interest was very similar and equal, on theaverage, to 19,6% for the first simulations and to 51,8% for the second one.

Tab. 7.1: Main features of tourism in Italy and % nights spent coverage.

103 provinces 8 provinces Coverage=20% Coverage=50%

Domain Nights Nights % change Nights C.v. Nights C.v. Nights C.v.spent spent % 2004/2003 spent % spent % spent %

(million)

Italians hotels 136,6 39,6 1,0 15,4 - 20,4 0,27 51,4 0,14

Foreigners hotels 97,2 28,1 3,4 18,9 - 21,3 0,47 51,3 0,25

Italians “o.c.a.” 67,5 19,6 -2,9 14,3 - 18,0 0,27 52,4 0,12

Foreigners “o.c.a.” 44,0 12,7 -3,8 12,5 - 17,6 0,52 52,8 0,27

Total 345,3 100,0 0,3 15,8 - 19,6 0,30 51,8 0,15

C.v.: coefficient of variation of nights spent coverage measured on 100 random replications.

Because of the high seasonality of tourism, nights spent in a given monthidentify a specific variable characterised by levels and dynamics that could be verydifferent month by month. According to that and to preliminary analyses, thevariable x used for univariate predictors was the number of nights spent in the samemonth m of the previous year 2001, that is xm=y(m-12), so that the y-values can beestimated according to an autoregressive first order linear model (Fuller, 1990).

A preliminary empirical validation of a model as (3.1) was carried outaccording to tests applied in similar contexts by Gismondi (2002; 2007). For whatconcerns expected values, a simple technique consists in evaluating results of theregression model referred to the province i:

i i iy x= + +α β ε (7.1)

verifying the level of the correct-R2 and the statistical significance of the usual test t.For what concerns heteroschedasticity, it can be tested using the known

Page 21: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 465

White test (1980). We can estimate parameters of the following regression model:

i i i ie x x20 1 2

2= + + +γ γ γ τ (7.2)

where ei are the residuals got from regression (7.1) and τi is a common random error,and then verify significance of the statistic nR2 – where n is the number ofobservations (n =103) – that in this case is distributed as a χ2 with 1 degree offreedom. If nR2 is lower than the chi-square threshold for a given probability level,model (7.2) is not significant and the hypothesis of heteroschedasticity can berefused, so that position v=1 can be accepted12.

An application to the yearly data referred to 2003 (x) and 2004 (y) showed thatin the model (3.1) the constant parameter is always not significant, while aregression through the origin leads to good result (table 7.2). The White test wasalways significant at the 95% level and at the 99% as well, with the exception of thedomain “Italians in hotels”. As a consequence, model (7.1) should be affected byheteroschedasticity, so that v≠1.

Table 7.2: Model validation using tests (7.1) and (7.2).

Model mean – Test (7.1) Model variance – Test (7.2)

Domain Correct R2 β estimate Sign. Tβ Correct R2 White test Sign. Sign.(nR2) 95% 99%

Italians in hotels 0,991 1,011 0,000 0,045 6,562 Yes NoForeigners in hotels 0,990 1,036 0,000 0,275 29,811 Yes YesItalians in “o.c.a.” 0,989 0,971 0,000 0,067 8,773 Yes YesForeigners in “o.c.a.” 0,990 0,964 0,000 0,804 83,218 Yes Yes

In each of the 4 domains, for estimation of level in a given month (ym or my )

the main synthetic quality measure used is the percent difference between estimatedand true values, taken in absolute value. The overall yearly percent error is gotaveraging monthly errors, using simple or weighted means, where weights aregiven by the percent incidence of tourism in each month on the whole yearly amountin the domain. While simple means are more related to the general precision ofestimates, weighted means more strictly refer to the effective impact of errors in theoverall estimate. In order to synthesize results derived from the 100 randomreplications, we considered the mean of absolute percent errors (MAPE), calculated

12 Thresholds of chi square with 1 degree of freedom leaving on the right the 5% and the 1% ofprobability are, respectively, 3,84 and 6,63. Yearly results are confirmed for the most part ofmonthly data as well.

Page 22: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

466 Gismondi R.

as mean of 100 percent errors: as well known, MAPE is an unbiased estimation ofthe mean squared error of a quick predictor conditioned to the final data; then, finalsynthesis of results is obtained averaging MAPE for the 4 domains using as weightsthe relative incidence of tourism in each domain on the whole yearly amount.

For estimation of percent changes (namely, (ym/y(m-12)-1) ·100), the procedure wassimilar to that for levels, but the main synthetic quality measure used is the difference(not percent) between estimated and true changes, taken in absolute value.

Moreover, the role played by the bias component in the overall mean squareerror of each estimator can be estimated on the basis of the empirical randomreplications carried out. The limit of this approach is that it is not possible to separatebias due to the particular considered estimator (that under the unknown underlyingmodel could be or not be biased) from bias properly due to a self-selection effect.If h indicates a random replication (h=1,2,…,100), the empirical mean of predictorT for month m will be given by:

m mh

h

T T= ( )=

∑1

100

100/ (7.3)

while, denoting as my the true mean and using label E to indicate “empirical”

variances, biases and mean squared errors, we can put:

Em mh mh

mh mh

MSE T y T T= −( ) = −( )= =

∑ ∑2

1

1002

1

100

100/ //1002 2+ −( ) = +m m Em EmT y VAR BIAS (7.4)

from which, putting wm as the relative y-weight in the month m on the whole year,we can derive the percent incidence of squared bias on the whole MSE:

1002

1

12Em

Emm

mBIAS

MSEw

=

∑ . (7.5)

All the 13 predictors used and compared in the empirical attempt have beenalready defined in paragraphs from 2 to 5 and have been resumed in table 7.313.They can be divided in 4 groups. The first group includes classical estimators basedon one auxiliary variable:1) ratio (3.2 with v=x). It is the estimator currently used in the survey for non-

responses imputation. It derives from the linear regression model (3.1) without

13 Recourse – instead of imputation of non-responses – to estimation techniques based on re-weighting of respondents as calibration (Rizzo, Kalton and Brick, 1996) – produced worstestimates, so that these techniques and relative results have been dropped.

Page 23: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 467

constant and heteroschedasticity, with xm=y(m-12).

2) Regression (3.2 with v=1). It derives from the linear regression model (3.1)without constant and homoschedasticity, with xm=y(m-12).

3) Regression with constant (3.9). It derives from the linear regression model (3.1)with constant and homoschedasticity, with xm=y(m-12).

A second group includes estimators that, by definition, are focused on thereduction of the possible self-selection bias:4) ratio based on a balanced sub-sample (3.8). It is a sort of modified ratio estimator

based on the after sampling selection of a sub-sample almost balanced; samehypotheses of ratio estimator 1), with xm=y(m-12).

5) Separate regression (4.24 and 4.22). Correction of self-selection bias supposingseparate populations for provisional and late respondents, with xm=y(m-12).

6) Separate regression (4.24 and 4.23). The same features of estimator 5), withxm=y(m-12).

7) Modified Fuller (5.6). Correction of self-selection bias supposing a log-linearmodel14 where the unobserved y-mean is modelled through the product betweenthe mean of provisional respondents and the estimation error in a previousperiod, with xm= y(m-12).

A third group includes 3 estimators that, even though not built up in order toreduce bias, exploit information of 2 auxiliary variables instead of 1:8) Ratio bivariate (6.2 with v=x1x2). Linear bivariate regression model without

constant and heteroschedasticity, with x1=y(m-12), x2=y(m-24).

9) Regression bivariate (6.2 with v=1). Linear bivariate regression model withoutconstant and homoschedasticity, with x1=y(m-12), x2=y(m-24).

10) Regression bivariate with constant (6.5). Linear bivariate regression modelwith constant and homoschedasticity, with x1=y(m-12), x2=y(m-24).

A fourth group includes the same estimators of group 3, but with a differencein the second auxiliary variable, that in this case is given by the y-value 6 monthsbefore. As a matter of fact, quite definitive data are available, for all the provinces,just after about 6 months from the end of the reference month. The idea is that, eventhough affected by a different seasonal pattern, more late data (respect to datareferred to 2 years before) could improve estimates. We have:11) Ratio bivariate (6.2 with v=x1x2). As estimator 8), but with x1=y(m-12), x2=y(m-6).

14 Preliminary analyses showed that a log-linear model based estimator as (4.6) leads to betterresults than a linear model based estimator as (4.5).

Page 24: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

468 Gismondi R.

12) Regression bivariate (6.2 with v=1). As estimator 9), but with x1=y(m-12), x2=y(m-

6).13) Regression bivariate with constant (6.5). As estimator 10), but with x1=y(m-12),

x2=y(m-6).Even though all the previous predictors have been introduced according to a

super-population model, in the follow terms “predictor” and “estimator” will beboth used without ambiguity.

Tab. 7.3: Provisional predictors for the number of monthly nights spent in tourist establishments.

Code Definition General remarks

1 Ratio (3.2 with v=x) Linear regression model without constant andheteroschedasticity (xm=y(m-12))

2 Regression (3.2 with v=1) Linear regression model without constant andhomoschedasticity (xm= y(m-12))

3 Regression with constant (3.9) Linear regression model with constant andhomoschedasticity (xm=y(m-12))

4 Balanced sub-sample (3.8) Modified ratio estimator based on a sub-samplealmost balanced; same hypotheses of ratio estimator(xm=y(m-12))

5 Separate regression (4.24 and 4.22) Correction of self-selection bias supposing separatepopulations for provisional and late respondents(xm=y(m-12))

6 Separate regression (4.24 and 4.23) Correction of self-selection bias supposing separatepopulations for provisional and late respondents(xm=y(m-12))

7 Modified Fuller (5.6) Correction of self-selection bias supposing a log-linear model (xm=y(m-12))

8 Ratio bivariate (6.2 with v=x1x2) Linear bivariate regression model without constantand heteroschedasticity (x1m=y(m-12), x2m=y(m-24))

9 Regression bivariate (6.2 with v=1) Linear bivariate regression model without constantand homoschedasticity (x1m=y(m-12), x2m=y(m-24))

10 Regression bivariate with constant (6.5) Linear bivariate regression model with constantand homoschedasticity (x1m=y(m-12), x2m=y(m-24))

11 Ratio bivariate (6.2 with v=x1x2) Linear bivariate regression model without constantand heteroschedasticity (x1m=y(m-12), x2m=y(m-6))

12 Regression bivariate (6.2 with v=1) Linear bivariate regression model without constantand homoschedasticity (x1m=y(m-12), x2m=y(m-6))

13 Regression bivariate with constant (6.5) Linear bivariate regression model andhomoschedasticity (x1m=y(m-12), x2m=y(m-6))

Page 25: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 469

Results have been synthesised in tables from 7.4 to 7.8, where figures in boldindicate the best performance and figures underlined indicate the second best.

Results obtained using data concerning the 8 quick provinces (table 7.4) showa satisfactory performance of the provisional estimator actually used, the ratioestimator 1): the total error (average of 4 domains) is the third best using a simplemean (the estimate error is 5,43%) and the second best using a weighted mean(4,20%). A similar result has been got for estimating changes.

However, a further aspect is the gain in precision that could be achieved usingestimator 5), aiming at reducing self-selection bias: it is the best estimator both forlevels and changes using weighted means (errors are 3,96% and 3,97%) and thesecond best both for levels and changes using simple means (5,18% and 5,44%). Theperformance of estimator 5) is quite better than that of estimator 6), where regressioncoefficients are estimated with reference to different times (see formula (3.23)).

That confirms the idea underlying the same modified Fuller estimator (4.6),where the regression coefficient is estimated for month (m-12) using both quick andlate respondents. In particular, this estimator led to very good results with the onlyexception of “Foreigners in o.c.a.”, both for levels and changes; in details, it wasalways the best estimator for “Foreigners in hotels” (weighted errors are respecti-vely 3,96% and 4,20%) and for “Italians in o.c.a.” was the second best for levels(3,77%) and the best for changes (3,72%).

On the whole, univariate regression with constant gives better results thanregression through the origin, but this strategy can always be improved by anotheralternative estimator. The use of 2 auxiliary variables can be helpful if the secondauxiliary variable is given by nights spent in the same month of 2 years before(2002), while is not useful at all when the second auxiliary variable is given bynights spent in month (m-6), and this peculiarity characterises results derived fromrandom replications as well. In particular, the estimator 8) – the ratio bivariate –achieved to the best average performance using simple means both for levels(5,16%) and for changes (5,29%) and turned out to be the best estimator for“Foreigners in o.c.a.” (weighted average errors are 4,20% for levels and 3,97% forchanges): this is the most difficult domain to be estimated, because nights spent byforeigners in the other receptive structures are quite heterogeneous on the Italianterritory and could be object of very high and unpredictable changes from year to year.

Finally, estimator 4) based on a balanced sub-sample strategy gave the worstresults, but that can be obviously due to the smallness and the intrinsic lack ofbalance of the available quick sample.

The main result deriving from the 100 random replications guaranteeing a20% coverage (table 7.5) is that, in a situation characterised by a low level of

Page 26: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

470 Gismondi R.

coverage (that is about the same coverage available when provisional estimationsare carried after 45 days form the reference month), the actual ratio estimator canbe always improved: on the average, for estimation of levels there are 2 betterestimators using simple means and 5 better estimators using weighted means; forestimation of changes, there are 1 better estimator using simple means and 3 betterestimators using weighted means.

The overall best estimator is always given by the modified Fuller 7); on theaverage, gains respect to ratio estimator 1) are relevant: for weighted means erroris 4,35% with 7) against 4,72%) with 1) for levels, while is 5,57% with 7) against6,53% with 1) for changes.

The second best estimator is clearly estimator 10) for levels (regressionbivariate with constant), improving the ratio bivariate estimator 8) that was the bestestimator based on 2 auxiliary variables in the case of 8 provinces seen in table 7.4.For changes, the second best is the ratio estimator using simple means and theseparate regression estimator 5) using weighted means.

Moreover, while for changes the modified Fuller estimator 7) is quite alwaysthe best for each domain (with the exception of “Italians in hotels”, for which it isthe third best after the 2 ratio bivariate estimators 8) and 11)), for levels the recourseto alternative estimators for “Italians in hotels” (estimator 11) and estimators 5) or8) for “Foreigners in hotels”) can improve estimator 7).

In synthesis, when coverage of quick respondents is low, a self-selection biascan often be present, so that methods aiming at reducing this bias (in particular,estimator 7), but also the balanced sub-sample technique 4) for “Italians in o.c.a.”)should be used at least for what concerns nights spent in the other collectiveaccommodations, while for “Italians in hotels” a ratio bivariate strategy can behelpful (using as second auxiliary variable nights spent in month (m-6) or, on asecond extent, nights spent in (m-24)).

When coverage of quick respondents is about 50% (table 7.6), results showa clear trade-off between estimation of levels or changes.

It is well known that optimal strategies for estimating levels or changes couldbe different (Rao, Srinath and Quenneville, 1989, 458-460); in this case forestimating changes the modified Fuller estimator 7) is always the best in eachdomain and lead to an average error equal to 4,67% using a simple mean and to3,87% using weighted means, and gains respect to the ratio estimator 1) are quitehigh, since ratio’s average errors are respectively equal to 6,38% and 5,06%. Thismeans that for estimation of change a 50% coverage is not enough to guarantee asignificant reduction of self-selection bias in the available quick sample.

On the other hand, for estimation of levels the recourse to the actual ratio

Page 27: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 471

estimator 1) seems justified, even though lightly better results could be achievedusing a regression bivariate with constant estimator 10) – as for “Foreigners inhotels” and “Italians in o.c.a.”, even though the gain in precision is very muchhigher in the first case than in the second – or a ratio bivariate estimator 8) – as for“Italians in hotels”.

From the point of view of a “minimax” strategy, table 7.7 confirms that, forlevel estimation, the ratio estimator 1) should be preferred to the modified Fullerestimator 7) only with a 50% coverage (and mostly because of the bad performanceof estimator 7) for the domain “Foreigners in o.c.a.”), while the vice-versa holdswhen coverage is low (20%): in the first case the highest estimate error is 3,91% for1) and 5,05% for 7); in the second one it is, respectively, 9,82% against 9,21%.

Let’s note that when coverage is 50%, several estimators based on 2 auxiliaryvariables (mainly estimator 12)) tend to limit the highest percent estimate errormore than univariate estimators.

Finally, results in the second part of the table confirm that, when levelestimation is considered, the availability of the largest provinces (in this case, thefirst ten) implies efficiency of ratio estimator 1), according to the MSE formula

(3.3). Of course, this situation represents only one of the 10310

quick samples that

could be effectively available. Moreover, even though in this case coverage in termsof nights spent is 45,3%15 and, so, not very far from 50%, error got with ratioestimator 1) is quite higher than the average error achieved with the same estimatorwith 100 random replications as in table 7.6, where this level of coverage wasguaranteed using a quite larger number of provinces (51 instead of 10). This meansthat it is better to base quick estimations on a large number of units (even thoughtheir coverage in terms of y-variable is not particularly high), rather than trying toinduce only a small sub-set of large units to respond in advance.

The empirical percent incidence of squared bias on the global MSE, evaluatedaccording to (7.3), reaches the lowest level just with the ratio estimator only fornights spent in hotels and a coverage equal to 20% (table 7.8). In all the other cases,it can be reduced using mostly estimators belonging to the second group. On theaverage, for a coverage equal to 20%, the lowest incidence is 8,01%, got usingestimator 4) based on a balanced sub-sample, while for a coverage equal to 50% itis 14,62%, got with estimator 9) (regression bivariate). It is worthwhile to note thatbias’ incidence is quite always higher with a higher coverage, meaning that increase

15 The main 10 provinces in terms of overall nights spent in 2004 are: Venezia, Bolzano-Bozen,Roma, Rimini, Trento, Milano, Verona, Napoli, Firenze, Salerno.

Page 28: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

472 Gismondi R.

of coverage leads to a reduction of estimators’ variance more than proportional thanreduction of bias.

The empirical evidence suggests the possibility to introduce in the surveysome mixed quick estimation strategies, in order to achieve to a higher efficiency(table 7.9). Use of estimation strategies based on one or two estimators can besupposed realistic if one considers separately the 4 domains, without evaluatingdifferent monthly performances16. In details:• on the basis of the actual 8 quick provinces, the best strategy consist in using

estimator 5) – separate regression with (3.22) – for “Italians in hotels” and“Foreigners in o.c.a.”, and estimator 7) – modified Fuller – for “Italians in o.c.a.”and “Foreigners in hotels”. In this way, we get an advantage respect to thecurrent ratio estimator for each domain both for levels and changes: the averageimprovement of precision is 0,75 percent points for levels and 0,78 points forchange.

• With a 20% coverage (low coverage), it is convenient to use always predictor7) – modified Fuller – except for “Italians in hotels”, for which estimator 11) –ratio bivariate with x2=y(m-6) – should be used. This strategy would produce again respect to the ratio estimator in each case except level estimation of“Foreigners in hotels”. The average gain is 1,09 percent points for levels and1,15 points for change.

• The most controversial situation concerns a 50% coverage. In this case there isa contrast between optimality of estimator 7) for changes and its clear sub-optimality for levels, so that a conservative option could be in favour of retainingratio estimator. However, an improvement could still be reached using analternative strategy based on the ratio bivariate estimator 8) for “Italians inhotels” and estimator 10) (regression bivariate with constant) in all the otherdomains. The average gain would be 0,25 percent points for levels and 0,02 forchanges.

16 At the moment, the possibility to use different quick estimators for different months is not realisticand it is not developed in any ISTAT survey.

Page 29: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 473T

ab. 7

.4: P

rovi

sion

al e

stim

ate

erro

rs u

sing

the

8 qu

ick

prov

ince

s (a

vera

ge o

f 12

mon

ths)

Ital

ians

in h

otel

sF

orei

gner

s in

hot

els

Ital

ians

in “

o.c.

a.”

For

eign

ers

in “

o.c.

a.”

Tot

al

Pre

dict

orS

impl

eW

eigh

ted

Sim

ple

Wei

ghte

dS

impl

eW

eigh

ted

Sim

ple

Wei

ghte

dS

impl

eW

eigh

ted

Per

cent

err

ors

(Lev

els)

- M

AP

E

Ra

tio (

3.2

with

v=x)

3,6

23

,68

4,4

64

,28

6,5

44

,41

7,1

25

,30

5,4

34

,20

Re

gre

ssio

n (

3.2

with

v=1

)4

,30

4,1

95

,72

5,5

18

,72

6,8

48

,54

5,1

66

,82

5,2

0R

eg

ress

ion

with

co

nst

an

t (3

.9)

4,8

14

,77

4,6

04

,08

6,3

23

,84

6,4

25

,32

5,5

44

,46

Ba

lan

ced

su

b-s

am

ple

(3

.8)

7,9

56

,78

7,1

57

,64

8,1

33

,96

26

,98

29

,21

12

,55

9,3

3S

ep

ara

te r

eg

ress

ion

(4

.24

with

4.2

2)

2,74

2,6

94

,35

4,3

89

,02

5,7

74,

624

,23

5,1

83,

96S

ep

ara

te r

eg

ress

ion

(4

.24

with

4.2

3)

3,0

82,

535

,99

5,7

01

0,5

37

,71

5,9

44

,24

6,3

94

,65

Mo

difi

ed

Fu

ller

(5.6

)3

,35

3,3

84,

203,

965

,91

3,7

71

3,4

81

8,8

56

,73

5,5

9

Ra

tio b

iva

ria

te (

6.2

with

v=x 1x

2)4

,27

4,4

45

,35

4,9

44,

754

,57

6,2

54,

205,

164

,58

Re

gre

ssio

n b

iva

ria

te

(6.2

with

v=

1)

4,7

04

,51

6,6

26

,67

8,6

86

,78

9,1

06

,92

7,2

75

,87

Re

gre

ssio

n b

iva

ria

te w

ith c

on

sta

nt

(6.5

)4

,46

4,6

55

,70

4,9

04

,86

3,21

8,7

58

,08

5,9

54

,88

Ra

tio b

iva

ria

te (

6.2

with

v=x 1x

2) (

*)7

,09

6,6

48

,22

8,3

39

,87

6,9

71

3,1

58

,97

9,5

87

,48

Re

gre

ssio

n b

iva

ria

te

(6.2

with

v=

1)

(*)

5,2

65

,03

6,7

06

,39

7,3

84

,43

11

,07

,93

7,5

95

,67

Re

gre

ssio

n b

iva

ria

te w

ith c

on

sta

nt

(6.5

) (*

)5

,14

3,9

88

,10

6,6

91

1,4

6,7

19

,99

7,5

78

,66

5,7

3

Abs

olut

e di

ffere

nces

(C

hang

es)

Ra

tio (

3.2

with

v=x )

3,7

03

,72

4,7

34

,52

6,9

23

,90

7,7

46

,27

5,7

74

,31

Re

gre

ssio

n (

3.2

with

v=1

)4

,37

4,2

25

,99

5,7

29

,03

6,2

68

,87

5,6

17

,07

5,2

2R

eg

ress

ion

with

co

nst

an

t (3

.9)

4,9

54

,84

4,8

44

,27

6,9

94

,63

7,0

86

,34

5,9

74

,83

Ba

lan

ced

su

b-s

am

ple

(3

.8)

8,1

16

,79

7,4

47

,89

9,1

04

,85

26

,90

27

,51

12

,89

9,3

6S

ep

ara

te r

eg

ress

ion

(4

.24

with

4.2

2)

2,79

2,7

14

,54

4,5

39

,33

5,4

65,

104

,31

5,4

43,

97S

ep

ara

te r

eg

ress

ion

(4

.24

with

4.2

3)

3,1

52,

566

,24

5,8

71

0,8

17

,12

6,2

74

,26

6,6

24

,60

Mo

difi

ed

Fu

ller

(5.6

)3

,42

3,4

34,

474,

205

,89

3,72

13

,85

18

,76

6,9

15

,65

Ra

tio b

iva

ria

te (

6.2

with

v=x 1x

2)4

,37

4,4

75

,52

5,0

34,

875

,20

6,4

03,

975,

294

,71

Re

gre

ssio

n b

iva

ria

te

(6.2

with

v=

1)

4,8

14

,56

6,9

36

,92

8,8

16

,17

9,2

87

,28

7,4

55

,89

Re

gre

ssio

n b

iva

ria

te w

ith c

on

sta

nt

(6.5

)4

,54

4,6

85

,95

5,0

35

,45

3,8

89

,23

8,5

86

,29

5,1

2

Ra

tio b

iva

ria

te (

6.2

with

v=x 1x

2) (

*)9

,65

9,4

71

0,8

91

0,3

71

4,1

27

,19

22

,65

25

,25

14

,33

11

,29

Re

gre

ssio

n b

iva

ria

te

(6.2

with

v=

1)

(*)

5,4

25

,13

7,0

86

,74

7,3

84

,37

11

,88

,62

7,9

25

,88

Re

gre

ssio

n b

iva

ria

te w

ith c

on

sta

nt

(6.5

) (*

)6

,09

5,0

17

,18

5,8

81

2,4

88

,49

9,1

88

,26

8,7

36

,35

(*)

The

var

iabl

e x2

is th

e va

riabl

e y(m

-6).

In b

old

th

e “

be

st”

pre

dic

tor,

un

de

rlin

ed

th

e “

seco

nd

be

st”

pre

dic

tor.

Page 30: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

474 Gismondi R.T

ab. 7

.5: P

rovi

sion

al e

stim

ate

erro

rs w

ith a

20%

cov

erag

e (a

vera

ge o

f 12

mon

ths

and

100

rand

om r

eplic

atio

ns).

Ital

ians

in h

otel

sF

orei

gner

s in

hot

els

Ital

ians

in “

o.c.

a.”

For

eign

ers

in “

o.c.

a.”

Tot

al

Pre

dict

orS

impl

eW

eigh

ted

Sim

ple

Wei

ghte

dS

impl

eW

eigh

ted

Sim

ple

Wei

ghte

dS

impl

eW

eigh

ted

Per

cent

err

ors

(Lev

els)

- M

AP

E

Rat

io (

3.2

with

v=x)

5,57

4,72

5,42

5,94

3,96

2,84

6,73

4,86

5,42

4,72

Reg

ress

ion

(3.2

with

v=

1)7,

686,

268,

299,

404,

953,

679,

236,

567,

546,

68R

egre

ssio

n w

ith c

onst

ant (

3.9)

5,42

4,71

6,15

6,97

3,99

3,00

7,37

5,74

5,73

5,15

Bal

ance

d su

b-sa

mpl

e (3

.8)

5,54

4,65

5,43

5,84

3,85

2,66

8,63

4,98

5,86

4,64

Sep

arat

e re

gres

sion

(4.

24 w

ith 4

.22)

6,18

5,14

4,48

4,47

4,80

3,37

9,38

5,79

6,21

4,69

Sep

arat

e re

gres

sion

(4.

24 w

ith 4

.23)

8,20

6,55

6,46

6,61

6,16

4,15

12,0

26,

538,

216,

09M

odifi

ed F

ulle

r (5

.6)

5,67

4,74

5,51

6,00

3,97

2,75

2,88

1,95

4,51

4,35

Rat

io b

ivar

iate

(6.

2 w

ith v=

x 1x2)

3,52

3,33

4,72

4,41

10,0

27,

5910

,69

7,66

7,24

5,02

Reg

ress

ion

biva

riate

(6.

2 w

ith

v=1)

7,57

6,21

6,23

6,63

4,92

3,75

7,92

5,80

6,66

5,79

Reg

ress

ion

biva

riate

with

con

stan

t (6.

5)5,

374,

714,

835,

084,

133,

196,

765,

385,

274,

60

Rat

io b

ivar

iate

(6.

2 w

ith v=

x 1x2)

(*)

2,92

2,92

5,15

5,36

11,3

09,

5514

,87

10,6

68,

565,

89R

egre

ssio

n bi

varia

te (

6.2

with

v=

1) (

*)7,

836,

377,

037,

594,

623,

198,

095,

776,

896,

01R

egre

ssio

n bi

varia

te w

ith c

onst

ant (

6.5)

(*)

5,35

4,52

5,83

6,42

4,21

2,93

7,44

5,24

5,71

4,84

Abs

olut

e di

ffere

nces

(C

hang

es)

Rat

io (

3.2

with

v=x)

6,47

5,49

7,39

7,61

5,89

3,89

12,5

911

,47

8,09

6,53

Reg

ress

ion

(3.2

with

v=

1)8,

416,

829,

3810

,16

6,82

4,44

14,3

312

,88

9,73

8,07

Reg

ress

ion

with

con

stan

t (3.

9)6,

135,

318,

558,

955,

944,

0013

,56

12,0

18,

556,

93

Bal

ance

d su

b-sa

mpl

e (3

.8)

6,85

5,78

7,61

7,78

6,26

3,78

14,3

111

,50

8,76

6,68

Sep

arat

e re

gres

sion

(4.

24 w

ith 4

.22)

6,72

5,56

6,47

6,23

7,06

4,45

13,8

010

,76

8,51

6,20

Sep

arat

e re

gres

sion

(4.

24 w

ith 4

.23)

8,80

7,14

7,76

7,60

8,27

5,00

16,4

511

,90

10,3

27,

46M

odifi

ed F

ulle

r (5

.6)

5,78

4,81

5,69

6,17

3,97

2,71

11,9

510

,97

6,85

5,57

Rat

io b

ivar

iate

(6.

2 w

ith v=

x 1x2)

5,47

4,65

7,00

6,21

10,8

28,

6715

,84

13,6

59,

787,

02R

egre

ssio

n bi

varia

te (

6.2

with

v=

1)8,

416,

937,

267,

386,

834,

5413

,21

12,0

58,

937,

24R

egre

ssio

n bi

varia

te w

ith c

onst

ant (

6.5)

6,40

5,51

7,01

6,93

6,17

4,12

12,8

011

,61

8,10

6,41

Rat

io b

ivar

iate

(6.

2 w

ith v=

x 1x2)

(*)

4,71

4,36

8,04

7,43

11,6

310

,29

15,7

713

,02

10,0

47,

48R

egre

ssio

n bi

varia

te (

6.2

with

v=

1) (

*)9,

057,

158,

418,

856,

484,

1513

,06

10,8

39,

257,

51R

egre

ssio

n bi

varia

te w

ith c

onst

ant (

6.5)

(*)

6,36

5,29

8,36

8,66

6,08

4,00

13,0

910

,72

8,47

6,68

(*)

The

var

iabl

e x2

is th

e va

riabl

e y(m

-6).

Page 31: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 475T

ab. 7

.6: P

rovi

sion

al e

stim

ate

erro

rs w

ith a

50%

cov

erag

e (a

vera

ge o

f 12

mon

ths

and

100

rand

om r

eplic

atio

ns).

Italia

ns in

hot

els

For

eign

ers

in h

otel

sIta

lians

in “

o.c.

a.”

For

eign

ers

in “

o.c.

a.”

Tot

al

Pre

dict

orS

impl

eW

eigh

ted

Sim

ple

Wei

ghte

dS

impl

eW

eigh

ted

Sim

ple

Wei

ghte

dS

impl

eW

eigh

ted

Per

cent

err

ors

(Lev

els)

- M

AP

E

Rat

io (

3.2

with

v=x)

2,38

1,98

2,32

2,46

1,79

1,31

3,11

2,12

2,40

2,00

Reg

ress

ion

(3.2

with

v=

1)2,

892,

332,

642,

742,

141,

573,

732,

692,

852,

34R

egre

ssio

n w

ith c

onst

ant (

3.9)

2,25

1,91

2,26

2,39

1,86

1,34

3,58

2,52

2,49

2,01

Bal

ance

d su

b-sa

mpl

e (3

.8)

2,56

2,19

2,61

2,72

2,07

1,43

3,83

2,63

2,77

2,25

Sep

arat

e re

gres

sion

(4.

24 w

ith 4

.22)

2,66

2,22

2,20

2,20

2,17

1,41

4,19

2,56

2,80

2,10

Sep

arat

e re

gres

sion

(4.

24 w

ith 4

.23)

3,24

2,68

2,45

2,52

2,67

1,70

4,47

3,13

3,21

2,50

Mod

ified

Ful

ler

(5.6

)2,

411,

982,

372,

501,

981,

584,

564,

852,

832,

42

Rat

io b

ivar

iate

(6.

2 w

ith v=

x 1x2)

1,80

1,56

2,74

2,37

6,33

4,34

6,87

4,72

4,44

2,74

Reg

ress

ion

biva

riate

(6.

2 w

ith

v=1)

2,92

2,41

2,35

2,40

2,03

1,47

3,81

2,72

2,78

2,26

Reg

ress

ion

biva

riate

with

con

stan

t (6.

5)2,

352,

012,

102,

121,

741,

233,

422,

332,

411,

93

Rat

io b

ivar

iate

(6.

2 w

ith v=

x 1x2)

(*)

1,78

1,62

2,89

2,67

4,31

3,87

8,25

5,82

4,31

2,89

Reg

ress

ion

biva

riate

(6.

2 w

ith

v=1)

(*)

2,68

2,16

2,69

2,80

1,98

1,89

3,65

2,89

2,75

2,38

Reg

ress

ion

biva

riate

with

con

stan

t (6.

5) (

*)4,

724,

152,

392,

512,

151,

533,

603,

013,

213,

03

Abs

olut

e er

rors

(C

hang

es)

Rat

io (

3.2

with

v=x)

4,63

3,82

6,67

6,18

4,57

2,80

9,67

9,85

6,38

5,06

Reg

ress

ion

(3.2

with

v=

1)4,

613,

776,

315,

925,

312,

989,

089,

686,

334,

97R

egre

ssio

n w

ith c

onst

ant (

3.9)

4,69

3,89

6,70

6,18

4,53

2,86

10,3

89,

996,

575,

11

Bal

ance

d su

b-sa

mpl

e (3

.8)

4,88

4,00

7,01

6,54

4,63

2,93

10,5

29,

866,

765,

25S

epar

ate

regr

essi

on (

4.24

with

4.2

2)4,

733,

936,

495,

964,

922,

8910

,46

9,78

6,65

5,04

Sep

arat

e re

gres

sion

(4.

24 w

ith 4

.23)

4,61

3,85

6,29

5,90

5,45

3,03

10,8

410

,45

6,80

5,11

Mod

ified

Ful

ler

(5.6

)3,

463,

024,

464,

582,

992,

567,

776,

974,

673,

87

Rat

io b

ivar

iate

(6.

2 w

ith v=

x 1x2)

4,79

3,94

6,30

5,34

6,81

5,43

64,8

219

,07

20,6

86,

55R

egre

ssio

n bi

varia

te (

6.2

with

v=

1)4,

703,

886,

335,

915,

202,

969,

259,

986,

375,

05R

egre

ssio

n bi

varia

te w

ith c

onst

ant (

6.5)

4,72

3,91

6,66

6,07

4,51

2,81

9,60

9,67

6,37

5,03

Rat

io b

ivar

iate

(6.

2 w

ith v=

x 1x2)

(*)

4,66

3,85

6,66

5,65

4,43

4,15

10,0

110

,09

6,44

5,21

Reg

ress

ion

biva

riate

(6.

2 w

ith

v=1)

(*)

4,55

3,71

6,42

6,01

4,77

2,60

9,63

9,85

6,34

4,92

Reg

ress

ion

biva

riate

with

con

stan

t (6.

5) (

*)6,

335,

396,

736,

174,

813,

139,

719,

86†

6,89

5,74

(*)

The

var

iabl

e x2

is th

e va

riabl

e y(m

-6).

Page 32: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

476 Gismondi R.T

ab. 7

.7: P

rovi

sion

al e

stim

ate

high

est e

rror

s w

hen

usin

g th

e 10

larg

est p

rovi

nces

(av

erag

e of

12

mon

ths)

..

Pre

dict

orIta

lians

in h

otel

sFor

eign

ers

in h

otel

sItal

ians

in “

o.c.

a.”F

orei

gner

s in

“o.

c.a.

”T

otal

The

hig

hest

per

cent

err

or (

Leve

ls)

on 1

00 r

ando

m r

eplic

atio

ns

20%

50%

20%

50%

20%

50%

20%

50%

20%

(**

)50

% (

**)

Rat

io (

3.2

with

v=x)

10,2

04,

068,

524,

347,

552,

0513

,00

5,21

9,82

3,91

Reg

ress

ion

(3.2

with

v=

1)19

,29

4,37

13,5

54,

118,

462,

9218

,74

5,13

15,0

14,

13R

egre

ssio

n w

ith c

onst

ant (

3.9)

8,08

4,12

12,4

84,

316,

863,

1816

,05

5,26

10,8

74,

22

Bal

ance

d su

b-sa

mpl

e (3

.8)

9,85

3,75

9,17

4,41

8,13

3,15

16,1

75,

1410

,83

4,12

Sep

arat

e re

gres

sion

(4.

24 w

ith 4

.22)

9,41

3,87

8,17

4,24

8,13

2,95

22,3

96,

9312

,03

4,50

Sep

arat

e re

gres

sion

(4.

24 w

ith 4

.23)

19,4

65,

779,

334,

9111

,16

3,84

32,6

17,

7618

,14

5,57

Mod

ified

Ful

ler

(5.6

)10

,25

4,10

8,40

4,35

6,60

2,54

11,6

09,

219,

215,

05

Rat

io b

ivar

iate

(6.

2 w

ith v=

x 1x2)

5,35

3,08

8,58

5,19

21,9

79,

5018

,82

10,7

113

,68

7,12

Reg

ress

ion

biva

riate

(6.

2 w

ith

v=1)

17,6

14,

319,

433,

738,

642,

9217

,70

6,61

13,3

44,

39R

egre

ssio

n bi

varia

te w

ith c

onst

ant (

6.5)

7,52

4,00

8,90

3,57

7,22

3,14

15,8

15,

709,

864,

11

Rat

io b

ivar

iate

(6.

2 w

ith v=

x 1x2)

(*)

6,34

3,36

6,67

6,15

23,0

74,

7833

,13

10,2

517

,30

6,13

Reg

ress

ion

biva

riate

(6.

2 w

ith

v=1)

(*)

16,5

94,

229,

764,

049,

012,

779,

014,

3511

,09

3,85

Reg

ress

ion

biva

riate

with

con

stan

t (6.

5) (

*)13

,19

11,4

510

,42

4,34

8,26

10,3

08,

264,

4410

,03

7,63

Per

cent

err

ors

(Lev

els)

usi

ng th

e 10

mai

n pr

ovin

ces

– M

AP

E

Sim

ple

Wei

ghte

dS

impl

eW

eigh

ted

Sim

ple

Wei

ghte

dS

impl

eW

eigh

ted

Sim

ple

Wei

ghte

d

Rat

io (

3.2

with

v=x)

4,74

3,84

2,97

3,41

3,66

2,36

2,50

2,00

3,47

3,19

Reg

ress

ion

(3.2

with

v=

1)4,

563,

212,

802,

485,

444,

225,

293,

224,

523,

20R

egre

ssio

n w

ith c

onst

ant (

3.9)

5,52

5,50

3,63

3,11

5,28

5,12

5,56

4,57

5,00

4,63

Bal

ance

d su

b-sa

mpl

e (3

.8)

8,87

7,71

2,43

2,31

8,13

3,96

15,9

815

,21

8,85

6,41

Sep

arat

e re

gres

sion

(4.

24 w

ith 4

.22)

5,10

4,44

2,39

2,57

5,19

3,71

4,91

4,18

4,40

3,74

Sep

arat

e re

gres

sion

(4.

24 w

ith 4

.23)

5,08

4,77

2,75

2,63

6,66

4,78

5,71

4,65

5,05

4,15

Mod

ified

Ful

ler

(5.6

)5,

464,

423,

033,

683,

452,

452,

881,

643,

703,

47

Rat

io b

ivar

iate

(6.

2 w

ith v=

x 1x2)

4,52

4,34

2,53

2,30

5,54

3,55

3,93

3,07

4,13

3,45

Reg

ress

ion

biva

riate

(6.

2 w

ith

v=1)

5,83

4,00

2,86

2,56

5,95

4,48

5,32

2,79

4,99

3,53

Reg

ress

ion

biva

riate

with

con

stan

t (6.

5)15

,41

15,3

69,

599,

627,

867,

3614

,46

13,6

811

,83

11,9

6

Rat

io b

ivar

iate

(6.

2 w

ith v=

x 1x2)

(*)

3,84

3,19

2,77

2,46

4,90

3,41

4,69

3,75

4,05

3,10

Reg

ress

ion

biva

riate

(6.

2 w

ith

v=1)

(*)

4,42

3,11

2,97

2,74

4,71

3,91

4,56

3,06

4,17

3,16

Reg

ress

ion

biva

riate

with

con

stan

t (6.

5) (

*)16

,30

16,5

411

,35

10,6

16,

656,

0214

,11

13,7

912

,10

12,4

6

(*)

The

var

iabl

e x2

is th

e va

riabl

e y(m

-6).

(*

*) W

eigh

ted

mea

ns.

Page 33: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 477

Tab. 7.8: Empirical percent incidence of squared bias on the global MSE (average of 12 monthsand 100 random replications).

Predictor Italians Foreigners Italians Foreigners Total

in hotels in hotels in “o.c.a.” in “o.c.a.”

20% 50% 20% 50% 20% 50% 20% 50% 20% (*) 50% (*)

(3.2) with v=x 5,17 19,42 4,78 34,94 18,01 3,42 9,82 8,98 8,17 19,32

(3.2) with v=1 8,18 18,54 9,20 24,04 16,86 16,78 7,75 5,73 10,11 18,11

(3.9) 5,30 19,04 6,32 34,86 17,24 3,06 8,98 17,89 8,39 20,21

(3.8) 7,05 19,14 9,45 38,37 10,38 6,28 4,17 4,76 8,01 20,20

(4.24) with (4.22) 6,72 13,01 8,81 32,07 13,91 3,06 11,11 4,73 9,27 15,36

(4.24) with (4.23) 15,29 16,19 12,77 26,74 17,16 17,227,18 8,03 13,92 18,32

(5.6) 6,81 15,00 6,30 23,87 8,03 23,88 18,51 9,67 8,39 18,56

(6.2) with v=x1x2 11,77 27,08 18,50 19,06 19,86 20,29 21,16 25,48 16,44 23,29

(6.2) with v=1 8,93 15,14 12,68 16,77 16,68 14,81 9,00 7,92 11,51 14,62

(6.5) 5,24 18,59 11,41 30,28 17,59 1,75 11,01 12,34 10,13 17,78

(6.2) with v=x1x2 (**) 20,52 23,97 15,88 22,99 27,85 28,15 22,17 24,26 20,86 24,55

(6.2) with v=1 (**) 20,49 17,24 10,94 15,02 19,43 18,22 17,25 14,23 17,19 16,43

(6.5) (**) 11,88 17,46 15,17 19,04 20,45 9,34 19,26 20,04 15,42 16,64

(*) All errors have been obtained using weighted means. (**) The variable x2 is the variable y(m-6).

Tab. 7.9: Some optimal strategies for provisional estimates (at most 2 estimators are used).

Domain Optimal strategy Gain in precision vs ratio estimator 1)(*) (*)

Estimator Error Error Estimator Error Erroron levels on changes on levels on changes

8 provinces

Italians in hotels 5 2,69 2,71 1 0,99 1,01Foreigners in hotels 7 3,96 4,20 1 0,32 0,32Italians in “o.c.a.” 7 3,77 3,72 1 0,64 0,18Foreigners in “o.c.a.” 5 4,23 4,31 1 1,07 1,96Total 3,45 3,53 0,75 0,78

Coverage = 20%

Italians in hotels 11 2,92 4,36 1 1,80 1,13Foreigners in hotels 7 6,00 6,17 1 -0,06 1,44Italians in “o.c.a.” 7 2,75 2,71 1 0,09 1,18Foreigners in “o.c.a.” 7 1,95 10,97 1 2,91 0,50Total 3,63 5,38 1,09 1,15

Coverage = 50%

Italians in hotels 8 1,56 3,94 1 0,42 -0,12Foreigners in hotels 10 2,12 6,07 1 0,34 0,11Italians in “o.c.a.” 10 1,23 2,81 1 0,08 -0,01Foreigners in “o.c.a.” 10 2,33 9,67 1 -0,21 0,18Total 1,75 5,04 0,25 0,02

(*) Weighted average errors are used.

Page 34: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

478 Gismondi R.

8. CONCLUDING REMARKS

The availability of historical data – playing the role of auxiliary informationin a model based prediction context where the sample of quick respondents is given– led to an in-depth comparison among different prediction techniques, where theneed of an ex-post correction for self-selection bias was strongly emphasized.

Even though shortness of time series did not allow for definitive conclusionson robustness of results, some final remarks resume the clearest evidences raisedfrom the application:• the ratio predictor used in the current survey on tourist nights spent, even though

quite efficient, can be improved by other estimators, especially when coverageis low. It could be still used when coverage is quite higher (at least 50%).

• Non random self-selection of quick respondents can be reduced using particularregression techniques as those given by formulas (4.24) and (4.6). On the otherhand, post-sampling balancing procedures do not produce significant biasreductions.

• The use of estimations techniques based on 2 auxiliary variables can be helpfulonly with a coverage equal to 50%, but efficiency gains respect to the ratioestimator are low.

• A bias reduction can be achieved using estimators different from the ratio whencoverage is 50% and mainly for nights spent in the other collectiveaccommodations.

• Generally speaking, for the four domains taken into account (Italians in hotels,foreigners in hotels, Italians in other collective accommodations, foreigners inother collective accommodations) a mixture of different quick predictors couldbe used.

9. APPENDIX

9.1 APPENDIX 1: OPTIMAL PREDICTION UNDER MODEL G MR

The GMR model can be defined as follows:

i i i

i

i i

i j

y

E

V vC

= ′ +==x ββ ε

εε σε ε

( )

( )

( , )

02

== ≠

0 if i j

(9.1)

where ′ =xi i i kix x x( , ,..., )1 2 and ′ =ββi k( , ,..., )1 2β β β . If ′ =y ( , ,..., )1 2y y yn , X is the

Page 35: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 479

(n×k) matrix containing the n row-vectors ′xi (the first column could contain n 1s)

and ∑∑ is the (n×n) diagonal matrix containing the general i-th term vi it will also

follow that: E S S( )y X= ββ , E S S( )y X= ββ , V S S( )y = 2σ , V S S( )y = 2σ ∑∑ ,

C S S( , )y y = 0 , where S and S refer, respectively, to the sample and non sample

units. For any given sampling design, the BLU predictor of the unknown mean y

is given by (Cassel, Särndal, Wretman, 1993, 127):

BLU S S BLUTn

Ny

n

N=

+ −

1 ' ˆx ββ (9.2)

where S S S kSx x x' ( , ,..., )x = 1 2 and ˆ ( ˆ , ˆ ,..., ˆ )′ =ββ1 2β β βk

. In particular, we will have:

BLU S S S S S Sˆ ' 'ββ ΣΣ ΣΣ= ( ) ( )−− −11 1X X X y (9.3)

E T yN

v N nBLU i

S

S S S S S2

2

2

2 11( ) ' '− = + −( ) ( )∑ −−σx X XΣΣ xx

(9.4)

This statement can be obtained as follows: the desired predictor must be of the

form: − + −1N ny N n US[ ( ) ] , where E U S( ) '= x ββ . The (model) unbiased estimator

β̂βBLU of β is the well known generalised least squared estimator (Johnston, 1972,233). The expression (9.3) follows easily.

9.2 APPENDIX 2: OPTIMAL PREDICTION UNDER MODEL (4.1) (SELF-SELECTION BIAS)

The unbiasedness condition E T yAL U( )( ) − = 0 under model (4.1) leads to:

P Pi Pi SS

L Li Li SS

c x x c x xP

P

L

L

β β−

+ −

∑ ∑ == 0 (9.5)

where P

P

S Pi

S

x x= ∑ and L

L

S Li

S

x x= ∑ . A general condition that guarantees (9.5) is:

Pi Pi SS

Li Li SS

c x x c x xP

P

L

L

= =∑ ∑ . (9.6)

Page 36: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

480 Gismondi R.

Following Cicchitelli, Herzel and Montanari (1992, 394) and on the basis of(4.3) we can also write:

MSE T E T y VAR c y c yPL PL U Pi PiS

Li LiP

( ) ( )( ) ( )= − = +∑2

LLSSVAR y∑

+ ( ) (9.7)

where:

VAR c y c y c vPi PiS

Li LiS

P Pi Pi

SP L P

∑ ∑ ∑+

= +2 2σ LL Li Li

S

c vL

2 2σ ∑ and VAR y v vS O Oi

S

L Li

SO L

( ) .= +∑ ∑2 2σ σ .(9.8)

Minimisation of (9.8) under constraints given by (9.6) leads straightforwardlyto (4.4), from which we can also derive (4.5). If we write (4.3) as

( ) ( )ˆ ( ˆ ) ( ˆ )PL S PL S S S S ST y y y y y y

P P L L= + = + + + , we can easily verify that the optimal

predictor (4.4) can be also written as ( )* ( ) ( )ˆ ˆPL S S P S S LT y x y x

P P L L= + + +β β , where

PS Px β̂ is the optimal linear predictor of P PS P S

y y y= −( ) and LS Lx β̂ is the optimal

linear predictor of L LS L Sy y y= −( ) .

REFERENCES

AELEN F. (2003), “Improving Timeliness of Industrial Short-term Statistics using Time SeriesAnalysis” Statistics Netherlands Working Paper, available on www.oecd.org /dataoecd/23/12/30044343.pdf.

BATTAGLIA F., FENGA L. (2003), “Forecasting Composite Indicators with Anticipated Information:an Application to the Industrial Production Index”, Statistica Applicata, 52, 3.

BOLFARINE H., ZACKS S. (1992), Prediction Theory for Finite Populations, Springer-Verlag.

CAPPUCCIO N., ORSI R. (1991), Econometria, Il Mulino, Bologna.

CASSEL C., SÄRNDAL C.E., WRETMAN J. (1977), Foundations of Inference in Survey Sampling,J.Wiley & Sons, New York.

CASSEL C., SÄRNDAL C.E., WRETMAN J. (1983), “Some Uses of Statistical Models inConnection with the Nonresponse Problem”, in: Madow W.G., Olkin I., Rubin D. (eds.),Incomplete Data in Sample Surveys, vol.3, 143-160, Academic Press, New York.

CICCHITELLI G., HERZEL A., MONTANARI G.E. (1992), Il campionamento statistico, Il Mulino,Bologna.

COCHRAN W.G. (1977), Sampling Techniques, J.Wiley & Sons, New York.

COSTA P., MANENTE M. (2000), Economia del turismo, Touring Club Italiano, Milano.

COUNCIL OF THE EUROPEAN UNION (1995), Council Directive 95/57/CE on the Collection ofStatistical Information in the Field of Tourism, 23th November, Bruxelles.

DALABEHERA M., SAHOO L.N. (1999), “A New Estimator with Two Auxiliary Variables forStratified Sampling”, Statistica, anno LIX, 1, 101-107, Clueb, Bologna.

Page 37: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

More Rapid Tourism Statistics Using Auxiliary Variables 481

DEVILLE J.C., TILLÉ Y. (2004), “Efficient Balanced Sampling: the Cube Method”, Biometrika, 91,4, 893-912.

DIVISEKERA S. (2003), “A Model of Demand for International Tourism”, Annals of TourismResearch, 30, 31-49.

DRUDI I., FERRANTE M.R. (2003), “Stima da fonti amministrative longitudinali con parzialesovrapposizione delle unità”, in: Falorsi P.D., Pallara A., Russo A. (eds.), Temi di ricerca edesperienze sull’utilizzo a fini statistici di dati di fonte amministrativa, 115-132, Franco Angeli,Milano.

DRUDI I., FILIPPUCCI C. (2000), “Inferenza da campioni longitudinali affetti da selezione noncasuale”, in: Filippucci C. (ed.), Tecnologie informatiche e fonti amministrative nella produ-zione di dati, 415-432, Franco Angeli, Milano.

EUROSTAT (2000), Short-term Statistics Manual, Eurostat, Luxembourg.

FALORSI P., ALLEVA G., BACCHINI F., IANNACCONE R. (2005), “Estimates Based onPreliminary Data from a Specific Subsample and from Respondents not Included in theSubsample”, Statistical methods and applications, 14, 1, 83-99, Physica-Verlag.

FULLER W. (1990), Analysis of Repeated Surveys, Survey Methodology, 16,167-180.

GISMONDI R. (2002), “Model Based Sample Selection using Balanced Sampling”, Rivista diStatistica Ufficiale, 3, 81-109, Franco Angeli, Milano.

GISMONDI R. (2003), “Optimal Provisional Estimation of Monthly Retail Trade Data”, Proceedingsof the Annual Meeting of the Statistical Society of Canada – Survey Methods Session. June 8-11, Halifax, Nova Scotia, Canada.

GISMONDI R. (2007), “Quick Estimation of Tourist Nights Spent in Italy”, Statistical Methods andApplications, on-line publication: http://dx.doi.org/10.1007/s10260-006-0035-3, Springer &Verlag.

GISMONDI R., MIRTO A.P.M., SALAMONE N. (2003), “Una stima “rapida” delle presenzeturistiche in Italia: un approccio multivariata”, Proceedings of the conference CLADAG 2003,185-188.

HARVEY A.C. (1984), “A Unified View of Statistical Forecasting Procedures”, Journal of Forecasting,3, 245-275.

HERNÁNDEZ-LÒPEZ M. (2004), “Future Tourists’ Characteristics and Decisions: the Use ofGenetic Algorithms as a Forecasting Method”, Tourism Economics, vol.10, 3, 245-262.

ISTAT (2005), Rapporto sulle stime anticipate, report finale del progetto “Stime anticipate per leindagini congiunturali sulle imprese” (a cura di Falorsi S. e Gismondi R.), Istat, Roma.

ISTAT (Anni vari), Statistiche del turismo, Collana Informazioni Istat, Roma.

JOHNSTON J. (1983), Econometria, Franco Angeli, Milano.

LIM C., MC ALEER M. (2001), “Forecasting Tourist Arrivals”, Annals of Tourism Research, Vol.28,4, 965-977.

MARAVALLE M., POLITI M., IAFOLLA P. (1993), “Scelta di indicatori per la stima rapida di unindice provvisorio della produzione industriale”, Quaderni di ricerca Istat, 6.

MONTANARI G.E. (1987), “Post-sampling Efficient QR-Prediction in Large Sample Surveys”,International Statistical Review, 55, 191-202.

PERRI P.F.(2005), “Improved Ratio-cum-product Type Estimators in Simple Random Samplingusing Two Auxiliary Variables”, Atti del convegno Cladag 2005, 473-476, MUP editore,Parma.

Page 38: MORE RAPID TOURISM STATISTICS USING …sa-ijas.stat.unipd.it/sites/sa-ijas.stat.unipd.it/files/...available in Tam (1987), Rao, Srinath and Quenneville (1989), Yansaneh and Fuller

482 Gismondi R.

RAO J.N.K., SRINATH K.P., QUENNEVILLE B. (1989), “Estimation of Level and Change UsingCurrent Preliminary Data”, in: Kasprzyk D, Duncan G, Kalton G, Singh MP (eds.), PanelSurveys, 457-485, J.Wiley & Sons, New York.

RIZZO L., KALTON G., BRICK M.J. (1996), “A Comparison of some Weighting AdjustmentMethods for Panel Non-response”, Survey Methodology, 22, 1, 43-53.

ROYALL R.M. (1988), The Prediction Approach to Sampling Theory, Handbook of Statistics vol.6. North Holland.

ROYALL R.M. (1992), “Robustness and Optimal Design Under Prediction Models for FinitePopulations”, Survey Methodology, 18, 179-185.

SÄRNDAL C.E., SWENSSON B., WRETMAN J. (1993), Model Assisted Survey Sampling,Springer Verlag.

SINGH M.P. (1965), “On the Estimation of Ratio and Product of the Population Parameters”,Sankhya, B, 27, 321-328.

SONG H., WITT S.F. (2000),‘Tourism Demand Modelling and Forecasting: Modern EconometricApproaches, Pergamon, Oxford.

TAM S.M. (1987), “Analysis of Repeated Surveys Using a Dynamic Linear Model”, InternationalStatistical Review, 55, 1, 63-73.

ULLBERG A. (2003), “More Rapid Retail Trade Statistics in Sweden”, Statistics Sweden WorkingPaper, available on www.oecd.org/dataoecd/2/62/2956932.pdf.

VALLIANT R., DORFMAN A.H., ROYALL R.M. (2000), Finite Population Sampling andInference – A Prediction Approach,J, J.Wiley & Sons, New York.

WHITE H. (1980), ”A Heteroschedasticity-Consistent Covariance Matrix Estimator and a Direct testfor Heteroschedasticity”, Econometrica, 48, 817-838.

YANSANEH I.S., FULLER W.A. (1998), “Optimal Recursive Estimation for Repeated Surveys”,Survey Methodology, Vol.24, 1, 31-40.

STIME ANTICIPATE DEL TURISMO INTERNO IN ITALIATRAMITE VARIABILI AUSILIARIE

Riassunto

L’ISTAT diffonde correntemente i dati definitivi sugli arrivi e le presenze dei clientinelle strutture ricettive italiane dopo 180 giorni dalla fine del mese di riferimento, sebbenegran parte degli utilizzatori richiedano stime preliminari a 30-60 giorni. In questo lavoro,si propongono e confrontano alcuni stimatori utilizzabili per calcolare e diffondere stimeanticipate, ponendo in evidenza il problema derivato dalla distorsione da auto-selezioneche potrebbe caratterizzare le unità che tendono a rispondere anticipatamente. E’ statacondotta una simulazione – riferita al periodo 2002-2004 – basata su replicazioni casualidi possibili sottoinsiemi di rispondenti anticipati e sulla disponibilità effettiva di risposteanticipate per alcune province italiane.