A robust algorithm for the sequential linear analysis of environmental radiological data with...
-
Upload
carlos-rivero -
Category
Documents
-
view
212 -
download
0
Transcript of A robust algorithm for the sequential linear analysis of environmental radiological data with...
Research Article
132
Received: 28 December 2007, Accepted: 28 September 2009, Published online in Wiley Online Library: 9 December 2009
A robust algorithm for the sequential linearanalysis of environmental radiological data withimprecise observationsCarlos Riveroa and Teofilo Valdesb*
In this paper we present an algorithm suitable to analyse linear models under the following robust conditions: the data is
(wileyonlinelibrary.com) DOI:10.1002/env.1034
* Corresp
E-mail:
a C. Rive
Departa
b T. Valde
Departa
Environme
not received in batch but sequentially; the dependent variables may be either non-grouped or grouped, that is, impreciselyobserved; the distribution of the errors may be general, thus, not necessarily normal; and the variance of the errors isunknown. As a consequence of the sequential data reception, the algorithm focuses on updating the current estimation andinference of the model parameters (slopes and error variance) as soon as a new data is received. The update of the currentestimate is simple and needs scanty computational requirements. The same occurs with the inference processes which arebased on asymptotics. The algorithm, unlike its natural competitors, has some memory; therefore, the storage of thecomplete up-to-date data set is not needed. This fact is essential in terms of computer complexity, so reducing both thecomputing time and storage requirements of our algorithm compared with other alternatives. Copyright � 2009 JohnWiley & Sons, Ltd.
Keywords: algorithmic estimation; stochastic approximation; linear model under robust conditions; conditional imputationtechniques; asymptotics and simulation studiesAMS subject classifications: 62F10; 62F15
1. INTRODUCTION
This paper focuses on the statistical analysis of the linear model
yi ¼ x0ibþ s"i (1)
where b is the slope vector parameter, xi and yi are, respectively, the independent variable vector of orderm and the dependent variable of the
individual i (i¼ 1,. . .,n), ei 0s denote the standardized random errors and s> 0 is a scale parameter. The following robust conditions will be
assumed in the sequel:
(a) T
he data is received sequentially instead of in batch. This usually occurs in the context of on-line transactions, continuous sampling orquality control, among others. Under these circumstances, it is desirable to advance partial analyses and to update them as soon as a new
observation (or a group of them) is received. The great majority of the usual statistical procedures are completely memory-less with
respect to the sequential data reception. This implies that, when a new observation is received, we must throw away the former
estimations and inferences and all of the computations need to be repeated from the complete up-to-date data sets; therefore, certain
degrees of wastefulness remain in the air. For its part, our procedure has some memory in the sense that it does not need the complete
storage of the individual observations, but only a small part of them, and the update of the current estimate and inferences is
computationally done in a rather effective way.
(b) E
ach dependent observation yi may be either ungrouped (with probability p0> 0) or grouped (with probability p1¼ 1�p0> 0) withdifferent classification intervals. This situation is typical of lateral censored data, however, interval censored data (that is, grouped data)
appear, as will be seen, in many situations related to the precision of the measuring apparatus. For simplicity of notation, we will assume
that there exists a unique set of known classification intervals given by their extremes
�1 ¼ c0 < c1 < . . . < cr ¼ 1: (2)
ondence to: T. Valdes, Departamento de Estadıstica e I.O. I, Facultad de Matematicas, Universidad Complutense de Madrid, 28040 Madrid, Spain.
ro
mento de Estadıstica e I.O. II, Facultad de Ciencias Economicas y Empresariales, Universidad Complutense de Madrid, 28223 Pozuelo de Alarcon, Spain
s
mento de Estadıstica e I.O. I, Facultad de Matematicas, Universidad Complutense de Madrid, 28040 Madrid, Spain
trics 2011; 22: 132–151 Copyright � 2009 John Wiley & Sons, Ltd.
ROBUST ALGORITHM FOR SEQUENTIAL LINEAR ANALYSIS
W
Env
hen a grouped observation is within the interval (ch�1, ch] its value is lost and only this interval is known. In spite of this simplification,
the proposed algorithm is also capable of handling the following groupingmechanisms: (a) the set of classification intervals could, as was
said, vary from one grouped observation to another, and (b) it could also be possible that the value yi is only lost if it falls within some
known subset of the intervals (ch�1, ch]. Thus, some common cases of incomplete data arewithin the scope of this paper. Missing data, for
example, is a particular case of grouped data for which there exists a unique classification interval equal to (�1,1). Also, right (or left)
censored data can be visualized as a grouping process with two classification intervals, (�1, c] and (c,1), in which each observation is
only lost if it falls within one of them.
(c) T
he distribution of the error components sei may be general with the sole restriction of being within the general class of the stronglyunimodal distributions (see An (1998)) centred on zero (either symmetrical or non-symmetrical). Let f> 0 denote the density function of
the standardized errors.
(d) T
he scale parameter s> 0 is unknown and needs to be estimated jointly with the vector slope parameter b.Under these conditions it is clear that (1) the existence of grouped data makes the OLS estimation and inference inapplicable even if the
errors are assumed to be distributed normally, and (2) the non-normality of the errors reinforces the non-applicability of the OLS estimation
and inference mentioned above. For its part, the algorithm proposed here is operative and easy to implement computationally under the robust
conditions mentioned above. The maximum likelihood methods (implemented either by the EM algorithm or by direct optimization) are the
natural alternative to our procedure. However, these methods have a much greater computational complexity with sequential data in the
memory-less sense mentioned above, which is critical with massive data sets. For example, if the maximum likelihood estimation and
inference is implemented through the EM algorithm, this, unlike ours, not only demands the complete storage of the individual data and the
re-execution of its loops, but additionally it has the disadvantage, compared with our algorithm, of the awkward computations included in
each loop. These affect both the quadrature and maximization processes (involved, respectively, in the E and M steps of the EM) by which b
and s are updated. For its part, not only the direct maximum likelihood estimation has no memory with respect to the sequential data
reception but, for a fixed sample size, it also needs to tackle the numerical maximization of the integral likelihood function of the incomplete
data described, which our algorithm avoids. Finally, it is important to highlight that the maximum likelihood b-inferences are based on the
asymptotic distribution of the ML-estimates as the sample size tends to infinity. This is typically normal centred on the true value of b and
with a covariance matrix which needs to be estimated. The same occurs with the proposed algorithm, although this allows us to estimate the
asymptotic covariance matrix of the slope parameter estimates more easily than when we employ either the direct maximum likelihood
method or the EM algorithm (see, Louis (1982) and Meng and Rubin (1991) in this respect).
The paper is organized as follows. Section 2 presents a motivating real life environmental case study with which we sum up the
potentialities of the proposed algorithm. This has a direct antecedent on which Section 3 is focussed. Section 4 includes the rationale, the final
loops and the convergence properties of the proposed algorithm. In Section 5 we present several exhaustive simulation studies, the intention
being to analyse the performance and sensibility of the algorithm estimates. The sequential analysis of the radiological measurements of the
aforementioned case study will be addressed in Section 6. Also in this section the proposed algorithm is compared to the two natural
maximum likelihood alternatives: the EM algorithm and the direct optimization of the likelihood function. In this respect some technicalities
have been annexed at the end of the paper for more interested readers. Finally, the comments and remarks of Section 7 will bring the paper to a
close.
2. A MOTIVATING REAL ENVIRONMENTAL CASE STUDY
In the following we will show a real life case study in which the potentiality and effectiveness of the algorithm become evident. The data
presented in Table 1 originally motivated our study and was provided by the Nuclear Security Council of Spain which has funded a research
contract tender to mitigate biases on the estimated levels of low environmental radiological contamination. Cell values of Table 1 are
composed of two elements. On one hand, the first cell elements show several radiological gamma emissions (in units BQ/kg) recorded from
samples of vegetables taken from sites around the Spanish nuclear power stations of Almaraz (A) and Trillo (T) at distances of 100, 600 and
1000m. In compliance with law, these samples are periodically taken and sent to different laboratories, each working with different
exposition periods and with apparatus from several manufactures, which may be calibrated differently. As a consequence, some of the
recorded levels of radioactivity have been, on the one hand, submitted to upper and lower limits of detection and, on the other hand, some
rough measures have been registered as interval censored (their extreme values written in square brackets). The signs þ and � indicate,
respectively, that their corresponding measurements are above or below the upper or lower limits of detection which are written on the left of
the sign. The laboratories send their measurements on-line and, since the sampling is continuous, they are sequentially received. The second
cell elements are shown, in circular brackets, to the right of the first, and they represent the order of reception of their corresponding
radiological measurement. Thus, these second elements vary from 1 to 48 as this latter number agrees with the total number of radiological
measurements received throughout a certain time period under study. In these circumstances, we will focus on the two following aspects, for
simplicity:
(a) T
1
o determine, first, the extent to which the distance affects the gamma emissions through the simple covariance linear relation
yi ¼ mAui þ mTwi þ gzi þ s"i; (3)
where yi represents the emission, zi the distance to the central station, ui is the dummy variable I(yi belong to the station A) and, similarly,
wi¼ 1� ui¼ I(yi belong to the station T) and, finally, (mA, mT, g) is the parameter vector. With double sub-indices this model is
ironmetrics 2011; 22: 132–151 Copyright � 2009 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics
33
Table 1. Radiological measurements (with their order of reception in circular brackets)
Central power stationDistance (in metres)
100 600 1000
A 110 (1) 97 (4) 38 (5)
108 (3) [92,98] (9) 22� (12)
[109,115] (13) 88 (11) [32,37] (16)
120þ (20) 87 (28) 43 (24)
101 (21) 90 (33) 39 (25)
102 (22) 94 (34) 20� (39)
115þ (32) [93,96] (42) 17� (40)
102 (38) 90 (44) 29 (48)
T 103 (2) 91 (6) 27 (8)
[95,99] (10) 79 (7) [21,26] (15)
115þ (14) 85 (19) 20� (18)
98 (17) [79,87] (23) 33 (26)
110þ (30) 84 (29) 17� (27)
99 (31) [83.88] (41) [19,23] (35)
[92,100] (37) 81 (45) [21,25] (36)
89 (43) [82,88] (47) 21 (46)
wile
C. RIVERO AND T. VALDES
134
equivalent to
yij ¼ mj þ gzi þ s"i;
where i¼ 1,. . ., 24, j¼A,T and the rest of the elements are clearly identifiable from Equation (3). The common slope g and the different
interceptsmA andmTare plausible from the fact that both central stations have similar characteristics and are submitted to similar security
controls although their operating periods differ. Secondly, to test the hypothesis mA¼mT.
(b) T
o appraise whether or not the hypothesis that the measurements of the two central stations are similar is statistically plausible. With thisintention we pose a 3� 2 factorial experiment to determine the effects that the three distances (D) and the two power stations (P) have on
the environmental radiological measurements. We will analyse the complete model with interactions
yijh ¼ hþ Di þ Pj þ DPij þ s"ijh (4)
(i¼ 100, 600, 1000 and j¼A,T), with the usual constraintsPi Di ¼ 0;
Pj Pj ¼ 0;
Pi DPij ¼ 0 and
Pj DPij ¼ 0
where yijh denote the radiological gamma emissions and eijh are error terms.
From a general linear model perspective, the errors ei of Equation (3) and eijh of Equation (4) are usually assumed to follow a normal
distribution. However, analysts of environmental radiological measurements prefer to make the assumption that the error distribution is
double exponential due to their exponential decay. Additionally, as was said, some values yi or yijh are unknown, since they were registered as
interval censored.
Let us observe that, even assuming that the error distribution is normal and that the data was received in batch, the parameters of models (3)
and (4) can not be inferred by ordinary least squares from the data of Table 1 unless we assign particular values to all the interval-censored
data. However, the greater the censure interval length of a grouped observation, the more unclear assigning a value to it becomes; thus, any
assigned value may be questioned and may also have a determinant influence on the results. Although this influence becomes evident when
the censure interval is unbounded, contradictory upshots may be derived from assigning different values to the grouped observations with
finite censure intervals. The following cases show how contradictory statistical inferences can be obtained after assigning different values to
the censored data of Table 1. For simplicity’s sake, the attention will be focussed on testing the hypotheses H0: mA¼mT (in model (3)) and
H00: PA¼PT¼ 0 (in model (4)) against their opposite alternatives, at the 5-per cent a-level.
Case 1:H0 andH00 are accepted and rejected, respectively, when each grouped observation is given a value equal to the finite extreme of its
censure interval, if this is unbounded; otherwise, the value is equal the lower or upper extreme of its censure interval, depending on if the
observation is of station A or T, respectively.
Case 2: Both H0 and H00 result in being rejected if we assign the values mentioned above to the infinite interval censored data, and
the assignations given to the finite interval grouped data of stations A and Twere the other way round compared to case 1, that is, the upper/
lower extreme of the grouping intervals to the data of station A/T.
Case 3: Pivoting again from the assignation rules of case 1, we have given the values 12, 10 and 7 to the grouped data 22�, 22� and 17� of
station A, and the values 130 and 125 to the grouped observations 115þ and 110þ of station T. Now we find thatmA¼mTand PA¼PT¼ 0 are
both accepted.
yonlinelibrary.com/journal/environmetrics Copyright � 2009 John Wiley & Sons, Ltd. Environmetrics 2011; 22: 132–151
ROBUST ALGORITHM FOR SEQUENTIAL LINEAR ANALYSIS
Case 4: Finally, as an extreme case of this sensitivity analysis, it can be shown that one single assignation of an infinite interval-censored data
may completely modify the results. If we maintain the assignations of case 1 with the sole exception of giving the value 200 to the grouped data
115þ of station T at distance 100, the hypothesis H0 and H00 result in being rejected and accepted, respectively.
Aside from the assumption that the data was recorded in batch, it is important to highlight that the former standard statistical analyses were
also made assuming that the errors follow a normal distribution; otherwise, it is well-known that the t–F statistics are not applicable. Thus, if
we assume, for instance, that the errors follow a Laplace distribution, as usually happens with radiological measures, a new problem arises,
which adds to the mentioned statistical inconclusiveness that is derived from the existence of grouping data. Finally, let the sequential data
reception come into play. If, as said, we wish to put forward estimations and inferences after receiving a certain bevy of data and update them
as a new observation (or group of them) is received, then, new conflictive elements appear in the sequential statistical analyses and their
consistency. This joint situation is tackled in Section 6 where the data of Table 1 is analysed in detail, once our algorithm is stated. At this
moment it is important to highlight that, for a fixed sample size, the asymptotic covariance matrix of the slope parameter estimate is easy to
estimate with our algorithm, whereas its computation using ML techniques (either directly or through the EM algorithm) is far more
complicated. Although this point will be clarified in Section 6, let us advance that in the first case only first derivatives are involved, while
with theML techniques the second derivatives that form part of the Hessian of the log-likelihood do not admit an explicit expression and need
to be numerically evaluated. These comments sum up the potentialities of the algorithm proposed in this paper, which has a direct antecedent
in Rivero and Valdes (2004), as will be explained in the next section.
3. REMOTE AND DIRECT ORIGINS OF OUR ALGORITHM
The remote precedents of the proposed algorithm to treat the type of data and models described above can be found, in chronological order,
in (1) the procedure given in Healy and Westmacott (1956), (2) the missing information principle of Orchard and Woodbury (1972) and
(3) the EM algorithm (Dempster et al., 1977) when the error distribution of the linear model is normal. More recent bibliographical
antecedents are James and Smith (1984), Ritov (1990) and Anido et al. (2000). However, the direct precursor must undoubtedly be sought in
Rivero and Valdes (2004) (Section 3, p. 471), where the authors suggest an estimating algorithm useful when the data are sequentially
received and the scale parameter s of model (1) is assumed to be known. This algorithm only iterates as the sample size, n, increases. It starts
from an arbitrary size, say n0, and an associated guess of b (let us denote this by bn0, for which, in absence of other rational criterion,
we suggest to equal the OLS estimate of b based on the ungrouped data of the initial sample). Strictly speaking, the algorithm is formalized
as follows:
Basic algorithm assuming that the scale parameter is known.
Initialization: Let n0 and bn0 be the initial algorithm values (mentioned above).
Iteration: Assuming that bn�1 is given, the iteration process runs through the following steps once the nth new data is recorded:
(1) Conditional mean imputation step: For i¼ 1. . .,n, let us compute yi(bn�1)¼ yi, if yi is an ungrouped observation; otherwise,
yi(bn�1)¼ x0i b
n�1þE(sei j�x0i bn�1þ ch�1< sei��x0i bn�1þ ch), if yi is within the grouping interval (ch�1, ch].
Then, let us define yn(bn�1)¼ (y1(bn�1),. . ., yn(b
n�1)).
(2) Updating step (OLS projections): bn ¼ X0nXn
� ��1X0ny
n bn�1ð Þ:(3) bn�1 bn, and return to Step 1.
1
In this algorithm E(seij � x0bnþ ch�1< sei��x0i bnþ ch) denotes the conditional expectation of the error term sei assuming that its
corresponding grouped data yi is within the classification interval (ch�1, ch] and, as usual, X0nXn ¼
Pni¼1 xix
0i. The point b
n¼bn(s) defines the
estimate of the slope parameter of the linear model (1) which corresponds to with the sample size n.
Remark 1: Although the mean imputation step resembles the expectation step of the EM algorithm, this and the basic algorithm are not
comparable, since the EM iterates under the assumption that the sample size is fixed. This means, as was said, that if the data is
sequentially recorded, the EM algorithm (as well as the direct numerical optimization of the maximum likelihood function) has to be rerun
when the new data is received. In this sense, the iterations of either the EM algorithm or the directML procedures are, in fact, nested in the
sample size loops. This point will be addressed in detail later on.
Remark 2: Equally, in spite of the incomparability between the basic algorithm and the EM, the updating step of the basic algorithm looks
like the maximization step of the EM if the error terms are normally distributed. Otherwise, this resemblance completely disappears.
Remark 3: For a fixed sample size n, the estimates bn and the maximum likelihood estimates differ independently of the error distribution.
In spite of this, if the error distribution is strongly unimodal (thus, not necessarily normal) and we assume some additional weak
conditions, the asymptotic properties of both estimates are similar. Strictly speaking, this fact will be synthesized in Theorem 1.
Let us partition the set of indices In¼ {1,. . .,n} into the two subsets Ign and Iun which correspond to the indices of the grouped and ungrouped
data, respectively, when the sample size is n. Let us assume that X0nXn ¼P
i2In xix0i is a full rank matrix and let us decompose it into the two
summands
X0nXn ¼ X0nXn
� �gþ X0nXn
� �u;
where X0nXn
� �g¼Pi2Ign xix0i and X0nXn
� �uis defined in a similar way.
Environmetrics 2011; 22: 132–151 Copyright � 2009 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics
35
C. RIVERO AND T. VALDES
136
Theorem 1 (bn-asymptotics, as n!1) Let us assume that the error distribution is strongly unimodal. If, for all n, X0nXn
� �uand X0nXn
� �gare positive definite matrices, xnk k � K <1, and the minimum eigenvalue of n�1 X0Xð Þu is greater than l> 0, then bn is a
consistent estimate of the slope parameter b and
ffiffiffinpðbn � bÞ �!D
n!1Nð0;LÞ: (5)
Additionally, the asymptotic covariance matrix L can be consistently estimated by
Ln ¼ n X0nMnXn
� ��1X0nRnXn
� �X0nMnXn
� ��1: (6)
The diagonal matrices Mn ¼ diag mni
� �and Rn ¼ diag rni
� �, both of order n, are, respectively, given by mn
i ¼ 1 if i 2 Iun , otherwise,
mni ¼
@
@aEðs"ijaþ ch�1 < s"i � aþ chÞ
����a¼�x0
ibn
if i 2 Ign and yi 2 ðch�1; ch�; (7)
and
rni � s2 if i 2 Inn ; otherwise;rni ¼ Varð"�i Þ if i 2 Ign and yi 2 ðch�1; chÞ
(8)
where "�i is a discrete random variable which takes the values
E s"ij � x0ibn þ ch�1 < s"i � �x0ibn þ ch
� �with probabilities
Pr �x0ibn þ ch�1 < s"i � �x0ibn þ ch� �
;
for h¼ 1,. . .,r.(Proof: See Rivero and Valdes, 2004: pp. 482–486).
Observations about this theorem:
(1) I
wile
t is clear that
E s"ij � x0ibn þ ch�1 < s"i � �x0ibn þ ch
� �¼
sR �x0ibnþchð Þs�1�x0
ibnþch�1ð Þs�1 xf ðxÞdx
Prob �x0ibn þ ch < s"i � �x0ibn þ ch�1ð Þ (9)
and
Prob �x0ibn þ ch�1 < s"i � �x0ibn þ ch� �
¼Z �x0ib
nþchð Þs�1
�x0ibnþch�1ð Þs�1
f ðxÞdx
¼ F �x0ibn þ ch� �
s�1� �
� F �x0ibn þ ch�1� �
s�1� �
;
(10)
where F denotes the distribution function of the standardized errors.
(2) C
learly@
@as
Z ðaþchÞs�1ðaþch�1Þs�1
xf ðxÞdx !�����
a¼�x0ibn
¼ s�1 �x0ibn þ ch� �
f �x0ibn þ ch� �
s�1� �
� �x0ibn þ ch� �
f �x0ibn þ ch� �
s�1� �� �
and
@
@aProb aþ ch�1 < s"i � aþ chð Þ
����a¼�x0
ibn¼ s�1 f �x0ibn þ ch
� �� f �x0ibn þ ch�1� �� �
Thus, the first derivative involved in (7) admits an explicit expression in terms of the density and distribution functions of the
standardized errors.
(3) I
t follows from (5) and (6) that, if n is sufficiently large, bn approximately follows the multivariate normal distribution N(b,n�1Ln),which allows the use of standard procedures to carry out statistical inferences (in terms of either confidence regions or hypothesis testing)
on the true slope parameter.
4. THE RATIONALE OF THE PROPOSED ALGORITHM: RESULTING LOOPS ANDPROPERTIES
In model (1) with a fixed sample size and grouped and ungrouped data such as was mentioned in Section 1, if the true values of the parameters
b and s were known (which is not the case), the scale parameter could be consistently approximated from the data sample of size n using
yonlinelibrary.com/journal/environmetrics Copyright � 2009 John Wiley & Sons, Ltd. Environmetrics 2011; 22: 132–151
ROBUST ALGORITHM FOR SEQUENTIAL LINEAR ANALYSIS
1
standard variance decomposition techniques by means of
s2nðb; sÞ ¼ s2
wnðb; sÞ þ s2bnðb; sÞ; (11)
where the within and between variances s2wnðb; sÞ and s2
bnðb; sÞ satisfy, respectively,
ns2bnðb; sÞ ¼
Xi2Iun
ðyi � x0ibÞ2 þ
Xi2Ign
Xrh¼1
Iðch�1 < yi � chÞE2ðs"ij � x0ibþ ch�1 < s"i � �x0ibþ chÞ
and
ns2wnðb; sÞ ¼
Xi2Ign
Xrh¼1
Iðch�1 < yi � chÞVarðs"ij � x0ibþ ch�1 < s"i � �x0ibþ ch�1Þ;
with Eðs"ij � x0ibþ ch�1 < s"i � �x0ibþ chÞ computed in similar way to (9) and
Varðs"ij � x0ibþ ch�1 < s"i � �x0ibþ chÞ ¼s2
Z �x0ibþchð Þs�1
�x0ibþch�1ð Þs�1
x2f ðxÞdx
Prob ð�x0ibþ ch�1 < s"i � �x0ibþ chÞ:
�E2 s"ij � x0ibþ ch�1 < s"i � �x0ibþ ch� �
(12)
Briefly, we can write
ns2nðb; sÞ ¼
Xi2Iun
ðyi � x0ibÞ2 þ s2
Xi2Ign
Xrh¼1
Iðch�1 < yi � chÞ
Z �x0ibþchð Þs�1
�x0ibþch�1ð Þs�1
x2f ðxÞdx
Z �x0ibþchð Þs�1
�x0ibþch�1ð Þs�1
f ðxÞdx(13)
Clearly, Eððyi � x0ibÞ2Þ ¼ s2 if i 2 Iun , whereas
EXrh¼1
Iðch�1 < yi � chÞ
Z �x0ibþchð Þs�1
�x0ibþch�1ð Þs�1
x2f ðxÞdx
Z �x0ibþchð Þs�1
�x0ibþch�1ð Þs�1
f ðxÞdx
0BBBB@
1CCCCA ¼ 1
if yi is grouped. Thus, s2nðb; sÞ ! s2 i.e. as n!1; therefore, with probability 1, the following equality holds in the limit
s1ðb; sÞ ¼ s (14)
However, since the true slope and scale parameters are unknown, snðb; sÞ is incomputable. However, these expressions together with the
basic algorithm have induced us to extend expression (11) to any pair of possible values (b�, s�) of the parameters by means of:
ns2nðb�; s�Þ ¼
Xi2Iun
yi � x0ib�� �2 þ s�2
Xi2Ign
Xrh¼1
Iðch�1 < yi � chÞ
Z �x0ib�þchð Þs��1
�x0ibþch�1ð Þs��1
x2f ðxÞdx
Z �x0ib�þchð Þs��1
�x0ib�þch�1ð Þs��1
f ðxÞdx: (15)
As
Xi2Iun
yi � x0ib�� �2 ¼X
i2Iun
yi � x0ib� �2 þ b� � bð Þ0
Xi2Iun
xix0i
" #b� � bð Þ þ oðnÞ a:e:;
reasoning as in (13), it is clear that
s2n b�; s�ð Þ !
n!1s2 þ p0 b� � bð Þ0P b� � bð Þ þ p1 s�2 � s2
� �(16)
with probability 1, where P denotes the limit of the mean product matrix
P ¼ limn!1
n�1u X0nXn
� �uand nu denotes the cardinal of I
un . Taking this into account, we propose the following estimating algorithm which, like the basic algorithm,
starts with an initial sample size n0 and, then, its loops are executed as soon as a new observation is received:
Environmetrics 2011; 22: 132–151 Copyright � 2009 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics
37
C. RIVERO AND T. VALDES
138
Proposed robust estimating algorithm.
Initialization: Let bn0 and sn0 be two arbitrary starting guesses of the slope and scale parameter, respectively. In absence
of any other external information, we suggest the use of the OLS estimates based on the ungrouped data
of the initial sample of size n0.
Iteration: Assuming that bn�1 and sn�1 are known, let us update them through the following steps:
(1) Assuming that the scale parameter agrees with sn�1, let us run one loop of the basic algorithm given
in Section 3 to update bn�1. Therefore, with a notation similar to that used in the basic algorithm:
(a) First, compute the vector of conditional imputations
ynðbn�1; sn�1Þ ¼ ðy1ðbn�1; sn�1Þ; . . . ; ynðbn�1; sn�1ÞÞwhere yi(b
n�1,sn�1) is equal to yi if this data is ungrouped; otherwise, it agrees with
x0ibn�1 þ Eðsn�1"ij � x0ib
n�1 þ ch�1 < sn�1"i � �x0ibn�1 þ chÞ ð17Þif the grouped data yi is within the interval (ch�1, ch].
(b) Secondly, let us use the OLS projections to update the current slope parameter, that is,
bn= X0nXn
� ��1X0ny
n bn�1; sn�1ð Þ ð18Þ(2) Update the scale parameter by
sn ¼ sn bn; sn�1ð Þ ð19Þusing (14).
bn�1 bn, sn�1 sn, and return to step 1.
The point (bn, sn) will define our slope/scale-estimates of model (1) for the sample size n. It is important to observe that, in harmony with
the limit equality (14), this estimate fulfils
s1 ¼ s1 b1; s1ð Þ: (20)
The following remarks synthesize the most important points about the proposed algorithm:
(1) A
wile
fter the enormous number of simulations included in the next section we are in a position to maintain that the proposed algorithm
certainly stabilizes towards a point (b1, s1) which does depend on {(xi, yi)}i=1,2,. . ., although it is independent of both the starting values
bn0 ; sn0ð Þ and the initial sample size n0. This is in spite of the fact that both sequences {bp} and {sp} are inter-linked and they differ when
some of the initial values vary. This will be justified in terms of stochastic convergences below.
(2) T
he algorithm estimate (b1, s1) is, in fact, an M-estimator since it fulfils the implicit relation (16). This means that we can expect itsasymptotic distribution to be a multivariate normal.
(3) T
he consistency of the estimates (bn, sn) can be proven under certain conditions. Although their strict formulations (see Fahrmeir andKufmann, 1985) are far from the scope of the present paper, the following arguments sketch the proof. As before, let (b, s) denote the true
values of the model parameters. From (17) and (16) we can write the equality chain
s12 ¼ s2
1ðb1; s1Þ ¼ p0ðb1 � bÞ0Pðb1 � bÞ þ p0s2 þ p1s
12
:
As the first term does not depend on b1, necessarily (b1�b)0P (b1� b)¼ 0. Thus, if we assume that P is a definite positive matrix,
we can conclude that b1¼b and s1¼ s with probability 1, in complete accordance with our empirical experience commented on in
Point 1.
(4) F
rom the last two points and taking into account the asymptotic distribution given in (4), our proposal is to use the following naturaldistributional approximation
bn � N b; n�1Ln� �
; ð21Þwhere the covariance matrix Ln ¼ ðlnijÞ is
Ln ¼ n X0nMnXn
� ��1X0nRnXn
� �X0nMnXn
� ��1 ð22Þand the diagonal matricesMn ¼ diag mn
i
� �and Rn ¼ diag rni
� �are defined as in (7) and (8), save that bn now stands for the slope estimate
of the proposed algorithm (instead of Section 2’s) and sn substitutes s. Finally, from (20), it is straightforward to make standard
inferences on the true slope parameters. For example, at the 95% level, the approximate confidence interval that is derived from the
proposed algorithm is given by
bnj �
1:96ffiffiffinp lnjj
� �: ð23Þ
Equally, tests of hypothesis that some set of linear combinations of b are all zero are carried out by the following standard procedure
based on assymptotics. Let us suppose that A is the m� pmatrix specifying the p linear combinations of b that are to be tested and that n
is the current sample size. Under the null hypothesisH0: Ab¼ 0 (againstH1: notH0), nb0nA0 ALnA0ð Þ�1Abn approximately distributes as a
yonlinelibrary.com/journal/environmetrics Copyright � 2009 John Wiley & Sons, Ltd. Environmetrics 2011; 22: 132–151
Env
ROBUST ALGORITHM FOR SEQUENTIAL LINEAR ANALYSIS
(central) x2-distribution with p degrees of freedom. If x2pðaÞ is the value such that
Pr x2p > x2
pðaÞn o
¼ a;
then
Pr nb0nA0 ALnA0ð Þ�1Abn > x2pðaÞ
n o:
Thus, to test the null hypothesis, we use as our critical region
Rna ¼ bnjnb0nA0 ALnA0ð Þ�1Abn > x2
pðaÞn o
: ð24Þ
(5) W
ith respect to the partial memory capacity of the proposed estimating algorithm, the following points merit to be commented on:only the individual values (yi, xi) corresponding to the grouped yi -values need to be stored, since for the rest there exist recurrent
formulas by which the update of current estimates can be implemented when a new observation is received. This is a relevant fact, since
the natural opponents of our algorithm are completely memory-less. A detailed analysis of this fact is included in the Annexe at the end of
this paper.
5. SIMULATION STUDY ON THE PERFORMANCE OF THE ALGORITHM
In this section we present a large number of simulations made with the intention of analysing the performance of the proposed algorithm. We
have considered the model
yi ¼ b0 þ b1x1i þ b2x2i þ s"i; (25)
in which (a) we have fixed the slope parameter b¼ (1, �4, 3)0; (b) 200 independent variables xi¼ (xi1, xi2)0, i¼ 1,. . .,200, were selected
from a uniform distribution on the square [�5, 5]2; (c) the values s¼ 1, 2, 4 were assigned to the scale parameter; (d) the errors eiwere generated from the following distributions duly standardized: (i) Laplace(1), that is, with a density function f(x)¼ 2�1exp(�jxj), (ii)Logistic, that is, f(x)¼ exp(�x)(1þexp(�x))�2 and (iii) Standard Normal and (e) we have computed the corresponding dependent variables
yi, after which they were grouped with probabilities p0 equal to 0.2, 0.4 and 0.6 using the grouping intervals
ð�1;�7�; ð�7;�2�; ð�2; 3�; ð3; 6� and ð6;1Þ:
For each combination of the values s and p0 and each standardized error distribution, we have made 250 replications of the data (ei, yi) andof the grouping process. Then, for each replication r, we ran the proposed algorithm from the following starting values: no¼ 25 and b25 equal
to the OLS estimate calculated from the ungrouped data of the initial sample. In this way, we obtained the parameter estimates (bn(r), sn(r))
and the covariance matrix estimate Ln(r) given in (21) for n¼ 25,. . .,200. With these replicated values we calculated:
(1) T
he empirical biases, variances and mean square errors of the estimates of the slope and scale parameters given byB bnj
¼ E bn
j
� bj
��� ��� ¼ 250�1X250r¼1
bnðrÞj � bj
����������;
Var bnj
¼ E bn
j � E bnj
2� �¼ 250�1
X250r¼1
bnðrÞj � E bn
j
2(26)
and
^MSE bnj
¼ E bn
j � bj
2� �¼ Var bn
j
þ B2 bn
j
( j¼ 1, 2, 3), and similarly for sn. With the objective of making comparisons, for each replication we have also computed two types of
ordinary least square parameter estimates. The first was based on the complete data, that is, without being submitted to the grouping process
explained above; these estimates are denoted by bOLS,n and sOLS,n, respectively. For its part, the second type was only based on the
non-grouped data once the grouping process was implemented and the resulting grouped data discharged; the estimates obtained are denoted
by bols,n and sols,n (with the super-index in lower case letters). Also, the empirical biases, variances and mean square errors of the OLS and ols
estimates were calculated as in (26). For the different value combinations of s and p0 mentioned above, we have assumed Laplacian errors in
Table 2a which also shows the empirical biases and mean square errors of the slope and scale parameter estimates based on the complete
sample of size 200 and their sequential precedents corresponding to the sample sizes 50 and 100. The equivalent results for logistic and
normal error are of similar orders to those shown in Table 2a. By way of proof we have partially included them in Table 2b, which has an
easily recognizable structure having seen Table 2a, and shows biases and mean square errors only when p0¼ 0.4 (the authors commit
themselves to send complete results upon request).
On seeing these tables, the following general remarks can be made:
� A
1
lthough the three slope estimates bn, bOLS,n and bols,n are consistent for the error distributions considered, only the first one is
asymptotically unbiased (and normally distributed) independently of the distributions mentioned above. This last property can only be
ironmetrics 2011; 22: 132–151 Copyright � 2009 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics
39
Table 2a. Empirical biases and mean square errors of the model parameter estimates
Laplacian errors
Values Estimates of the model parameters
p1 s n bn0 bn
1 bn2 sn bOLS;n
0 bOLS;n1 bOLS;n
2 sOLS,n bols;n0 bols;n
1 bols;n2 sols,n
Empirical biases
0.2 1 50 �0.016 0.032 �0.005 �0.019 �0.010 0.026 �0.003 �0.017 �0.018 0.040 �0.005 �0.024100 �0.003 0.015 �0.010 �0.012 �0.002 0.014 �0.015 �0.009 �0.006 0.011 �0.005 �0.013200 0.016 �0.006 �0.003 �0.013 0.015 �0.003 �0.009 �0.009 0.012 �0.005 �0.001 �0.015
2 50 0.029 �0.048 0.014 �0.017 0.028 �0.051 0.017 �0.004 0.048 �0.017 �0.013 �0.025100 0.016 0.005 �0.003 �0.005 0.009 0.003 �0.007 0.007 0.027 �0.013 �0.005 �0.006200 0.007 �0.002 0.016 �0.011 0.005 �0.004 0.018 �0.006 �0.001 �0.001 0.015 �0.009
4 50 �0.042 0.049 �0.017 �0.062 �0.059 0.052 �0.002 �0.046 �0.011 0.074 �0.047 �0.059100 0.061 �0.039 �0.007 �0.044 0.067 �0.041 �0.007 �0.034 0.061 �0.050 �0.005 �0.057200 0.033 �0.007 �0.025 �0.055 0.030 �0.005 �0.014 �0.028 0.054 �0.012 �0.031 �0.042
0.4 1 50 0.012 �0.025 �0.003 �0.031 0.016 �0.025 �0.013 �0.026 0.011 �0.020 0.003 �0.034100 �0.002 �0.006 0.012 �0.006 0.002 �0.003 0.004 �0.003 �0.007 �0.001 0.012 �0.001200 0.013 �0.001 �0.017 �0.010 0.017 �0.004 �0.020 �0.003 0.019 �0.003 �0.019 �0.005
2 50 �0.015 �0.020 0.043 �0.046 �0.011 �0.012 0.042 �0.034 �0.012 �0.027 0.025 �0.048100 �0.029 �0.010 0.031 0.006 �0.036 0.003 0.024 0.012 �0.007 �0.026 0.039 0.002
200 0.015 0.001 �0.012 0.003 0.017 �0.005 �0.017 0.013 0.013 0.001 �0.010 0.014
4 50 �0.059 0.064 �0.075 �0.002 �0.049 0.065 �0.074 0.009 0.018 �0.045 �0.055 �0.014100 �0.068 �0.039 0.095 �0.103 �0.067 �0.028 0.106 �0.077 �0.739 �0.049 0.102 �0.089200 �0.052 0.058 0.056 0.003 �0.057 0.050 0.044 �0.018 �0.078 0.064 0.066 0.022
0.6 1 50 �0.015 0.036 0.012 �0.032 �0.015 0.029 0.014 �0.022 �0.016 0.015 0.013 �0.036100 0.003 0.009 �0.010 �0.006 0.007 0.003 �0.009 �0.002 �0.014 0.017 0.014 0.005
200 0.011 �0.002 �0.007 �0.020 0.014 �0.002 �0.005 �0.002 0.025 �0.011 �0.018 �0.0142 50 �0.079 0.060 0.035 �0.022 �0.098 0.066 0.032 �0.016 �0.058 0.068 �0.014 �0.057
100 �0.013 �0.002 0.012 �0.029 �0.013 0.001 0.012 �0.016 �0.002 �0.004 �0.006 �0.043200 0.016 �0.010 0.002 0.017 0.015 �0.009 �0.006 0.033 �0.007 0.022 0.001 0.028
4 50 0.017 0.005 �0.034 �0.011 �0.005 0.013 �0.015 �0.010 �0.017 �0.011 �0.110 �0.057100 �0.039 �0.016 0.047 �0.065 �0.031 �0.014 0.048 �0.003 0.026 �0.091 0.127 �0.086200 �0.009 �0.007 0.017 �0.064 �0.010 �0.009 0.020 �0.032 �0.052 0.030 0.036 �0.067
Empirical mean square errors
0.2 1 50 0.040 0.028 0.033 0.028 0.035 0.024 0.030 0.025 0.049 0.033 0.039 0.031
100 0.028 0.018 0.015 0.013 0.024 0.016 0.013 0.013 0.029 0.019 0.017 0.015
200 0.011 0.009 0.009 0.007 0.010 0.007 0.007 0.007 0.012 0.010 0.011 0.009
2 50 0.161 0.132 0.118 0.102 0.157 0.129 0.112 0.097 0.196 0.159 0.150 0.128
100 0.081 0.058 0.057 0.049 0.075 0.056 0.052 0.043 0.098 0.063 0.068 0.062
200 0.040 0.033 0.030 0.029 0.036 0.032 0.029 0.028 0.046 0.034 0.036 0.035
4 50 0.582 0.399 0.448 0.525 0.565 0.393 0.429 0.519 0.714 0.529 0.555 0.622
100 0.337 0.265 0.255 0.226 0.338 0.265 0.257 0.240 0.399 0.314 0.315 0.288
200 0.135 0.116 0.122 0.106 0.137 0.118 0.120 0.098 0.165 0.154 0.142 0.141
0.4 1 50 0.041 0.039 0.037 0.032 0.033 0.032 0.028 0.026 0.059 0.048 0.049 0.044
100 0.019 0.019 0.017 0.017 0.016 0.016 0.014 0.014 0.029 0.029 0.024 0.024
200 0.010 0.010 0.009 0.009 0.007 0.007 0.009 0.006 0.013 0.012 0.012 0.013
2 50 0.205 0.144 0.154 0.101 0.186 0.121 0.140 0.086 0.347 0.255 0.224 0.142
100 0.077 0.065 0.077 0.062 0.067 0.057 0.068 0.054 0.109 0.095 0.126 0.087
200 0.047 0.033 0.034 0.027 0.041 0.030 0.031 0.026 0.070 0.048 0.050 0.044
4 50 0.678 0.552 0.429 0.446 0.605 0.531 0.407 0.443 1.115 0.820 0.744 0.733
100 0.302 0.208 0.217 0.213 0.312 0.201 0.234 0.205 0.548 0.326 0.388 0.304
200 0.156 0.140 0.124 0.109 0.150 0.128 0.119 0.097 0.286 0.230 0.184 0.187
0.6 1 50 0.054 0.044 0.041 0.035 0.036 0.028 0.031 0.022 0.112 0.098 0.078 0.068
100 0.024 0.020 0.024 0.019 0.016 0.014 0.017 0.014 0.049 0.040 0.043 0.037
200 0.013 0.013 0.009 0.011 0.009 0.007 0.006 0.007 0.027 0.020 0.017 0.017
2 50 0.183 0.152 0.151 0.133 0.171 0.122 0.124 0.118 0.379 0.302 0.307 0.264
100 0.076 0.059 0.073 0.064 0.065 0.051 0.065 0.051 0.197 0.148 0.161 0.125(Continues)
wileyonlinelibrary.com/journal/environmetrics Copyright � 2009 John Wiley & Sons, Ltd. Environmetrics 2011; 22: 132–151
C. RIVERO AND T. VALDES
140
Table 2a. (Continued)
Laplacian errors
Values Estimates of the model parameters
p1 s n bn0 bn
1 bn2 sn b
OLS;n0 b
OLS;n1 b
OLS;n2 sOLS,n b
ols;n0 b
ols;n1 b
ols;n2 sols,n
200 0.046 0.031 0.042 0.029 0.039 0.029 0.034 0.024 0.097 0.063 0.080 0.060
4 50 0.623 0.465 0.492 0.541 0.625 0.438 0.493 0.494 1.435 1.310 1.263 1.238
100 0.315 0.227 0.212 0.229 0.296 0.236 0.220 0.216 0.722 0.516 0.621 0.471
200 0.165 0.107 0.114 0.112 0.159 0.098 0.109 0.090 0.350 0.292 0.292 0.260
En
ROBUST ALGORITHM FOR SEQUENTIAL LINEAR ANALYSIS
assured for bOLS,n and bols,n if the error distribution is normal, in which case these estimates agree with the maximum likelihood estimates
based on the complete data and on the non-discharged data, respectively. Despite the theoretical basis mentioned above, the similarity
between the biases when the errors follow a Laplace distribution and those obtained with the other error distributions, points to the fact that
the asymptotic unbiasedness of the slope estimates when the error distribution is normal, seems to be extendable to the rest of the error
distributions considered in our simulation study.
� A
lso the mean square errors of the slope estimates are of similar orders independently of the error distributions considered. For all of these,our proposed slope estimates have a similar efficiency to the OLS-estimates. This means that the proposed algorithm is useful to avoid the
negative consequences that are derived from the existence of grouped or missing data. In all of the cases, the empirical mean square errors
increase as the proportion of grouped data grows and as the sample size decreases, in agreement with what could be expected in advance.
� T
he former conclusions (in terms of biases and MSE’s) about the slope parameter estimates can be straightforwardly extended to the scaleparameter estimate.
(2) T
v
he empirical covariance matrices of the slope parameter estimates bn, bOLS,n and bols,n. These are denoted by Gn, GOLS,n and Gols,n,
respectively, and in harmony with (22) they were computed by
Gn ¼ 250�1X250r¼1ðbnðrÞ � EðbnÞÞðbnðrÞ � EðbnÞÞ0 (27)
and similarly for GOLS,n and Gols,n. For each replication r, the empirical matrix Gn should be compared with n�1LnðrÞ, since LnðrÞ
approximates the asymptotic covariance matrix offfiffiffinpðbn � bÞ, as was indicated. For its part, the comparison between the mean
matrix
G�n ¼ n�1E Lnð Þ ¼ n�1250�1X250r¼1
LnðrÞ (28)
and Gn will allow us to evaluate the biases of the different elements of the covariance matrix estimate of the slope parameter
estimates obtained with the proposed algorithm. Table 3a includes the distinct elements of the matrices Gn, G�n, GOLS,n and Gols,n, respectively,
when the errors are Laplacian.With logistic and normal errors the equivalent former matrices are rather similar. Again, these have been partly
included in Table 3b (which is similar to Table 2b) as proof of it.
Briefly speaking, it can be stressed that the matrices Gn, G�n and G
OLS,n are quite similar, independent of the values of the percentage
of grouped data, the true scale parameter, the sample size and, also, independent of the error distributions considered in this study.
This fact seems to indicate that the proposed algorithm tends to eliminate the negative consequences that spring from the existence of
grouped or missing data. Additionally, as could be foreseen, the empirical variances of the matrices Gols,n, which are based on the
non-grouped data only, are larger than those of the former matrices Gn, G�n and G
OLS,n, and the observed differences increase as the
percentage of grouped data and the true scale parameter increase and, also, as the sample size decreases.
(3) E
mpirical efficiencies of the algorithm confidence interval estimates. At the 95% level, the coverage probability of the confidenceintervals given in (23) can be empirically assessed by the expression:
1
CðbjÞ ¼ 250�1X250r¼1
I bnðrÞj � 1:96ffiffiffi
np lnjj � bj � b
nðrÞj þ 1:96ffiffiffi
np lnjj
� �: (29)
These empirical coverage probabilities are included in Table 4 for the three error distributions simulated in this study.
As can be seen, the orders of these empirical coverage probabilities are rather similar for the error distributions mentioned above. In all of
the cases the more sensitive element seems to be the true scale parameter s and, in second place, the percentage of grouped data.
Nevertheless, as these probabilities are close to 0.95, it cannot be again said that the large empirical efficiency of the proposed algorithm
confidence interval estimates is encouraging.
ironmetrics 2011; 22: 132–151 Copyright � 2009 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics
41
Table 2b. Empirical biases and mean square errors of the model parameter estimates
Logistic errors
Values Estimates of the model parameters
p1 s n bn0 bn1 bn2 sn bOLS;n0 bOLS;n1 bOLS;n2 sOLS,n bols;n0 bols;n1 bols;n2 sols,n
Empirical biases
0.4 1 50 0.026 �0.013 �0.018 �0.027 0.025 �0.005 �0.026 �0.024 0.019 0.002 �0.006 �0.036100 0.002 0.005 0.000 �0.004 0.000 0.009 0.001 �0.006 �0.007 0.009 0.002 �0.001200 �0.012 0.010 0.000 �0.018 �0.012 0.010 0.003 �0.011 �0.003 0.006 �0.012 �0.018
2 50 0.022 �0.007 �0.002 �0.020 0.024 0.002 �0.002 �0.006 0.080 �0.027 �0.036 �0.024100 �0.036 �0.003 0.025 0.004 �0.046 0.004 0.020 0.007 �0.003 �0.026 0.020 0.001
200 0.009 �0.005 0.009 �0.036 0.010 �0.010 0.003 �0.030 0.006 �0.002 0.014 �0.0224 50 �0.063 0.075 �0.005 0.032 �0.070 0.093 �0.041 0.010 �0.113 0.055 0.050 0.006
100 �0.072 �0.039 0.067 �0.057 �0.085 �0.028 0.075 �0.046 �0.120 0.007 0.087 �0.063200 �0.031 0.029 0.012 �0.046 �0.035 0.036 0.004 �0.049 �0.039 0.036 0.003 �0.032
Empirical mean square errors
0.4 1 50 0.047 0.034 0.042 0.021 0.035 0.029 0.029 0.016 0.064 0.050 0.054 0.027
100 0.021 0.019 0.016 0.011 0.018 0.017 0.014 0.010 0.028 0.027 0.024 0.016
200 0.011 0.010 0.010 0.005 0.009 0.007 0.007 0.004 0.015 0.012 0.014 0.006
2 50 0.181 0.134 0.129 0.074 0.164 0.120 0.110 0.072 0.310 0.239 0.196 0.109
100 0.070 0.054 0.074 0.042 0.060 0.045 0.065 0.039 0.107 0.092 0.106 0.058
200 0.042 0.040 0.031 0.019 0.039 0.036 0.028 0.017 0.065 0.059 0.040 0.026
4 50 0.627 0.520 0.479 0.309 0.589 0.513 0.453 0.287 1.044 0.873 0.811 0.570
100 0.326 0.250 0.218 0.143 0.327 0.244 0.217 0.131 0.620 0.377 0.403 0.184
200 0.122 0.118 0.113 0.076 0.121 0.116 0.110 0.068 0.204 0.205 0.202 0.120
Standard normal errors
Empirical biases
0.4 1 50 0.050 0.037 0.045 0.023 0.038 0.031 0.031 0.017 0.069 0.054 0.057 0.029
100 0.023 0.021 0.017 0.011 0.019 0.018 0.015 0.010 0.030 0.029 0.025 0.017
200 0.011 0.010 0.010 0.006 0.009 0.008 0.008 0.005 0.016 0.013 0.015 0.007
2 50 0.193 0.143 0.139 0.079 0.175 0.128 0.118 0.077 0.332 0.255 0.210 0.117
100 0.074 0.057 0.079 0.045 0.064 0.048 0.070 0.041 0.114 0.098 0.113 0.062
200 0.045 0.042 0.033 0.021 0.041 0.039 0.030 0.018 0.070 0.063 0.042 0.027
4 50 0.671 0.556 0.513 0.331 0.630 0.548 0.484 0.307 1.117 0.934 0.868 0.610
100 0.349 0.268 0.234 0.153 0.350 0.261 0.232 0.140 0.663 0.403 0.432 0.197
200 0.131 0.126 0.121 0.081 0.129 0.124 0.118 0.073 0.219 0.220 0.216 0.128
Empirical mean square errors
0.4 1 50 0.048 0.032 0.043 0.016 0.040 0.025 0.030 0.012 0.059 0.048 0.058 0.021
100 0.029 0.022 0.017 0.009 0.022 0.017 0.014 0.006 0.041 0.027 0.024 0.010
200 0.011 0.010 0.009 0.004 0.009 0.007 0.006 0.003 0.014 0.012 0.012 0.005
2 50 0.165 0.113 0.127 0.059 0.150 0.103 0.108 0.050 0.266 0.190 0.225 0.077
100 0.074 0.057 0.072 0.031 0.068 0.055 0.064 0.028 0.096 0.091 0.101 0.037
200 0.040 0.034 0.029 0.014 0.034 0.031 0.026 0.012 0.063 0.055 0.044 0.020
4 50 0.534 0.563 0.432 0.250 0.522 0.533 0.422 0.230 0.977 0.910 0.749 0.365
100 0.274 0.228 0.200 0.087 0.269 0.225 0.198 0.075 0.486 0.392 0.319 0.155
200 0.150 0.141 0.118 0.059 0.147 0.140 0.116 0.048 0.233 0.197 0.219 0.068
C. RIVERO AND T. VALDES
142
6. CASE STUDY: THE PROPOSED ALGORITHM VERSUS ITS ALTERNATIVES
With the real life data of Table 1, we have used the proposed algorithm to estimate, first, the parameters b¼ (mA, mT, g) and s of model
(3) and, secondly, the parameters h, Di, Pj, DPij and s of the ANOVA model (4), among other models which were not included in this
paper. Although the final analyses should refer to the complete data, we were required to have updated forecasts which could be supplied
on request. The partial analyses were initiated as soon as the current sample size was n0¼ 16 (the first one third of the complete
sample received). In this section we will show the results obtained with the sample sizes 16, 32 and 48 (one, two and three thirds of the total
dataset).
With respect to model (3), Table 5 exhibits the sequences ðmnA;m
nT ; g
n; snÞ, for n¼ 16,. . .,48, generated by the proposed algorithm from
three different starting points (b16, s16). The first of them, (123.919, 117.788,�74.415, 14.788), agrees with the OLS estimates based on the
wileyonlinelibrary.com/journal/environmetrics Copyright � 2009 John Wiley & Sons, Ltd. Environmetrics 2011; 22: 132–151
Table 3a. Covariance matrices G�n, Gn, GOLS,n and Gols,n given in Section 5
Laplacian errors
Values G�n Gn
p1 s n g00 g11 g22 g01 g02 g12 g00 g11 g22 g01 g02 g12
0.2 1 50 0.040 0.033 0.033 �0.016 �0.016 0.000 0.039 0.027 0.033 �0.012 �0.017 0.000
100 0.020 0.016 0.016 �0.008 �0.008 0.000 0.028 0.017 0.015 �0.011 �0.010 0.000
200 0.010 0.007 0.007 �0.004 �0.004 0.000 0.010 0.008 0.008 �0.003 �0.005 0.000
2 50 0.149 0.125 0.121 �0.063 �0.057 0.000 0.159 0.129 0.117 �0.047 �0.065 �0.008100 0.073 0.059 0.059 �0.030 �0.029 0.000 0.081 0.058 0.056 �0.023 �0.025 �0.008200 0.036 0.030 0.029 �0.015 �0.014 0.000 0.039 0.033 0.030 �0.018 �0.017 0.002
4 50 0.568 0.463 0.463 �0.233 �0.225 �0.001 0.578 0.395 0.445 �0.193 �0.248 0.017
100 0.271 0.218 0.220 �0.109 �0.108 0.000 0.332 0.263 0.253 �0.121 �0.164 �0.005200 0.134 0.108 0.108 �0.055 �0.053 0.000 0.134 0.116 0.121 �0.048 �0.048 �0.010
0.4 1 50 0.045 0.037 0.036 �0.018 �0.017 0.000 0.040 0.037 0.037 �0.015 �0.018 0.000
100 0.022 0.019 0.018 �0.010 �0.008 0.000 0.019 0.019 0.017 �0.008 �0.007 0.000
200 0.011 0.008 0.008 �0.004 �0.004 0.000 0.008 0.010 0.008 �0.004 �0.004 0.001
2 50 0.148 0.122 0.123 �0.061 �0.057 0.000 0.204 0.143 0.152 �0.091 �0.085 0.007
100 0.075 0.061 0.061 �0.032 �0.029 0.000 0.076 0.066 0.076 �0.023 �0.029 �0.012200 0.037 0.031 0.030 �0.015 �0.015 0.000 0.047 0.033 0.034 �0.019 �0.018 �0.001
4 50 0.535 0.445 0.446 �0.215 �0.212 �0.011 0.672 0.546 0.422 �0.319 �0.173 �0.034100 0.250 0.205 0.204 �0.105 �0.098 0.002 0.288 0.206 0.208 �0.092 �0.122 0.005
200 0.127 0.105 0.105 �0.052 �0.050 �0.001 0.153 0.136 0.121 �0.081 �0.064 0.007
0.6 1 50 0.052 0.043 0.042 �0.022 �0.020 0.001 0.053 0.042 0.040 �0.018 �0.020 �0.001100 0.025 0.021 0.021 �0.011 �0.010 0.000 0.023 0.020 0.023 �0.012 �0.012 0.002
200 0.012 0.011 0.010 �0.005 �0.004 0.000 0.013 0.013 0.008 �0.005 �0.004 �0.0012 50 0.154 0.128 0.127 �0.064 �0.058 �0.001 0.176 0.147 0.148 �0.070 �0.092 0.004
100 0.075 0.061 0.061 �0.032 �0.029 0.000 0.075 0.058 0.072 �0.028 �0.039 0.002
200 0.037 0.031 0.031 �0.016 �0.014 0.000 0.046 0.031 0.041 �0.018 �0.023 0.004
4 50 0.505 0.417 0.421 �0.210 �0.195 �0.003 0.620 0.463 0.489 �0.232 �0.228 �0.024100 0.232 0.192 0.193 �0.096 �0.088 �0.001 0.312 0.226 0.209 �0.128 �0.103 0.002
200 0.116 0.094 0.094 �0.048 �0.043 �0.001 0.164 0.107 0.113 �0.056 �0.061 �0.004
Values GOLS,n Gols,n
p1 s n g00 g11 g22 g01 g02 g12 g00 g11 g22 g01 g02 g12
0.2 1 50 0.035 0.022 0.030 �0.012 �0.015 0.000 0.049 0.032 0.038 �0.016 �0.022 0.003
100 0.023 0.016 0.013 �0.008 �0.008 0.000 0.029 0.019 0.017 �0.012 �0.011 0.001
200 0.008 0.007 0.007 �0.003 �0.004 0.000 0.012 0.010 0.011 �0.004 �0.006 0.001
2 50 0.156 0.127 0.111 �0.049 �0.063 �0.004 0.193 0.158 0.149 �0.064 �0.082 �0.005100 0.074 0.055 0.052 �0.024 �0.023 �0.007 0.098 0.063 0.068 �0.027 �0.032 �0.007200 0.037 0.032 0.029 �0.017 �0.017 0.002 0.046 0.034 0.036 �0.019 �0.021 0.002
4 50 0.559 0.388 0.427 �0.184 �0.223 0.011 0.710 0.520 0.551 �0.236 �0.292 0.016
100 0.332 0.263 0.255 �0.128 �0.161 �0.008 0.394 0.310 0.314 �0.134 �0.198 �0.013200 0.136 0.118 0.120 �0.049 �0.049 �0.007 0.161 0.154 0.140 �0.059 �0.054 �0.019
0.4 1 50 0.033 0.032 0.028 �0.013 �0.015 0.001 0.059 0.048 0.049 �0.023 �0.025 0.002
100 0.016 0.016 0.014 �0.008 �0.005 0.000 0.029 0.029 0.022 �0.015 �0.010 0.000
200 0.007 0.007 0.007 �0.003 �0.003 0.000 0.013 0.012 0.012 �0.004 �0.006 0.000
2 50 0.184 0.121 0.138 �0.075 �0.080 0.004 0.345 0.252 0.222 �0.166 �0.131 0.020
100 0.066 0.056 0.067 �0.018 �0.027 �0.012 0.109 0.094 0.124 �0.033 �0.056 �0.007200 0.040 0.031 0.031 �0.016 �0.015 �0.002 0.069 0.048 0.051 �0.029 �0.023 �0.001
4 50 0.599 0.524 0.400 �0.280 �0.144 �0.040 1.110 0.814 0.737 �0.404 �0.401 �0.032100 0.296 0.199 0.224 �0.092 �0.130 0.007 0.530 0.323 0.376 �0.163 �0.209 �0.002200 0.145 0.126 0.117 �0.074 �0.058 0.002 0.279 0.226 0.179 �0.139 �0.100 0.013
0.6 1 50 0.036 0.027 0.031 �0.013 �0.017 0.001 0.111 0.098 0.077 �0.052 �0.033 �0.002100 0.016 0.014 0.017 �0.007 �0.008 0.000 0.049 0.039 0.042 �0.020 �0.021 0.000
200 0.007 0.007 0.006 �0.003 �0.002 0.000 0.025 0.020 0.017 �0.012 �0.008 �0.0012 50 0.161 0.118 0.122 �0.061 �0.085 0.003 0.374 0.296 0.305 �0.144 �0.167 0.006
100 0.065 0.051 0.065 �0.021 �0.037 0.001 0.195 0.147 0.160 �0.067 �0.101 �0.001(Continues)
Environmetrics 2011; 22: 132–151 Copyright � 2009 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics
ROBUST ALGORITHM FOR SEQUENTIAL LINEAR ANALYSIS
143
Table 3a. (Continued)
Laplacian errors
Values G�n Gn
p1 s n g00 g11 g22 g01 g02 g12 g00 g11 g22 g01 g02 g12
200 0.038 0.029 0.034 �0.016 �0.020 0.004 0.096 0.063 0.080 �0.036 �0.046 0.008
4 50 0.622 0.436 0.491 �0.234 �0.234 �0.005 1.429 1.304 1.247 �0.656 �0.358 �0.148100 0.295 0.235 0.217 �0.125 �0.104 0.002 0.719 0.506 0.603 �0.255 �0.316 0.007
200 0.159 0.099 0.108 �0.058 �0.057 0.001 0.346 0.290 0.289 �0.143 �0.138 �0.013
Table 3b. Covariance matrices G�n, Gn, GOLS,n and Gols,n given in Section 5
Values G�n Gn
p1 s n g00 g11 g22 g01 g02 g12 g00 g11 g22 g01 g02 g12
Logistic errors
0.4 1 50 0.043 0.036 0.037 �0.018 �0.017 0.000 0.047 0.034 0.041 �0.019 �0.015 �0.001100 0.022 0.018 0.018 �0.010 �0.008 0.000 0.021 0.019 0.016 �0.010 �0.008 0.001
200 0.011 0.008 0.008 �0.004 �0.004 0.000 0.011 0.010 0.010 �0.004 �0.004 0.000
2 50 0.149 0.124 0.127 �0.063 �0.056 �0.001 0.179 0.133 0.129 �0.081 �0.061 0.004
100 0.074 0.060 0.060 �0.031 �0.028 0.000 0.068 0.053 0.072 �0.022 �0.031 �0.005200 0.035 0.029 0.029 �0.015 �0.014 0.000 0.042 0.040 0.031 �0.020 �0.016 �0.002
4 50 0.545 0.448 0.469 �0.219 �0.215 �0.008 0.621 0.512 0.477 �0.305 �0.211 0.001
100 0.257 0.209 0.213 �0.106 �0.101 �0.001 0.320 0.248 0.213 �0.126 �0.122 �0.002200 0.126 0.103 0.104 �0.053 �0.048 �0.001 0.121 0.117 0.112 �0.050 �0.042 �0.014
Standard normal errors
0.4 1 50 0.045 0.037 0.037 �0.018 �0.018 0.000 0.048 0.032 0.042 �0.019 �0.020 0.003
100 0.022 0.018 0.018 �0.010 �0.008 0.000 0.029 0.022 0.016 �0.012 �0.010 �0.001200 0.011 0.008 0.008 �0.004 �0.004 0.000 0.011 0.008 0.008 �0.005 �0.004 0.001
2 50 0.155 0.129 0.130 �0.065 �0.058 �0.002 0.164 0.112 0.126 �0.068 �0.077 0.013
100 0.074 0.060 0.061 �0.031 �0.029 �0.001 0.074 0.056 0.071 �0.027 �0.038 0.005
200 0.036 0.030 0.030 �0.015 �0.014 0.000 0.039 0.035 0.029 �0.018 �0.014 0.000
4 50 0.538 0.441 0.452 �0.228 �0.202 �0.013 0.527 0.561 0.430 �0.250 �0.129 �0.076100 0.270 0.222 0.223 �0.113 �0.104 �0.001 0.262 0.227 0.199 �0.087 �0.090 �0.013200 0.133 0.109 0.111 �0.055 �0.050 �0.001 0.147 0.139 0.117 �0.068 �0.051 �0.004
Logistic errors
0.4 1 50 0.034 0.029 0.029 �0.014 �0.011 �0.002 0.064 0.051 0.054 �0.029 �0.021 �0.001100 0.018 0.017 0.014 �0.008 �0.007 0.001 0.028 0.027 0.023 �0.014 �0.008 0.001
200 0.008 0.007 0.007 �0.004 �0.003 0.000 0.015 0.012 0.013 �0.005 �0.006 0.000
2 50 0.162 0.120 0.110 �0.074 �0.051 0.004 0.303 0.237 0.194 �0.123 �0.101 0.011
100 0.057 0.045 0.065 �0.017 �0.028 �0.006 0.107 0.090 0.105 �0.039 �0.043 �0.013200 0.038 0.036 0.028 �0.018 �0.016 �0.001 0.065 0.058 0.039 �0.027 �0.023 �0.005
4 50 0.582 0.502 0.449 �0.279 �0.205 0.000 1.028 0.866 0.806 �0.401 �0.393 �0.041100 0.319 0.243 0.211 �0.118 �0.125 0.001 0.604 0.374 0.394 �0.226 �0.216 0.011
200 0.120 0.113 0.109 �0.050 �0.045 �0.010 0.201 0.204 0.201 �0.071 �0.077 �0.022Standard normal errors
0.4 1 50 0.039 0.024 0.030 �0.014 �0.018 0.002 0.058 0.048 0.057 �0.023 �0.028 0.004
100 0.022 0.017 0.014 �0.008 �0.007 �0.001 0.040 0.027 0.023 �0.015 �0.016 �0.002200 0.008 0.007 0.006 �0.004 �0.003 0.001 0.014 0.012 0.012 �0.007 �0.005 0.001
2 50 0.148 0.101 0.107 �0.060 �0.065 0.010 0.265 0.189 0.224 �0.127 �0.118 0.022
100 0.068 0.054 0.064 �0.027 �0.033 0.004 0.096 0.090 0.100 �0.032 �0.050 �0.007200 0.034 0.031 0.025 �0.015 �0.012 0.001 0.063 0.054 0.043 �0.031 �0.024 0.004
4 50 0.515 0.530 0.420 �0.248 �0.152 �0.042 0.970 0.904 0.745 �0.393 �0.287 �0.060100 0.259 0.224 0.197 �0.082 �0.086 �0.023 0.475 0.389 0.318 �0.179 �0.156 0.001
200 0.145 0.138 0.114 �0.071 �0.047 �0.006 0.231 0.196 0.216 �0.096 �0.098 0.002
GOLS,n Gols,n
wileyonlinelibrary.com/journal/environmetrics Copyright � 2009 John Wiley & Sons, Ltd. Environmetrics 2011; 22: 132–151
C. RIVERO AND T. VALDES
144
Table 4. Empirical coverage probabilities of the confidence intervals at the level of 95%
ValuesCoverage probabilities of the algorithm confidence intervals of the slope estimates
Laplacian errors Logistic errors Standard normal errors
p0 s n bn0 bn
1 bn2 bn
0 bn1 bn2 bn
0 bn1 bn
2
0.2 1 50 0.960 0.980 0.954 0.965 0.965 0.949 0.934 0.939 0.954
100 0.919 0.949 0.965 0.919 0.934 0.939 0.970 0.960 0.960
200 0.970 0.954 0.939 0.960 0.965 0.965 0.929 0.954 0.980
2 50 0.960 0.960 0.975 0.975 0.939 0.954 0.954 0.960 0.960
100 0.954 0.965 0.970 0.949 0.970 0.944 0.954 0.939 0.975
200 0.949 0.944 0.965 0.954 0.965 0.954 0.929 0.970 0.944
4 50 0.960 0.990 0.944 0.949 0.924 0.965 0.934 0.965 0.929
100 0.929 0.929 0.919 0.934 0.944 0.919 0.949 0.944 0.924
200 0.985 0.975 0.949 0.929 0.949 0.954 0.975 0.954 0.954
0.4 1 50 0.980 0.960 0.954 0.949 0.965 0.939 0.934 0.980 0.939
100 0.960 0.965 0.980 0.949 0.970 0.970 0.919 0.939 0.975
200 0.965 0.954 0.970 0.960 0.944 0.944 0.970 0.949 0.954
2 50 0.924 0.929 0.929 0.909 0.934 0.965 0.949 0.965 0.970
100 0.960 0.934 0.914 0.980 0.970 0.939 0.939 0.975 0.934
200 0.919 0.949 0.939 0.929 0.919 0.960 0.960 0.939 0.949
4 50 0.924 0.929 0.949 0.934 0.960 0.944 0.944 0.939 0.970
100 0.934 0.960 0.934 0.954 0.929 0.934 0.939 0.975 0.975
200 0.909 0.914 0.944 0.975 0.949 0.954 0.965 0.899 0.960
0.6 1 50 0.949 0.934 0.965 0.960 0.944 0.965 0.944 0.944 0.954
100 0.954 0.970 0.934 0.975 0.975 0.924 0.980 0.944 0.985
200 0.960 0.929 0.970 0.954 0.949 0.960 0.975 0.939 0.949
2 50 0.909 0.939 0.939 0.929 0.944 0.934 0.929 0.975 0.944
100 0.954 0.975 0.965 0.975 0.944 0.919 0.960 0.944 0.960
200 0.904 0.960 0.919 0.919 0.944 0.949 0.939 0.965 0.909
4 50 0.934 0.944 0.914 0.919 0.929 0.949 0.939 0.985 0.904
100 0.914 0.944 0.960 0.894 0.944 0.939 0.894 0.939 0.939
200 0.889 0.949 0.939 0.924 0.949 0.929 0.914 0.954 0.929
ROBUST ALGORITHM FOR SEQUENTIAL LINEAR ANALYSIS
1
nine ungrouped data that form part of the initial sub-sample of size 16, as suggested in the description of the algorithm. For their part, the
remaining initial points were chosen rather separated from the first.
Table 5 shows, as was indicated, that all of the sequences mentioned above quickly stabilize towards a limit point which is independent of
the algorithm initial values. It can be seen that the (mA, mT, g , s)-estimate based on the complete 48 observations (grouped and ungrouped)
was
m48A ;m48
T ; g48; s48� �
¼ 128:37; 120:66;�89:72; 16:52ð Þ:
Also, we have used (22) to compute
48�1L48 ¼21:33 11:56 �19:84
19:86 �19:2332:97
0@
1A;
the approximate covariance matrix of ðm48A ;m48
T ; g48Þ given in (21). Finally, we used (23) to calculate the 95%-confidence intervals formA,mT
and g , which resulted to be
½119:32; 137:42�; ½111:93; 129:40� and ½�100:97;�78:46�;respectively, from where individual hypothesis tests on these parameters follow straightaway. For its part, to test null linear hypotheses of the
formH0: Ab¼ 0 at a given a-level, it is sufficient to use the critical region shown in (24). In particular, the null hypothesisH0:mA¼mT tallies
the matrix A¼ (1, �1, 0) of rank one and, at the level of 95%, the critical region (24) based on the complete sample agrees with
R480:05 ¼ b48j48 m48
A � m48T
� �l4811 þ l4822 � 2l4812� ��1
m48A � m48
T
� �> 3:84
n o;
since x21ð0:05Þ ¼ 3:84; therefore, H0 is accepted at the level cited above (the value of the statistic within R48
0:05 being 3.29). The similar
analysis based on the suggested initial guess (b 16, s16)¼ (123.919, 117.788, �74.415, 14.788) and the sample 32 yield the following
Environmetrics 2011; 22: 132–151 Copyright � 2009 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics
45
Table
5.Sequencesðm
n A;m
n T;g
n;s
nÞg
enerated
bytheproposedalgorithm
from
differentstartingpoints
andn0¼16(w
ithdataofTable
1)
Sam
ple
size
nInicialvalues
m16 A,m16
T,g16ands16
123.919
117.885
�74.415
14.788
00
01
�40
40
�40
320
17
127.218
121.929
�87.777
17.430
125.234
120.210
�88.706
92.936
148.342
171.933
�127.502
162.072
18
126.095
119.416
�85.897
16.247
130.550
129.833
�94.277
30.004
135.613
141.413
�103.272
50.612
19
128.213
118.909
�89.593
16.462
129.259
121.052
�91.740
18.974
130.911
124.570
�94.831
23.936
20
127.888
120.611
�89.027
16.551
128.073
120.963
�89.435
16.783
128.424
121.646
�90.175
17.365
21
130.146
121.604
�90.928
16.532
130.190
121.648
�90.989
16.563
130.300
121.759
�91.141
16.649
22
127.100
120.275
�88.375
16.703
127.106
120.281
�88.383
16.710
127.123
120.298
�88.406
16.730
23
124.939
119.353
�86.584
16.613
124.940
119.354
�86.585
16.614
124.943
119.358
�86.590
16.618
24
124.768
120.529
�86.211
16.489
124.768
120.529
�86.212
16.490
124.768
120.530
�86.212
16.490
25
124.766
120.160
�85.523
16.073
124.766
120.160
�85.523
16.073
124.766
120.160
�85.523
16.073
26
124.704
120.109
�85.468
15.642
124.704
120.109
�85.468
15.642
124.704
120.109
�85.468
15.642
27
124.739
120.007
�85.564
15.244
124.739
120.007
�85.564
15.244
124.739
120.007
�85.564
15.244
28
125.931
119.529
�87.846
15.673
125.931
119.529
�87.846
15.673
125.931
119.529
�87.846
15.673
29
126.792
119.462
�87.711
15.571
126.792
119.462
�87.711
15.571
126.792
119.462
�87.711
15.571
30
126.777
120.672
�87.689
15.607
126.777
120.672
�87.689
15.607
126.777
120.672
�87.689
15.607
31
127.330
121.886
�88.724
15.547
127.330
121.886
�88.724
15.547
127.330
121.886
�88.724
15.547
32
126.606
120.300
�87.371
15.481
126.606
120.300
�87.371
15.481
126.606
120.300
�87.371
15.481
33
127.498
120.645
�88.070
15.390
127.498
120.645
�88.070
15.390
127.498
120.645
�88.070
15.390
34
128.267
120.490
�87.785
15.374
128.267
120.490
�87.785
15.374
128.267
120.490
�87.785
15.374
35
129.149
120.322
�87.479
15.469
129.149
120.322
�87.479
15.469
129.149
120.322
�87.479
15.469
36
129.754
120.265
�88.591
15.367
129.754
120.265
�88.591
15.367
129.754
120.265
�88.591
15.367
37
130.121
120.187
�89.280
15.183
130.121
120.187
�89.280
15.183
130.121
120.187
�89.280
15.183
38
129.431
118.646
�87.946
15.147
129.431
118.646
�87.946
15.147
129.431
118.646
�87.946
15.147
39
127.735
117.848
�86.544
15.224
127.735
117.848
�86.544
15.224
127.735
117.848
�86.544
15.224
40
127.512
119.177
�88.897
15.729
127.512
119.177
�88.897
15.729
127.512
119.177
�88.897
15.729
41
127.436
120.498
�91.189
16.207
127.436
120.498
�91.189
16.207
127.436
120.498
�91.189
16.207
42
127.459
121.517
�91.206
16.315
127.459
121.517
�91.206
16.315
127.459
121.517
�91.206
16.315
43
128.368
121.450
�91.055
16.478
128.368
121.450
�91.055
16.478
128.368
121.450
�91.055
16.478
44
127.505
119.463
�89.427
16.692
127.505
119.463
�89.427
16.692
127.505
119.463
�89.427
16.692
45
128.132
119.387
�89.297
16.666
128.132
119.387
�89.297
16.666
128.132
119.387
�89.297
16.666
46
128.074
120.008
�89.173
16.624
128.074
120.008
�89.173
16.624
128.074
120.008
�89.173
16.624
47
128.440
119.961
�89.845
16.486
128.440
119.961
�89.845
16.486
128.440
119.961
�89.845
16.486
48
128.371
120.662
�89.719
16.522
128.371
120.662
�89.719
16.522
128.371
120.662
�89.719
16.522
wileyonlinelibrary.com/journal/environmetrics Copyright � 2009 John Wiley & Sons, Ltd. Environmetrics 2011; 22: 132–151
C. RIVERO AND T. VALDES
146
ROBUST ALGORITHM FOR SEQUENTIAL LINEAR ANALYSIS
1
estimates:
m16A ;m16
T ; g16; s16� �
¼ ð123:92; 117:79;�74:41; 14:79Þ;m32A ;m32
T ; g32; s32� �
¼ ð126:61; 120:30;�87:37; 15:48Þ;
16�1L16 ¼45:41 20:96 �43:69
39:69 �33:1969:48
0@
1A
and
32�1L32 ¼23:87 11:91 �21:42
24:61 �22:7940:97
0@
1A:
From them, it can be verified, for example, that, at the significance level of 0.05, the hypothesisH0:mA¼mT is accepted in both cases, since
the values of the statistics that form part of R160:05 and R32
0:05 are 0.87 and 1.61, respectively.
For its part, the analysis of model (4) was tackled by means of its auxiliary reformulation yijh¼ bijþ seijh (i¼ 100, 600, 1000 and j¼A, T),
the slope parameters of which determine the slope free parameters of (4) through the equations:
h ¼ b: ¼ 6�1X
ijbij; D100 ¼ b100: � b::
¼ 2�1ðb100A þ b100LÞ � b::; D600 ¼ b600: � b::; PA ¼ b:A � b::;
DP100A ¼ b100A � b100: � b:A þ b::; andDP600L ¼ b600L � b600: � b:L þ b::
For n¼ 16,. . .,48, first, the proposed algorithm supplies the estimates sn, bnij and n
�1Ln, this last element being the approximate covariance
matrix of the bij-estimates. Then, we can use this matrix and (24) to test the typical ANOVA null hypotheses which identify the absence of
main effects or interactions; in particular, (1) distance main effects: Hð1Þ0 D100 ¼ D600 ¼ D1000 ¼ 0, (2) power station main effects:
Hð2Þ0 PA ¼ PT ¼ 0 and (3) interaction between distance and power station: H
ð3Þ0 DPij ¼ 0, for all possible values (i, j). It is sufficient to
realise that, after denoting b¼ (b100A, b600A, b1000A, b100T, b600T, b1000T)0, these three null hypotheses are linear and equivalent to
Hð1Þ0 A1b ¼ 0, H
ð2Þ0 A2b ¼ 0 and H
ð3Þ0 A3b ¼ 0, for the particular matrices
A1 ¼ 6�12 �1 �1 2 �1 �1�1 2 �1 �1 2 �1
� �; A2 ¼ ð1; 1; 1;�1;�1;�1Þ and A3 ¼ 6�1
2 �1 �1 �2 1 1
�1 2 �1 1 �2 1
� �:
With the complete sample, the auxiliary parameter estimates were s48¼ 8.78 and b48 ¼ ð110:26; 91:86; 28:28; 102:56; 84:18; 21:85Þ0. Thelinear combinations of b48 which form part of the null hypotheses H
ð1Þ0 , H
ð2Þ0 and H
ð3Þ0 were A1b
48 ¼ ð33:25; 14:86Þ0, A2b48 ¼ 21:81 and
A3b48 ¼ ð0:21; 0:20Þ0.
Finally, the observed values 48b048A0ðAL48A0Þ�1Ab48 which, in accordance with (24), need to be computed to find out if b48 belongs to the
critical regions associated with the hypothesis Hð1Þ0 , H
ð2Þ0 and H
ð3Þ0 were 957.13, 9.75 and 0.07, respectively. At the a-level of 0.05, each of
these values must be compared to x22ð0:05Þ ¼ 5:99, x2
1ð0:05Þ ¼ 3:84 and x22ð0:05Þ ¼ 5:99, respectively. Therefore, the hypotheses of null
distance and power station main effects are rejected and the hypothesis of null interactions is accepted, at the significance level mentioned
above. These conclusions are in complete agreement with those which are derived from the sub-sample of size 32. The values of the statistic
32b032A0ðAL32A0Þ�1Ab32 were 542.67, 7.24 and 0.90 for Hð1Þ0 , H
ð2Þ0 and H
ð3Þ0 , respectively. For its part, the equivalent values for the sample
size 16 were 290.96, 1.42 and 0.29, thus the conclusion about Hð2Þ0 differs using the suggested initial guesses b16 and s16 (the OLS estimates
based on the non-grouped data of the first one third of the sample, as was indicated).
The natural alternatives to the proposed algorithm are the maximum likelihood procedures. Thinking, for simplicity’s sake, in terms of
model (3) and denoting f¼ (b, s) with b¼ (mA, mT, g), it holds that, for a certain sample size n, the log-likelihood function (under the robust
conditions considered in this paper) agrees with the integral function
lðfÞ ¼Xi2Iun
log s�1fyi � x0ib
s
� �� �þXi2Ign
logXrh¼1
Iðch�1 < yi � chÞZ �x0ibþchð Þs�1
�x0ibþch�1ð Þs�1
f ðxÞdx: (30)
Its maximization can be carried out either directly, typically using the Newton–Raphson algorithm, due to its quadratic convergence, or the
EM algorithm, given its extended use under incomplete data. If the Newton–Raphson procedure is employed, it starts from an initial guess of
f (let f(0) denote this) and, then, assuming that the current f-estimate is f(k), the iteration equation is defined by
fðkþ1Þ ¼ fðkÞ þ I�1ðfðkÞÞSðfðkÞÞ; (31)
where the vector S(f) and the matrix I(f) are, respectively, the gradient and the negative of the second-order partial derivatives of the log
likelihood function with respect to the elements of f, that is,
SðfÞ ¼ @lðfÞ@f
Environmetrics 2011; 22: 132–151 Copyright � 2009 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics
47
C. RIVERO AND T. VALDES
148
and
IðfÞ ¼ � @2lðfÞ@f@f0
:
According to this, if the data is sequentially received and we want to advance partial estimations, as maintained in this paper, the iteration
(31) need to be conceived as nested in a primary iteration upon the sample size. Therefore, the sequential data reception version of the
Newton–Raphson procedure is formalized as follows (from a given initial sample size n0):
The Newton–Raphson algorithm adapted to the sequential data reception.
Initialization: Let n0 and fn0 be a starting sample size and guess of f, respectively.
Iteration:
Step 1: Assuming that the current sample size, n�1, and f-value, fn�1, are known, let us update fn�1
by the following iterative process nested in the former one (as a new observation is received):
Initialization: Let us take an initial f(0),n (a plausible option could be f(0),n¼ fn�1 to take
advantage of the previous estimate).
Iteration: Assuming that f(k�1),n is known, let us update this through the following steps:
Step 1.1: fðkÞ;n ¼ fðk�1Þ;n þ I�1ðfðk�1Þ;nÞ Sðfðk�1Þ;nÞ ð32ÞStep 1.2: k�1 k, and return to Step 1.1 until the convergence is achieved, and let f(1),n
be the limit point.
Step 2: Let us define fn¼f(1),n.
Step 3: n�1 n, and return to Step 1.
The similar version of the EM algorithm simply substitutes the former Step 1.1 with the concatenation of the E and M steps of the EM (for
example, see McLachlan and Krishnan, 1997: p. 22). The E step obliges us to compute
Qðf;f k�1ð Þ;nÞ ¼ �n log s þXi2Iun
log fyi � x0ib
s
� �
þXi2Ign
Xrh¼1
Iðch�1 < yi � chÞE log fyi � x0ib
s
� �jf k�1ð Þ;n; ch�1 < yi � ch
� �(33)
where
E log fyi � x0ib
s
� �jfðk�1Þ;n; ch�1 < yi � ch
� �¼
R chch�1
log ft�x0ibs
f
t�x0ibðk�1Þ;nsðk�1Þ;n
dtR ch
ch�1f
t�x0ibðk�1Þ;n
sðk�1Þ;n
dt
;
whereas the M step updates the current f-values of the secondary iteration with the maximum of Q(f,f(k�1),n) as a function of f. Therefore,
Step 1.1 of the EM algorithm adapted to the sequential data reception is
Step1:1 : fðkÞ;n ¼ argmaxf
Q f;fðk�1Þ;n
: (34)
The proposed algorithm avoids the secondary iteration (defined by the former steps 1.1 and 1.2), which is substituted with one
single computation of the simple expressions (17), (18) and (19). This by itself represents a clear computational advantage of the proposed
algorithm compared to the sequential adaptations of the Newton–Raphson and EM algorithms. There exist many other technical advantages
to support our algorithm. However, we have decided to annex them at the end of the paper for interested readers.
7. CONCLUSIONS AND FINAL COMMENTS
This paper provides the presentation of an algorithm for linear model estimation and inference under robust conditions which include the
sequential data reception and the need to forecast partial statistical analyses, the existence of grouped or missing data and the possibility of
general errors, not necessarily normal or symmetrical, with null means and unknown variances. As the sample size increases, the parameter
estimate sequences generated by the algorithm stabilize towards a point which does not depend on the starting point. Additionally, for a fixed
sample size, the asymptotic covariance matrix of the slope estimates can also be consistently estimated by means of an explicit expression
which is easy to implement computationally. In this sense the proposed algorithm presents a clear advantage compared to the maximum
likelihood procedures, the natural competitors of the proposed algorithm. With these procedures, implemented either directly or through the
wileyonlinelibrary.com/journal/environmetrics Copyright � 2009 John Wiley & Sons, Ltd. Environmetrics 2011; 22: 132–151
ROBUST ALGORITHM FOR SEQUENTIAL LINEAR ANALYSIS
EM algorithm, the computation of the estimate asymptotic covariance matrix entails the evaluation of the Hessian matrix of the log-likehood
integral function associated with the robust conditions mentioned above, which can only be done numerically through quite unstable
methods. This advantage and others with regard to the computational complexity and the memory capacity of the procedures justify our
proposal. A last point, added to its simplicity, merits to be emphasized: the large variety of simulations made corroborates that if the linear
model is well specified the capacity of the proposed algorithm for treating the incomplete data situations considered is remarkable. In fact, the
statistical acuteness of our estimates (in terms of biases and mean square errors) is similar to the OLS estimates with complete data. As can be
seen from Table 2, this occurs for all of the combinations of error distributions, proportion of grouped data, values of the scale parameter and
sample sizes displayed there.
Finally, a brief computational observation will bring the paper to a close. This refers to our computer implementation of the algorithm. This
was made in MATLAB and its source language is available from the authors on request.
Acknowledgements
This paper springs from research partially funded by MEC under grant MTM2004-05776.
REFERENCES
An, MY. 1998. Logconcavity versus logconvexity: a complete characterization. Journal of Economic Theory 80: 350–369.Anido, C, Rivero, C, Valdes, T. 2000. Modal iterative estimation in linear models with unihumped errors and non-grouped and grouped data collected from
different sources. Test 9: 393–416.Dempster, AP, Laird, NM, Rubin, DB. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39:
1–22.Fahrmeir, L, Kufmann, H. 1985. Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Annals of Statistics
13: 342–368.Healy, MJR, Westmacott, M. 1956. Missing values in experiments analysed on automatic computers. Applied Statistics 5: 203–206.James, IR, Smith, PJ. 1984. Consistency results for linear regression with censored data. Annals of Statistics 12: 590–600.Louis, TA. 1982. Finding observed information using the EM algorithm. Journal of the Royal Statistical Society B 44: 98–130.McLachlan GJ, Krishnan, T. 1997. The EM Algorithm and Extensions. Wiley: New York.Meilijson, I. 1989. A fast improvement of the EM algorithm on its own terms. Journal of the Royal Statistical Society B 51: 127–138.Meng, XL, Rubin, DB. 1991. Using EM to obtain asymptotic variance-covariance matrices. Journal of the American Statistical Association 86: 899–909.Orchard, T, Woodbury, MA. 1972. A missing information principle: theory and applications Proceedings of the 6th Berkeley Symposium on Mathematical
Statistics Vol. I:697–715.Ritov, Y. 1990. Estimation in linear regression model with censored data. Annals of Statistics 18: 303–328.Rivero, C, Valdes, T. 2004. Mean based iterative procedures in linear models with general errors and grouped data. Scandinavian Journal of Statistics 31:
469–486.
8. ANNEXE: TECHNICAL ADVANTAGES OF THE PROPOSED ALGORITHMCOMPARED TO ITS SEQUENTIAL ALTERNATIVES
In addition to the clear computational advantage of the proposed algorithm (against the sequential adaptations of the Newton–Raphson and
EM algorithms), which was commented on at the end of Section 6, many other technical reasons support the first. In particular the following:
(a) A
Env
sole iteration of the Step 1.1 of the Newton–Raphson version is more complicated to compute than the expressions (17), (18) and (19),
since it needs to evaluate numerically the first and second derivatives of the log-likelihood (30).
(b) T
he same occurs with the EM version. With general error distributions, the computational complexity of a sole iteration of (34) greatlysurpasses that of the expressions (17), (18) and (19). Only when the errors are distributed normally, does there exist a certain likeness
between Step 1.1 of the EM and the expressions cited above. In this respect, if the ei-distribution is standard normal, it can be verified that
(33) essentially agrees with
1
Qðf;fðk�1Þ;nÞ ¼ �n log s � 1
2s2
Xi2Iun
ðyi � x0ibÞ2 þ
Xi2Iun
E� ðyi � x0ibÞ2jfðk�1Þ;n
h i( )
where E�[r(yi)jf(k�1),n]¼E[r(yi)jf(k�1),n, ch�1< yi� ch] if (ch�1, ch] is the actual grouping interval which overlaps the grouped observation
yi. The first order optimality condition
s2 @Qðf;fðk�1Þ;nÞ@b
¼Xi2Iun
ðyi � x0ibÞxi þXi2Iun
E� yijfðk�1Þ;n
� x0ib
xi ¼ 0
clearly indicates that to update current b-values of the secondary iteration we need, first, to impute each grouped yi-observation with its
conditional expectation E�(yijf(k�1),n) and secondly, to apply least squares from both the ungrouped data and the imputations mentioned
above. With the notation used in Section 4 when describing the proposed robust estimating algorithm, this is equivalent to
bðkÞ;n ¼ ðX0nXnÞ�1X0nynðbðk�1Þ;n; sðk�1Þ;nÞ (35)
ironmetrics 2011; 22: 132–151 Copyright � 2009 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics
49
C. RIVERO AND T. VALDES
150
which resembles (18). Finally, from (35) and
@Q f;fðk�1Þ;n� �
@s¼ 0;
one can easily verify that the s-updating equation of the secondary loops is
sðkÞ;n ¼ n�1Xi2Iun
yi � x0ibðkÞ;n
2þXi2Ign
E� yi � x0ibðkÞ;n
2fðk�1Þ;n��� �0
@1A
24
351=2
(36)
where the conditional expectation on the right can be calculated from
E� yi � x0ibðkÞ;n
2fðk�1Þ;n��� �
¼ E� yi � x0ibðk�1Þ;n
2fðk�1Þ;n��� �
þ x0i bðkÞ;n � bðk�1Þ;n h i2
þ2x0i bðkÞ;n � bðk�1Þ;n
E� yi � x0ibðk�1Þ;n fðk�1Þ;n
�� (37)
and the following easily verifiable equalities
E� yi � x0ibðk�1Þ;n
2fðk�1Þ;n��� �
¼ sðk�1Þ;n 1�mðk�1Þ;nih ’ m
ðk�1Þ;nih
� m
ðk�1Þ;nih�1 ’ m
ðk�1Þ;nih�1
F m
ðk�1Þ;nih
�F m
ðk�1Þ;nih�1
24
35 (38)
and
E� yi � x0ibðk�1Þ;n fðk�1Þ;n
�� ¼ �
’ mðk�1Þ;nih
� ’ m
ðk�1Þ;nih�1
F m
ðk�1Þ;nih
�F m
ðk�1Þ;nih�1
; (39)
in which w and F denote the density and distribution functions of the standard normal, respectively, mðk�1Þ;nih ¼ ð�x0ibðk�1Þ;n þ chÞ=sðk�1Þ;n,
mðk�1Þ;nih�1 is similar with ch�1 instead of ch, and (ch�1, ch] the actual grouping interval of the grouped observation yi. Let us take into account that
with normal errors (19) adopts the form
sn ¼ n�1Xi2Iun
yi � x0ibn
� �2 þ sn�1Xi2Ign
1�lnih’ lnih
� �� lnih�1’ lnih�1
� �F lnih� �
�F lnih�1� �
" #0@
1A
24
351=2
; (40)
where lnih ¼ ð�x0ibn þ chÞ=sn�1, lnih�1 defined in a similar way with ch substituted with ch�1 and (ch�1, ch] being, as before, the actual
grouping interval of the grouped observation yi. Therefore, the expressions (36) and (19) admit an explicit form and, although (36) is slightly
more complicated than (19), their computational complexities are of similar orders. It follows from (35) and (36), likened with (18) and (19),
respectively, that if in the sequential data reception version of the EM with normal errors one had decided to execute one single loop of the
secondary iteration, the resulting hypothetical algorithm would have a similar complexity to the one proposed in this paper, although they do
not agree. In this sense, if f�n¼ (b�n,s�n) denotes the sequence generated by the hypothetical algorithm, its updating equations would be
(from (35) and (36))
b�n ¼ X0nXn
� ��1X0ny
n b�n�1; s�n�1� �
; (41)
which agrees with (18), and
s�n ¼ n�1Xi2Iun
yi � x0ib�n� �2 þX
i2Ign
E� yi � x0ib�n� �2
f�n�1�� 0
@1A
24
351=2
(42)
where the computation of the conditional expectation on the right can be performed from the expressions (37), (38) and (39) conveniently
adapted. Although (41) differs from (and it is slightly more complicated than) (40), the normal error version of (19), the most relevant
drawback of the hypothetical EM version concerns the asymptotic distribution of its estimates, b�n, as n!1, which, in our opinion, is
inscrutable and without which no inferences about the true slope parameter can bemade. For its part, if the error distribution is non-normal, the
similarity between the step 1.1 of the sequential data reception version of the EM and the expressions (18) and (19) of the proposed algorithm not
only disappears but, additionally, the steps E andM of the EM do not usually admit a closed form. If this is the case, the E step requires the use of
quadrature techniques, whereas the maximization involved in the M step has to be numerically carried out (through the Newton–Raphson
procedure or alternative algorithms). This last point implies that, besides the secondary iteration of the sequential data reception version of the
EM, a third nested iteration usually needs to be added to compute Step 1.1. For example, with the Laplacian errors which are typically assumed
with the radiological data of Table 1, as was said, it can be verified that
Q f;fðk�1Þ;n
¼ �n log s � 2�1=2s�1Xi2Iun
yi � x0ib�� ��þX
i2Ign
E� yi � x0ib�� �� fðk�1Þ;n�� 2
435
the maximization of which does not admit an explicit form. Thus, the third nested iteration mentioned above is required in this case.
wileyonlinelibrary.com/journal/environmetrics Copyright � 2009 John Wiley & Sons, Ltd. Environmetrics 2011; 22: 132–151
ROBUST ALGORITHM FOR SEQUENTIAL LINEAR ANALYSIS
Additionally to the former comments, which concern the parameter point estimation, there exists a second substantial statistical advantage
of the proposed procedure (compared to the maximum likelihood methods). This affects the parameter interval estimation and hypothesis
testing which, in both cases, lean on asymptotics. As explained above, our algorithm tackles these inferences by means of the covariance
matrix n�1Ln, in which only first derivatives are involved. From expressions (8) and (9) as was said, these first derivatives admit an explicit
expression in terms of the density and distribution functions of the standardized errors. For its part, the maximum likelihood methods (either
direct or through the EM algorithm) need to estimate the Fisher information matrix evaluated in the maximum likelihood estimate. If the
direct ML method is employed this matrix need to be numerically evaluated, as it occurs in each of the steps 1.1 of the secondary iteration.
Within the context of the EM algorithm, the Hessian (matrix) of the log-likelihood function can be evaluated by several methods, among them
by direct computation/numerical differentiation (Meilijson, 1989), the Louis method (Louis, 1982) or using EM iterates (Meng and Rubin,
1991). Therefore, in none of the former MLmethods does the estimate of the asymptotic covariance matrix of the b-estimates admit a closed
form and the numerical evaluation of the second derivatives of the log-likelihood is quite unstable. Undoubtedly the equivalent estimation in
our proposed algorithm is simpler if we recall that only first differentiations with analytical form are involved.
To finish, let us comment on the partial memory capacity of the proposed algorithm compared to its maximum likelihood alternatives.
Neither the Newton–Raphson version nor the EM version (adapted to the sequential data reception) has the least memory with respect to the
primary iteration upon the sample size. The reason is that the maximization of the sum of n functional summands (which form part of (30) or
(33) when the sample size is n) do not hinge at all on the similar maximizations of the sum of their first n� 1 summands (which are associated
with the sample size were n� 1). In fact, this means that the computation of the secondary iteration when the sample size is n (a) requires the
use of the complete up-to-date data set of individual grouped or ungrouped observations (xi, yi), which, therefore, has to be completely stored,
and (b) is utterly independent of the similar computation based on the previous sample size n� 1: hence, the dregs of wastefulness which the
maximum likelihood methods, as well as other traditional statistical procedures, are pervaded with when we try to adapt them to a sequential
data reception, as commented on in Section 1. For its part, the proposed algorithm (1) needs only the storage of the (xi, yi)-information
corresponding with the grouped yi-data, and (2) its computation admits a certain sequential organization. To see this, it suffices to observe,
first, that the (xi, yi)-information with regard to the grouped yi-data is needed to compute (17) within the proposed algorithm; secondly, the
sequential computation of (18) and (19) can be made as follows. Let us rewrite (18) and (19) in the form
bn ¼ M�1n þ Nn þXi2Ign
xiyi bn�1; sn�1� �
and
nsn2 ¼ Pn þ b0nQnbn � 2Rnb
n þ sn�12Xi2Ign
Xrh¼1
I ch�1 < yi � ch�1ð Þ
R �x0ibnþchð Þ=sn�1�x0
ibnþch�1ð Þ=sn�1 x
2f ðxÞdxR �x0
ibnþchð Þ=sn�1
�x0ibnþch�1ð Þ=sn�1 f ðxÞdx
;
where
Mn ¼Xni¼1
xix0i; Nn ¼
Xi2Iun
xiyi; Pn ¼Xi2Iun
y2i ; Qn ¼Xi2Iun
xix0i and Rn ¼
Xi2Iun
yix0i:
These latter matrices, along with the individual information concerning the grouped yi -observations (that is, xi and the actual grouped interval
(ch�1, ch] which overlaps yi), are the only elements that need to be stored. As soon as a new observation is received, the matrices can be
sequentially updated (meaning by this that the updating depends on their stored values and the new observation) by means of
Mnþ1 ¼ Mn þ xnþ1x0nþ1; Nnþ1 ¼ Nn þ I nþ 1 2 Iunþ1
� �xnþ1x
0nþ1
and the updating expressions for Pnþ1, Pnþ1 and Pnþ1 which are similar to the latter one. For their part, it is clear that the updating of the sums
Xi2Ign
xiyi bn�1; sn�1� �
andXi2Ign
Xrh¼1
I ch�1 < yi � ch�1ð Þ
R �x0ibnþchð Þ=sn�1
�x0ibnþch�1ð Þ=sn�1 x
2f ðxÞdxR �x0
ibnþchð Þ=sn�1
�x0ibnþch�1ð Þ=sn�1 f ðxÞdx
do not admit a sequential representation but, on the contrary, it requires the use of the all of the individual information regarding the grouped
data. Although the memory of the proposed algorithm is only partial, as it only affects the ungrouped data, there is no comparison with the
memory of its alternatives which is null, as was said. This represents a new advantage of our proposal in terms of storage requirements to be
added to those which, with regard to its statistical simplicity and computational complexity, were commented on above.
Environmetrics 2011; 22: 132–151 Copyright � 2009 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/environmetrics
151