An algorithm for robust linear estimation with grouped data

Computational Statistics and Data Analysis 53 (2008) 255–271

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda

An algorithm for robust linear estimation with grouped dataCarlos Rivero, Teofilo Valdes ∗Complutense University of Madrid, Spain

a r t i c l e i n f o

Article history:Received 1 June 2007Received in revised form 30 April 2008Accepted 1 July 2008Available online 9 July 2008

a b s t r a c t

An algorithm which is valid to estimate the parameters of linear models under severalrobust conditions is presented.With respect to the robust conditions, firstly, the dependentvariables may be either non-grouped or grouped. Secondly, the distribution of theerrors may vary within the wide class of the strongly unimodal distributions, eithersymmetrical or non-symmetrical. Finally, the variance of the errors is unknown. Underthese circumstances the algorithm is not only capable of estimating the parameters (slopesand error variance) of the linear model, but also the asymptotic covariance matrix ofthe linear parameters. This opens the possibility of making inferences in terms of eithermultiple confidence regions or hypothesis testing.

© 2008 Elsevier B.V. All rights reserved.

1. Presentation of the problem and an introductory example

Let us consider the fixed effects linear model

yi = x′iβ + σεi, (1)

where β represents its slope vector parameter, xi and yi are, respectively, the independent variable vector of order m andthe dependent variable of the individual i (i = 1, . . . , n), the εi’s denote the random error terms with mean 0 and variance1, and σ > 0 is the error scale parameter.Several robust conditions will be assumed throughout the rest of the paper. Firstly, each dependent observation yi may

be either ungrouped (with probability π0 > 0) or grouped (with probability π1= 1− π0 > 0) with different classificationintervals. For simplicity of notation we will assume that there exists a unique set of known classification intervals given bytheir extremes

−∞ = c0 < c1 < · · · < cr = ∞; (2)

when a grouped observation is within the interval (ch−1, ch] its value is lost and only this interval is known. In spite of thissimplification, on onehand, the set of classification intervalsmay, aswas said, vary fromone grouped observation to another;on the other hand, it may also be possible that the value yi is only lost if it falls within some known subset of the intervals(ch−1, ch]. Thus, some common cases of incomplete data are within the scope of this paper. For example, (1) missing data isa particular case of grouped data for which there exists a unique classification interval equal to (−∞,∞), or (2) right (orleft) censored data can be visualized as a grouping process with two classification intervals (−∞, c] and (c,∞) in whichthe data is only lost if it falls within one of them. From this variety of situations it follows that the grouping mechanismmay be dependent or independent of the data. The second condition concerns the distribution of the error componentsσεi, which may be general with the sole restriction of being within the wide class of the strongly unimodal distributions

∗ Corresponding address: Departamento de Estadística e I.O. I, Facultad de Matemáticas, Universidad Complutense de Madrid, 28040 Madrid, Spain.Fax: +34 91 3944606.E-mail address: [email protected] (T. Valdes).

0167-9473/$ – see front matter© 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.csda.2008.07.009

http://www.elsevier.com/locate/csda

http://www.elsevier.com/locate/csda

mailto:[email protected]

http://dx.doi.org/10.1016/j.csda.2008.07.009

256 C. Rivero, T. Valdes / Computational Statistics and Data Analysis 53 (2008) 255–271

(see An (1998)) centred on zero (either symmetrical or non-symmetrical). In the sequel f > 0 will denote the densityfunction of the standardized errors. Finally, it will be assumed that the scale parameter σ > 0 is unknown and needs to beestimated jointly with the vector slope parameter β .Under these conditions it is clear that (1) the existence of grouped data makes the ordinary least square (OLS) estimation

and inference inapplicable even if the errors are assumed to be distributed normally, and (2) the non-normality of the errorsreinforces the non-applicability of the OLS estimation and inference mentioned above. For its part, the algorithm proposedhere is operative and easy to implement computationally under these circumstances. It may be conceived as an alternativeto the EM algorithm with the advantage of avoiding the awkward computations of the EM, relating to both the quadratureand maximization processes involved, respectively, in the sequential E and M steps by which β and σ are updated whenthe errors are non-normal. For its part, compared to the direct maximum likelihood estimation our algorithm avoids thenumericalmaximization of the integral function that defines the likelihood of the incomplete data situation described above.Finally, it is important to highlight that the proposed algorithm allows us to estimate the asymptotic covariancematrix of itsslope parameter estimates more easily than the direct maximum likelihood method or the EM algorithm (see, Louis (1982)and Meng and Rubin (1991) in this respect).In the following we will show a simple example with right-censored data in which the potentiality and effectiveness

of the algorithm become evident. The following data were initially presented in Schmee and Hahn (1979) and, later, alsoanalysed by Tanner (1996, p. 67–69). Certain motorettes were tested at temperatures of 150, 170, 190 and 220 ◦C and theirrecorded times until failure are given below.

Table 1Schmee and Hahn data

150◦ 170◦ 190◦ 220◦

8064* 1764 408 4088064* 2772 408 4088064* 3444 1344 5048064* 3542 1344 5048064* 3780 1440 528*8064* 4860 1680* 528*8064* 5196 1680* 528*8064* 5448* 1680* 528*8064* 5448* 1680* 528*8064* 5448* 1680* 528*

Hours until failure versus temperatures (◦C).* Indicates that the observation is censored.

A star indicates that the correspondingmotorette was taken off studywithout failing at the indicated event time. For thisdata Tanner fitted the model

ti = β0 + β1vi + σεi, (3)

where εi ∼ N(0,1), vi ≡ 1000(temperature+ 273.2)−1 and ti = log10 (ith failure time). Since it is clear that each censoreddatum may be envisaged as grouped on the interval (data ∗,∞) the proposed algorithm may be applied to the abovedata, albeit its potential applicability lies far beyond. The percentage of grouped data is high (60%) and the logarithmtransformation on the failure times is spurious and simply seeks the normality of the error terms. As was said, this simplelinearmodel and others similar could be analyzed through the algorithm proposed here. It is clear that the direct applicationof least squares (after merely assigning a representative value to each grouped data) yields undesirable mean square errorsand contradictions on the statistical inferences due to the information loss. Tanner applies the EM algorithm to estimate theslope parameters and the error variance of the above mentioned model. We will compare his results with those obtainedfrom our algorithm. However, at this moment it is important to highlight that, even assuming the normality of the errors,the implementation of the EM algorithm is more complicated if, as maintained in this paper, the grouped observations arenot simply right-censored but interval-censored. In this case the explicit formulas given in Tanner are inapplicable anddifficult to extend explicitly. Additionally, if the assumption of error normality is released, the EM algorithm can only beimplemented numerically. However, under these circumstances our algorithm is easy to implement as will be explainedlater in the paper. Finally, it is also notable that the asymptotic covariance matrix of the linear parameter estimates is easyto estimate with our algorithm, whereas its computation with the EM algorithm is far more complicated. Although thispoint will be detailed in Section 5, for the moment it is sufficient to indicate that in the first case only first derivatives withclosed forms are involved, while with the EM the second derivatives that form part of the Hessian of the log-likelihood donot admit an explicit expression and need to be numerically evaluated. These comments sum up the potentialities of thealgorithm proposed in this paper, which has a direct antecedent in Rivero and Valdes (2004), as will be explained in the nextsection.The paper is organised as follows. Section 2 presents the antecedent cited above. Section 3 includes the rationale, the

final loops and the convergence properties of the proposed algorithm. In Section 4we present several exhaustive simulationstudies, the intention being to analyze the performance and sensibility of the algorithm estimates. The analysis of the Tanner

C. Rivero, T. Valdes / Computational Statistics and Data Analysis 53 (2008) 255–271 257

censored data will be addressed in Section 5, the intention being to compare the proposed algorithm with the maximumlikelihood estimates (and inferences) computed through the EM algorithm. Finally, the comments and remarks of Section 6will bring the paper to a close.

2. Antecedents and the direct precursor of the algorithm

The remote origins of the proposed algorithm to treat the type of data and models described above can be found, inchronological order, in (1) the procedure given in Healy and Westmacott (1956), (2) the missing information principle ofOrchard and Woodbury (1972), and (3) the EM algorithm (Dempster et al., 1977) when the error distribution of the linearmodel is normal. More recent bibliographical antecedents are James and Smith (1984), Ritov (1990) and Anido et al. (2000).However, the direct precursormust undoubtedly be sought in Rivero and Valdes (2004, p. 470), inwhich the authors suggest(in the so-called secondary iteration) an estimating algorithmwhen the sample size, n, is fixed and the scale parameter σ ofmodel (1) is assumed to be known. The algorithm starts from an arbitrary initial pointβ0 and the iteration process (assumingthat βp is given) runs through the steps:

Basic algorithm assuming that the scale parameter is known1. Mean imputation step:

yi(βp) = yi if yi is an ungrouped data= x′iβ

p+ E(σεi | − x′i β

p+ ch−1< σεi≤ −x′i β

p+ ch) if yi is grouped and

yi ∈ (ch−1, ch].2. Updating step: βp+1 = (X ′X)−1X ′y(βp).3. βp ← βp+1, and return to step 1 until de convergence is achieved.

where X ′ = (x1 , . . . , xn), X ′X =∑i∈I xix

′

i with I = {1, . . . , n}, and E(σεi | − x′

i βp+ ch−1< σεi≤ −x′i β

p+ ch) denotes the

conditional expectation of the error term σεi assuming that its corresponding grouped data yi is within the classificationinterval (ch−1, ch]. The limit point, β = β (σ ), of this basic algorithmdefines the estimate of the slope parameter of the linearmodel (1). With respect to this estimating algorithm it is important to highlight that the mean imputation and updatingsteps agree, respectively, with the expectation and maximization steps of the EM algorithm if the error terms are normallydistributed; otherwise, the two steps of the algorithm differ from the E and M steps of the EM. A consequence of this is thatthe algorithm estimate of β does not necessarily agree with the maximum likelihood estimate under the incomplete datasituation alluded to in Section 1 when σ is known. In spite of this, if we simply assume that the error distribution is stronglyunimodal (thus, not necessarily normal), under certain weak conditions the basic algorithm estimate is well-defined for agiven sample size n, and, as n→ ∞, its asymptotic properties are similar to those of the maximum likelihood estimate ofβ . Strictly speaking, this is synthesized as follows.Let us partition the set of indices I into the two subsets Ig and Iu corresponding to the indices of the grouped and

ungrouped data, respectively. Finally, let us assume that X ′X =∑i∈I xix

′

i is a full rank matrix and let us decompose itinto the two summands

X ′X =(X ′X

)g+(X ′X

)u,

where(X ′X

)g=∑i∈Ig xix

′

i and(X ′X

)u is defined in a similar way.Theorem 1. (a) (For a given sample size, n, the basic algorithm β-estimate is well-defined). If

(X ′X

)u is a positive definite matrixthe sequence βp generated by the basic algorithm converges, as p→∞, to a unique point β , which is independent of the initialpoint β0.(b) (β-asymptotics, as n→∞). If, for all n, ‖xn‖ ≤ K <∞ and the minimum eigenvalue of n−1

(X ′X

)u is greater than λ > 0,then β is a consistent estimate of the slope parameter β and

√n(β − β

)D→n→∞

N (0,Λ) , (4)

Additionally, the asymptotic covariance matrixΛ can be consistently estimated by

Λ = n(X ′MX

)−1 (X ′RX) (X ′MX)−1 . (5)

The diagonal matrices M = diag(mi) and R = diag(ri) are, respectively, given by

mi = 1 if i ∈ Iu, and

mi =∂

∂aE (σεi | a+ ch−1 < σεi ≤ a+ ch)

∣∣∣∣a=−x′i β

if i ∈ Ig and yi ∈ (ch−1, ch]; (6)


and

ri = σ 2 if i ∈ Iu, and (7)ri = Var

(ε∗i)if i ∈ Ig and yi ∈ (ch−1, ch],

where ε∗i is a discrete random variable which takes the values

E(σεi | −x′iβ + ch−1 < σεi ≤ −x′iβ + ch

)with probabilities

Pr(−x′iβ + ch−1 < σεi ≤ −x′iβ + ch

),

for h = 1, . . . , r.

Proof. See Rivero and Valdes (2004, p. 477–482). �

Remarks. 1. It is clear that

E(σεi | −x′iβ + ch−1 < σεi ≤ −x′iβ + ch

)=

σ∫ (−x′i β+ch)σ−1(−x′i β+ch−1

)σ−1xf (x) dx

Prob(−x′iβ + ch−1 < σεi ≤ −x′iβ + ch

) (8)

and


)=

∫ (−x′i β+ch

)σ−1(

−x′i β+ch−1)σ−1f (x) dx

= F((−x′iβ + ch

)σ−1

)− F

((−x′iβ + ch−1

)σ−1

), (9)

where F denotes the distribution function of the standardized errors.

2. Clearly

∂

∂a

(σ

∫ (a+ch)σ−1

(a+ch−1)σ−1xf (x) dx

)∣∣∣∣∣a=−x′i β

= σ−1[(−x′iβ + ch

)f((−x′iβ + ch

)σ−1

)−

(−x′iβ + ch−1

)f((−x′iβ + ch−1

)σ−1

)]and

∂

∂aProb (a+ ch−1 < σεi ≤ a+ ch)

∣∣∣∣a=−x′i β

= σ−1[f((−x′iβ + ch

)σ−1

)− f

((−x′iβ + ch−1

)σ−1

)].

Thus, the first derivative involved in (6) admits an explicit expression in terms of the density and distribution functionsof the standardized errors.

3. It follows from (4) and (5) that β approximately follows themultivariate normal distributionN(β, n−1Λ

), which allows

the use of standard procedures to carry out statistical inferences (in terms of either confidence regions or hypothesistesting) on the true slope parameter.

3. The rationale of the proposed algorithm: Resulting loops and properties

In model (1) with a fixed sample size and grouped and ungrouped data such as was mentioned in Section 1, if thetrue values of the parameters β and σ were known (which is not the case), the scale parameter could be consistentlyapproximated using standard variance decomposition techniques by means of

σ 2 (β, σ ) = σ 2b (β, σ )+ σ2w (β, σ ) , (10)

where the between variance σ 2b and the within variance σ2w , respectively, satisfy

nσ 2b (β, σ ) =∑i∈Iu

(yi − x′iβ

)2+

∑i∈Ig

r∑h=1

I (ch−1 < yi ≤ ch) E2(σεi | −x′iβ + ch−1 < σεi ≤ −x′iβ + ch

)


and

nσ 2w (β, σ ) =∑i∈Ig

r∑h=1

I (ch−1 < yi ≤ ch)Var(σεi | −x′iβ + ch−1 < σεi ≤ −x′iβ + ch

),

with E(σεi | −x′iβ + ch−1 < σεi ≤ −x′iβ + ch

)computed in similar way to (8) and

Var(σεi | −x′iβ + ch−1 < σεi ≤ −x′iβ + ch

)=

σ 2∫ (−x′iβ+ch)σ−1(−x′iβ+ch−1)σ

−1 x2f (x) dx


)− E2

(σεi | −x′iβ + ch−1 < σεi ≤ −x′iβ + ch

). (11)

Briefly, we can write

nσ 2 (β, σ ) =∑i∈Iu

(yi − x′iβ

)2+ σ 2

∑i∈Ig

r∑h=1

I (ch−1 < yi ≤ ch)

∫ (−x′iβ+ch)σ−1(−x′iβ+ch−1)σ

−1 x2f (x) dx∫ (−x′iβ+ch)σ−1(−x′iβ+ch−1)σ

−1 f (x) dx. (12)

Clearly

E((yi − x′iβ

)2)= E

(ε2i)= σ 2

if i ∈ Iu, whereas

E

r∑h=1

I (ch−1 < yi ≤ ch)

∫ (−x′iβ+ch)σ−1(−x′iβ+ch−1)σ

−1 x2f (x) dx∫ (−x′iβ+ch)σ−1(−x′iβ+ch−1)σ

−1 f (x) dx

= 1if yi is grouped. Thus,σ 2 (β, σ )→ σ 2 a.e., as n→∞; therefore, with probability 1, the equality

σ (β, σ ) = σ (13)

holds in the limit. As the true slope and scale parameters are unknown, it is clear that σ (β, σ ) is incomputable. However,these expressions together with the basic algorithm have induced us to extend expression (12) to any pair of possible values(β∗, σ ∗) of the parameters by means of:

nσ 2(β∗, σ ∗

)=

∑i∈Iu

(yi − x′iβ

∗)2+ σ ∗2

∑i∈Ig

r∑h=1

I (ch−1 < yi ≤ ch)

∫ (−x′iβ∗+ch)σ∗−1(−x′iβ

∗+ch−1)σ∗−1x2f (x) dx∫ (−x′iβ∗+ch)σ∗−1

(−x′iβ∗+ch−1)σ∗−1

f (x) dx. (14)

Let us write∑i∈Iu

(yi − x′iβ

∗)2=

∑i∈Iu

ε2i +(β∗ − β

)′ [∑i∈Iuxix′i

] (β∗ − β

)+ o(n),

where

o (n)n=2n

n∑i=1

εix′i(β − β∗

)→n→∞

0 a.e.

Reasoning as in (12), it is clear that

σ 2(β∗, σ ∗

)→n→∞

σ 2 + π0(β∗ − β

)′ P (β∗ − β)+ π1 (σ ∗2 − σ 2) (15)

with probability 1, where P denotes the limit of the mean product matrix

P = limn→∞

n−1u(X ′X

)u,

and nu denotes the cardinal of Iu. Taking this into account, we propose the following estimating algorithm:


Proposed robust estimating algorithmINITIALIZATION: Let β0 and σ 0 be two arbitrary starting values of the slope and scale parameter, respectively.ITERATION: Assuming that βp and σ p are known, let us update them through the following steps:

1. Assuming that the scale parameter agrees with σ p, run the basic algorithm given in Section 2taking βp as the initial point, and use its limit point to define βp+1. Therefore, with the notationused in the basic algorithm: βp+1 = β (σ p).2. Update the scale parameter by σ p+1 = σ

(βp+1 , σ p

), using (14).

3. βp ← βp+1, σ p ← σ p+1, and return to step 1 until the convergence is achieved

The algorithm limit point, (β∞, σ∞), will define our slope/scale-estimates of model (1). It is important to observe that,in harmony with the limit equality (13), this estimate needs to fulfil

σ∞ = σ(β∞, σ∞

), (16)

where the term on the right is similar to (10). Additionally, it is significant to highlight that1. After the enormous number of simulations included in Section 4 (which amounts to 20250 different cases) we are in aposition to maintain that the proposed algorithm certainly converges to a point (β∞ , σ∞) which does not depend onthe starting values (β0 , σ 0), in spite of the fact that both sequences {βp} and {σ p} are inter-linked and differ when someof the initial values vary;

2. The algorithm estimate (β∞ , σ∞) is, in fact, an M-estimator since it fulfils the implicit relation (16). This means that wecan expect its asymptotic distribution to be a multivariate normal;

3. The consistency property of β = β (σ ), the basic algorithm limit estimate of β when σ is known, can be transferredto the proposed algorithm estimate, (β∞ , σ∞), under certain technical conditions. Although their strict formulationsare far from the scope of the present paper, the following arguments sketch the proof. As before, let (β, σ ) denote thetrue values of the model parameters. Since both the algorithm estimates and the functions β (of basic algorithm) andσ (given in (12) and (14)) depend, in fact, on the sample of size n, it will be necessary (only in this sketch) to denotethem by (β∞n , σ

∞n ), βn and σn, respectively. For its part, let (β

pn(β

0, σ 0), σpn (β

0, σ 0)) denote in this sketch the sequencegenerated by the algorithm to clarify its dependency not only of n but also of the initial point (β0 , σ 0) which was chosen.This latter dependency does not manifest upon the limit point (β∞n , σ

∞n ) = limp→∞(β

∞n (β

0, σ 0), σ∞n (β0, σ 0)) which

is unique and independent of (β0 , σ 0). In particular, if we had run the algorithm from the starting point (β0, σ ), whichis impossible as σ is unknown, then, equally (βpn(β0, σ ), σ

pn (β

0, σ ))→ (β∞n , σ∞n ) as p→∞. We will reason from this

that (β∞n , σ∞n ) tends to (β, σ ) in probability as n→∞. The first loop of the proposed algorithm will lead to

β1n (β0, σ ) = βn(σ )

P→n→∞

β,

and from (15)

σ 1n (β0, σ ) = σn(β

1n , σ )

P→n→∞

σ .

Sequentially, if we repeat the former reasoning in the following loops of the proposed algorithm, it can be concluded thatfor any natural p(

βpn(β0, σ ), σ pn (β

0, σ )) P→n→∞

(β, σ ) ,

which, under regularity conditions (see Fahrmeir and Kufmann (1985)) can also be extended to the limit point (β∞n , σ∞n ).

4. From the last two points and taking into account the asymptotic distribution given in (4), our proposal is to use thefollowing natural distributional approximation

√n(β∞ − β

)≈ N

(0,Λ∞

), (17)

where the covariance matrixΛ∞ =(λ∞ij)is

Λ∞ = n(X ′M∞X

)−1 (X ′R∞X) (X ′M∞X)−1 , (18)

and the diagonal matricesM∞ = diag(m∞i

)and R∞ = diag

(r∞i)are defined as in (6) and (7) after substituting σ and β

with σ∞ and β∞, respectively. Finally, from (17), it is straightforward to make standard inferences about the true slopeparameters.

4. Simulation study on the performance of the algorithm

In this section we present a large number of simulations made with the intention of analyzing the performance of theproposed algorithm. We have considered the model

yi = β0 + β1x1i + β2x2i + σεi, (19)


in whichwe have fixed the slope parameter as β = (1,−4, 3)′, the independent variables xi = (xi1, xi2)′, i = 1, . . . , n, wereselected fromauniformdistribution on the square [−1, 2]2, and the valuesσ = 1, 2, 4were assigned to the scale parameter.The sample sizes considered in this study were n = 50, 100 and 200, respectively, and the errors were standardizedfrom the following distributions: (i) Laplace, with a density function equal to 2−1 exp(−|x|), (ii) Logistic, with densityexp(−x)(1+ exp(−x))−2, and (iii) Normal. Finally, the dependent variables yi were randomly grouped within the groupingintervals (−∞,−7], (−7,−4], (−4,−1], (−1, 2], (2, 6] and (6,∞)with probabilities π0 equal to 0.6, 0.4, and 0.2, that is,converting nπ0 observations into grouped data.For each combination of the values σ , n and π0 and each standardized error distribution, we have made 250 replications

of the data (εi, yi) and of the grouping process. The manner in which the nπ0 grouped observations are divided into thedifferent grouping intervals is random, thus, it varies from one replica to another. However, we can easily give it in terms ofexpectations from the theoretical probabilities of the grouping intervals. The theoretical probability of a particular groupinginterval is understood as the probability that a y-value generated frommodel (19), with the characteristics explained above,falls in the grouping interval in question (which agrees with the probability that a grouped observation be within thatinterval). As these theoretical probabilities depend only on the error distribution and sigma, we have estimated them fromthe empirical frequencies of the intervals in the most efficient way, which means using as many observations as possible.The maximum number of observations available for a particular pair (error distribution, sigma) is 262500, the total numberof observations generated in the 2250 replicas simulated with the nine different settings of the pairs n and π0. Table 2shows, for each of the error distributions and σ -values, the estimates of the theoretical probabilities mentioned above. Fora particular setting of error distribution, σ , n and π0, the expected number per replica of grouped observations (out of theexisting nπ0 grouped data) within a certain grouping interval is simply equal to nπ0 times the theoretical probability ofthe interval. The greatest loss of information takes place on the infinite grouping intervals, therefore, the most negativeeffects of the data incompleteness are owed to them. The probabilities of these infinite intervals depend firstly on the scaleparameter and secondly, for a given σ , on the tail weight of the error distribution. The lightest tail weight of the errordistributions considered in this simulation study corresponds to the normal, while the Laplace distribution has the heaviest.The directions of the effects of the scale parameter and the tail weight on the probabilities of the infinite grouping intervalare obvious and clearly recognisable from Table 2. On seeing the data of this table, the probabilities were 6.15%, 11.29%and 26.21% for σ = 1, 2, 4, respectively, when the errors follow a standardized Lapalacian distribution. The equivalenttriplets of percentageswere (4.48%, 8.53%, 23.81%) and (3.11%, 6.24%, 19.80%) for the standardized logistic and normal errors,respectively. However, let us keep in mind that these percentages, as well as those of Table 2, must still be multiplied byπ0 to obtain in a specific data set the expected percentage of observations that, over the total set of them, turn out to begrouped within the infinite grouping intervals. The variations of the triplets mentioned above together with those of n andπ0 cover a wide spectrum of values as to guarantee the assessment comments shed in the following points.

Table 2Estimated theoretical probabilities (in %) of the grouping intervals of the yi-values

Values of σ Grouping intervals(−∞,−7] (−7,−4] (−4,−1] (−1, 2] (2,6] (6,∞]

Laplacian error distribution

1 1.99 6.08 23.44 38.67 25.66 4.162 3.95 11.27 17.31 33.45 26.68 7.344 9.21 15.93 15.69 20.33 21.84 17.00

Logistic error distribution

1 1.07 5.15 24.76 37.43 28.18 3.412 2.50 9.53 21.64 32.28 28.02 6.034 8.61 12.42 19.49 19.94 24.34 15.20

Standard normal error distribution

1 0.57 4.95 23.34 41.29 27.31 2.542 1.45 7.54 22.96 35.09 28.17 4.794 6.10 12.95 17.47 23.94 25.84 13.70

For each replication r , we have run the proposed algorithm to obtain the parameter estimates (β∞(r), σ∞(r)) and thecovariance matrix estimateΛ∞(r) given in (17). With these replicated values we have calculated:(1) The empirical biases, variances and mean square errors of the estimates of the slope and scale parameters given by

B(β∞j

)=

∣∣∣E (β∞j )− βj∣∣∣ =∣∣∣∣∣250−1 250∑

r=1

β∞(r)j − βj

∣∣∣∣∣ ,ˆVar(β∞j

)= E

((β∞j − E

(β∞j

))2)= 250−1

250∑r=1

(β∞(r)j − E

(β∞j

))2, (20)


andˆMSE(β∞j

)= E

((β∞j − βj

)2)= ˆVar

(β∞j

)+ B2

(β∞j

),

(j = 1, 2, 3), and similarly for σ∞.With a comparative aim, for each replication we have also computed the ordinary least square parameter estimates

based on the complete data, that is, without being submitted to the grouping process explained above. These estimates aredenoted by βOLS and σOLS, respectively, and their empirical biases, variances and mean square errors are computed as in(20). Tables 3a–3c (depending on the error distribution) show the empirical biases and mean square errors of the slope andscale parameter estimates for the different value combinations of σ , n and π0 mentioned above.On seeing these tables, the following general remarks can be made:

Table 3aEmpirical biases and mean square errors of the model parameter estimates

Laplacian errors

Values of Estimates of the model parametersπ1 σ n β∞0 β∞1 β∞2 σ∞ βOLS0 βOLS1 βOLS2 σOLS

Empirical biases

0.2 1 50 −0.015 0.030 −0.005 −0.018 −0.009 0.024 −0.003 −0.016100 −0.003 0.014 −0.009 −0.011 −0.002 0.013 −0.014 −0.008200 0.015 −0.006 −0.003 −0.012 0.014 −0.003 −0.008 −0.008

2 50 0.027 −0.045 0.013 −0.016 0.026 −0.048 0.016 −0.004100 0.015 0.005 −0.003 −0.005 0.008 0.003 −0.007 0.007200 0.007 −0.002 0.015 −0.010 0.005 −0.004 0.017 −0.006

4 50 −0.039 0.046 −0.016 −0.058 −0.055 0.049 −0.002 −0.043100 0.057 −0.036 −0.007 −0.041 0.063 −0.038 −0.007 −0.032200 0.031 −0.007 −0.023 −0.051 0.028 −0.005 −0.013 −0.026

0.4 1 50 0.011 −0.023 −0.003 −0.029 0.015 −0.023 −0.012 −0.024100 −0.002 −0.006 0.011 −0.006 0.002 −0.003 0.004 −0.003200 0.012 −0.001 −0.016 −0.009 0.016 −0.004 −0.019 −0.003

2 50 −0.014 −0.019 0.040 −0.043 −0.010 −0.011 0.039 −0.032100 −0.027 −0.009 0.029 0.006 −0.034 0.003 0.022 0.011200 0.014 0.001 −0.011 0.003 0.016 −0.005 −0.016 0.012

4 50 −0.055 0.060 −0.070 −0.002 −0.046 0.061 −0.069 0.008100 −0.108 −0.036 0.089 −0.096 −0.120 −0.026 0.099 −0.072200 −0.049 0.054 0.052 0.003 −0.053 0.047 0.041 −0.017

0.6 1 50 −0.014 0.034 0.011 −0.030 −0.014 0.027 0.013 −0.021100 0.003 0.008 −0.009 −0.006 0.007 0.003 −0.008 −0.002200 0.010 −0.002 −0.007 −0.019 0.013 −0.002 −0.005 −0.002

2 50 −0.074 0.056 0.033 −0.021 −0.092 0.062 0.030 −0.015100 −0.012 −0.002 0.011 −0.027 −0.012 0.001 0.011 −0.015200 0.015 −0.009 0.002 0.016 0.014 −0.008 −0.006 0.031

4 50 0.016 0.005 −0.032 −0.010 −0.005 0.012 −0.014 −0.009100 −0.036 −0.015 0.044 −0.061 −0.029 −0.013 0.045 −0.003200 −0.008 −0.007 0.016 −0.060 −0.009 −0.008 0.019 −0.030

Empirical mean square errors

0.2 1 50 0.037 0.026 0.031 0.026 0.033 0.022 0.028 0.023100 0.026 0.017 0.014 0.012 0.022 0.015 0.012 0.012200 0.010 0.008 0.008 0.007 0.009 0.007 0.007 0.007

2 50 0.150 0.123 0.110 0.095 0.147 0.121 0.105 0.091100 0.076 0.054 0.053 0.046 0.070 0.052 0.049 0.040200 0.037 0.031 0.028 0.027 0.034 0.030 0.027 0.026

4 50 0.544 0.373 0.419 0.491 0.528 0.367 0.401 0.485100 0.315 0.248 0.238 0.211 0.316 0.248 0.240 0.224200 0.126 0.108 0.114 0.099 0.128 0.110 0.112 0.092

0.4 1 50 0.038 0.036 0.035 0.030 0.031 0.030 0.026 0.024100 0.018 0.018 0.016 0.016 0.015 0.015 0.013 0.013200 0.009 0.009 0.008 0.008 0.007 0.007 0.008 0.006

2 50 0.192 0.135 0.144 0.094 0.174 0.113 0.131 0.080100 0.072 0.061 0.072 0.058 0.063 0.053 0.064 0.050200 0.044 0.031 0.032 0.025 0.038 0.028 0.029 0.024

4 50 0.634 0.516 0.401 0.417 0.565 0.496 0.380 0.414100 0.282 0.194 0.203 0.199 0.292 0.188 0.219 0.192200 0.146 0.131 0.116 0.102 0.140 0.120 0.111 0.091

0.6 1 50 0.050 0.041 0.038 0.033 0.034 0.026 0.029 0.021100 0.022 0.019 0.022 0.018 0.015 0.013 0.016 0.013


Table 3a (continued)

Laplacian errors


200 0.012 0.012 0.008 0.010 0.008 0.007 0.006 0.0072 50 0.171 0.142 0.141 0.124 0.160 0.114 0.116 0.110

100 0.071 0.055 0.068 0.060 0.061 0.048 0.061 0.048200 0.043 0.029 0.039 0.027 0.036 0.027 0.032 0.022

4 50 0.582 0.435 0.460 0.506 0.584 0.409 0.461 0.462100 0.294 0.212 0.198 0.214 0.277 0.221 0.206 0.202200 0.154 0.100 0.107 0.105 0.149 0.092 0.102 0.084

Table 3bEmpirical biases and mean square errors of the model parameter estimates

Logistic errors


Empirical biases

0.2 1 50 −0.018 0.030 0.009 0.005 −0.013 0.025 0.010 0.007100 0.002 0.001 −0.001 −0.006 0.001 0.001 −0.005 −0.008200 0.000 0.002 0.006 −0.013 0.003 0.001 0.002 −0.012

2 50 0.020 −0.048 0.002 0.006 0.022 −0.056 0.000 0.007100 −0.007 −0.004 0.007 0.010 −0.008 −0.002 0.005 0.014200 −0.008 0.013 −0.012 −0.007 −0.011 0.012 −0.007 −0.005

4 50 −0.095 0.099 0.053 0.031 −0.104 0.116 0.045 0.028100 0.069 −0.014 −0.025 −0.001 0.073 −0.012 −0.032 −0.009200 0.024 −0.042 −0.001 −0.020 0.026 −0.046 −0.003 −0.013

0.4 1 50 0.024 −0.012 −0.017 −0.025 0.023 −0.005 −0.024 −0.022100 0.002 0.005 0.000 −0.004 0.000 0.008 0.001 −0.006200 −0.011 0.009 0.000 −0.017 −0.011 0.009 0.003 −0.010

2 50 0.021 −0.007 −0.002 −0.019 0.022 0.002 −0.002 −0.006100 −0.034 −0.003 0.023 0.004 −0.043 0.004 0.019 0.007200 0.008 −0.005 0.008 −0.034 0.009 −0.009 0.003 −0.028

4 50 −0.059 0.070 −0.005 0.030 −0.065 0.087 −0.038 0.009100 −0.067 −0.036 0.063 −0.053 −0.079 −0.026 0.070 −0.043200 −0.029 0.027 0.011 −0.043 −0.033 0.034 0.004 −0.046

0.6 1 50 0.000 0.019 0.016 −0.010 −0.008 0.013 0.026 −0.009100 −0.007 0.005 −0.002 0.003 0.000 0.004 −0.006 0.001200 0.012 −0.012 −0.013 −0.011 0.003 −0.008 −0.008 −0.006

2 50 −0.065 0.041 0.019 −0.021 −0.077 0.045 0.019 −0.021100 0.022 −0.006 0.003 0.000 0.021 −0.003 0.002 −0.001200 0.001 −0.001 0.013 −0.017 −0.009 0.002 0.012 −0.007

4 50 0.048 −0.090 −0.014 0.009 0.050 −0.051 −0.045 0.002100 −0.019 −0.027 0.023 −0.011 −0.036 −0.017 0.014 −0.012200 0.031 −0.025 0.021 0.012 0.031 −0.024 0.014 0.024


0.2 1 50 0.039 0.033 0.029 0.016 0.035 0.030 0.027 0.014100 0.024 0.017 0.015 0.009 0.021 0.015 0.013 0.008200 0.010 0.008 0.007 0.004 0.009 0.007 0.006 0.003

2 50 0.141 0.136 0.117 0.063 0.141 0.128 0.107 0.060100 0.077 0.056 0.056 0.033 0.074 0.052 0.050 0.030200 0.033 0.027 0.028 0.015 0.031 0.026 0.026 0.013

4 50 0.574 0.554 0.334 0.302 0.559 0.542 0.318 0.287100 0.348 0.233 0.262 0.117 0.356 0.231 0.257 0.124200 0.150 0.119 0.108 0.059 0.148 0.117 0.103 0.056

0.4 1 50 0.044 0.032 0.039 0.020 0.033 0.027 0.027 0.015100 0.020 0.018 0.015 0.010 0.017 0.016 0.013 0.009200 0.010 0.009 0.009 0.005 0.008 0.007 0.007 0.004

2 50 0.169 0.125 0.121 0.069 0.153 0.112 0.103 0.067100 0.065 0.050 0.069 0.039 0.056 0.042 0.061 0.036200 0.039 0.037 0.029 0.018 0.036 0.034 0.026 0.016

4 50 0.586 0.486 0.448 0.289 0.550 0.479 0.423 0.268(continued on next page)


Table 3b (continued)

Logistic errors


100 0.305 0.234 0.204 0.134 0.306 0.228 0.203 0.122200 0.114 0.110 0.106 0.071 0.113 0.108 0.103 0.064

0.6 1 50 0.045 0.047 0.036 0.026 0.032 0.030 0.030 0.016100 0.021 0.017 0.023 0.013 0.017 0.013 0.017 0.009200 0.012 0.010 0.009 0.007 0.009 0.007 0.007 0.004

2 50 0.172 0.121 0.147 0.085 0.175 0.113 0.123 0.069100 0.060 0.059 0.066 0.039 0.053 0.050 0.054 0.032200 0.041 0.031 0.033 0.018 0.038 0.026 0.027 0.014

4 50 0.673 0.512 0.426 0.291 0.627 0.467 0.399 0.241100 0.325 0.232 0.224 0.150 0.296 0.228 0.214 0.129200 0.142 0.090 0.129 0.075 0.146 0.093 0.123 0.071

Table 3cEmpirical biases and mean square errors of the model parameter estimate

Standard normal errors


Empirical biases

0.2 1 50 −0.018 0.030 0.009 0.005 −0.013 0.025 0.010 0.007100 0.002 0.001 −0.001 −0.006 0.001 0.001 −0.005 −0.008200 0.000 0.002 0.006 −0.013 0.003 0.001 0.002 −0.012

2 50 0.020 −0.048 0.002 0.006 0.022 −0.056 0.000 0.007100 −0.007 −0.004 0.007 0.010 −0.008 −0.002 0.005 0.014200 −0.008 0.013 −0.012 −0.007 −0.011 0.012 −0.007 −0.005

4 50 −0.095 0.099 0.053 0.031 −0.104 0.116 0.045 0.028100 0.069 −0.014 −0.025 −0.001 0.073 −0.012 −0.032 −0.009200 0.024 −0.042 −0.001 −0.020 0.026 −0.046 −0.003 −0.013

0.4 1 50 0.024 −0.012 −0.017 −0.025 0.023 −0.005 −0.024 −0.022100 0.002 0.005 0.000 −0.004 0.000 0.008 0.001 −0.006200 −0.011 0.009 0.000 −0.017 −0.011 0.009 0.003 −0.010

2 50 0.021 −0.007 −0.002 −0.019 0.022 0.002 −0.002 −0.006100 −0.034 −0.003 0.023 0.004 −0.043 0.004 0.019 0.007200 0.008 −0.005 0.008 −0.034 0.009 −0.009 0.003 −0.028

4 50 −0.059 0.070 −0.005 0.030 −0.065 0.087 −0.038 0.009100 −0.067 −0.036 0.063 −0.053 −0.079 −0.026 0.070 −0.043200 −0.029 0.027 0.011 −0.043 −0.033 0.034 0.004 −0.046

0.6 1 50 0.000 0.019 0.016 −0.010 −0.008 0.013 0.026 −0.009100 −0.007 0.005 −0.002 0.003 0.000 0.004 −0.006 0.001200 0.012 −0.012 −0.013 −0.011 0.003 −0.008 −0.008 −0.006

2 50 −0.065 0.041 0.019 −0.021 −0.077 0.045 0.019 −0.021100 0.022 −0.006 0.003 0.000 0.021 −0.003 0.002 −0.001200 0.001 −0.001 0.013 −0.017 −0.009 0.002 0.012 −0.007

4 50 0.048 −0.090 −0.014 0.009 0.050 −0.051 −0.045 0.002100 −0.019 −0.027 0.023 −0.011 −0.036 −0.017 0.014 −0.012200 0.031 −0.025 0.021 0.012 0.031 −0.024 0.014 0.024


0.2 1 50 0.039 0.033 0.029 0.016 0.035 0.030 0.027 0.014100 0.024 0.017 0.015 0.009 0.021 0.015 0.013 0.008200 0.010 0.008 0.007 0.004 0.009 0.007 0.006 0.003

2 50 0.141 0.136 0.117 0.063 0.141 0.128 0.107 0.060100 0.077 0.056 0.056 0.033 0.074 0.052 0.050 0.030200 0.033 0.027 0.028 0.015 0.031 0.026 0.026 0.013

4 50 0.574 0.554 0.334 0.302 0.559 0.542 0.318 0.287100 0.348 0.233 0.262 0.117 0.356 0.231 0.257 0.124200 0.150 0.119 0.108 0.059 0.148 0.117 0.103 0.056

0.4 1 50 0.044 0.032 0.039 0.020 0.033 0.027 0.027 0.015100 0.020 0.018 0.015 0.010 0.017 0.016 0.013 0.009200 0.010 0.009 0.009 0.005 0.008 0.007 0.007 0.004

2 50 0.169 0.125 0.121 0.069 0.153 0.112 0.103 0.067


Table 3c (continued)

Standard normal errors


100 0.065 0.050 0.069 0.039 0.056 0.042 0.061 0.036200 0.039 0.037 0.029 0.018 0.036 0.034 0.026 0.016

4 50 0.586 0.486 0.448 0.289 0.550 0.479 0.423 0.268100 0.305 0.234 0.204 0.134 0.306 0.228 0.203 0.122200 0.114 0.110 0.106 0.071 0.113 0.108 0.103 0.064

0.6 1 50 0.045 0.047 0.036 0.026 0.032 0.030 0.030 0.016100 0.021 0.017 0.023 0.013 0.017 0.013 0.017 0.009200 0.012 0.010 0.009 0.007 0.009 0.007 0.007 0.004

2 50 0.172 0.121 0.147 0.085 0.175 0.113 0.123 0.069100 0.060 0.059 0.066 0.039 0.053 0.050 0.054 0.032200 0.041 0.031 0.033 0.018 0.038 0.026 0.027 0.014

4 50 0.673 0.512 0.426 0.291 0.627 0.467 0.399 0.241100 0.325 0.232 0.224 0.150 0.296 0.228 0.214 0.129200 0.142 0.090 0.129 0.075 0.146 0.093 0.123 0.071

- Although the slope estimates β∞ and βOLS are consistent for the error distributions considered, only the first oneis asymptotically unbiased (and normally distributed) independently of the distributions mentioned above. This lastproperty can only be assured for βOLS if the error distribution is normal, in which case these estimates agree with themaximum likelihood estimates based on the complete data. The asymptotic unbiasedness of the two estimates is patentfrom the empirical biases of Table 3c. In spite of the theoretical basis mentioned above, the values shown in Tables 3aand 3b are rather similar to those of Table 3c, thus the asymptotic unbiasedness of the slope estimates when the errordistribution is normal seems to be extendable to the rest of the error distributions considered in our simulation study.- The mean square errors of the slope estimates are of similar orders independently of the error distributions considered.For all of these, our proposed slope estimates have a similar efficiency to the OLS estimates. Thismeans that the proposedalgorithm is useful to avoid the negative consequences that are derived from the existence of grouped or missing data.In all of the cases, the empirical mean square errors increase as the proportion of grouped data grows and as the samplesize decreases, in agreement with what could be expected in advance.- The former conclusions (in terms of biases and MSE’s) about the slope parameter estimates can be extendedstraightforwardly to the scale parameter estimate.

(2) The empirical covariance matrices of the slope parameter estimates β∞ and βOLS. These are denoted by Γ∞ and Γ OLS,respectively, and in harmony with (20) they were computed by

Γ∞ = 250−1250∑r=1

(β∞(r) − E

(β∞

)) (β∞(r) − E

(β∞

))′, (21)

and similarly for Γ OLS. For each replication r , the empirical matrix Γ∞ should be compared with n−1Λ∞(r), sinceΛ∞(r)approximates the asymptotic covariance matrix of

√n (β∞ − β), as was indicated. The comparison between the mean

matrix

Γ ∗∞ = n−1E(Λ∞

)= n−1250−1

250∑r=1

Λ∞(r) (22)

andΓ∞will allowus to evaluate the biases of the different elements of the covariancematrix estimate of the slopeparameterestimates obtained with the proposed algorithm. Tables 4 (with versions a, b and c, depending on the error distributionconsidered) include the distinct elements of the matrices Γ∞,Γ ∗∞ and Γ OLS, respectively.Briefly speaking, it can be seen from Tables 4 that the matrices Γ∞,Γ ∗∞ and Γ OLS are quite similar independently of

the values of the percentage of grouped data, the true scale parameter, the sample size and, also, independently of the errordistributions considered in this study. This fact reinforces our former comment in the sense that the proposed algorithmtends to eliminate the negative consequences that spring from the existence of grouped or missing data.(3) Empirical efficiencies of the algorithm confidence interval estimates.From (17) the distribution of β∞ is approximately

a multivariate normal N(β, n−1Λ∞), therefore, at the 95% level, the approximate confidence interval that is derived fromthe proposed algorithm is given by[

β∞j ±1.96√nλ∞jj

]


Table4a

CovariancematricesΓ∗∞.Γ∞andΓOLSgiveninSection4

Laplacianerrors

Valuesof

Γ∗∞

Γ∞

ΓOLS

π1

σn

γ∗∞

00γ∗∞

11γ∗∞

22γ∗∞

01γ∗∞

02γ∗∞

12γ∞ 00

γ∞ 11

γ∞ 22

γ∞ 01

γ∞ 02

γ∞ 12

γOLS00

γOLS11

γOLS22

γOLS01

γOLS02

γOLS12

0.2

150

0.038

0.031

0.031−0.015−0.015

0.000

0.037

0.025

0.031−0.011−0.016

0.000

0.033

0.021

0.028−0.011−0.014

0.000

100

0.019

0.015

0.015−0.008−0.008

0.000

0.026

0.016

0.014−0.010−0.009

0.000

0.022

0.015

0.012−0.008−0.008

0.000

200

0.009

0.007

0.007−0.004−0.004

0.000

0.009

0.008

0.008−0.003−0.005

0.000

0.008

0.007

0.007−0.003−0.004

0.000

250

0.141

0.118

0.114−0.059−0.054

0.000

0.150

0.122

0.110−0.044−0.061−0.008

0.147

0.120

0.105−0.046−0.059−0.004

100

0.069

0.056

0.056−0.028−0.027

0.000

0.076

0.055

0.053−0.022−0.024−0.008

0.070

0.052

0.049−0.023−0.022−0.007

200

0.034

0.028

0.027−0.014−0.013

0.000

0.037

0.031

0.028−0.017−0.016

0.002

0.035

0.030

0.027−0.016−0.016

0.002

450

0.536

0.437

0.437−0.220−0.212−0.001

0.545

0.373

0.420−0.182−0.234

0.016

0.527

0.366

0.403−0.174−0.210

0.010

100

0.256

0.206

0.208−0.103−0.102

0.000

0.313

0.248

0.239−0.114−0.155−0.005

0.313

0.248

0.241−0.121−0.152−0.008

200

0.126

0.102

0.102−0.052−0.050

0.000

0.126

0.109

0.114−0.045−0.045−0.009

0.128

0.111

0.113−0.046−0.046−0.007

0.4

150

0.042

0.035

0.034−0.017−0.016

0.000

0.038

0.035

0.035−0.014−0.017

0.000

0.031

0.030

0.026−0.012−0.014

0.001

100

0.021

0.018

0.017−0.009−0.008

0.000

0.018

0.018

0.016−0.008−0.007

0.000

0.015

0.015

0.013−0.008−0.005

0.000

200

0.010

0.008

0.008−0.004−0.004

0.000

0.008

0.009

0.008−0.004−0.004

0.001

0.007

0.007

0.007−0.003−0.003

0.000

250

0.140

0.115

0.116−0.058−0.054

0.000

0.192

0.135

0.143−0.086−0.080

0.007

0.174

0.114

0.130−0.071−0.075

0.004

100

0.071

0.058

0.058−0.030−0.027

0.000

0.072

0.062

0.072−0.022−0.027−0.011

0.062

0.053

0.063−0.017−0.025−0.011

200

0.035

0.029

0.028−0.014−0.014

0.000

0.044

0.031

0.032−0.018−0.017−0.001

0.038

0.029

0.029−0.015−0.014−0.002

450

0.505

0.420

0.421−0.203−0.200−0.010

0.634

0.515

0.398−0.301−0.163−0.032

0.565

0.494

0.377−0.264−0.136−0.038

100

0.236

0.193

0.192−0.099−0.092

0.002

0.272

0.194

0.196−0.087−0.115

0.005

0.279

0.188

0.211−0.087−0.123

0.007

200

0.120

0.099

0.099−0.049−0.047−0.001

0.144

0.128

0.114−0.076−0.060

0.007

0.137

0.119

0.110−0.070−0.055

0.002

0.6

150

0.049

0.041

0.040−0.021−0.019

0.001

0.050

0.040

0.038−0.017−0.019−0.001

0.034

0.025

0.029−0.012−0.016

0.001

100

0.024

0.020

0.020−0.010−0.009

0.000

0.022

0.019

0.022−0.011−0.011

0.002

0.015

0.013

0.016−0.007−0.008

0.000

200

0.011

0.010

0.009−0.005−0.004

0.000

0.012

0.012

0.008−0.005−0.004−0.001

0.007

0.007

0.006−0.003−0.002

0.000

250

0.145

0.121

0.120−0.060−0.055−0.001

0.166

0.139

0.140−0.066−0.087

0.004

0.152

0.111

0.115−0.058−0.080

0.003

100

0.071

0.058

0.058−0.030−0.027

0.000

0.071

0.055

0.068−0.026−0.037

0.002

0.061

0.048

0.061−0.020−0.035

0.001

200

0.035

0.029

0.029−0.015−0.013

0.000

0.043

0.029

0.039−0.017−0.022

0.004

0.036

0.027

0.032−0.015−0.019

0.004

450

0.476

0.393

0.397−0.198−0.184−0.003

0.585

0.437

0.461−0.219−0.215−0.023

0.587

0.411

0.463−0.221−0.221−0.005

100

0.219

0.181

0.182−0.091−0.083−0.001

0.294

0.213

0.197−0.121−0.097

0.002

0.278

0.222

0.205−0.118−0.098

0.002

200

0.109

0.089

0.089−0.045−0.041−0.001

0.155

0.101

0.107−0.053−0.058−0.004

0.150

0.093

0.102−0.055−0.054

0.001


Table4b


Logisticerrors

Valuesof

Γ∗∞

Γ∞

ΓOLS

π1

σn

γ∗∞

00γ∗∞

11γ∗∞

22γ∗∞

01γ∗∞

02γ∗∞

12γ∞ 00

γ∞ 11

γ∞ 22

γ∞ 01

γ∞ 02

γ∞ 12

γOLS00

γOLS11

γOLS22

γOLS01

γOLS02

γOLS12

0.2

150

0.040

0.032

0.032−0.016−0.016

0.000

0.039

0.032

0.029−0.017−0.013

0.000

0.035

0.030

0.027−0.016−0.011−0.001

100

0.019

0.015

0.015−0.008−0.008

0.000

0.024

0.017

0.015−0.010−0.008

0.000

0.021

0.015

0.013−0.008−0.007

0.000

200

0.009

0.007

0.007−0.004−0.004

0.000

0.010

0.008

0.007−0.004−0.004

0.000

0.009

0.007

0.006−0.004−0.004

0.000

250

0.143

0.120

0.117−0.059−0.057−0.001

0.141

0.134

0.118−0.057−0.054

0.006

0.141

0.126

0.108−0.057−0.054

0.010

100

0.070

0.057

0.056−0.029−0.027

0.000

0.077

0.056

0.057−0.028−0.028−0.005

0.074

0.052

0.051−0.027−0.026−0.004

200

0.034

0.027

0.028−0.014−0.014

0.000

0.033

0.027

0.028−0.014−0.013

0.001

0.031

0.026

0.026−0.013−0.012

0.000

450

0.559

0.450

0.454−0.235−0.217−0.004

0.568

0.547

0.333−0.278−0.188

0.008

0.551

0.531

0.318−0.275−0.185

0.025

100

0.259

0.212

0.214−0.104−0.103−0.001

0.345

0.234

0.263−0.110−0.170−0.009

0.352

0.232

0.258−0.118−0.169−0.005

200

0.128

0.104

0.104−0.053−0.050

0.000

0.150

0.118

0.108−0.059−0.064−0.001

0.148

0.115

0.104−0.056−0.065

0.003

0.4

150

0.041

0.034

0.035−0.017−0.016

0.000

0.044

0.032

0.039−0.018−0.014−0.001

0.032

0.027

0.027−0.013−0.010−0.002

100

0.021

0.017

0.017−0.009−0.008

0.000

0.020

0.018

0.015−0.009−0.008

0.001

0.017

0.016

0.013−0.008−0.007

0.001

200

0.010

0.008

0.008−0.004−0.004

0.000

0.010

0.009

0.009−0.004−0.004

0.000

0.008

0.007

0.007−0.004−0.003

0.000

250

0.141

0.117

0.120−0.059−0.053−0.001

0.169

0.125

0.122−0.076−0.058

0.004

0.153

0.113

0.104−0.070−0.048

0.004

100

0.070

0.057

0.057−0.029−0.026

0.000

0.064

0.050

0.068−0.021−0.029−0.005

0.054

0.042

0.061−0.016−0.026−0.006

200

0.033

0.027

0.027−0.014−0.013

0.000

0.040

0.038

0.029−0.019−0.015−0.002

0.036

0.034

0.026−0.017−0.015−0.001

450

0.514

0.423

0.442−0.207−0.203−0.008

0.586

0.483

0.450−0.288−0.199

0.001

0.549

0.474

0.424−0.263−0.193

0.000

100

0.242

0.197

0.201−0.100−0.095−0.001

0.302

0.234

0.201−0.119−0.115−0.002

0.301

0.229

0.199−0.111−0.118

0.001

200

0.119

0.097

0.098−0.050−0.045−0.001

0.114

0.110

0.106−0.047−0.040−0.013

0.113

0.107

0.103−0.047−0.042−0.009

0.6

150

0.049

0.041

0.040−0.020−0.019

0.000

0.045

0.047

0.036−0.023−0.016

0.005

0.032

0.030

0.029−0.015−0.012

0.001

100

0.024

0.020

0.020−0.010−0.009

0.000

0.021

0.017

0.023−0.008−0.012

0.001

0.017

0.013

0.017−0.006−0.010

0.000

200

0.012

0.010

0.009−0.005−0.004

0.000

0.012

0.010

0.009−0.005−0.004

0.001

0.009

0.007

0.007−0.004−0.003

0.000

250

0.144

0.120

0.120−0.062−0.054

0.001

0.168

0.120

0.148−0.073−0.066

0.001

0.170

0.112

0.124−0.076−0.063

0.001

100

0.071

0.058

0.059−0.030−0.027

0.000

0.059

0.059

0.066−0.024−0.028−0.002

0.053

0.050

0.054−0.019−0.028

0.000

200

0.034

0.028

0.028−0.014−0.013

0.000

0.042

0.032

0.033−0.014−0.020

0.004

0.038

0.026

0.027−0.013−0.017

0.003

450

0.503

0.409

0.420−0.219−0.188−0.007

0.674

0.507

0.428−0.236−0.275−0.004

0.628

0.466

0.399−0.213−0.265

0.012

100

0.230

0.190

0.193−0.097−0.085−0.003

0.326

0.232

0.225−0.129−0.118−0.007

0.296

0.228

0.215−0.116−0.112−0.007

200

0.114

0.095

0.097−0.047−0.043−0.003

0.141

0.090

0.129−0.044−0.057−0.012

0.146

0.093

0.123−0.051−0.053−0.010


Table4c


Standardnormalerrors

Valuesof

Γ∗∞

Γ∞

ΓOLS

π1

σn

γ∗∞

00γ∗∞

11γ∗∞

22γ∗∞

01γ∗∞

02γ∗∞

12γ∞ 00

γ∞ 11

γ∞ 22

γ∞ 01

γ∞ 02

γ∞ 12

γOLS00

γOLS11

γOLS22

γOLS01

γOLS02

γOLS12

0.2

150

0.039

0.031

0.031−0.015−0.016

0.000

0.049

0.034

0.027−0.023−0.016

0.003

0.044

0.028

0.025−0.020−0.015

0.003

100

0.019

0.015

0.015−0.008−0.007

0.000

0.015

0.015

0.014−0.006−0.004−0.001

0.014

0.014

0.012−0.006−0.004−0.002

200

0.009

0.007

0.007−0.004−0.004

0.000

0.010

0.008

0.007−0.004−0.003

0.001

0.009

0.007

0.007−0.003−0.003

0.000

250

0.145

0.117

0.120−0.059−0.059−0.001

0.149

0.117

0.125−0.055−0.059−0.006

0.138

0.115

0.115−0.053−0.054−0.004

100

0.070

0.056

0.057−0.029−0.027

0.000

0.069

0.062

0.050−0.028−0.025

0.000

0.066

0.063

0.048−0.029−0.024−0.001

200

0.034

0.028

0.027−0.014−0.013

0.000

0.038

0.028

0.030−0.015−0.016

0.002

0.036

0.028

0.028−0.014−0.015

0.002

450

0.535

0.445

0.446−0.224−0.204−0.009

0.550

0.421

0.550−0.204−0.224−0.001

0.533

0.406

0.533−0.202−0.221

0.002

100

0.262

0.214

0.218−0.106−0.103−0.001

0.293

0.202

0.258−0.106−0.139−0.002

0.286

0.189

0.250−0.100−0.133

0.002

200

0.131

0.107

0.107−0.053−0.052−0.001

0.108

0.108

0.088−0.041−0.034−0.010

0.111

0.106

0.087−0.041−0.035−0.011

0.4

150

0.042

0.035

0.035−0.017−0.017

0.000

0.045

0.030

0.040−0.018−0.019

0.003

0.037

0.023

0.028−0.013−0.017

0.002

100

0.021

0.017

0.017−0.009−0.008

0.000

0.027

0.021

0.015−0.011−0.009−0.001

0.021

0.016

0.013−0.008−0.007−0.001

200

0.010

0.008

0.008−0.004−0.004

0.000

0.010

0.008

0.008−0.005−0.004

0.001

0.008

0.007

0.006−0.004−0.003

0.001

250

0.146

0.122

0.123−0.061−0.055−0.002

0.155

0.106

0.119−0.064−0.073

0.012

0.140

0.095

0.101−0.057−0.061

0.009

100

0.070

0.057

0.058−0.029−0.027−0.001

0.070

0.053

0.067−0.025−0.036

0.005

0.064

0.051

0.060−0.025−0.031

0.004

200

0.034

0.028

0.028−0.014−0.013

0.000

0.037

0.033

0.027−0.017−0.013

0.000

0.032

0.029

0.024−0.014−0.011

0.001

450

0.508

0.416

0.426−0.215−0.191−0.012

0.497

0.529

0.406−0.236−0.122−0.072

0.486

0.500

0.396−0.234−0.143−0.040

100

0.255

0.209

0.210−0.107−0.098−0.001

0.247

0.214

0.188−0.082−0.085−0.012

0.244

0.211

0.186−0.077−0.081−0.022

200

0.125

0.103

0.105−0.052−0.047−0.001

0.139

0.131

0.110−0.064−0.048−0.004

0.137

0.130

0.108−0.067−0.044−0.006

0.6

150

0.050

0.043

0.042−0.022−0.019

0.000

0.053

0.044

0.040−0.023−0.023

0.003

0.043

0.026

0.028−0.017−0.019

0.003

100

0.023

0.020

0.019−0.010−0.009

0.000

0.021

0.019

0.017−0.008−0.009

0.000

0.015

0.015

0.013−0.006−0.007

0.000

200

0.012

0.010

0.010−0.005−0.004

0.000

0.010

0.010

0.010−0.004−0.004

0.000

0.008

0.007

0.007−0.003−0.003

0.000

250

0.146

0.122

0.125−0.060−0.056−0.005

0.160

0.099

0.135−0.064−0.071

0.013

0.144

0.096

0.116−0.060−0.067

0.012

100

0.070

0.058

0.058−0.030−0.025−0.001

0.068

0.055

0.058−0.026−0.018−0.001

0.058

0.049

0.046−0.022−0.018

0.001

200

0.035

0.029

0.029−0.015−0.013

0.000

0.038

0.027

0.037−0.014−0.019

0.000

0.035

0.025

0.033−0.013−0.019

0.002

450

0.496

0.409

0.420−0.207−0.187−0.012

0.635

0.396

0.516−0.227−0.199−0.027

0.597

0.367

0.437−0.214−0.196−0.019

100

0.233

0.196

0.201−0.099−0.086−0.003

0.328

0.217

0.219−0.146−0.101

0.007

0.300

0.202

0.198−0.123−0.105

0.001

200

0.117

0.098

0.100−0.050−0.043−0.002

0.146

0.098

0.125−0.048−0.065

0.000

0.136

0.091

0.113−0.048−0.059−0.002


for the slope parameter βj (j = 0, 1, 2). The coverage probability of these intervals can be empirically assessed by theexpression:

C(βj)= 250−1

250∑r=1

I(β∞(r)j −

1.96√nλ∞jj ≤ βj ≤ β

∞(r)j +

1.96√nλ∞jj

). (23)

These empirical coverage probabilities are included in Table 5 for the three error distributions simulated in this study.

Table 5Empirical coverage probabilities of the confidence intervals at the level of 95%

Values of Coverage probabilities of the algorithm confidence intervals of the slope estimatesLaplacian (1) errors Logistic errors Standard normal errors

π0 σ n β∞0 β∞1 β∞2 β∞0 β∞1 β∞2 β∞0 β∞1 β∞2

0.2 1 50 0.950 0.970 0.945 0.955 0.955 0.940 0.925 0.930 0.945100 0.910 0.940 0.955 0.910 0.925 0.930 0.960 0.950 0.950200 0.960 0.945 0.930 0.950 0.955 0.955 0.920 0.945 0.970

2 50 0.950 0.950 0.965 0.965 0.930 0.945 0.945 0.950 0.950100 0.945 0.955 0.960 0.940 0.960 0.935 0.945 0.930 0.965200 0.940 0.935 0.955 0.945 0.955 0.945 0.920 0.960 0.935

4 50 0.950 0.980 0.935 0.940 0.915 0.955 0.925 0.955 0.920100 0.920 0.920 0.910 0.925 0.935 0.910 0.940 0.935 0.915200 0.975 0.965 0.940 0.920 0.940 0.945 0.965 0.945 0.945

0.4 1 50 0.970 0.950 0.945 0.940 0.955 0.930 0.925 0.970 0.930100 0.950 0.955 0.970 0.940 0.960 0.960 0.910 0.930 0.965200 0.955 0.945 0.960 0.950 0.935 0.935 0.960 0.940 0.945

2 50 0.915 0.920 0.920 0.900 0.925 0.955 0.940 0.955 0.960100 0.950 0.925 0.905 0.970 0.960 0.930 0.930 0.965 0.925200 0.910 0.940 0.930 0.920 0.910 0.950 0.950 0.930 0.940

4 50 0.915 0.920 0.940 0.925 0.950 0.935 0.935 0.930 0.960100 0.925 0.950 0.925 0.945 0.920 0.925 0.930 0.965 0.965200 0.900 0.905 0.935 0.965 0.940 0.945 0.955 0.890 0.950

0.6 1 50 0.940 0.925 0.955 0.950 0.935 0.955 0.935 0.935 0.945100 0.945 0.960 0.925 0.965 0.965 0.915 0.970 0.935 0.975200 0.950 0.920 0.960 0.945 0.940 0.950 0.965 0.930 0.940

2 50 0.900 0.930 0.930 0.920 0.935 0.925 0.920 0.965 0.935100 0.945 0.965 0.955 0.965 0.935 0.910 0.950 0.935 0.950200 0.895 0.950 0.910 0.910 0.935 0.940 0.930 0.955 0.900

4 50 0.925 0.935 0.905 0.910 0.920 0.940 0.930 0.975 0.895100 0.905 0.935 0.950 0.885 0.935 0.930 0.885 0.930 0.930200 0.880 0.940 0.930 0.915 0.940 0.920 0.905 0.945 0.920

As canbe seen, the orders of these empirical coverageprobabilities are rather similar for the error distributionsmentionedabove. In all of the cases the more sensitive element seems to be the true scale parameter σ and, in second place, thepercentage of grouped data. Nevertheless, as these probabilities are close to 0.95, it can not be gainsaid that the largeempirical efficiency of the proposed algorithm confidence interval estimates is encouraging.

5. Case study: The proposed algorithm versus the EM

With the data of Table 1, we used the proposed algorithm to estimate the parameters β = (β0, β1) and σ of model (3),using the stop condition:

∣∣∣βp0 − βp−10

∣∣∣ < 0.0001 and similar inequalities for β1 and σ . The convergence point that definesour point estimate, up to four decimal places in consistency with the stop condition, is

β∞0 = −5.9797, β∞1 = 4.2874 and σ∞ = 0.2651.

Finally, as n = 40, we have used (18) to compute the covariance matrix of the approximate distribution given in (17),from where the approximate bivariate normal distribution of β∞ resulted to be

β∞ ≈ N(β,

(0.8663 −0.4023−0.4023 0.1872

)).

Thus, the interval estimates of β0 and β1 at the 0.95 confidence level are

[−7.8039,−4.1555] and [3.4394, 5.1354],

respectively; thus, the null hypotheses β0 = 0 and β1 = 0 must be rejected, at the 0.05 α-level.


The natural alternatives to the proposed algorithm are the maximum likelihood procedures. Blatantly, under the robustconditions considered in this paper the integral function

L (β, σ ) =

[∏i∈Iuuf(yi − x′iβσ

)][∏i∈Ig u

r∑h=1

I (ch−1 < yi ≤ ch)∫ (−x′iβ+ch)σ−1(−x′iβ+ch−1)σ

−1f (x) dx

](24)

agreeswith the likelihood function. Itsmaximization can be carried out either directly (typically using the Newton–Raphsonor scoringmethods) or the EM algorithm. Tanner applies the EM algorithm to arrive at the point estimates (see Tanner (1996,p. 69))

β0 = −6.019, β1 = 4.311 and σ = 0.2592,which are clearly in accordance with the those obtained through the proposed algorithm. Although the updating of σ isclearly simpler with the proposed algorithm thanwith the EM (see Tanner (1996, p. 68)), with the right censored data of thiscase study the main advantage of the proposed procedure, in comparison to the maximum likelihood methods, affects theparameter interval estimation and hypothesis testing. As explained above, our algorithm tackles these inferences by meansof the covariancematrix n−1Λ∞, inwhich only a first derivative is involved. Additionally, from expressions (8) and (9) aswassaid, this first derivative admits an explicit expression in terms of the density and distribution functions of the standardizederrors. For its part, the maximum likelihood methods need to estimate the Fisher information matrix. Within the contextof the EM algorithm, the Hessian (matrix) of the log-likelihood function can be evaluated by several methods, for example,direct computation/numerical differentiation (Meilijson, 1989), the Louis’ method (Louis, 1982) or using EM iterates (Mengand Rubin, 1991). If the maximization of (24) is done by Newton–Raphson or similar methods, then, the former Hessianmatrix is typically computed by numerical techniques. In none of the former methods within the context of maximumlikelihood does the estimate of the asymptotic covariance matrix of the β-estimates admit a closed form, since all of them,in fact, evaluate the second derivatives of the log-likelihood computationally and are quite unstable. Without a doubt theequivalent estimation in our proposed algorithm is simpler ifwe recall that only first differentiationswith analytical formareinvolved.

6. Conclusions and final comments

This paper provides the presentation of an algorithm for linear model estimation and inference under robust conditionswhich include the existence of grouped or missing data and the possibility of general errors, not necessarily normal orsymmetrical, with null means and unknown variances. The algorithm converges to a unique point which does not dependon the starting point and defines our model parameter estimate. The asymptotic covariance matrix of the slope estimatescan also be estimated by means of an explicit expression which is easy to implement computationally. In this sense theproposed algorithm presents clear advantages compared to themaximum likelihood procedures, the natural competitors ofthe proposed algorithm.With these procedures, implemented either directly or through the EM algorithm, the computationof the estimate asymptotic covariance matrix entails the evaluation of the Hessian matrix of the log-likelihood integralfunction associated with the robust conditions mentioned above, which can only be calculated numerically throughquite unstable methods. Additionally, the large variety of simulations made corroborates that if the linear model is well-specified the capacity of the proposed algorithm for treating the incomplete data situations considered is remarkable.In fact, the statistical acuteness of our estimates (in terms of biases and mean square errors) is similar to the OLSestimates with complete data. As can be seen from Tables 3, this occurs for all of the combinations of error distributions,proportions of grouped data, values of the scale parameter and sample sizes displayed. However, the evaluation of thealgorithm in more extreme settings with very high proportions of left and right censored data is a topic of furtherresearch.Finally two brief computational observations will bring the paper to a close. The first concerns the number of algorithm

iterations needed to ensure a prefixed precision. This number depends on the initial values of the algorithm for a given casebeing studied, although, as indicated, the algorithm has a unique limit point. The following comments give an idea aboutthe speed of convergence of the proposed algorithm in the simulations and case study commented on in Sections 4 and 5,respectively. In all of the simulations the algorithmwas run from the initial slope and scale values β0 = (0, 0, 0) and σ 0 = 1(let us recall that the true β-vector was (1,−4, 3) in all of the cases, while the simulated true values of the scale parameterwere σ = 1, 2, 4) and the stop condition was activated at the precision of 10−4. In these circumstances, our experienceindicates that the elements that influence the speed of convergence of the algorithm were the true scale parameter andthe percentage, π0, of grouped data (equal to 0.2, 0.4 and 0.6 in the simulation study), whereas, if the former elements arefixed, the effect of error distribution is negligible. As a short comment in this respect, let us indicate that in the worst case(which corresponds to Lapalacian errors, σ = 4 and π0 = 0.6) the mean number of iterations over the 250 replicationswere 82.3 with a standard deviation of 12.4. For its part, in the case study of Section 5 we have executed the algorithm fromthe following starting points

(β00 , β

01 , σ

0): (0, 0, 1), (10, 0, 1), (10, 0, 4) and (7,−7, 4). In spite of the fact that these the

initial values are far from the convergence point (−5.9797, 4.2874, 0.2651), the number of iterations varies from 34 to 42for a precision of 10−3 and from 42 to 47 if wewant this precision to be up to 4 decimal places. The second observation refersto our computer implementation of the algorithm. This was made in MATLAB and its source language is available from theauthors on request.


Acknowledgement

This paper springs from research partially funded by MEC under grant MTM2004-05776.

References

An, M.Y., 1998. Logconcavity versus logconvexity: A complete characterization. Journal of Economic Theory 80, 350–369.Anido, C., Rivero, C., Valdes, T., 2000. Modal iterative estimation in linearmodels with unihumped errors and non-grouped and grouped data collected fromdifferent sources. Test 9 (2), 393–416.

Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39,1–22.

Fahrmeir, L., Kufmann, H., 1985. Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. The Annals ofStatistics 13, 342–368.

Healy, M.J.R., Westmacott, M., 1956. Missing values in experiments analysed on automatic computers. Applied Statistics 5, 203–206.James, I.R., Smith, P.J., 1984. Consistency results for linear regression with censored data. The Annals of Statistics 12, 590–600.Louis, T.A., 1982. Finding observed information using the EM algorithm. Journal of the Royal Statistical Society B 44, 98–130.Meilijson, I., 1989. A fast improvement of the EM algorithm on its own terms. Journal of the Royal Statistical Society B 51, 127–138.Meng, X.L., Rubin, D.B., 1991. Using EM to obtain asymptotic variance–covariance matrices. Journal of the American Statistical Association 86, 899–909.Orchard, T., Woodbury, M.A., 1972. A missing information principle: Theory and applications. In: Proc. of the 6th Berkeley Symposium on MathematicalStatistics, vol. I. pp. 697–715.

Ritov, Y., 1990. Estimation in linear regression model with censored data. The Annals of Statistics 18, 303–328.Rivero, C., Valdes, T., 2004. Mean based iterative procedures in linear models with general errors and grouped data. Scandinavian Journal of Statistics 31(3), 469–486.

Schmee, J., Hahn, G.J., 1979. A simple method for regression analysis with censored data. Technometrics 21, 417–432.Tanner, M.A., 1996. Tools for statistical inferences. In: Methods for the Exploration of Posterior Distributions and Likelihood Functions. Springer, New York.

An algorithm for robust linear estimation with grouped data

Documents

Transcript of An algorithm for robust linear estimation with grouped data