Prediction of Gamma failure times

6

Click here to load reader

Transcript of Prediction of Gamma failure times

Page 1: Prediction of Gamma failure times

400 IEEE TRANSACTIONS ON RELIABILITY, VOL 46, NO. 3, 1997 SEPTEMBER

Prediction of Gamma Failure Times

Olabode Theophilus Ogunyemi

Paul Irwin Nelson

location-scale famdy. This assumption endows the problem with a rich structure where invariance can be exploited to obtain best linear unbiased and best linear invariant estimates.

This paper considers a different setting, where the underly ing distribution is Gamma distributed with unknown scale & shape parameters. Cox & Oakes [3: page 181 note the utility of this class of probability models: ". . . for many statistical pur- poses the Gamma family is the most important family of con- tinuous distributions talung positive values . . .". However, the use of the Gamma as a failure-time distribution with type-II cen- sored data considered here has been limited by analytic dif- ficulties arising from the form of its likelihoods, The point predictors in this paper use straight-forward simulation in place of complex formulas.

Section 2 develops two effective point predictors for future, Ordered, Gamma Both predictors can be con- structed using a parametric bootstrap. Section 3 (Of indepen- dent interest) deals with some of the problems we encountered in using MLE with type-11 censored Gamma data. Section 4 presents a simulation evaluation of the performance of the predictors.

Little previous work has been done on this problem for the distribution with bo* unknown, Bdasooriya [21 provides simulated maximin type bounds on the efficiencies of some predictors of Gamma order statistics where the value ofthe shape Parameter must be guessed from among a finite set of assumed values. We believe that this approach requires considerably more study before it can be recommend- ed for practical application.

Notation

Med{. } median of { a }

n number of bootstrap samples

Oakland University, Rochester

Kansas State University, Manhattan

Key Words - Prediction, Order statistic, Type-II censoring, Gamma distribution, Maximum likelihood estimation, Conditional mean, Conditional median, Parametric bootstrap.

Summary & Conclusions - Statistically-independent operating components, each of which follows a Gamma failure-law, are simultaneously put into service. Two predictors of later failure times, based on observations of earlier failures, are proposed & investigated. The predictors are in the form of estimated condi- tional mean & median of the value being predicted. Unknown parameters of the underlying failure law are estimated by the method of maximum likelihood (ML), and the predictors are con- structed using a parametric bootstrap. These conditional median & mean predictors provide a relatively easy method to compute predictors of future Gamma order statistics. Simulation indicates that these predictors are effective except when the shape parameter of the Gamma distribution is small. Generally, the larger the frac- tion of available data and the closer the value being predicted, the more accurate the predictions (as anticipated). The simulation also detected some difficulty in implementing ML for the gamma bas-

type-11 censored data when the sample ratio of the geometric to the arithmetic mean is very close to 1. This problem war-

rants further study.

1. INTRODUCTION

yms/Abbreviations ' maximum likelihood estimate/estimator

PN probability of nearness {xz} random sample of size n MAE mean absolute error X {xz, i=1,2 ,..., n } : i.i.d. r from the Gamma fami-

ly in (2-1) order statistics obtained from x, i= 1,. . , ,r order statistic ( s - r ) in set i , 1 5 i 5 N number of observed order statistics index of the order statistic being predicted

E{y(,), 81 y(?)}: estimated conditional mean predic-

mean square error square root of 'MSE {Y(~)}

w ( ~ ) r S

P($), B($) predictors of ~ ( $ 1 $ ( ~ ( $ 1 )

tor of Y (s) ($) ) Med{y(,), 6) y(r)}: estimated conditional median

predictor of y ( $ )

RMAE =E, RRMSE implies use of [ MAE, RMSE]

a, /3 [shape, scale] parameter

Cr, /3

MAE{V ( Y ( $ ) ) }/MAE{P ( Y ( $ ) ) 1 e --

PN{9(S), h ) I prs{19($) - Y ( s ) I > IB(s) - Y(s) l )

are always spell- 8, la_ MLE of [a, /3] simulated means of [a, fi]

0018-9529197$10 00 01997 IEEE

Page 2: Prediction of Gamma failure times

OGUNYEMUNELSON: PREDICTION OF GAMMA FAILURE TIMES 401

e (a,P)

cp S / ( S - P ) P (+,3 0) QI, -e2, Q3 quartiles L(a,P) likelihood of a, 0; based on {Y(~)} R2 coefficient of determination RB{ .} estimated relative bias of { - } Eff{j,,), j j ( s ) } efficiency of j j( ,) relative to j j ( s ) .

Other, standard notation is given in “Information for Readers & Authors” at the rear of each issue.

S, P [arithmetic, geometric] mean of first r order statistics

proportion of iterations where @ > 30

Assumption

1. The data are order statistics from a random sample from a Gamma distribution.

2. PREDICTION

( U P ) .gamd(x/P,a) =

4

for x > 0 (2-1)

The goal is to predict y($) based on {Y(~) , i=1,2, ..., r } ,

We consider 3 criteria for assessing the worth of 9(s):

1. Small MSE{P(,), e } 2. smallMAE{P(,), e> 3. Small PN with respect to another predictor p(,) in the

sense of small probability that j ( s ) is further from the target

Criterion #3 was applied to predictors by Nagaraja [9]. Since order statistics constitute a Markov chain [4], criterion #1 is minimized by:

1 I Y < s I n.

E e { ( j ( , ) - Y(~) )*} . Ee{19(s) - Y(~) I} .

Y(s) than is Y(s). 4

0 is the parameter-set governing the observations.

Criterion #2 is minimized by Med{y(,), ely(,)}.

Neither of these predictors is available to the decision maker since 0 is unknown. Following Raqab [ 101, define approximate optimal predictors by: p(y(,)) & fl(y($)); 6 is conveniently used in their definitions. Since estimated parameter values are used, neither predictor retains its optimality . Simulation is used to compare p & 5 with respect to criteria #1 - #3. The MLE can often be accurately approximated by interpolating the tables of Wilk, et al [12] which are indexed by:

n / r , s-sufficient statistic y(+

(2-2a)

(2-2b)

Also, see the tables in Bain & Engelhardt [ 11. For values of the s-sufficient statistics outside the scope of these tables we recommend the iterative procedure (MLE scheme) of Lawless [8: page 2081 and described in the next paragraph.

Our simulation used the tables in [ 121 to obtain the initial values of a, /3. These initial values were then used in the Lawless MLE scheme to obtain 6 , 6. Specifically, for thle MLE method with fixed a, use an iterative scheme to find 6 (a) by solving:

alog(L(aJP)) = 0, [8: page 207, (5.1.15)]. aP

Perform this procedure for several values of f i in a grid form to find the 6 ( a ) that makes,

Then by a second-stage grid search, obtain the value of a that maximizes log(L(a,fl(a))), [8: page 207, (5.1.131. The finer the grid, the greater the accuracy of this schemle. This scheme avoids solving the other likelihood equation,

which involves the digamma function (deriivative of log gamf(u;P)). Ref [12] also presents an iterative scheme for joint- ly solving both likelihood equations. It uses finite differences to approximate partial derivatives. We found the Lawless scheme [8] easier to use. Section 3 notes that all of these pro- cedures have difficulties under some conditions. For a given {Y(~,, 1 I i I r } , both predictors can be olbtained in two straight-forward steps:

i. Find 8, 6. ii. Simulate n s-independent sets of s-independent Gam-

ma variates with a=& P = t . For each set, beep only those variates that exceed Y ( ~ ) , and continue until n - r variates have been obtained in each set. Then,

r N i

Gamma variates can be generated by the IMSL [ 61 FORTRAN routine DRNGAM . 4

Step ii is justified by the fact [4: page 20, theorem 2.71 that the conditional distribution of Y ( ~ ) , given y(r) is the same as the unconditional distribution of order statistiic (s - r ) from

Page 3: Prediction of Gamma failure times

402 IEEE TRANSACTIONS ON RELIABILITY, VOL. 46, NO. 3, 1997 SEPTEMBER

a random sample of size n - r from a Gamma distribution tmn- cated on the left by y(,.). The complicated form of rhis distribu- tion makes the simulation in the previous paragraph easier to implement than numerical integration, which could also be used to find f l (y ( , ) ) and C(y(,)) . In this sense, step ii can be viewed as a parametric bootstrap since simulation is based on a known distributional form wi meter values. Efron & Tibshirani [5] fully tr

values.

r = 6 r = 9

1 Y O ) ii ( Y ( , ) ) f j ( Y @ ) 1 P ( Y ( , ) ) fj(Y ( 1 ) 1

1 0.233 2 0 379 3 0.436 4 0.442 5 0 610 6 0 713 7 2 8 0 9 7

10 1.902 1 004 1.023 1.865 1.829 11 2.618 1.073 1.092 2.119 2.095 12 2 668 1151 1 172 2 389 2.355 13 2 832 1.248 1.247 2.666 2.601 14 2.945 1.346 1.365 2.937 2.860 15 3 091 1513 1.466 3 213 3 211 16 3.105 1.651 1.642 3.555 3.472 17 4.728 2.050 1.998 4 061 3 719 18 5 109 4 053 4 598 5 528 5 012

report on a simulation study of our predictors.

3. PARAMETER ESTIMATION

The 6, p̂ were computed for the s the above-cited [ 11 tables do not cover we considered. The likelihood functio

sored Gamma data steep slopes. In ou

ery irregular with several lation study this is a par-

ticular difficulty for data sets where 9 is large. Within the scope of our simulation, both iterative and grid-search procedures for finding the 8, p̂ sometimes did not converge for data sets with

CP > 30. In addition, the estimates we obtained in these cases tended not to be close to the true parameter values. In table 1,

Table 2. Parameter Estimates for a Sample Data Set

Parameter Eshmate

r=6 r=9

CY = 2.000 3.263 1.445 p = 1000 0 312 1382

@ = 17.12 for r=6,

@ = 6.52 for r=9.

Both of these values of CP are in the region where the iterative scheme converged quickly. CP has a distribution free of P . The largest value of CP allowable in the Gamma MLE tables in [12] is 25; and Q, = 12 is the largest entry in the tables in [l]. We simulated data with:

01 = ?h, 1, 2 , 4, 6 , 8;

n = 12, 24, 36, 48;

and tallied the p (@,30), of times out of 1000 iterations that @ > 30; p (CP,30) was small except for the larger values of 01 and the smaller values of r/n. Table 3 shows thosep (@,30) > 0.05.

Table 3. Simulated Estimates of p(+,30) in Some Problem-Cases

2 12 ?4 0 080 4 12 % 0 218

12 ?h 12 ?4

0.074 0 319

6 12 Yi 0 203 6 24 ?4 0.122 6 24 '/z 0 058 6 36 Yi 0 060 8 12 ?4 0.459 8 12 % 0 337 8 12 % 0 238 8 24 ?4 0 249 8 24 '/z 0.082

Based on table 3, we recommend caution in using MLE with the Gamma model when the sample size is small and cen- soring is heavy. This issue merits further study. However, even in such cases, the MLE can sometimes be found with the aid of a plot of the likelihood when formal iterative methods fail. Such plots can, for example, be constructed with the statistical package SAS [ 111.

Page 4: Prediction of Gamma failure times

OGUNYEMI/NELSON PREDICTION OF GAMMA FAILURE TIMES 403

4. SIMULATION STUDY2 Table 4. Ratios of RMSE & MAE [comparing conditional median to conditional mean] We evaluated, via simulation, the performance of our point

predictors. The program was written in FORTRAN, and used the IMSL [6] FORTRAN routine DRNGAM to generate Gam-

- - /\ RMSE RMAE

ma variates. The simulation covered a range ofrepresentative parameter settings:

n = 12, 24, 36, 48,

r = %n, %n,

Min 0.92 0.91 Qi 0.97 0.96 Q2 0.99 0.99 Q3 1.02 1.02 Max 1.04 1.04

s = %n, n;

so that % and ?h the data were used respectively to predict the % quantile and maximum failure times. Without loss of generali- ty we set:

P = 1,

a = %, 1, 2, 4, 8.

The PN(s) appears to be most heavily influenced by the value of a. Table 5 (rounded to 2 decimal places, and collaps- ed across n, r , s) indicates that the was most- ly better ( m(s) < 0.5) than the conditional mean for our larger values of a. This observation was supported by plotting our 80 estimated m(s) values against a, wlhich reveals a quadratic-like downward trend with the median predictor hav- ing an increasing advantage as a increases. For a=O.5, the me- dian is sometimes better, sometimes worse than the mean predic-

Thus, we ran a total of 80 ( r , s, n, a) combinations. Due to the poor performance of the MLE when CP > 30, we discarded runs where this occurred and continued to generate data sets until a specified number of iterations were obtained. Therefore, our results are conditional on CP I 30. Based on sample runs, 'N=200 bootstrap samples' was adequate. Because of the large computer-time required to run the simulations, we limited the number of iterations for each parameter setting. To obtain reasonably precise estimates of the prediction errors of both of our predictors, we calibrated the number of iterations so that the relative error ratios of the estimated standard errors of & =of both predictors were small. In most cases these ratios were less than 0.03. The only important exception to this bound were for CY = 8 where the ratios for the MSE were about 0.10, which was acceptable for our purposes. The number of itera- tions per parameter combination were:

tor. The entry of 0.14 in table 5 for the minimum when a =0.5 is anomalous. To analyze the probability of nearness further, we regressed the 80 m(s) values on:

a, a2, r /n , s/n, n.

The fitted surface had R2 = 0.637 with MSE = 0.011. The regression coefficients of a, a2, n were s-significant. The coef- ficient of a was negative; the coefficient of a2 was positive. This confirms the pattern we observed in the plot. The coeffi- cient of n was negative, indicating an increasing; advantage of the median predictor as n increases 'while the other variables remain fixed. Overall, we recommend the median predictor ex- cept for small values of a.

Table 5. Estimated Probabilities of Nearness

4000 for a = 0.5, ff

3000 for a = 1.0, 0.5 1.0 2.0 4.0 8.0

1000 for a = 2.0,

500 for a = 4.0, 8.0.

Ratios of these values provide a basis for a scale-invariant com- parison of our two predictors, Table 4 presents (rounded to 2 places the Min, Q,, Q2, Q3, Max of the 80 ratios

settings indicates a slightly better (ratios < 1) performance of the conditional median predictor than the conditional mean predictor according to both criteria.

& 4 RMA . The stability of these values across all parameter

'The number of significant figures is not intended to imply any ac- curacy in the estimates, but to illustrate the arithmetic.

Min 0.14 0.50 0.41 0.34 0.16 Qi 0.45 0.56 0.44 0.39 0.25 Q2 0.71 0.62 0.46 0.41 0.31

Max 0.85 0.74 0.52 0.47 0.38 Q3 0.79 0.68 0.48 0.43 0.33

To help assess the accuracy of our predictors, table 6 presents the ratios of (rounded to 2 decimal places) of the conditional median predictor to the s-expected value of the order statistic being predicted (estimated by simulation) arranged by a, n, r/n, s/n. The ratios appearing in the table may be viewed as prdction versions of the coefficient of variation - the smaller the better. To analyze these relative error rates, we regressed

n. The fitted surface had R2 = 0.834 with MSE = 0.003. All the 80 values of iGXEiu(y(s))}/~{~($)} on a, a 2 , r/n, s/n,

Page 5: Prediction of Gamma failure times

404 IEEE TRANSACIIONS ON RELIABILITY, VOL. 46, NO. 3, 1997 SEPTEMBER

the regression coefficients of' a2, sin were positive,

e s-significant. The coefficients others negative. Thus, perfor-

0 & 8.0, the entries are ~ 0 . 1 0 for s/n = ?A, and = 0.20 for s /n = 1 (when predicting the largest order statistic).

Table 6. ~ { F ( Y ( s ) w q Y ( s ) j

The regression coefficients of a, a2 were positive and ' nificant for both models. The regression coefficient for n

negative and s-significant for the first model but not for the second. The contribution of rln was not s-significant foir either model. Thus, the regression analysis yields a quadratic

CY for absolute relative s-bias and decreasing absolute bias for 6 as n increases.

Table 7. Relative s-Bias of MLE [the results are shown as a pair of numbers: RB{&}, RB{&I

r /n = ?h r /n = ?h s / n s /n

48 0 22 0 32 0 20 0 31

1 0 12 0 32 0 42 0 22 0 42 24 0.22 0.32 0.17 0.34 36 0 17 0 2 0 14 0 31 48 0 14 0 2 0 12 0 28

2.0 12 0 20 0.30 0 14 0.26 24 0 16 0 23 0 10 0 22 36 0 13 0.21 0 09 0.21 48 0.11 0.22 0.08 0.19

4.0 12 0.14 0.23 0.10 0.21 24 0 11 0 20 0 07 0.17 36 0 09 0.16 0 06 0.16 48 0.08 0.17 0.06 0.14

8.0 12 0.12 0.20 0.07 0.16 24 0 10 0 20 0 06 0 16 36 0 09 0 20 0 05 0 16 48 0.08 0 19 0.05 0.10

01

0 5 1 0 2 0 4 0 8 0

Min - 07, - 65 03, - 45 -.05, - 01 - 11, 11 -26, 30 QI .12, -.56 .13, -.11 -.04, .02 -.lo, .12 -.25, .38 Qz 42, - 49 25, - 2 4 - 03, 07 - 09, 15 -24, 47 Q3 .91, -.43 .48, - 3 2 00, 09 -.OS, .18 -.22, .50 Max 1.31, -.39 .85, -.45 .07, . l l -.07, .19 -.26, .52

To assess the effects of using estimates of a, /3 in our predictors, table 8 summarizes the s-efficiencies of the median predictor based on i3, p ̂ relative to the median predictor based on the true CY, /3 with respect to estimated MAE. Table 8 in- dicates that the s-efficiencies have a general quadratic configura- tion with the smallest s-efficiencies at the smallest & largest values of a. This pattern was confirmed by regressing the 80 simulated s-efficiencies on a, a2, n, r/n, s /n (R2 = 0.535, MSE = 0.007). The fitted surface yielded s-significant positive coefficients for CY & r /n , and negative coefficie s/n. Thus, all other variables being held fixed, s- creases as the available data (measured by rln) increases, and decreases as predictions are made further into the future (s/n increases). This pattern tracks the behavior of the MLE de- scribed in table 7.

The performance of our predictors depends somewhat on how close the MLE are to the true values of a ) /3. Our simula- tion indicates that 6 is positively s-biased for the small a, and negatively s-biased for the large a. The reverse is true of 6.

Table B. Estimated s-Effic,encies of Median Predictor

Eff{Med{@, Med{O}} = MAE Wed {Y (5) I Y ( r ) } 1 MAECMed{Y(,),elY(i-)}} Table 7 presents summaries (rounded to 2 places) for:

0 5 1 0 2.0 4.0 8 0

Min 60 .70 76 73 53 .63 .76 .so .77 .57 68 80 85 .85 66

.82 .88 .90 91 .75

.92 90 .93 .96 92

RB{& = ( P - P ) / P .

values of some o

R2 = 0.467, MSE = 0.050 REFERENCES

[l] L J Bain, M Engelhardt, Statistical Analysis of Reliability and Life Testing Models, Theory and Methods (2nd ed), 1991, Marcel Dekker R2 = 0.603, MSE = 0.01

Page 6: Prediction of Gamma failure times

OGUNYEMI/NELSON: PREDICTION OF GAMMA FAILURE TIMES 405

[2] U. Balasooriya, “A comparison of the prediction of future order statistics for the 2-parameter Gamma distribution”, IEEE Trans. Reliability, vol R-36, 1987 Dec, pp 591-594.

[3] D.R. Cox, D. Oakes, Analysis ofSurvival Data, 1984; Chapman & Hall. [4] H.A. David, Order Sfatistics (2nd ed), 1981; John Wiley & Sons. [5] B. Efron, R.J. Tibshirani, An Introduction to the Bootstrap, 1993; Chap-

man Hall. [6] IMSL and Stanford Graphics Software (9990 Richmond Ave, Ste 400;

Houston, Texas 77042-4548 USA). [7] K.S. Kaminsky, P.I. Nelson, “Prediction of order statistics”, submit-

ted for publication in Order Statistics (C.R. Rao & N.K. Balakrishnan, Eds); North Holland.

[8] J.F. Lawless, Statistical Models and Methods for Lifetime Data, 1982; John Wiley & Sons.

[9] H.N. Nagaraja, “Comparison of estimators and predictors from two- parameter exponential distribution”, Sankhya, vol48B, 1986, pp 10-18.

[lo] M.Z. Raqab, H.N. Nagaraja, “On some predictors of future order statistics”, Tech. Repol? -488, 1990; (Dept. of Statistics; Ohio State Univ; Columbus, Ohio USA).

[ l l ] Statistical Analysis System (SAS); POBox 8000; Cary, North Carolina USA.

[12] M.B. Wilk, R. Gnanadesikian, M. Huyett, “Estimation of the Gamma distribution using order statistics”, Biometrika, vol49, 1962, pp 525-545.

AUTHORS

Dr. Olabode Ogunyem; Dep’t of Mathematical Sciences; Oakland Univ; Rochester, Michigan 48309 USA. Internet (e-mail): [email protected]

Olabode Ogunyemi received a PhD (1990) m Statistics, from Kansas State University. He is Associate Professor of Mathematics at OaMand University. His research interests include statistical reliability, design of experiments, Bayes inference, and stochastic processes.

Dr. Paul Nelson; Dep’t of Statistics, Kansas State Univ; Manhattan, Kansas 66506 USA. Internet (e-mail): [email protected]

Paul Nelson received a PhD (1969) in Statistics from Rutgers Universi- ty. He has held faculty positions at Bucknell, Pennsylvania1 State university, and Kansas State University, where he is Professor of Statistics. His main research interests are prediction & inference for stochastic processes.

Manuscript TR95-172 received 1995 November 25; revised 1996 December 9

Responsible editor: H.C. Benski

Publisher Item Identifier S 0018-9529(97)07569-6 4 T R b

INVITATION To MEMBERSHIP INVITATION To MEMBERSHIP INVITATION To MEMBERSHIP INVITATION To MEMBERSHIP

Invitation to Membership in the Reliability Society There is no better time than now to join the IEEE

Reliability Society. Membership gives you ready access to meetings and conferences in your area of interest, and to the prime movers in engineering, science, and business.

As an IEEE member, your can choose from a wide offering of standards, products, and services (books, conference records, employment surveys, short courses, and other helpful aids) - all at reduced member rates. Your membership entitles you to reduced registration fees for most activities sponsored or cosponsored by the IEEE and/or Reliability Society. This could easily save you more than the cost of annual membership.

Active local Reliability Society Chapters are in many locations in the USA and in Canada (Montreal, Ottawa), Japan (Tokyo), and Rep. of Singapore. The chapters offer opportunities for your personal participation and growth. Association with other Reliability Society members helps you to exchange information & experience on current technical & management problems and to learn how others are handling them.

Don’t wait. If you are already an IEEE: member, just send in your Reliability Society fee to the IEEE. If you are not an IEEE member, then write or call IEEE for an IEEE membership application.

There are 2 areas in the IEEE Reliability Society; you can join either one or both. The fee for either is $20/year; the fee for both is $30/year. Everyone gets the Society Newsletter. The other publication benefits are:

Area System Reliability Parts, Physics of Failure IEEE Identification 007/ 1 09 1 007/145 1

IEEE Transactions on Reliability Semiconductor Manufacturing Annual Proceedings Ann. Reliability & Maintainability Symp, Int’l Reliability Physics Symp.

(These areas, fees, and publications are subject to change.)

IEEE Service Ctr; POBox 1331; Piscataway, NJ 08855 USA. phone: [l] 908-981-1393, or 800-678-IEEE (USA only) If you are a Reliability Society member, show this invitation to a colleague - sign u p another member. 4TRb