An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions...

33
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID: 555020227-5 Advisor: Assoc. Prof. Dr. Supunnee Ungpansattawong Date: 29 th November 2013 Department of Statistics, Faculty of Science, Khon Kaen University

Transcript of An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions...

An Empirical Likelihood RatioBased Goodness-of-Fit Test for

Two-parameter Weibull Distributions

Presented by: Ms. Ratchadaporn Meksena

Student ID: 555020227-5

Advisor: Assoc. Prof. Dr. Supunnee Ungpansattawong

Date: 29th November 2013

Department of Statistics, Faculty of Science,

Khon Kaen University

OUTLINE

1. Introduction Rationale and Background Objective of Study Scope and Limitation of Study Anticipated Outcomes

2. Literature Review

3. Research Methodology Empirical Likelihood Method Goodness-of-Fit Test Based on Empirical Likelihood Ratio Calculation of Critical Values and Evaluation of Type I Error

Control Evaluation of the Power of the Proposed Test

1. Introduction

Rationale and Background

Weibull distribution is commonly used in many fields such as

β€’ Survival Analysis

β€’ Reliability Engineering & Failure Analysis

β€’ Extreme Value Theory

β€’ Weather Forecasting

β€’ General Insurance

β€’ etc.

The two-parameter Weibull distribution is the most widely used distribution for life data analysis.

1. Introduction

Rationale and Background (cont.)

The important part of data analysis is ensuring that the data come from a particular family of distributions. The goodness-of-fit tests for Weibull distribution are generally based on the empirical distribution function (EDF), such as the Kolmogorov-Smirnov (KS) test, Cramer-von Mises (CvM) test, or the Anderson-Darling (AD)

test. Recently, there are some literature about a goodness-of-fit test based on empirical likelihood ratio which the study results showed

the goodness-of-fit tests based on empirical likelihood ratio is competitive when compared with other available tests. Therefore, in

this study, we will propose an empirical likelihood ratio based goodness of fit test for two-parameter Weibull distributions.

1. Introduction

Objective of Study

The objective of this study is to propose a new

goodness-of-fit statistic based on empirical likelihood

ratio for two-parameter Weibull distributions.

1. Introduction

Scope and Limitation of Study

In this study, we will derive an empirical likelihood ratio based goodness-of-fit test for two-parameter Weibull distributions and its asymptotic properties, calculate the critical values for fixed sample sizes using Monte Carlo

simulations, and evaluate the performance of the proposed test in controlling the Type I error. Finally, we

will compare the power of the test between the proposed test statistic and Kolmogorov-Smirnov, CramΓ©r-von Mises,

and Anderson-Darling statistic.

1. Introduction

Anticipated Outcomes

We expect that we will get a new goodness-of-

fit test based on empirical likelihood ratio for two-

parameter Weibull distributions.

2. Literature Review

Examples of Goodness-of-Fit Tests for Two-Parameter Weibull Distributions:

β€’ Shapiro and Brain (1987) proposed the test statistic is based on similar principles used in the derivation of the well known W-test for normality.

β€’ Coles (1989) proposed a test via the stabilized probability plot, which involves estimating scale and shape parameters.

β€’ Khamis (1997) proposed the Ξ΄-corrected Kolmogorov-Smirnov test, where the MLE for scale and shape parameters was employed.

2. Literature Review

Examples of Goodness-of-Fit Tests for Two-Parameter Weibull Distributions (cont.):

β€’ Cabana and Quiroz (2005) proposed to employ the empirical moment generating function and a ne invariant estimators for estimating scale ffiand shape parameters such as moment estimators.

2. Literature Review

Examples of Goodness-of-Fit Tests Based on Empirical Likelihood Ratio:

β€’ Vexler and Gurevich (2010) constructed an empirical likelihood ratio based goodness of fit

test to approximate the optimal Neyman–Pearson ratio test with an unknown alternative density

function. β€’ Vexler et al. (2011) proposed a similar goodness

of fit test based on the empirical likelihood method to test the null hypothesis of an inverse Gaussian

distribution.

2. Literature Review

Examples of Goodness-of-Fit Tests Based on Empirical Likelihood Ratio (cont.):

β€’ Ning and Ngunkeng (2013) proposed a similar goodness of fit test based on the empirical

likelihood method to test the null hypothesis of a skew normality.

3. Research Methodology

Consider the two-parameter Weibull distribution which has the cumulative distribution function and the

probability density function defined as

and

respectively, where x > 0, Ξ² > 0 is the scale parameter and Ξ± > 0 is the shape parameter.

𝐹ሺπ‘₯;𝛽,π›Όαˆ»= 1βˆ’ 𝑒π‘₯π‘βˆ’ΰ΅¬π‘₯𝛽ࡰ𝛼

ࡨ (1)

π‘“αˆΊπ‘₯;𝛽,π›Όαˆ»= 𝛼𝛽࡬π‘₯π›½ΰ΅°π›Όβˆ’1 𝑒π‘₯π‘βˆ’ΰ΅¬π‘₯𝛽ࡰ𝛼

ࡨ , (2)

3. Research Methodology

Empirical Likelihood Method

Let X1, X2, …, Xn be independently and identically distributed observations, which follow an unknown population distribution F. The

empirical likelihood function of F be defined as

where the component pi , i =1, 2 , …, n, maximize the likelihood Lp(F) and satisfy empirical constraints corresponding to hypotheses of interest. For example, when a population parameter ΞΈ identified by E(X) = ΞΈ is

of interest, and the true value of ΞΈ is ΞΈ0. The null hypothesis

is Ho ∢ E(X) = θ0 . To maximize Lp(F), the values of pi in Lp(F) should be

chosen given the constraints and ,

where the constraint is an empirical version of E(X) = ΞΈ0.

πΏπ‘αˆΊπΉαˆ»= ΰ·‘οΏ½ 𝑝𝑖𝑛

𝑖=1

𝑝𝑖 β‰₯ 0, 𝑝𝑖 = 1𝑛𝑖=1

𝑝𝑖𝑋𝑖 = πœƒ0𝑛𝑖=1

𝑝𝑖𝑋𝑖 = πœƒ0𝑛𝑖=1

3. Research Methodology

Empirical Likelihood Method (cont.)

The empirical log-likelihood ratio statistic to test ΞΈ = ΞΈ0 is given by

where R(ΞΈ) is the empirical log-likelihood ratio function defined through the definition of the empirical likelihood ratio function by Owen (1988).

π‘…αˆΊπœƒ0ሻ= max࡝ log αˆΊπ‘›π‘π‘–αˆ» ; 𝑝𝑖 β‰₯ 0,𝑛𝑖=1 𝑝𝑖

𝑛𝑖=1 = 1, 𝑝𝑖π‘₯𝑖

𝑛𝑖=1 = πœƒ0ΰ΅‘

3. Research Methodology

Goodness-of-Fit Test

The goodness-of-fit test is a statistical test to determine whether the observations are consistent

with the particular statistical model. It describes how well the particular model fits a set of observations.

Measures of goodness of fit typically summarize the discrepancy between observed values and the

values expected under a statistical model.

3. Research Methodology

Goodness-of-Fit Test Based on Empirical Likelihood Ratio

The hypothesis to be tested is

where fH0 and fH1

are both unknown.

𝐻0 ∢ 𝑓= 𝑓𝐻0 ~ π‘Šπ΅(𝛽,𝛼)

𝐻1 ∢ 𝑓= 𝑓𝐻1 ≁ π‘Šπ΅αˆΊπ›½,π›Όαˆ»,

3. Research Methodology

Goodness-of-Fit Test

When density functions fH0 and fH1

are completely

known, the most powerful test statistics is the likelihood ratio

where under the null hypothesis X1, X2, …, Xn follows a Weibull distribution with parameters Ξ² and .

𝐿𝑅= Ο‚ 𝑓𝐻1𝑛𝑖=1 αˆΊπ‘‹π‘–αˆ»Ο‚ 𝑓𝐻0𝑛𝑖=1 αˆΊπ‘‹π‘–αˆ»= Ο‚ 𝑓𝐻1𝑛𝑖=1 αˆΊπ‘‹π‘–αˆ»Ο‚ 𝛼𝛽ቀπ‘₯π‘–π›½α‰π›Όβˆ’1 𝑒π‘₯π‘α‰‚βˆ’α‰€π‘₯𝑖𝛽ቁ𝛼

ቃ𝑛𝑖=1 , (3)

3. Research Methodology

Goodness-of-Fit Test Based on Empirical Likelihood Ratio

In this study, forms of fH0 and fH1

are both unknown, but are

estimable. We follow the similar idea by Vexler and Gurevich (2010) and Ning and Ngunkeng (2013) to construct a test

statistic in forms of estimated likelihood ratios based goodness-of-fit test for the two-parameter Weibull distribution.

Apply the maximum empirical likelihood method to estimate of the numerator of the ratio (3). Rewrite the likelihood

function in the form of

where X(1) ≀ X(2) ≀ ≀ β‹― X(n) are the order statistics based on the observations X1, X2, …, Xn .

𝐿𝑓 = ΰ·‘οΏ½ 𝑓𝐻1(𝑋𝑖)𝑛𝑖=1 = ΰ·‘οΏ½ 𝑓𝐻1(𝑋(𝑖))𝑛

𝑖=1 = ΰ·‘οΏ½ 𝑓𝑖𝑛

𝑖=1 ,

3. Research Methodology

Goodness-of-Fit Test Based on Empirical Likelihood Ratio

Following the maximum empirical likelihood method, we can derive values of fi that maximize Lf and satisfy the empirical constraints under the alternative hypothesis H1. Obviously, values of fi should

be restricted by the equation ∫ f(s)ds = 1. Thus, we need an empirical form of the constraint ∫ f(s)ds = 1. We first give the following lemma by Vexler and Gurevich (2010) to obtain this empirical constraint.

Lemma 1 Let f(x) be a density function. Then

where X(j-m) = X(1) if j-m ≀ 1 and X(j+m) = X(n) , if j+m β‰₯ n.

ΰΆ± π‘“αˆΊπ‘₯αˆ»π‘‘π‘₯𝑋(𝑗+π‘š)

𝑋(π‘—βˆ’π‘š)𝑛

𝑗=1 = 2π‘š ΰΆ± π‘“αˆΊπ‘₯αˆ»π‘‘π‘₯𝑋(𝑛)

𝑋(1)βˆ’ (π‘šβˆ’π‘˜)π‘šβˆ’1

π‘˜=1 ΰΆ± π‘“αˆΊπ‘₯αˆ»π‘‘π‘₯𝑋(π‘›βˆ’π‘˜+1)

𝑋(π‘›βˆ’π‘˜)βˆ’ (π‘šβˆ’π‘˜)π‘šβˆ’1

π‘˜=1 ΰΆ± π‘“αˆΊπ‘₯αˆ»π‘‘π‘₯𝑋(π‘˜+1)

𝑋(π‘˜)

β‰… 2π‘š ΰΆ± π‘“αˆΊπ‘₯αˆ»π‘‘π‘₯𝑋(𝑛)

𝑋(1)βˆ’ π‘š(π‘šβˆ’ 1)𝑛 (3)

3. Research Methodology

Goodness-of-Fit Test Based on Empirical Likelihood Ratio

It is obvious that since and we denote

, using the empirical approximation to the

remainder term in Lemma 1, we have

From Lemma 1,we can empirically estimate Ξ΄m via

Notice that Ξ΄m β†’ 1 when m ⁄ n β†’ 0 as m, nβ†’βˆž.

ΰΆ± π‘“αˆΊπ‘₯αˆ»π‘‘π‘₯𝑋(𝑛)𝑋(1) ≀ ΰΆ± π‘“αˆΊπ‘₯αˆ»π‘‘π‘₯∞

βˆ’βˆž = 1

π›Ώπ‘š = 12π‘š ΰΆ± π‘“αˆΊπ‘₯αˆ»π‘‘π‘₯≀ 1𝑋(𝑗+π‘š)

𝑋(π‘—βˆ’π‘š)𝑛

𝑗=1

π›Ώπ‘š β‰… ΰΆ± π‘“αˆΊπ‘₯αˆ»π‘‘π‘₯π‘‹αˆΊπ‘›αˆ»

π‘‹αˆΊ1αˆ»βˆ’αˆΊπ‘šβˆ’ 1ሻ2𝑛 ≀ 1βˆ’αˆΊπ‘šβˆ’ 1ሻ2𝑛 .

π›Ώαˆ˜π‘š = ΰΆ± 𝑑π‘₯πΉπ‘›ΰ΅«π‘‹αˆΊπ‘›αˆ»ΰ΅―πΉπ‘›ΰ΅«π‘‹αˆΊ1ሻ࡯

βˆ’αˆΊπ‘šβˆ’ 1ሻ2𝑛 = 1βˆ’ 1π‘›βˆ’αˆΊπ‘šβˆ’ 1ሻ2𝑛 .

3. Research Methodology

Goodness-of-Fit Test Based on Empirical Likelihood Ratio

By applying the mean value theorem to the term of ,

we have

Thus, the empirical constraint under the alternative hypothesis H1 is given by

ΰΆ± π‘“αˆΊπ‘₯αˆ»π‘‘π‘₯𝑋(𝑗+π‘š)

𝑋(π‘—βˆ’π‘š)𝑛

𝑗=1

ΰΆ± π‘“αˆΊπ‘₯αˆ»π‘‘π‘₯𝑋(𝑗+π‘š)

𝑋(π‘—βˆ’π‘š)𝑛

𝑗=1 β‰… (π‘‹αˆΊπ‘—+π‘šαˆ»

𝑛𝑗=1 βˆ’ π‘‹αˆΊπ‘—βˆ’π‘šαˆ»)π‘“ΰ΅«π‘‹αˆΊπ‘—αˆ»ΰ΅―= (π‘‹αˆΊπ‘—+π‘šαˆ»

𝑛𝑗=1 βˆ’ π‘‹αˆΊπ‘—βˆ’π‘šαˆ»)𝑓𝑗.

π›Ώπ‘š = 12π‘š ΰΆ± π‘“αˆΊπ‘₯αˆ»π‘‘π‘₯β‰… 12π‘š (𝑋(𝑗+π‘š)𝑛

𝑗=1 βˆ’ 𝑋(π‘—βˆ’π‘š))𝑓𝑗 β‰œ π›Ώαˆ˜π‘š ≀ 1𝑋(𝑗+π‘š)

𝑋(π‘—βˆ’π‘š)

𝑛𝑗=1

3. Research Methodology

Goodness-of-Fit Test Based on Empirical Likelihood Ratio

Apply the Lagrange multiplier method to maximize

that subject to the constraint . The Lagrange function defined by

where Ξ» is a lagrange multiplier. By taking the derivative of the above equation with respect to each fj , j = 1, 2, …, n, and Ξ» , we obtain

log𝑓𝑗𝑛𝑗=1

π›Ώαˆ˜π‘š ≀ 1

π›¬αˆΊπ‘“1,𝑓2,…,𝑓𝑛,πœ†αˆ»= π‘™π‘œπ‘”π‘“π‘—π‘›π‘—=1 + πœ†α‰Œ 12π‘š (𝑋(𝑗+π‘š)

𝑛𝑗=1 βˆ’ 𝑋(π‘—βˆ’π‘š))𝑓𝑗 βˆ’ 1ቍ

1𝑓𝑗 + πœ†2π‘šΰ΅«π‘‹(𝑗+π‘š) βˆ’ 𝑋(π‘—βˆ’π‘š)ΰ΅―= 0 (4)

3. Research Methodology

Goodness-of-Fit Test Based on Empirical Likelihood Ratio

and

respectively. From the equation (5), we have

Then multiply equation (4) by fj and taking summation, we have

12π‘š (𝑋(𝑗+π‘š)𝑛

𝑗=1 βˆ’ π‘‹αˆΊπ‘—βˆ’π‘šαˆ»)𝑓𝑗 βˆ’ 1 = 0 , (5)

𝑓𝑗 = βˆ’ 2π‘šπœ†ΰ΅«π‘‹αˆΊπ‘—+π‘šαˆ»βˆ’ π‘‹αˆΊπ‘—βˆ’π‘šαˆ»ΰ΅― .

𝑛 + πœ† 12π‘š ࡫𝑋(𝑗+π‘š) βˆ’ 𝑋(π‘—βˆ’π‘š)࡯𝑓𝑗𝑛𝑗=1 = 0 .

3. Research Methodology

Goodness-of-Fit Test Based on Empirical Likelihood Ratio

Since , we have Ξ» = -n. Finally, we will

obtain the estimate value of fj to maximize , which also

maximizes as

where X(j-m) = X(1) if j-m ≀ 1 and X(j+m) = X(n) , if j+m β‰₯ n.

Thus, using the maximum empirical likelihood method, the empirical likelihood ration based goodness-of-fit test for the two-

parameter Weibull distribution can be constructed as

12π‘š (𝑋(𝑗+π‘š)𝑛

𝑗=1 βˆ’ 𝑋(π‘—βˆ’π‘š))𝑓𝑗 ≀ 1

log𝑓𝑗𝑛𝑗=1

ΰ·‘οΏ½ 𝑓𝑗𝑛𝑗=1

𝑓𝑗 = 2π‘šπ‘›(π‘‹αˆΊπ‘—+π‘šαˆ»βˆ’ π‘‹αˆΊπ‘—βˆ’π‘šαˆ») , (6)

3. Research Methodology

Goodness-of-Fit Test Based on Empirical Likelihood Ratio

where ΞΈ = (Ξ², Ξ±)' is the parameter vector of a two-parameter Weibull distribution. To maximize the denominator, since the parameters

Ξ² and Ξ± are unknown, the maximum likelihood estimate of Ξ± based on the observations can be applied.

The maximum likelihood estimators and of Ξ² and Ξ± , respectively, are solutions of the equations:

and

π‘Šπ΅π‘›π‘š = Ο‚ 2π‘šπ‘›(π‘‹αˆΊπ‘—+π‘šαˆ»βˆ’π‘‹αˆΊπ‘—βˆ’π‘šαˆ»)𝑛𝑗=1max𝜽 Ο‚ 𝑓𝐻0(π‘‹π‘—πœ½)𝑛𝑗=1 (7)

π›½αˆ˜ π›Όΰ·œΰ·‘οΏ½

1π›Όΰ·œΰ·‘οΏ½+ ln𝑋𝑖𝑛

𝑖=1 βˆ’ Οƒ π‘‹π‘–π›Όΰ·œΰ·‘οΏ½ln𝑋𝑖𝑛𝑖=1Οƒ π‘‹π‘–π›Όΰ·œΰ·‘οΏ½π‘›π‘–=1 = 0 π›½αˆ˜= ΰ΅­ π‘‹π‘–π›Όΰ·œΰ·‘οΏ½π‘›

𝑖=1 ΰ΅±

1 π›Όΰ·œΰ·‘οΏ½Ξ€

3. Research Methodology

Goodness-of-Fit Test Based on Empirical Likelihood Ratio

We notice that the distribution of the test statistic WBnm strongly depends on the integer m. Thus, the optimal values of m should be evaluated to make the test more efficient. We follow the same argument by Vexler and Gurevich (2010) to reconstruct the test statistic according to the properties of the empirical likelihood

method. We adopt their idea here to reconstruct the test statistic in (7) as

where δ (0, 1). ∈

π‘Šπ΅π‘› = min1β‰€π‘š<𝑛𝛿 Ο‚ 2π‘šπ‘›(π‘‹αˆΊπ‘—+π‘šαˆ»βˆ’π‘‹αˆΊπ‘—βˆ’π‘šαˆ»)𝑛𝑗=1max𝜽 Ο‚ 𝑓𝐻0(π‘‹π‘—πœ½)𝑛𝑗=1 , (8)

3. Research Methodology

Goodness-of-Fit Test Based on Empirical Likelihood Ratio

Similar to the argument of Vexler et al. (2011) and Ning and

Ngunkeng (2013), we take Ξ΄ =0.5 in the equation (8). Thus, the

final form of the test statistic is

π‘Šπ΅π‘› = min1β‰€π‘š<ξ𝑛ς 2π‘šπ‘›(π‘‹αˆΊπ‘—+π‘šαˆ»βˆ’π‘‹αˆΊπ‘—βˆ’π‘šαˆ»)𝑛𝑗=1max𝜽 Ο‚ 𝑓𝐻0(π‘‹π‘—πœ½)𝑛𝑗=1 (9)

3. Research Methodology

Asymptotic Properties of the Proposed Test Statistic

Denote and

We assume the following conditions hold:

(C1)

(C2) Under the null hypothesis, in probability.

(C3) Under alternative hypothesis, in probability where ΞΈ0

is a constant vector with finite components.

(C4) There are open intervals and containing ΞΈ and ΞΈ0 respectively. There also exists a function s(x) such that

for all x ∈ R and .

β„Žπ‘–αˆΊπ‘₯,𝜽ሻ= πœ•π‘™π‘œπ‘”π‘“π»0(π‘₯;𝜽)πœ•πœ½π‘– ,𝑖 = 1,2 , 𝜽= αˆΊπœƒ1,πœƒ2ሻ= (𝛽,𝛼)

𝐸(logπ‘“αˆΊπ‘‹1ሻ)2 < ∞

𝜽 βˆ’ 𝜽= max1≀i≀2πœƒπ‘– βˆ’ πœƒπ‘–β†’0

𝜽 β†’πœ½0

0𝑅3 1𝑅3

β„Ž(π‘₯,)≀ 𝑠(π‘₯) ∈0 βˆͺ1

3. Research Methodology

Asymptotic Properties of the Proposed Test Statistic (cont.)

Proposition 1 Assume that the condition (C1)–(C4) hold. Then, under H0,

in probability as →𝑛 ∞,

while, under H1 ,

in probability as →𝑛 ∞.Given condition (C1)–(C4), Proposition 1 shows that the power of the

test goes to 1 as →𝑛 ∞ under the alternative hypothesis. Thus, the proposed test is consistent.

1𝑛logαˆΊπ‘Šπ΅π‘›αˆ»β†’0 1𝑛logαˆΊπ‘Šπ΅π‘›αˆ»β†’πΈπ‘™π‘œπ‘”α‰† 𝑓𝐻1(𝑋1)𝑓𝐻0(𝑋1;πœƒ0)ቇ

3. Research Methodology

Calculation of Critical Values and Evaluation of Type I Error Control

To calculate the critical values for fixed sample sizes n = 10, 20, 30, 40, 50, 100, 200, 500, we simulate 5,000

samples from WB(Ξ², ) with different values of (Ξ², ) = (1, 0.5), (1, 2), (1, 4), (1, 8). For each simulated sample, we use R package MASS to estimate parameters Ξ² and . Then we can calculate a statistic for each sample

based on equation (9). After we obtain all 5,000 test statistics, we order them and choose 90th, 95th and 99th

percentiles to be the critical values corresponding to the significance level = 0.1, 0.05 and 0.01, respectively.

3. Research Methodology

Calculation of Critical Values and Evaluation of Type I Error Control (cont.)

Consequently, to investigate the performance of the proposed test in controlling the Type I error with the significance level = 0.1, 0.05 and 0.01, we conduct

simulations 5,000 times under WB(Ξ², ) with different values of (Ξ², ) = (1, 0.5), (1, 2), (1, 4), (1, 8)

and sample sizes n = 20, 50, 100, 200, 500, 1000. For each sample, we calculate a sample statistic based on

equation (9) and compares to the critical value. The percentage of rejecting the null hypothesis will be the

size of the proposed test.

3. Research Methodology

Evaluation of the Power of the Proposed Test

In order to study the power of the proposed test, we simulate 10,000 samples with sample size sizes n = 20, 50, 100, 200, 500, 1000 from Beta(0.25, 0.25), Beta(2, 2), N(0,

1) TruncN(-1,1). Then we compute the powers of Kolmogorov-Smirnov test, CramΓ©r-von Mises test,

Anderson-Darling test and the proposed test WBn at the nominal level 0.05.