On Simon's two-stage design for single-arm phase IIA cancer clinical trials under Beta-binomial...

12
Research Article Received 18 February 2009, Accepted 20 October 2009 Published online 14 January 2010 in Wiley Interscience (www.interscience.wiley.com) DOI: 10.1002/sim.3805 On Simon’s two-stage design for single-arm phase IIA cancer clinical trials under Beta-binomial distribution Junfeng Liu, a,bYong Lin a,b and Weichung Joe Shih a,b Simon (Control. Clin. Trials 1989; 10:1–10)’s two-stage design has been broadly applied to single-arm phase IIA cancer clinical trials in order to minimize either the expected or the maximum sample size under the null hypothesis of drug inefficacy, i.e. when the pre-specified amount of improvement in response rate (RR) is not expected to be observed. This paper studies a realistic scenario where the standard and experimental treatment RRs follow two continuous distributions (e.g. beta distribution) rather than two single values. The binomial probabilities in Simon’s (Control. Clin. Trials 1989; 10:1–10) design are replaced by prior predictive Beta-binomial probabilities that are the ratios of two beta functions and domain-restricted RRs involve incomplete beta functions to induce the null hypothesis acceptance probability. We illustrate that Beta-binomial mixture model based two-stage design retains certain desirable properties for hypothesis testing purpose. However, numerical results show that such designs may not exist under certain hypothesis and error rate (type I and II) setups within maximal sample size 130. Furthermore, we give theoretical conditions for asymptotic two-stage design non-existence (sample size goes to infinity) in order to improve the efficiency of design search and to avoid needless searching. Copyright © 2010 John Wiley & Sons, Ltd. Keywords: (incomplete) beta distribution; (incomplete) beta function; design existence; hypothesis test; prior predictive probability; Simon’s two-stage design 1. Introduction Simon [1]’s two-stage design (optimal and minimax) for phase IIA single-arm cancer clinical trials has gained broad application and many modifications ever since its inception. Chen [2] considered a three-stage design that can potentially reduce sample size by around 10 per cent; Jung et al. [3] proposed using a graphical search method to reach a compromise between the optimal and minimax designs; Panageas et al. [4] discussed a two-stage design with multiple endpoints (complete, partial and no response) using a trinomial model, where the decision boundary is created on a two-dimensional (complete, partial response) discrete region; Ye and Shyr [5] developed a two-stage phase II design with balanced sample sizes in the first and second stages while keeping the total sample size comparable with Simon [1]’s design; Wu and Shih [6] introduced some approaches to handling phase II trials using conditional probabilities when the data collection process deviates from the original design; Chi and Chen [7] brought in a curtailed two-stage design to shorten drug development process and save the expected sample size; and Chen and Shan [8] gave an optimal and minimax three-stage design that allows early termination under both hypotheses (the null and alternative) among many others. Within the Bayesian realm (e.g. Gelman et al. [9]), Thall and Simon [10] created criteria for early trial termination when data indicate inconclusive results and/or experimental treatment inefficacy, Heitjan [11] devised the ‘persuasion’ probability as a rule for stopping trials. For other massive Bayesian literature on phase II cancer clinical trials, readers are referred to Sylvester [12], Thall and Simon [13], Thall and Sung [14], Mayo and Gajewski [15], Wang et al. [16], Tan and Machin [17, 18] and Gajewski and Mayo [19] among many others. Banerjee and Tsiatis [20] used Bayesian decision-theoretic criterion to propose an adaptive two-stage design that allows stage-two sample size to depend on the results from stage one. Focusing on choosing the experimental treatment RR flexibly, Lin and Shih [21] proposed an adaptive two-stage design with two expected RRs (p 1 ,p 2 ) under the alternative hypothesis compared with one RR (p 0 ) under the null hypothesis. Multiple-value RR examples are commonly seen in the medical literature. Loesch et al. [22] reported that taxane therapy can reach a tumor a Biometrics Division at The Cancer Institute of New Jersey, 195 Little Albany Street, New Brunswick, NJ 08901, U.S.A. b Department of Biostatistics at School of Public Health, University of Medicine and Dentistry of New Jersey, Piscataway, NJ 08854, U.S.A. Correspondence to: Junfeng Liu, Biometrics Division at The Cancer Institute of New Jersey, 195 Little Albany Street, New Brunswick, NJ 08901, U.S.A. E-mail: [email protected] 1084 Copyright © 2010 John Wiley & Sons, Ltd. Statist. Med. 2010, 29 1084--1095

Transcript of On Simon's two-stage design for single-arm phase IIA cancer clinical trials under Beta-binomial...

Research Article

Received 18 February 2009, Accepted 20 October 2009 Published online 14 January 2010 in Wiley Interscience

(www.interscience.wiley.com) DOI: 10.1002/sim.3805

On Simon’s two-stage design for single-armphase IIA cancer clinical trials underBeta-binomial distributionJunfeng Liu,a,b∗† Yong Lina,b and Weichung Joe Shiha,b

Simon (Control. Clin. Trials 1989; 10:1–10)’s two-stage design has been broadly applied to single-arm phase IIA cancerclinical trials in order to minimize either the expected or the maximum sample size under the null hypothesis of druginefficacy, i.e. when the pre-specified amount of improvement in response rate (RR) is not expected to be observed. Thispaper studies a realistic scenario where the standard and experimental treatment RRs follow two continuous distributions(e.g. beta distribution) rather than two single values. The binomial probabilities in Simon’s (Control. Clin. Trials 1989;10:1–10) design are replaced by prior predictive Beta-binomial probabilities that are the ratios of two beta functionsand domain-restricted RRs involve incomplete beta functions to induce the null hypothesis acceptance probability. Weillustrate that Beta-binomial mixture model based two-stage design retains certain desirable properties for hypothesistesting purpose. However, numerical results show that such designs may not exist under certain hypothesis and errorrate (type I and II) setups within maximal sample size ∼130. Furthermore, we give theoretical conditions for asymptotictwo-stage design non-existence (sample size goes to infinity) in order to improve the efficiency of design search and toavoid needless searching. Copyright © 2010 John Wiley & Sons, Ltd.

Keywords: (incomplete) beta distribution; (incomplete) beta function; design existence; hypothesis test; prior predictiveprobability; Simon’s two-stage design

1. Introduction

Simon [1]’s two-stage design (optimal and minimax) for phase IIA single-arm cancer clinical trials has gained broad applicationand many modifications ever since its inception. Chen [2] considered a three-stage design that can potentially reduce sample sizeby around 10 per cent; Jung et al. [3] proposed using a graphical search method to reach a compromise between the optimal andminimax designs; Panageas et al. [4] discussed a two-stage design with multiple endpoints (complete, partial and no response)using a trinomial model, where the decision boundary is created on a two-dimensional (complete, partial response) discreteregion; Ye and Shyr [5] developed a two-stage phase II design with balanced sample sizes in the first and second stages whilekeeping the total sample size comparable with Simon [1]’s design; Wu and Shih [6] introduced some approaches to handlingphase II trials using conditional probabilities when the data collection process deviates from the original design; Chi and Chen [7]brought in a curtailed two-stage design to shorten drug development process and save the expected sample size; and Chenand Shan [8] gave an optimal and minimax three-stage design that allows early termination under both hypotheses (the nulland alternative) among many others. Within the Bayesian realm (e.g. Gelman et al. [9]), Thall and Simon [10] created criteriafor early trial termination when data indicate inconclusive results and/or experimental treatment inefficacy, Heitjan [11] devisedthe ‘persuasion’ probability as a rule for stopping trials. For other massive Bayesian literature on phase II cancer clinical trials,readers are referred to Sylvester [12], Thall and Simon [13], Thall and Sung [14], Mayo and Gajewski [15], Wang et al. [16], Tanand Machin [17, 18] and Gajewski and Mayo [19] among many others. Banerjee and Tsiatis [20] used Bayesian decision-theoreticcriterion to propose an adaptive two-stage design that allows stage-two sample size to depend on the results from stage one.Focusing on choosing the experimental treatment RR flexibly, Lin and Shih [21] proposed an adaptive two-stage design withtwo expected RRs (p1, p2) under the alternative hypothesis compared with one RR (p0) under the null hypothesis. Multiple-valueRR examples are commonly seen in the medical literature. Loesch et al. [22] reported that taxane therapy can reach a tumor

aBiometrics Division at The Cancer Institute of New Jersey, 195 Little Albany Street, New Brunswick, NJ 08901, U.S.A.bDepartment of Biostatistics at School of Public Health, University of Medicine and Dentistry of New Jersey, Piscataway, NJ 08854, U.S.A.∗Correspondence to: Junfeng Liu, Biometrics Division at The Cancer Institute of New Jersey, 195 Little Albany Street, New Brunswick, NJ 08901, U.S.A.†E-mail: [email protected]

10

84

Copyright © 2010 John Wiley & Sons, Ltd. Statist. Med. 2010, 29 1084--1095

J. LIU, Y. LIN AND W. J. SHIH

regression rate between 20 and 40 per cent for metastatic breast cancer patients with prior cytotoxic therapy, weekly paclitaxelregimen can achieve a response rate (RR) ranging from 22 to 79 per cent in patients with locally advanced and metastatic breastcancers and the RR to single-agent paclitaxel therapy in patients receiving previous chemotherapy ranges from 53 to 67 per cent.The RR uncertainty from multi-center study perhaps arises from the following factors: patient pathologic heterogeneity, interimtreatment adjustment due to toxicity and small sample sizes among others. As an example of multiple-value hypothesis testing,breast cancer researchers at The Cancer Institute of New Jersey recently proposed two single-arm phase IIA cancer clinical trials:one trial would test the efficacy of combination (experimental) therapy (paclitaxel+pazopanib+carboplatin) over the historic(standard) RR (p0 =43 per cent) on patients with prior cytotoxic treatment, where the combination treatment is targeted at anRR (p1) ranging from 54 to 62 per cent based on meta-analysis. The second trial would test the efficacy of the combinationtherapy ‘paclitaxel+pazopanib+carboplatin’ (40 per cent�target RR�55 per cent) over standard treatment ‘paclitaxel+pazopanib’(RR=37 per cent). The historical RR seems to be easier to be specified than the hypothesized experimental treatment RR thatlacks available information. However, in some situations cancer investigators can only assume the historic (standard) treatment tohave a certain range due to opinion difference and/or diversified results from multiple medical information sources (e.g. Loeschet al. [22]). Lee and Tseng [23] pointed out that the historical control design is subject to patient accrual and outcome evaluationbias so that the observed RR is seldom the true one. Tackling this scenario statistically, the standard and the experimental RRranges may require continuous distribution priors, which put different weights on possible RRs instead of only two (p0 vs p1) orthree values (p0 vs p1 and p2). This paper considers the feasibility of applying Simon [1]’s design under continuous beta priorsfor setting up hypotheses and error rate (type I and II) constraints. The binomial probability used in Simon [1]’s design is to bereplaced by prior predictive Beta-binomial probability to calculate the two-stage critical region and sample size design (r1 / n1, r / n:notations from Simon [1]) under error rate control. Our computational load is comparable to Simon [1]’s two-stage design andlikely less than the adaptive design by Lin and Shih [21]. The conjugate Beta-binomial family streamlines the computationalprocess by calling a subroutine library for computing the (incomplete) beta function. For other recently developed predictiveprobability-based designs, readers are referred to Wang et al. [16], Lee and Liu [24] and Sambucini [25] among others.

The organization of the remainder of the paper is as follows: Section 2 derives the experimental treatment rejection (nullhypothesis acceptance) probability given (incomplete) beta priors; Section 3 discusses the numerical results and design sensitivityto error rate and prior specification with a focus on asymptotic design existence conditions; and Section 4 concludes withdiscussion. Some proofs of the obtained results in Section 3 are placed in the Appendix.

2. Two-stage design under Beta-binomial distribution

After obtaining a two-stage patient accrual plan in terms of sample sizes (n1, n) and a two-dimensional critical region boundary(r1, r) on the response space (y1, y) under error rate (type I: �, type II:�) control, Simon [1]’s two-stage hypothesis test (H0:RR=p0 vsH1 : RR=p1) amounts to the following procedure: if the number of responders (y1) at stage one (n1 patients) is less than or equalto r1, then the experimental treatment is turned down early at stage one, if the trial enters stage two and the total number ofresponders (y) from two stages (n patients) is less than or equal to r, then the experimental treatment is turned down at stagetwo; otherwise, the experimental treatment is taken as more effective than the standard treatment and progresses into phase IIIstudy. After specifying two point-mass hypotheses (p0 vs p1) and test protocol (two-stage design) ,which is denoted by a fractionpair (r1 / n1, r / n), given RR (denoted by p), Simon [1] considered a binomial distribution-based experimental treatment rejectionprobability of

Bi(r1; p, n1)+min[n1 ,r]∑x=r1+1

b(x; p, n1)Bi(r−x; p, n−n1), (1)

where b(·) and Bi(·) are the probability mass and cumulative distribution functions from a binomial distribution. A two-stagedesign solution exists whenever it meets the error rate (type I and II) constraint and we usually search for a design solutionwithin maximal sample size (100–200), since phase II trial is not of a large sample study. To account for hypothesis uncertainty(Section 1), the present work goes one step further by considering two different continuous priors (e.g. beta priors) for thestandard and experimental treatment RRs and conceiving a Beta-binomial version for Simon [1]’s design with the same outputformat of (r1 / n1, r / n). Beyond those arguments for continuous prior beliefs in Section 1, a justification for special beta priors canbe considered in a Bayesian way as follows. We assume that the original two RR prior distributions (H0 and H1) on domain [0,1] arenon-informative: �0(p), �1(p)=Beta(0, 0)∝p−1(1−p)−1, certain elicited pseudo responses (a0 responders and b0 non-respondersfor updating H0, a1 responders and b1 non-responders for updating H1) are used to refine prior beliefs (density curves) ontop of two original non-informative priors, applying a Bayesian rule subsequently leads to two beta priors: H0 : Beta(a0, b0) vsH1 : Beta(a1, b1). This original prior (p−1(1−p)−1) is one of the several non-informative priors including the flat prior (Beta(1,1)) andJeffreys’ prior (Beta( 1

2 , 12 )). This Beta(0,0) prior will simplify our notations throughout this paper and the difference among these

alternatives is not of our concern since Beta(1, 1) can be taken as a posterior distribution (refined prior) after obtaining one eventand one non-event [9]. From the frequentist perspective, the specified beta priors would represent the distribution of the standarddrug RR (e.g. based on the historical data and the meta-analysis) under the null hypothesis and that of the investigator-proposedexperimental drug RR (i.e. investigator opinion distribution as the alternative hypothesis). Wu et al. [26] discussed methods forsoliciting relevant information from clinicians in order to set up reasonable beta priors. For both the standard treatment RR and

Copyright © 2010 John Wiley & Sons, Ltd. Statist. Med. 2010, 29 1084--1095

10

85

J. LIU, Y. LIN AND W. J. SHIH

Figure 1. Beta prior densities for the null and alternative hypotheses. (The dash line is the null hypothesis and the dotted line is the alternative hypothesis.)

the experimental treatment RR restricted on truncated spaces [0,�] or [�, 1] with �∈ (0, 1), we may still use pseudo responses torefine the original priors restricted to [0,�] or [�, 1] and get a right-truncated or left-truncated beta prior.

We now study the experimental treatment rejection probability given beta prior (Beta(a, b)) for RR (p). Each response, zi , is either0 (non-responder) or 1 (responder), i=1,. . . , n1 for stage one and i=n1 +1,. . . , n for stage two. We denote the beta function valuewith arguments (a, b) by B(a, b) :=∫ 1

0 pa−1(1−p)b−1 dp. The prior predictive probability for stage one responses (z1, z2,. . . , zn1 ) is

Pr(z1, z2,. . . , zn1 ) = (1 / B(a, b))×∫ 1

0p∑n1

i=1 zi (1−p)n1−∑n1

i=1 zi pa−1(1−p)b−1 dp

= B

(n1∑

i=1zi +a, n1 −

n1∑i=1

zi +b

)/B(a, b).

Given stage-one responses, the posterior predictive probability for stage-two responses (zn1+1, zn1+2,. . . , zn) is

Pr(zn1+1, zn1+2,. . . , zn|z1, z2,. . . , zn1 )=∫ 1

0Pr(zn1+1, zn1+2,. . . , zn|p)�(p|z1, z2,. . . , zn1 ) dp

=∫ 1

0

[n∏

j=n1+1pzj (1−p)1−zj

][p∑n1

i=1 zi+a−1(1−p)n1−∑n1

i=1 zi+b−1]/

B

(n1∑

i=1zi +a, n1 −

n1∑i=1

zi +b

)dp

=B

(n∑

i=1zi +a, n−

n∑i=1

zi +b

)/B

(n1∑

i=1zi +a, n1 −

n1∑i=1

zi +b

).

Thus, the joint prior predictive probability for two-stage responses (z1, z2,. . . , zn) is

Pr(z1, z2,. . . , zn)=B

(n∑

i=1zi +a, n−

n∑i=1

zi +b

)/B(a, b).

Hereafter, we use notation y1 as the total number of responders at stage one and y as the total number of responders at twostages. The experimental treatment rejection probability (analogous to equation (1)) is

r1∑y1=0

Cy1n1 B(y1 +a; n1 −y1 +b)+

min[n1 ,r]∑y1=r1+1

r−y1∑x=0

Cxn−n1

Cy1n1 B(x+y1 +a, n−x−y1 +b). (2)

divided by B(a, b). Large RR ranges may require complete beta priors, e.g. Beta(50, 50) (the upper-right panel in Figure 1) maybe assumed for the example of ‘22–79 per cent’ (Section 1). Moreover, incomplete beta priors could be used for possible left-truncation and/or right-truncation cases. For the multiple-value RR examples in Section 1, Beta(30, 70) defined over [0.0, 0.6] maybe assumed for ‘20–40 per cent’, Beta(60, 40) defined over [0.5, 0.7] may be assumed for ‘53–67 per cent’. For the incomplete

10

86

Copyright © 2010 John Wiley & Sons, Ltd. Statist. Med. 2010, 29 1084--1095

J. LIU, Y. LIN AND W. J. SHIH

Beta-binomial distribution under right-truncation or left-truncation, p follows an incomplete beta prior (Beta�(a, b) or Beta∗� (a, b))

with density function pa−1(1−p)b−1 defined over domain p∈ [0,�] or [�, 1]. We also denote the values of the incomplete betafunctions as

B�(a, b)=∫ �

0pa−1(1−p)b−1 dp and B∗

� (a, b)=∫ 1

�pa−1(1−p)b−1 dp.

The experimental treatment rejection probability is obtained by replacing the beta functions in equation (2) by correspondingincomplete beta functions. We search for design (r1 / n1, r / n) under error rate (type I:�, type II:�) constraint and sample sizedetermination criteria (optimal or minimax, see Simon [1]). In this paper, we study beta priors subject to up to one-sided truncation.

3. Results

3.1. Design existence and search efficiency

We make the hypothesized beta prior means (H0 : Beta(a0, b0) vs H1 : Beta(a1, b1)) equal to the RR values from Tables I and II inSimon [1] with a normalized total number of pseudo Bernoulli trials for prior construction (see Section 2) such that a0 +b0 =a1 +b1 =100, where we choose 100 considering the practical sample size range of a phase IIA cancer clinical trial and conveniencefor demonstration purpose. For the right-truncation or left-truncation case, we create truncated beta priors by keeping thosedensity curves to the left or right of the middle point ((a0 +a1) / 200) of two means. The expected sample size under H0 (EN0) andthe probability of early trial stopping at stage one (PET0) are displayed in Tables I and II, where the results from Simon [1]’s designare listed in the parentheses (Table I). Under n�130 restriction, when two beta prior means increase up to 50 per cent, theremay not be any two-stage design that meets the error rate constraint where � and � are small. However, there may be designsolution when the tolerated type I and II error rates increase with larger difference (e.g. �=5 per cent and �=20 per cent). For

Table I. For each pair of priors (H0, H1), rows 1, 2 and 3 are for error rate constraints (�,�): (10,10 percent), (5,20 per cent) and (5,10 per cent). The number ‘a’ under H0 or H1 represents the beta prior withshape parameters (a, 100−a).

Optimal design Minimax design

H0, H1 r1 / n1, r / n EN0 PET0 r1 / n1, r / n EN0 PET0

5, 25 0/12, 3/16 13.8 (14.5) 0.56 (0.63) 0/12, 3/16 13.8 (16.4) 0.56 (0.51)0/8, 3/16 10.6 (12.0) 0.67 (0.63) 0/10, 3/13 11.2 (13.8) 0.61 (0.54)

0/11, 4/24 16.4 (16.8) 0.58 (0.63) 0/14, 4/19 16.5 (20.4) 0.51 (0.46)10, 30 1/16, 6/21 18.4 (19.8) 0.53 (0.65) 1/16, 6/21 18.4 (20.4) 0.53 (0.46)

1/13, 6/19 15.2 (15.0) 0.63 (0.74) 0/12, 6/18 16.2 (19.5) 0.30 (0.55)2/17, 26/41 22.9 (22.5) 0.75 (0.71) 1/21, 9/29 25.9 (26.2) 0.39 (0.62)

20, 40 2/12, 14/24 17.3 (26.0) 0.56 (0.55) 1/12, 10/21 18.4 (28.3) 0.29 (0.46)2/10, 17/25 14.9 (20.6) 0.68 (0.75) 2/12, 11/21 15.9 (22.3) 0.56 (0.50)2/12, 14/24 17.3 (30.4) 0.56 (0.67) 2/13, 11/22 17.4 (31.2) 0.51 (0.66)

30, 50 2/9, 13/20 14.9 (29.9) 0.47 (0.67) 1/8, 12/19 16.1 (35.0) 0.27 (0.36)1/5, 18/22 13.0 (23.6) 0.53 (0.72) 3/10, 12/19 13.2 (25.7) 0.65 (0.48)2/9, 13/20 14.9 (34.7) 0.47 (0.73) 1/8, 12/19 16.1 (36.6) 0.27 (0.56)

70, 90 8/11, 28/31 17.5 (14.8) 0.68 (0.58) 8/11, 28/31 17.5 (23.2) 0.68 (0.95)5, 20 0/19, 4/24 22.0 (23.5) 0.41 (0.54) 0/19, 4/24 22.0 (26.4) 0.41 (0.40)

0/15, 4/20 17.6 (17.6) 0.49 (0.60) 0/15, 4/20 17.6 (19.8) 0.49 (0.51)1/23, 7/46 30.3 (26.7) 0.68 (0.72) 0/28, 7/35 33.0 (32.9) 0.28 (0.57)

10, 25 2/22, 31/51 32.9 (31.2) 0.62 (0.65) 2/29, 12/39 34.4 (33.7) 0.46 (0.48)2/17, 41/56 26.6 (24.7) 0.75 (0.73) 2/27, 11/36 31.5 (28.8) 0.50 (0.62)2/22, 31/51 32.9 (36.8) 0.62 (0.65) 2/29, 12/39 34.4 (40.0) 0.46 (0.62)

20, 35 2/16, 12/26 22.3 (43.6) 0.37 (0.54) 2/16, 12/26 22.3 (45.5) 0.37 (0.50)2/12, 19/29 19.5 (35.4) 0.56 (0.73) 2/14, 13/25 20.0 (40.4) 0.46 (0.57)2/15, 15/28 22.7 (51.4) 0.41 (0.69) 1/14, 13/26 23.4 (58.4) 0.22 (0.53)

30, 45 1/8, 14/21 17.5 (51.4) 0.27 (0.59) 1/9, 12/20 17.7 (56.0) 0.21 (0.68)2/9, 14/21 15.4 (41.7) 0.47 (0.73) 1/8, 13/20 16.8 (49.6) 0.27 (0.81)1/8, 14/21 17.5 (60.8) 0.27 (0.70) 1/8, 14/21 17.5 (78.5) 0.27 (0.86)

40, 55 15/26, 90/101 28.7 (54.5) 0.96 (0.67) 15/26, 90/101 28.7 (57.2) 0.96 (0.56)15/26, 90/101 28.7 (44.9) 0.96 (0.67) 15/26, 90/101 28.7 (60.1) 0.96 (0.90)15/26, 90/101 28.7 (64.0) 0.96 (0.68) 15/26, 90/101 28.7 (78.9) 0.96 (0.47)

Copyright © 2010 John Wiley & Sons, Ltd. Statist. Med. 2010, 29 1084--1095

10

87

J. LIU, Y. LIN AND W. J. SHIH

Table II. For each pair of right-truncated and left-truncated priors (H0, H1), rows 1, 2 and 3 are for errorrate constraints (�,�): (10,10 per cent), (5,20 per cent) and (5,10 per cent). The number ‘a’ under H0 or H1represents beta prior with shape parameters (a, 100−a).

Optimal design Minimax design

H0, H1 r1 / n1,r / n EN0 PET0 Error rates (type I, II) r1 / n1, r / n EN0 PET0 Error rates (type I, II)

5, 25 3/27, 4/28 27.1 0.94 (0.06, 0.09) 3/27, 4/28 27.1 0.94 (0.06, 0.09)3/22, 4/23 22.0 0.96 (0.04, 0.19) 3/22, 4/23 22.0 0.96 (0.04, 0.19)4/32, 5/33 32.0 0.96 (0.04, 0.09) 4/32, 5/33 32.0 0.96 (0.04, 0.09)

10, 30 6/35, 7/36 35.1 0.92 (0.08, 0.10) 6/35, 7/36 35.1 0.92 (0.08, 0.10)6/30, 7/31 30.0 0.96 (0.04, 0.19) 6/30, 7/31 30.0 0.96 (0.04, 0.19)

9/48, 10/49 48.0 0.96 (0.04, 0.10) 9/48, 10/49 48.0 0.96 (0.04, 0.10)20, 40 21/31, 24/34 31.0 1.00 (0.00, 0.10) 19/31, 20/32 31.0 1.00 (0.00, 0.10)

21/31, 40/50 31.0 1.00 (0.00, 0.10) 20/31, 21/32 31.0 1.00 (0.00, 0.10)21/31, 40/50 31.0 1.00 (0.00, 0.10) 20/31, 21/32 31.0 1.00 (0.00, 0.10)

30, 50 22/26, 30/34 26.0 1.00 (0.00, 0.01) 20/26, 21/27 26.0 1.00 (0.00, 0.00)15/25, 16/26 25.0 1.00 (0.00, 0.13) 15/25, 16/26 25.0 1.00 (0.00, 0.13)21/26, 33/38 26.0 1.00 (0.00, 0.01) 20/26, 21/27 26.0 1.00 (0.00, 0.00)

70, 90 25/30, 26/31 30.1 0.95 (0.05, 0.20) 25/30, 26/31 30.1 0.95 (0.05, 0.20)5, 20 4/41, 5/42 41.1 0.92 (0.08, 0.10) 4/41, 5/42 41.1 0.92 (0.08, 0.10)

4/34, 5/35 34.0 0.95 (0.05, 0.19) 4/34, 5/35 34.0 0.95 (0.05, 0.19)6/55, 7/56 55.0 0.95 (0.05, 0.10) 6/55, 7/56 55.0 0.95 (0.05, 0.10)

10, 25 9/58, 10/59 58.1 0.90 (0.10, 0.10) 9/58, 10/59 58.1 0.90 (0.10, 0.10)9/50, 10/51 50.1 0.95 (0.05, 0.20) 9/50, 10/51 50.1 0.95 (0.05, 0.20)

12/66, 13/67 66.0 0.96 (0.04, 0.07) 12/66, 13/67 66.0 0.96 (0.04, 0.07)20, 35 15/39, 16/40 39.0 1.00 (0.01, 0.05) 15/39, 16/40 39.0 1.00 (0.01, 0.05)

13/37, 14/38 37.0 0.98 (0.02, 0.13) 13/37, 14/38 37.0 0.98 (0.02, 0.13)15/39, 16/40 39.0 1.00 (0.01, 0.05) 15/39, 16/40 39.0 1.00 (0.01, 0.05)

30, 45 15/27, 16/28 27.0 1.00 (0.00, 0.09) 15/27, 16/28 27.0 1.00 (0.00, 0.09)15/27, 16/28 27.0 1.00 (0.00, 0.09) 15/27, 16/28 27.0 1.00 (0.00, 0.09)15/27, 16/28 27.0 1.00 (0.00, 0.09) 15/27, 16/28 27.0 1.00 (0.00, 0.09)

40, 55 15/26, 16/27 26.0 0.97 (0.03, 0.00) 15/26, 16/27 26.0 0.97 (0.03, 0.00)15/26, 16/27 26.0 0.97 (0.03, 0.00) 15/26, 16/27 26.0 0.97 (0.03, 0.00)15/26, 16/27 26.0 0.97 (0.03, 0.00) 15/26, 16/27 26.0 0.97 (0.03, 0.00)

80, 95 36/41, 37/42 41.1 0.90 (0.10, 0.08) 36/41, 37/42 41.1 0.90 (0.10, 0.08)36/40, 37/41 40.0 0.95 (0.05, 0.16) 36/40, 37/41 40.0 0.95 (0.05, 0.16)

example, ‘Beta(70, 30) vs Beta(90, 10)’ only has a solution under error rate constraint (5, 20 per cent) (Table I). Table II summarizesthe results for two completely separated priors due to truncation at the average of two complete beta prior means, ‘Beta(70, 30)vs Beta(90, 10)’ has a solution under error rate constraint (5, 20 per cent), ‘Beta(80, 20) vs Beta(95, 5)’ has solutions under error rateconstraints (10, 10 per cent) and (5, 20 per cent). Furthermore, from Table II, we find an n1 +1=n phenomenon, which showsthat stage one may play sufficiently well compared with the overall two-stage design, i.e. stage two may not be needed to someextent and the early stopping probability at stage one is close to 1. In most cases where a design solution exists, two designsexactly duplicate under either the optimal or minimax criterion. However, two-stage design may not exist for those beta priorpairs with means greater than 1

2 . Table II also lists the actual type I and II error rates for those obtained design solutions. Undera0 +b0 =a1 +b1 =100,

H0 :�0(p)∝pa0−1(1−p)b0−1 = (p / (1−p))a0−1 ×(1−p)98

vs

H1 :�1(p)∝pa1−1(1−p)b1−1 = (p / (1−p))a1−1 ×(1−p)98.

These have a ratio of �0(p) / �1(p)= (p / (1−p))a0−a1 , which only depends on a0 −a1 and is a decreasing function for p whena0 −a1<0. The range is from +∞ to 0 as p goes from 0 to 1. It seems that the beta prior with a smaller mean (a0 / 100) willalways tend to have smaller responses rate (p) than that with a larger mean (a1 / 100) and there may always exist RR thresholds(r1 and r) that can clearly distinguish the responses under H0 from those under H1 as long as the sample size (n1 and/or n) issufficiently large. However, we now show some conditions for asymptotic design non-existence even if the sample size goes toinfinity.

10

88

Copyright © 2010 John Wiley & Sons, Ltd. Statist. Med. 2010, 29 1084--1095

J. LIU, Y. LIN AND W. J. SHIH

Lemma 1Given the hierarchical model

p ∼ Beta(a, b), 0�p�1,

Yi|p ∼ Bernoulli(p) independently, (i=1,. . . , n).(3)

Yn =∑ni=1 Yi / n converges to the random variable p in distribution, i.e.

limn→∞Pr(Yn�x)=Pr(p�x)=Bx(a, b) / B(a, b) :=Fa,b(x) ∀x ∈ [0, 1].

Lemma 2For the hierarchical model in (3), we have

limn→∞Var(Yn)=Var(p)= (ab) / ((a+b)2(a+b+1)).

Lemma 3The following are some simple facts relating to beta distribution with parameter set (a, b) and probability distribution functionFa,b(x), x ∈ [0, 1].

(1) ∀x ∈ [0, 1], Fa,b(x) is a continuous function of parameters a and b.(2) Given two parameter sets ‘a0,b0’ and ‘a1,b1’, |Fa0 ,b0

(x)−Fa1 ,b1(x)| is a uniformly continuous function on domain x ∈ [0, 1];

thus, it is reasonable to define the global maximum distribution function difference of

�(a0, b0, a1, b1) :=Maxx∈[0,1]{|Fa0 ,b0(x)−Fa1 ,b1

(x)|}.

(3) As a0 →a1 and b0 →b1, |Fa0 ,b0(x)−Fa1 ,b1

(x)|→0 uniformly for x ∈ [0, 1].(4) If a0 +b0 =a1 +b1, a0<a1, a0 =b1 and a1 =b0, then Fa0 ,b0

(x)−Fa1 ,b1(x)>0, ∀ x ∈ [0, 1].

Proposition 1For a single-stage design with critical region boundary r and sample size n, when we specify a small type I error rate tolerance(�) under the null hypothesis (H0) with prior �0(p)=Beta(a0, b0) and a small type II error rate tolerance (�) under the alternativehypothesis (H1) with prior �1(p)=Beta(a1, b1), then the closeness between the two priors (�0(p)≈�1(p)) tends to prohibit theexistence of a single-stage design as n goes to ∞. Specifically,

if �+�+�(a0, b0, a1, b1)�1, then no design solution exists.

Theorem 1For a two-stage design (r1 / n1, r / n), when we specify a small type I error rate tolerance (�) under the null hypothesis (H0) with prior�0(p)=Beta(a0, b0) and a small type II error rate tolerance (�) under the alternative hypothesis (H1) with prior �1(p)=Beta(a1, b1),then the closeness between the two priors (�0(p)≈�1(p)) tends to prohibit the existence of a two-stage design as n1 goes to ∞.Specifically,

if �+2�+2�(a0, b0, a1, b1)�1, then no design solution exists.

The panels from the left column in Figure 2 apply to Theorem 1.

Theorem 2For a two-stage design (r1 / n1, r / n), when we specify a type I error rate tolerance (�) under the null hypothesis (H0) with prior�0(p)=Beta(a0, b0) and a type II error rate tolerance (�) under the alternative hypothesis (H1) with prior �1(p)=Beta(a1, b1), anecessary condition for the existence of a two-stage design when n1 is bounded above and n goes to ∞ is

PrH0 (y1>r1)��+�+�(a0, b0, a1, b1) and PrH1 (y1>r1)�1−�,

where PrH0 (y1>r1) stands for the probability of event {y1>r1} under H0 and PrH1 (y1>r1) stands for the probability of event {y1>r1}under H1.

Panageas et al. [4] discussed computational efficiency for their trinomial model-based design and found that increasing p0 andp1 requires a greater number of regions to search. We have similar findings from Beta-binomial distribution-based design whenthe beta prior means are increased. The algorithm by Lee and Liu [24] is a three-dimensional search that is more computationallyintensive than two-stage or three-stage designs. Theorem 2 suggests searching for (r, n) by increasing n on top of stage-onedesigns (r1, n1), which satisfy the above necessary condition. Since stage-one search is exhaustive with a moderate size (r1,n1) towork on, the overall search workload is likely less than that from the following crude search loop

n : from 1 to ∞n1 : from 1 to (n−1)

Copyright © 2010 John Wiley & Sons, Ltd. Statist. Med. 2010, 29 1084--1095

10

89

J. LIU, Y. LIN AND W. J. SHIH

Den

sity

func

tion

Beta(49,51),Beta(51,49) Beta(45,55),Beta(55,45) Beta(40,60),Beta(60,40)D

istr

ibut

ion

func

tion

Beta(49,51),Beta(51,49) Beta(45,55),Beta(55,45) Beta(40,60),Beta(60,40)

0

0.0 0.2 0.4 0.6 0.8 1.0

2

4

6

8

10

-0.2

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Dis

trib

utio

n fu

nctio

n

-1.0

0.0 0.2 0.4 0.6 0.8 1.0

-0.5

0.0

0.5

1.0

Dis

trib

utio

n fu

nctio

n

-1.0

0.0 0.2 0.4 0.6 0.8 1.0

-0.5

0.0

0.5

1.0

Den

sity

func

tion

0

0.0 0.2 0.4 0.6 0.8 1.0

2

4

6

8

10

Den

sity

func

tion

0

0.0 0.2 0.4 0.6 0.8 1.0

2

4

6

8

10

Figure 2. Comparison between Beta(a0 , 100−a0) and Beta(a1 , 100−a1) with a0<a1. (The dash curve is for Beta(a0 , 100−a0), the dotted curve is for Beta(a1 , 100−a1)and the solid curve is the difference between Beta(a1 , 100−a1) and Beta(a0 , 100−a0).)

r1 : from 0 to n1

r : from (r1 +1) to (n−n1 +r1)

with the null hypothesis (H0) acceptance probability calculated at each (r1 / n1, r / n) combination within the above 4-layer loop.If we require a high power (1−�≈1) and low type I error rate (�≈0), then the closeness between two priors under the nulland alternative hypotheses prohibits the existence of a two-stage design as n goes to ∞ and n1 is of a moderate size. The�(a0, b0, a1, b1) in these preceding conclusions is a global maximum distribution function difference and is increasing rapidly (upto >0.50) as prior difference increases (see the panels from the middle and the right columns in Figure 2).

3.2. Some issues on hypothesis testing

Lehmann [27] discusses several mathematical properties for hypothesis testing such as invariance, unbiasedness, uniformly mostpowerfulness, robustness, symmetry and many others. Salsburg [28] showed the potential change on the nature of hypothesistesting in clinical trials. Hypothesis testing based on p-value may be insufficient since any p-value less than certain threshold seemsto have the same interpretation. Fisher suggested doing hypothesis testing only for trials with random treatment assignment [28],for which an intuitive explanation may be that unlimitedly accurate and precise parameter (say RR) estimation can be achievedfor two treatments under large sample. Hypotheses like ‘RR(H0)<RR(H1)’ can be easily done after considering parameter (RR)estimation for two treatments. However, the single-arm phase IIA cancer clinical trials studied in Simon [1] are not randomizedtrials with two treatment groups. If we happen to know the standard treatment RR (p0) exactly, then RR (p1) estimation duringthe course of the experimental treatment may be used to solve the hypothesis testing problem by comparing p0 with theestimated p1. The non-existence of two-stage design under Beta-binomial mixture model shows that uncertain beliefs on p0 maymake hypothesis testing untractable under frequentist error rate constraint. Given � :=p1 −p0>0 (p0, p1 ∈ [0, 1]), the single-stagehypothesis testing on ‘H0:p�p0 vs H1 : p�p1’ requires a one-dimensional critical region boundary of �∈ (0, 1). If the estimated RR(e.g. the number of responders y divided by the number of Bernoulli trials n) is less than �, then H0 would be accepted; otherwiseH1 would be accepted. Detecting an effect size (�) suffices for catching any larger size �∗ :=p∗

1 −p∗0>� when p∗

0<p0 and p∗1>p1

under the same type I and II error rate constraints. We now demonstrate similar results for Beta-binomial mixture model-basedhypothesis testing by taking the size (�) to be the difference between two pseudo responses (hyperparameters) in beta priors,i.e. � :=a1 −a0>0(0<a0<a1<100), where the notations and model assumptions are given in Section 3.1. We are spontaneouslyinterested in testing ‘H0:a�a0 vs H1 : a�a1’ for the hypothesized beta prior, namely Beta(a, 100−a).

Proposition 2If p∼Beta(a, m−a), then the cumulative distribution function Fa,m−a(x) is stochastically increasing in a, i.e. ∀0�a0<a1�m,Fa1 ,m−a1 (x)<Fa0 ,m−a0 (x), ∀x ∈ [0, 1].

Thus, detecting the specified size of � suffices for catching any larger size �∗:=a∗1 −a∗

0>� when a∗0<a0 and a∗

1>a1 under thesame type I and II error rate constraints.

10

90

Copyright © 2010 John Wiley & Sons, Ltd. Statist. Med. 2010, 29 1084--1095

J. LIU, Y. LIN AND W. J. SHIH

a

5,25 (0/12,3/16)

a

10,30 (1/16,6/21)

a

20,40 (2/12,14/24)

a

30,50 (2/9,13/20)

a

[R] 5,25 (3/27,4/28)

a

[L] 5,25 (3/27,4/28)

a

[R] 10,30 (6/35,7/36)

a

[L] 10,30 (6/35,7/36)

a

[R] 20,40 (21/31,24/34) [L] 20,40 (21/31,24/34)

a a

[R] 30,50 (22/26,30/34)

a

[L] 30,50 (22/26,30/34)

0.0

0 20 40 60 80

Pro

babi

lity

0.4

0.8

0.0

0 20 40 60 80

Pro

babi

lity

0.4

0.8

0.0

0 20 40 60 80

Pro

babi

lity

0.4

0.8

0.0

0 20 40 60 80

Pro

babi

lity

0.4

0.8

0.0

0 20 40 60 80

Pro

babi

lity

0.4

0.8

0.0

0 20 40 60 80

Pro

babi

lity

0.4

0.8

0.0

0 20 40 60 80

Pro

babi

lity

0.4

0.8

0.0

0 20 40 60 80

Pro

babi

lity

0.4

0.8

0.0

0 20 40 60 80

Pro

babi

lity

0.4

0.8

0.0

0 20 40 60 80P

roba

bilit

y

0.4

0.8

0.0

0 20 40 60 80

Pro

babi

lity

0.4

0.8

0.0

0 20 40 60 80

Pro

babi

lity

0.4

0.8

Figure 3. Rejection region probability vs shape parameter a for Beta(a, 100−a). (In each title, two numbers outside of the parenthesis represent the parameters(a0 , a1) of beta priors (Beta(a0 , 100−a0) and Beta(a1 , 100−a1)) under the null and alternative hypotheses, the two fractions in the parenthesis represent the

two-stage design. [R] or [L] represents right- or left-truncated prior case.)

Now we study a two-stage hypothesis testing where a two-dimensional critical region needs to be considered. Koyama andChen [29] discussed proper inference from Simon [1]’s two-stage design in terms of reporting p-value, point estimation andconfidence intervals. On the other hand, two-stage design outputs a unique two-dimensional critical region with boundary (r1, r)on the (y1, y) sample space. It is worthwhile to study the sensitivity of such a critical region probability to pseudo response(hyperparameter) a in the beta prior. Since we always consider a normalized case of Beta(a, 100−a), the sensitivity study isrepresented by a scatter plot of ‘Pr(critical region) vs a’, where the critical region is fixed as a two-stage design for testinghypothesis of ‘H0: Beta(a0, 100−a0) vs H1 : Beta(a1,100−a1)’. Figure 1 in Lin and Shih [21] plays a similar role for studying Simon[1]’s two-stage design under single-value RRs. From Table I, we extract critical regions from every first obtained design (undererror rate constraint: �=�=10 per cent) under the first four hypothesis (H0, Ha) setups. We let beta prior (Beta(a, 100−a)) havean increasing shape parameter (a) ranging from 1 to 99. The upper-right panel in Figure 3 shows that not all curves are monotonefor complete beta priors, where a ‘bitten-off’ pattern is observed for designs with RR ranges close to those design non-existencecases. The panels from the middle and the bottom rows in Figure 3 are for those cases with left-truncated or right-truncatedbeta priors without overlapping domains between the null and the alternative priors, where varying non-monotone patterns areobserved for designs that have RR ranges close to those design non-existence cases.

Copyright © 2010 John Wiley & Sons, Ltd. Statist. Med. 2010, 29 1084--1095

10

91

J. LIU, Y. LIN AND W. J. SHIH

4. Discussion

Owing to the discreteness of the binomial distribution, Simon [1]’s two-stage design for hypothesis testing (p0 vs p1) may givedifferent solutions when we slightly change RRs under the null and the alternative hypotheses. Beta priors offer some spacefor discussing design issues in a continuous manner due to the convergence of observed RR to the assumed prior distribution.Under two beta priors for the RRs (the standard and the experimental treatments), we demonstrated that Simon [1]’s design maynot exist under maximum sample size (n�130) constraint, certain type I and II error rate control and hypothesis configurations.Lee and Liu [24] used the Bayesian tool to fulfill flexible trial design with desirable frequentist properties and emphasized theimportance of estimation within the hypothesis testing framework. Our results show that sample size determination withoutconsideration for parameter estimation may make the study question unsolvable. Simon [1]’s two-stage design works well underpoint-mass hypothesis, whereas direct applicability of such a two-stage design to a continuous prior hypothesis scenario isseverely restricted due to possible sample size unavailability. However, since cancer drug RRs usually have values much less than40 per cent, thereby excluding design non-existence based on our numerical study, the Beta-binomial two-stage design is stilluseful in phase IIA single-arm cancer clinical trials.

Appendix A: Proofs of main results in Section 3

Proof of Lemma 1For any �, we apply the Chebychev inequality conditional on p.

Pr(|Yn −p|��) =∫ 1

0Pr(|Yn −p|��|p)�(p) dp

=∫ 1

0Pr(|Yn −E(Yn)|��|p)�(p) dp�

∫ 1

0[Var(Yn|p) / �2]�(p) dp

�∫ 1

0(1 / (4n�2))�(p) dp=1 / (4n�2).

Yn converges to the random variable p in probability with a uniform convergence rate of the order 1 / (2√

n�). Convergence inprobability implies convergence in distribution, which is also uniform with a rate of convergence similar to the preceding rate.The proof ends. �

Proof of Lemma 2Conditional on RR=p with prior p∼Beta(a, b), each random variable Yi and/or sample mean Yn has conditional mean p andmarginal mean E(p)=a / (a+b). We have

Var

(n∑

i=1Yi

)=

n∑i

Var(Yi)+2∑

1�i<j�nCov(Yi, Yj),

where

Var(Yi) = Var[E(Yi|p)]+E[Var(Yi|p)]=Var(p)+E(p(1−p))

= E(p)−E2(p)=E(p)(1−E(p))=ab / (a+b)2

and

Cov(Yi, Yj) = E(YiYj)−E(Yi)E(Yj)=E(p2)−E2(p)

= Var(p)= (ab) / ((a+b)2(a+b+1)).

Hence, we have

Var(Yn)= [ab / (n(a+b)2)]×[1+(n−1) / (a+b+1)]→ab / ((a+b)2(a+b+1))=Var(p)>0 when n→∞.

This is an intuitive consequence in view of Lemma 1. The proof ends. �

Proof of Lemma 3(1)–(3) result from elementary calculus. (4) is a special case of Proposition 2 (Section 3.2). �

Proof of Proposition 1Given RR (taken as a random variable) prior p∼Beta(a, b), the response Yi ∼Bernoulli(p) identically and independently. Lemma 1says Yn converges to the random variable p in probability as n goes to ∞, whereas it does not converge to the mean of p,

10

92

Copyright © 2010 John Wiley & Sons, Ltd. Statist. Med. 2010, 29 1084--1095

J. LIU, Y. LIN AND W. J. SHIH

a / (a+b), in probability. Lemma 2 gives the asymptotic variance for sample mean (Yn). We use notation y as the total responsesout of n trials and assume H0 :�0(p)=Beta(a0, b0) and H1 :�1(p)=Beta(a1, b1). The error rate control for design (r, n) with H0 criticalregion ‘y�r’ demands

PrH1 (Accepting H0)=Pr(y�r)��, PrH0 (Accepting H0)=Pr(y�r)�1−�.

When H0 and H1 are very close to each other, as n goes to ∞, these two H0 accepting probabilities given H0 or H1 (PrH0 (y / n�r / n)and PrH1 (y / n�r / n)) are expected to be similar since they converge to Pr�0 (p�r / n) and Pr�1 (p�r / n) uniformly (Lemma 1). Werequire that

� � PrH1 (Accepting H0)=PrH1 (y / n�r / n),

1−� � PrH0 (Accepting H0)=PrH0 (y / n�r / n)

� PrH1 (Accepting H0)+�(a0, b0, a1, b1)

� �+�(a0, b0, a1, b1) as n→∞.

If the beta prior has restricted parameters: a+b=100, the limiting variance (Lemma 2) becomes a(100−a) / 101(100)2—thisparabola has roots at 0 and 100 and increases as a increases from 0 to 50. Since larger variances may create more overlapbetween two priors (the null and alternative hypotheses), it tends to be more difficult to distinguish them under specified errorrate control. The proof ends. �

Proof of Theorem 1Under H1, it demands

� � PrH1 (Accepting H0)=PrH1 (y1�r1)+PrH1 (y1>r1, y�r)

� PrH1 (y1�r1)=PrH1 (y1 / n1�r1 / n1).

When H0 and H1 are close to each other (e.g. Beta(49, 51) vs Beta(51, 49)), as n1 goes to ∞, the small lower bound probability,Pr(y1 / n1�r1 / n1), is expected to be similar under both H0 and H1 and thus r1 / n1 is small (approaches the origin along theprobability distribution function curves in the lower-left panel of Figure 2). On the other hand, conditional on such a constrainton r1 / n1 and letting n1 go to ∞, under H0 or H1, we have

Pr(Accepting H0) = Pr(y1�r1)+Pr(y1>r1, y�r)

� Pr(y1�r1)+Pr(y / n�r / n)��+�(a0, b0, a1, b1)+Pr(y / n�r / n).

Under H0 or H1, we have

Pr(Accepting H0) = Pr(y1�r1)+Pr(y1>r1, y�r)

= Pr(y1 / n1�r1 / n1)+Pr(y / n�r / n)−Pr(y1 / n1�r1 / n1, y / n�r / n)

� Pr(y1 / n1�r1 / n1)+Pr(y / n�r / n)−Pr(y1 / n1�r1 / n1)

= Pr(y / n�r / n).

Thus, under H0 or H1 we have

Pr(y / n�r / n)�Pr(Accepting H0)��+�(a0, b0, a1, b1)+Pr(y / n�r / n).

Note that |Pr(y / n�r / n|H0)−Pr(y / n�r / n|H1)|��(a0, b0, a1, b1) as n→∞, we have

|Pr(Accepting H0|H0)−Pr(Accepting H0|H1)|��+2�(a0, b0, a1, b1).

The proof ends. �

Proof of Theorem 2It demands that

1−� � PrH1 (Rejecting H0)=PrH1 (y1>r1, y / n>r / n)

= PrH1 (y1 / n1>r1 / n1, y / n>r / n)�PrH1 (y / n>r / n)

= 1−PrH1 (y / n�r / n)�⇒PrH1 (y / n�r / n)��.

Copyright © 2010 John Wiley & Sons, Ltd. Statist. Med. 2010, 29 1084--1095

10

93

J. LIU, Y. LIN AND W. J. SHIH

and

� � PrH0 (Rejecting H0)=PrH0 (y1>r1, y / n>r / n)

= PrH0 (y1 / n1>r1 / n1)−PrH0 (y1>r1, y / n�r / n)

� PrH0 (y1 / n1>r1 / n1)−PrH0 (y / n�r / n).

Thus, we have

� � PrH0 (y1 / n1>r1 / n1)−�−�(a0, b0, a1, b1), (n→∞)

�⇒ �+�+�(a0, b0, a1, b1)�PrH0 (y1>r1), (n→∞).

On the other hand, we have

1−��PrH1 (Rejecting H0)=PrH1 (y1>r1, y / n>r / n)�PrH1 (y1>r1).

The proof ends. �

Proof of Proposition 2The probability density function of Beta(a, m−a) is fa,m−a(x)=xa−1(1−x)m−a−1 / B(a, m−a). For any a=0,. . . , m−1,

d(a, x) : = Fa+1,m−a−1(x)−Fa,m−a(x)

= 1

B(a+1, m−a−1)

∫ x

0pa(1−p)m−a−1−1 dp−

∫ x

0

1

B(a, m−a)pa−1(1−p)m−a−1 dp

= 1

B(a, m−a)

∫ x

0

((m−a−1)

apa(1−p)m−a−2 −pa−1(1−p)m−a−1

)dp

= 1

B(a, m−a)

∫ x

0pa−1(1−p)m−a−2

(m−a−1

ap−(1−p)

)dp

= 1

B(a, m−a)

∫ x

0pa−1(1−p)m−a−2

(m−1

ap−1

)dp.

Since

pa−1(1−p)m−a−2(

m−1

ap−1

)⎧⎪⎨⎪⎩

<0, if 0�p<a

m−1,

>0, ifa

m−1<p�1,

d(a, x) is decreasing for x ∈ [0, a / (m−1)] and increasing for x ∈ [a / (m−1), 1]. But d(a, 0)=d(a, 1)=0, therefore d(a, x)�0, ∀x ∈ [0, 1].The proof ends. �

Acknowledgements

We thank Antoinette Tan (Breast Medical Oncology and Phase I Program) and Dirk F. Moore (Biometrics Division) at The CancerInstitute of New Jersey for very helpful discussions on cancer clinical trials.

References

1. Simon R. Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials 1989; 10:1--10.2. Chen TT. Optimal three-stage designs for phase II cancer clinical trials. Statistics in Medicine 1997; 16:2701--2711.3. Jung SH, Carey M, Kim KM. Graphical search for two-stage designs for phase II clinical trials. Controlled Clinical Trials 2001; 22(4):367--372.4. Panageas KS, Smith A, Gönen M, Chapman PB. An optimal two-stage phase II design utilizing complete and partial response information

separately. Controlled Clinical Trials 2002; 23:367--379.5. Ye F, Shyr Y. Balanced two-stage designs for phase II clinical trials. Clinical Trials 2007; 4:514--524.6. Wu Y, Shih WJ. Approaches to handling data when a phase II trial deviates from the pre-specified Simon’s two-stage design. Statistics in

Medicine 2008; 27:6190--6208.7. Chi Y, Chen CM. Curtailed two-stage designs in phase II clinical trials. Statistics in Medicine 2008; 27:6175--6189.8. Chen K, Shan M. Optimal and minimax three-stage designs for phase II oncology clinical trials. Contemporary Clinical Trials 2008; 29:32--41.9. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis (2nd edn). Chapman & Hall/CRC: Boca Raton, 2003.

10

94

Copyright © 2010 John Wiley & Sons, Ltd. Statist. Med. 2010, 29 1084--1095

J. LIU, Y. LIN AND W. J. SHIH

10. Thall PF, Simon R. A Bayesian approach to establishing sample size and monitoring criteria for phase II clinical trials. Controlled Clinical Trials1994; 5(6):463--481.

11. Heitjan DF. Bayesian interim analysis of phase II cancer clinical trials. Statistics in Medicine 1997; 16(16):1791--1802.12. Sylvester RJ. A Bayesian approach to the design of phase II clinical trials. Biometrics 1988; 44(3):823--836.13. Thall PF, Simon R. Practical Bayesian guidelines for phase IIB clinical trials. Biometrics 1994; 50:337--349.14. Thall PF, Sung HG. Some extensions and applications of a Bayesian strategy for monitoring multiple outcomes in clinical trials. Statistics in

Medicine 1998; 17:1563--1580.15. Mayo MS, Gajewski BJ. Bayesian sample size calculations in phase II clinical trials using informative conjugate priors. Controlled Clinical Trials

2004; 25:157--167.16. Wang Y, Leung DH, Li M, Tan SB. Bayesian designs with frequentist and Bayesian error rate considerations. Statistical Methods in Medical

Research 2005; 14:445--456.17. Tan SB, Machin D. Bayesian two-stage designs for phase II clinical trials. Statistics in Medicine 2002; 21:1991--2012.18. Tan SB, Machin D. Bayesian two-stage designs for phase II clinical trials. Statistics in Medicine 2006; 25:3407--3408.19. Gajewski BJ, Mayo MS. Bayesian sample size calculations in phase II clinical trials using a mixture of informative priors. Statistics in Medicine

2006; 25:2554--2566.20. Banerjee A, Tsiatis AA. Adaptive two-stage designs in phase II clinical trials. Statistics in Medicine 2006; 25:3382--3395. DOI: 10.1002/sim.2501.21. Lin Y, Shih WJ. Adaptive two-stage designs for single-arm phase IIA cancer clinical trials. Biometrics 2004; 60(2):482--490.22. Loesch D, Robert N, Asmar L, Gregurich MA, ÓRourke M, Dakhil S, Cox E. Phase II multicenter trial of a weekly paclitaxel and carboplatin

regimen in patients with advanced breast cancer. Journal of Clinical Oncology 2002; 20(18):3857--3864.23. Lee JJ, Tseng C. Uniform power method for sample size calculation in historical control studies with binary response. Controlled Clinical

Trials 2001; 22(4):390--400.24. Lee JJ, Liu DD. Predictive probability design for phase II cancer clinical trials. Clinical Trials 2008; 5:93--106.25. Sambucini V. A Bayesian predictive two-stage design for phase II clinical trials. Statistics in Medicine 2008; 27:1199--1224.26. Wu Y, Shih WJ, Moore DF. Elicitation of a Beta prior for Bayesian inference in clinical trials. Biometrical Journal 2008; 50(2):212--223. DOI:

10.1002/bimj.200710390.27. Lehmann EL. Testing Statistical Hypotheses (2nd edn). Springer: New York, 1986.28. Salsburg D. Hypothesis test. In Encyclopedia of Biostatistics, Armitage P, Colton T (eds in chief). Wiley: Chichester, England, 1999; 1969--1976.29. Koyama T, Chen H. Proper inference from Simon’s two-stage designs. Statistics in Medicine 2008; 27:3145--3154.

Copyright © 2010 John Wiley & Sons, Ltd. Statist. Med. 2010, 29 1084--1095

10

95