Design and analysis of sequential clinical trials using a ...€¦ · Design and analysis of...
Transcript of Design and analysis of sequential clinical trials using a ...€¦ · Design and analysis of...
Design and analysis of sequential clinical trials using a Markov
chain transition rate model with conditional power
by
GREGORY RUSSELL POND
A thesis submitted in conformity with the requirements
for the degree of Doctor of Philosophy,
Department of Public Health Sciences,
University of Toronto
c© Copyright by Gregory Russell Pond (2008)
i
Design and analysis of sequential clinical trials using a Markov chain transition rate model with
conditional power, Gregory Russell Pond, Department of Public Health Sciences, University of
Toronto, Doctor of Philosophy, 2008
Abstract
Background:
There are a plethora of potential statistical designs which can be used to evaluate efficacy
of a novel cancer treatment in the phase II clinical trial setting. Unfortunately, there is
no consensus as to which design one should prefer, nor even which definition of efficacy
should be used and the primary endpoint conclusion can vary depending on which design
is chosen. It would be useful if an all-encompassing methodology was possible which could
evaluate all the different designs simultaneously and allow investigators an understanding
of the trial results under the varying scenarios.
Methods:
Finite Markov chain imbedding is a method which can be used in phase II oncology clin-
ical trials but has never previously been evaluated for examining phase II cancer clinical
trials. Simple variations to the transition matrix or end-state probability definitions can
be performed which allow for evaluation of multiple designs and endpoints for a single
trial. A computer program is written in R which allows for computation of p-values and
conditional power, two common statistical measures used for evaluation of trial results.
A simulation study is performed using data arising from an actual phase II clinical trial
performed recently in which the study conclusion regarding the efficacy of the potential
treatment was debatable.
ii
Results:
Finite Markov chain imbedding is shown to be useful for evaluating phase II oncology
clinical trial results. The R code written for evaluating the simulation study is demon-
strated to be fast and useful for investigating different trial designs. Further details
regarding the clinical trial results are presented, including the potential prolongation of
stable disease of the treatment, which is a potentially useful marker of efficacy for this
cytostatic agent.
Conclusions:
This novel methodology may prove to be an useful investigative technique for the eval-
uation of phase II oncology clinical trial data. Future studies which have disputable
conclusions might become less controversial with the aid of finite Markov chain imbed-
ding and the possible multiple evaluations which are now viable. Better understanding
of activity for a given treatment might expedite the drug development process or help
distinguish active from inactive treatments.
iii
Acknowledgement
Completion of any achievement is hollow without incentive. For me that incentive is my
family.
Beverly, you have stuck by me through every adversity, energised me when I was
tired, guided me when I was lost, supported my decisions and endured many tribulations
as I pursued my degree. I can never express my gratitude for your support or my infinite
love for you.
Connor and Carter, I can not express the joy and pride you bring to my life. I
can already see the remarkably caring, thoughtful and outstanding young men you are
becoming.
I also wish to thank my parents for all your encouragement, love and guidance
throughout the years; my parents-in-law, Gloria and Wilson, for your kindness, gen-
erosity and willingness to help; Dr. Lillian Siu, a truly amazing oncologist and researcher
and wonderful colleague for the opportunities you have given me; and Dr. Wendy Lou,
for your guidance, suggestions and support.
Finally, to all my friends, family and colleagues over the years, without whom I would
not be where I am today, I thank you.
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
1 Introduction 1
2 Statistical Issues 5
2.1 Phase II and III Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Methods for Adjusting the Type I Error (α) In the Situation of Multiple-
Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Bonferroni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Early Group-Sequential Methods . . . . . . . . . . . . . . . . . . 8
2.2.3 Alpha-Spending Function . . . . . . . . . . . . . . . . . . . . . . 9
2.2.4 Repeated Confidence Intervals . . . . . . . . . . . . . . . . . . . . 10
2.2.5 Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.6 Using a Risk/Loss Function . . . . . . . . . . . . . . . . . . . . . 13
2.2.7 Stochastic Curtailment and Conditional Probability . . . . . . . . 13
2.3 Sample Size Re-adjustment [52] [53] [54] [55] . . . . . . . . . . . . . . . . 14
2.3.1 Variance Spending Approach . . . . . . . . . . . . . . . . . . . . . 16
2.3.2 Fisher Combination Test . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.3 Conditional Power . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Combining Data from Different Analyses . . . . . . . . . . . . . . . . . . 20
iv
CONTENTS v
2.4.1 Continuous and Categorical Outcomes . . . . . . . . . . . . . . . 20
2.4.2 Survival-Type Outcomes . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.2 Properties of Markov Chains . . . . . . . . . . . . . . . . . . . . . 23
2.5.3 Markov Chains as a Model for Cancer Phase II Clinical Trials . . 25
3 Potential Trial Designs for Phase II Oncology Clinical Trials 29
3.1 Design Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Phase II Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1 Univariate Designs With Response as the Outcome . . . . . . . . 32
3.3 Multinomial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1 Zee Design [11] [82] . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.2 Trinomial Design [12] . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.3 Dual-Response Design [13] . . . . . . . . . . . . . . . . . . . . . . 51
3.3.4 Weighted Response Design [14] . . . . . . . . . . . . . . . . . . . 54
3.4 Using Finite Markov Chain Imbedding . . . . . . . . . . . . . . . . . . . 59
4 Examples and Simulation Set-up 61
4.1 Phase II Clinical Trial of CCI-779 (temsirolimus) in Neuroendocrine Car-
cinoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.1.1 Trial Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.1.2 Trial Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1.3 Note regarding response rates . . . . . . . . . . . . . . . . . . . . 64
4.2 Implementation of Markov Chain Methods . . . . . . . . . . . . . . . . . 65
4.2.1 RECIST criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.1 Estimating H0 and HA . . . . . . . . . . . . . . . . . . . . . . . . 69
CONTENTS vi
4.4 Models Investigated in Simulation . . . . . . . . . . . . . . . . . . . . . . 70
4.4.1 RECIST model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4.2 RECIST model evaluating outcomes at different transition times . 71
4.4.3 Transition Matrices Based on Immediate Changes . . . . . . . . . 72
4.4.4 Transition Matrices with Different Positive Outcomes . . . . . . . 73
4.4.5 Multi-binomial transition matrices . . . . . . . . . . . . . . . . . 75
4.5 Calculation of p-values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6 Methods Used for Investigating Different Outcomes . . . . . . . . . . . . 77
5 Results 80
5.1 RECIST Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1.1 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.1.3 Transition Time-Important RECIST model . . . . . . . . . . . . . 84
5.1.4 Varying away from the RECIST criteria . . . . . . . . . . . . . . 87
5.1.5 Immediate response . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.1.6 Consecutive states . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.1.7 Dual-Binomial Outcomes . . . . . . . . . . . . . . . . . . . . . . . 90
5.1.8 Theoretical Versus Simulated Calculations . . . . . . . . . . . . . 91
6 Discussion 93
A Data 98
B State Spaces 100
C Computer Code 102
D Results 108
List of Tables
1 Table of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
2.1 Repeated significance testing on accumulating data, taken from [20] . . . 6
3.1 Potential Phase II Designs Using Response . . . . . . . . . . . . . . . . . 31
3.2 Potential Phase II Designs Using Response & Stable Disease . . . . . . . 32
3.3 Acceptance Region for Hypothetical Trial using Lin and Chen Design [14],
comparing H0 : RR = 0.05 and SD = 0.25 versus HA : RR = 0.15 and
SD = 0.50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
A.1 Data, in mm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
B.1 Data, in State Spaces According to RECIST Criteria . . . . . . . . . . . 101
D.1 Data input for matrix (2.9) modelling the RECIST criteria . . . . . . . . 108
D.2 Endstate probabilities for (2.9) modelling the RECIST criteria . . . . . . 108
D.3 Outcomes for (2.9) modelling the RECIST criteria and n=36 patients . . 109
D.4 Outcomes for (2.9) modelling the RECIST criteria and n=54 patients . . 110
D.5 Data input for matrix (4.1) modelling the transition-time dependent RE-
CIST criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
D.6 Endstate probabilities for (4.1) modelling the transition-time dependent
RECIST criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
vii
LIST OF TABLES viii
D.7 Outcomes for (4.1) modelling the transition-time dependent RECIST cri-
teria with n=36 patients . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
D.8 Outcomes for (4.1) modelling the transition-time dependent RECIST cri-
teria with n=54 patients . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
D.9 Modified data input (2), slightly better expectations under H0, for matrix
(4.1) modelling the transition-time dependent RECIST criteria . . . . . . 113
D.10 Endstate probabilities for modified data input (2), slightly better expecta-
tions under H0, for (4.1) modelling the transition-time dependent RECIST
criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
D.11 Outcomes for modified data input (2), slightly better expectations under
H0, for (4.1) modelling the transition-time dependent RECIST criteria
with n=36 patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
D.12 Outcomes for modified data input (2), slightly better expectations under
H0, for matrix (4.1) modelling the transition-time dependent RECIST
criteria with n=54 patients . . . . . . . . . . . . . . . . . . . . . . . . . . 115
D.13 Modified data input (3), extremely better expectations under H0, for ma-
trix (4.1) modelling the transition-time dependent RECIST criteria . . . 115
D.14 Endstate Probabilities for Modified Data Input (3), extremely better ex-
pectations under H0, for (4.1) modelling the transition-time dependent
RECIST criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
D.15 Outcomes for modified data input (3), extremely better expectations under
H0, for (4.1) modelling the transition-time dependent RECIST criteria
with n=36 patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
D.16 Outcomes for modified data input (3), extremely better expectations under
H0, for matrix (4.1) modelling the transition-time dependent RECIST
criteria with n=54 patients . . . . . . . . . . . . . . . . . . . . . . . . . . 117
LIST OF TABLES ix
D.17 Modified data input (4), hypothesing a cyto-toxic treatment with im-
proved immediate response but no durability, for matrix 4.1 modelling
the transition-time dependent RECIST criteria . . . . . . . . . . . . . . . 118
D.18 Endstate probabilities for modified data input (4), hypothesing a cyto-
toxic treatment with improved immediate response but no durability, for
(4.1) modelling the transition-time dependent RECIST criteria . . . . . . 118
D.19 Outcomes for modified data input (4), hypothesing a cyto-toxic treatment
with improved immediate response but no durability, for (4.1) modelling
the transition-time dependent RECIST criteria with n=36 patients . . . 119
D.20 Outcomes for modified data input (4), hypothesing a cyto-toxic treatment
with improved immediate response but no durability, for matrix (4.1) mod-
elling the transition-time dependent RECIST criteria with n=54 patients 120
D.21 Modified data input (5), an extreme optimist, for matrix (4.1) modelling
the transition-time dependent RECIST criteria . . . . . . . . . . . . . . . 121
D.22 Endstate probabilities for modified data input (5), an extreme optimist,
for matrix (4.1) modelling the transition-time dependent RECIST criteria 121
D.23 Outcomes for modified data input (5), an extreme optimist, for matrix
(4.1) modelling the transition-time dependent RECIST criteria with n=36
patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
D.24 Outcomes for modified data input (5), an extreme optimist, for matrix
(4.1) modelling the transition-time dependent RECIST criteria with n=54
patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
D.25 Data input (6), an additional transition, for matrix (4.1) modelling the
transition-time dependent RECIST criteria . . . . . . . . . . . . . . . . . 124
D.26 Endstate probabilities (6), an additional transition, for matrix (4.1) mod-
elling the transition-time dependent RECIST criteria . . . . . . . . . . . 124
LIST OF TABLES x
D.27 Outcomes (6), an additional transition, for matrix(4.1) modelling the
transition-time dependent RECIST criteria with n=36 patients . . . . . . 125
D.28 Outcomes (6), an additional transition, for matrix (4.1) modelling the
transition-time dependent RECIST criteria with n=54 patients . . . . . . 126
D.29 Endstate Probabilities for matrix (4.1) modelling the transition-time de-
pendent RECIST criteria with only 3 transitions . . . . . . . . . . . . . . 126
D.30 Outcomes for matrix (4.1) modelling the transition-time dependent RE-
CIST criteria with 3 transitions and n=36 patients . . . . . . . . . . . . 127
D.31 Outcomes for matrix (4.1) modelling the transition-time dependent RE-
CIST criteria with 3 transitions and n=54 patients . . . . . . . . . . . . 128
D.32 Data input for matrix (4.2) modelling the transition-time dependent RE-
CIST criteria with response not an absorbing state . . . . . . . . . . . . 129
D.33 Endstate probabilities for matrix (4.2) modelling the transition-time de-
pendent RECIST criteria with response not an absorbing state . . . . . . 130
D.34 Outcomes for matrix (4.2) modelling the transition-time dependent RE-
CIST criteria with response not an absorbing state and n=36 patients . . 131
D.35 Outcomes for matrix (4.2) modelling the transition-time dependent RE-
CIST criteria with response not an absorbing state and n=54 patients . . 132
D.36 Data input for matrix (4.3) modelling the change in response (10%) at
each transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
D.37 Endstate probabilities for matrix (4.3) modelling the change in response
(10%) at each transition . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
D.38 Outcomes for matrix (4.3) modelling the change in response (10%) at each
transition and n=36 patients . . . . . . . . . . . . . . . . . . . . . . . . . 135
D.39 Outcomes for matrix (4.3) modelling the change in response (10%) at each
transition and n=54 patients . . . . . . . . . . . . . . . . . . . . . . . . . 136
LIST OF TABLES xi
D.40 Data input for matrix (4.3) modelling the change in response (5%) at each
transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
D.41 Endstate probabilities for matrix (4.3) modelling the change in response
(5%) at each transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
D.42 Outcomes for matrix (4.3) modelling the change in response (5%) at each
transition and n=36 patients . . . . . . . . . . . . . . . . . . . . . . . . . 139
D.43 Outcomes for matrix (4.3) modelling the change in response (5%) at each
transition and n=54 patients . . . . . . . . . . . . . . . . . . . . . . . . . 140
D.44 Data input for matrix (4.4) modelling the change in response, with no
stable disease, at each transition . . . . . . . . . . . . . . . . . . . . . . . 140
D.45 Endstate probabilities for matrix (4.4) modelling the change in response,
with no stable disease, at each transition . . . . . . . . . . . . . . . . . . 141
D.46 Outcomes for matrix (4.4) modelling the change in response, with no stable
disease, at each transition and n=36 patients . . . . . . . . . . . . . . . . 141
D.47 Outcomes for matrix (4.4) modelling the change in response, with no stable
disease, at each transition and n=54 patients . . . . . . . . . . . . . . . . 142
D.48 Data input for matrix (4.5) modelling response+3 consecutive stable dis-
ease observations as a good outcome . . . . . . . . . . . . . . . . . . . . 143
D.49 Endstate probabilities for matrix (4.5) modelling response+3 consecutive
stable disease observations as a good outcome . . . . . . . . . . . . . . . 144
D.50 Outcomes for matrix (4.5) modelling response+3 consecutive stable disease
observations as a good outcome and n=36 patients . . . . . . . . . . . . 144
D.51 Outcomes for matrix (4.5) modelling response+3 consecutive stable disease
observations as a good outcome and n=54 patients . . . . . . . . . . . . 145
D.52 Data input for matrix (4.6) modelling response+4 consecutive stable dis-
ease observations as a good outcome . . . . . . . . . . . . . . . . . . . . 146
LIST OF TABLES xii
D.53 Endstate probabilities for matrix (4.6) modelling response+4 consecutive
stable disease observations as a good outcome . . . . . . . . . . . . . . . 147
D.54 Outcomes for matrix (4.6) modelling response+4 consecutive stable disease
observations as a good outcome and n=36 patients . . . . . . . . . . . . 147
D.55 Outcomes for matrix (4.6) modelling response+4 consecutive stable disease
observations as a good outcome and n=54 patients . . . . . . . . . . . . 148
D.56 Data input for matrix (4.7) modelling response+consecutive minor re-
sponses as a good outcome . . . . . . . . . . . . . . . . . . . . . . . . . . 149
D.57 Endstate probabilities for matrix (4.7) modelling response+consecutive
minor responses as a good outcome . . . . . . . . . . . . . . . . . . . . . 150
D.58 Outcomes for matrix (4.7) modelling response+consecutive minor responses
as a good outcome and n=36 patients . . . . . . . . . . . . . . . . . . . . 151
D.59 Outcomes for matrix (4.7) modelling response+consecutive minor responses
as a good outcome and n=54 patients . . . . . . . . . . . . . . . . . . . . 152
D.60 Data mnput for matrix (4.8) modelling response & toxicity outcomes . . 152
D.61 Endstate probabilities for matrix (4.8) modelling response & toxicity out-
comes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
D.62 Outcomes for matrix (4.8) modelling response & toxicity outcomes and
n=36 patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
D.63 Outcomes for matrix (4.8) modelling response & toxicity outcomes and
n=54 patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
List of Figures
3.1 Potential Distribution for Standard Treatment Response Rate . . . . . . 43
3.2 Tumour Shrinkage and Growth for Three Hypothetical Patients Over Time 47
3.3 Decision Process for Zee [82] multinomial design. Figure A is the decision
rule after stage 1 and Figure B is the decision rule after stage 2 . . . . . 50
xiii
LIST OF FIGURES xiv
Abbreviation Meaning
C Censored
CI Confidence Interval
CR Complete Response
DSMB Data Safety Monitoring Board
MTA Molecularly Targeted Agent
PD Progressive Disease
PR Partial Response
RECIST Response Evaluation Criteria In Solid Tumours
RR Response Rate
SD Stable Disease
UR Unconfirmed Response
Table 1: Table of Abbreviations
Chapter 1
Introduction
The National Cancer Institute of Canada enrolled 6626 patients to clinical trials, at a
cost of $64 million, in 2004-05, which is a mean of over $10,000 per patient [1]. In the
United States, the National Cancer Institute is planning a budget of nearly $5.8 billion
for 2008 and supports over 1300 clinical trials a year, treating over 200,000 patients [2].
In addition to the high financial cost of cancer clinical trials, there is an even greater
human cost. Patients who are eligible for clinical trials are generally patients who have
no other options, having failed all standard therapies or having a disease for which no
standard therapy exists. These patients often enter clinical trials as their last hope and
are at the end of their life. Further, the types of treatments studied in cancer clinical
trials can be quite toxic with life-threatening adverse events.
Despite the large numbers of patients accrued to clinical trials, this is only a small
fraction of patients diagnosed with cancer, which numbers around 150,000 yearly in
Canada [3]. Canadians have a lifetime probability of developing cancer of around 44% and
38%, with a lifetime probability of dying of cancer of 28% and 23% in males and females,
respectively [3]. Accrual to clinical trials remains difficult [4], especially amongst minority
and disadvantaged groups [5]. It is necessary that data from patients accrued to cancer
clinical trials be optimally used and clinical trial designs need to be constructed such
1
Chapter 1. Introduction 2
that data is used as efficiently as possible. Efficient designs save valuable resources and
funds from being wasted, save patients from receiving toxic and possibly non-efficacious
treatments needlessly, and speed up the drug development process, allowing efficacious
treatments to be available for all cancer patients more promptly.
In the drug development paradigm, there are 4 main clinical trial stages [6]. Phase I
is dose-finding, with the ultimate goal being to determine the optimal dose, which is the
highest dose that has acceptable levels of toxicity. Phase II trials are preliminary inves-
tigations of treatment efficacy in which the ultimate goal is to weed out non-efficacious
treatments and allow treatments with some potential activity to continue on to more
definitive study. This occurs in phase III, when the experimental treatment is studied
in a randomized clinical trial compared with the present standard of care. Phase IV is
post-marketing, which further evaluate the long-term safety and effectiveness of a new
treatment, including determining if there are subgroups which benefit from treatment,
or alternate doses which are superior.
Phase II cancer clinical trials tend to be single-arm, open-label studies with a small
number of patients, usually no more than 50. As noted above, phase II trials serve
primarily to discriminate between treatments with some potential activity, and those
which are ineffective and do not prevent tumour growth. Since it is unethical to accrue
patients to a treatment which is ineffective, especially if it has significant toxicity, a
number of clinical trial designs have been proposed which attempt to optimize these
trials [7] [8] [9] [10] [11] [12] [13] [14] [15] [16].
There is no consensus as to which design should be used, and the final decision as
to which design to use is often a subjective choice based on personal preference of the
statistician or the principal investigator. The choice of design is often subjective and
the final statistical decision as to whether a treatment is worthy of further study, or
ineffective, may differ significantly depending on which design is used. Further compli-
cating this decision is the choice of which primary outcome measure to use. The most
Chapter 1. Introduction 3
frequently used primary outcome is response rate, with definitions for solid tumours of-
ten following the response evaluation criteria in solid tumours, more frequently referred
to as the RECIST criteria [17]. Prior to RECIST, the World Health Organization [18]
criteria was commonly used. Traditionally, cytotoxic treatments were investigated which
would only be deemed effective if the treatment could shrink the tumour, hence causing
an objective response. Recently, however, molecularly targeted agents [MTA], which are
cytostatic, are more frequently studied. These agents may work by preventing tumour
growth, not necessarily by shrinking the tumour. Non-progression may then be an in-
dicator of treatment efficacy. Other outcomes might also be indications of treatment
efficacy, such as prolongation of overall survival, time to progression, or some multivari-
ate outcome which combines response, toxicity, survival or other outcomes. Results of
a trial may therefore be different than one had expected and as a result, designs are
often ammended or disregarded during the trial, even by experienced investigators, due
to unexpected results.
An additional problem with traditional trial designs is that only a single primary
outcome measurement is used for each patient, although there are often multiple mea-
surements of outcomes such as tumour response. Tumour size is often measured at the
end of each treatment cycle, or every second treatment cycle, although in the end, ac-
cording to most response definitions including RECIST, only the best observed response
is used. Despite the desire to use data efficiently, much data is discarded.
Instead of forcing investigators to choose a single design based on a single outcome,
which may in the end not effectively demonstrate treatment efficacy, an alternative idea
is to explore a range of possible designs and a range of possible outcomes. In this
dissertation, finite imbedding Markov chain models are used to explore data from a
single trial under a range of possible design questions and outcomes of interest. Using
Markov chain methodology allows incorporation of multiple outcome measures for each
patient, such as tumour response status at each evaluation.
Chapter 1. Introduction 4
The rest of the thesis is organised as follows. In Chapter 2, a review of statistical
methodologies is presented, including finite imbedding Markov chains, common in phase
II and III clinical trials or pertinent to this dissertation, followed by a review of most
frequently used phase II designs in Chapter 3. At the end of Chapter 3, the use of finite
Markov chain imbedding for use in analysing phase II oncology clinical trials is proposed
along with rationale for using finite Markov chain imbedding. An actual clinical trial
example is presented in Chapter 4 and in addition, implemention of finite Markov chain
imbedding methods and a description of the simulation analysis performed. Results of the
simulation are presented in Chapter 5. Finally, Chapter 6 summarises the dissertation,
adds some conclusions and discusses areas of future work.
Chapter 2
Statistical Issues
One way researchers have attempted to improve clinical trial efficiency is by performing
group-sequential designs in which interim analyses are performed and a trial is stopped
as soon as possible when the conclusion is definitive. In cancer, this has an increased
ethical importance as it limits the number of patients exposed to an inactive and possibly
toxic treatment, while ensuring personnel and financial resources are not wasted [19].
Additionally, if efficacy is found earlier, the drug development process can be sped up with
quicker approval by regulatory agencies, resulting in more patients being treated with an
active agent. Group-sequential designs are now implemented routinely in most clinical
trials, however, the nature of the design can vary substantially, particularly depending
on the type and phase of the trial. Phase II trials tend to have a single interim analysis,
while phase III trials tend to have many.
Statistically, the effect of performing an interim analysis can be quite pronounced
[20] and give spurious results [21]. Each additional look at the data will increase the
probability of falsely rejecting the null hypothesis (H0). An example is shown in Table 2.1,
which shows the false-positive rate when performing multiple independent significance
tests, each at the α=0.05 level of significance, using a two-sample t-test. The false-
positive probability increases to 1 as the number of looks at the data increases to ∞ [20].
5
Chapter 2. Statistical Issues 6
Thus, when performing a clinical trial which ethically requires interim analyses, one must
adjust the statistical error rates to account for these extra looks at the data.
No. of tests K Overall null probability
of rejecting H0
1 0.05
2 0.08
3 0.11
4 0.13
5 0.14
10 0.19
20 0.25
50 0.32
∞ 1.00
Table 2.1: Repeated significance testing on accumulating data, taken from [20]
It is important to review the statistical literature for existing group-sequential tech-
niques prior to investigating novel methodologies or modifications of existing methods.
Many methods have been proposed, both for phase II and phase III clinical trials and a
review of these methods is presented in the next section.
2.1 Phase II and III Clinical Trials
Before looking at the different group-sequential methods, it is important to note some
differences between phase II and phase III designs. In phase III clinical trials, there is a
comparison between two treatment arms, whereas in phase II trials, there is only a single
treatment arm. The second main difference is that in phase III trials, the experimental
treatment is generally hoped to be superior to, or in the case of non-inferior testing,
Chapter 2. Statistical Issues 7
about as good as, standard, or the control arm, so interim analyses are often based
on the premise of stopping the trial early due to superiority. Phase II trials are more
frequently stopped early due to inferiority due to the expectation that most treatments
will not be better than standard of care. Even in the rare situation that a treatment is
superior, one will generally want to accrue additional patients to increase familiarity with
the treatment prior to the expensive, phase III trial. This difference will be important
when deciding on which statistical method is appropriate for use in clinical trial design.
2.2 Methods for Adjusting the Type I Error (α) In
the Situation of Multiple-Testing
2.2.1 Bonferroni
The simplest, and probably best known, statistical adjustment for multiple testing is
the Bonferroni adjustment [22]. Here, the type I error (α) is divided by the number of
looks at the data which will be performed. At each analysis, only data from patients
accrued since the previous test are included, and the test is significant only if the test α
is less than the Bonferroni level of significance. Bonferroni adjustments are well-known
to be overly-conservative [23]. Further, the Bonferroni adjustment is correct if each
individual group of data is independent but is inappropriate when the data is correlated,
as is generally the case in clinical trials where patients are accrued group-sequentially.
Thus, one possibility when many analyses are performed is that one might observe a
trend towards significance in the same direction at each analysis, but no test is by itself
sufficiently strong to reject the null hypothesis. However, if one combines each test
together into a single test, one might observe an overwhelmingly significant result, which
is missed when using Bonferroni. Fortunately, newer methods have been proposed which
are more efficient than using a Bonferroni adjustment and allow for one to reject or not
Chapter 2. Statistical Issues 8
reject the null hypothesis at an interim analysis.
2.2.2 Early Group-Sequential Methods
The first sequential methods, the sequential probability ratio test [24] and the triangular
test [25] were developed during the second world war, but it was not until the late 1970s
that practical methods were developed. These more practical methods were developed
because of the ability to calculate conditional error rates by using numerical integration
[26]. Since the amount of α error spent at any one test is conditional upon previous
results (i.e. was a previous test significant, thus leading to a termination of the trial, or
not), one must account for previous results when determining total error spent at any
given test. This is possible through numerical integration and one adds up the total α
error spent at each test to obtain the final level of significance.
Two well-known approaches were first developed using numerical integration in the
late 1970’s, whereby patients were accrued in k equally-spaced groups and an interim
analysis performed after each group was accrued. The Pocock [27] and O’Brien-Fleming
[28] methods describe a ’family’ of boundaries which incorporate all previously observed
data at each interim analysis and used numerical integration to ensure the total α used
at the end of the trial is within pre-defined limits. For the first time, when the 2nd (or
later) interim analysis was performed, data from patients accrued between the 1st and
2nd analyses in addition to data from patients accrued prior to the 1st interim analysis
were combined and analysed together. The difference between the two methods was how
the α was used at each analysis. The O’Brien-Fleming design is based on boundaries
which are constant in the test statistic scale as a function of time, i.e. one would reject
Z∗ if Z∗ ≥ Z/√
ni/N at the ith analysis, where ni is the number of patients at analysis i
of the total N patients in the trial, and Z is the statistical test of interest (based on the
normal distribution). In contrast, the Pocock design gives boundaries which are constant
in terms of the α as a function of time, i.e. one would reject a test Zi if Zi > c at any
Chapter 2. Statistical Issues 9
analysis i = 1, 2, 3, ..., n where c is a constant such that the overall test has size α. The
specific boundary used for a particular analysis from within the ’family’ of boundaries,
for both the Pocock and O’Brien-Fleming methods, depends on the number of interim
looks at the data. The O’Brien-Fleming approach required much stricter requirements
for stopping early on in a trial compared to the Pocock method, however, the boundary
point at the end of the study is much closer to the unadjusted boundary point. In
contrast, the Pocock approach would stop a trial earlier [29].
2.2.3 Alpha-Spending Function
The major drawback to these methods is that they require interim analyses to occur
at exact time points (i.e. after every ni patients), which may be difficult to achieve in
practise. A major breakthrough occurred with the development of the alpha-spending
function [30]. This development allowed users to specify a function, or a curve, which
represents the amount of α, or type I error, spent at any information time point during
the trial. By only specifying the α-spending function at the start of the trial, the timing
of interim analyses becomes irrelevant, provided when an interim analysis is performed is
not scheduled based on the data already observed. Here, information time is defined as
the information available at the interim analysis as a proportion of the total information
which will be available at the trial conclusion. With continuous or categorical endpoints,
this is simply the number of patients enrolled out of the total expected number. With
survival endpoints, this would be the number of deaths which have been observed out of
the total expected number.
The concept behind the α-spending function is based on assuming a Brownian motion
process, or a continuous-time stochastic process which is random. If B(t): 0 ≤ t ≤ 1 is
a standard Brownian motion process, and some horizontal boundary point b(t) = zα2
is
set (for a one-sided test), then by defining τ to be the first time that the process B(t)
crosses beyond the point b(t), a known function describes α∗(t) = pr(τ ≤ t), that being:
Chapter 2. Statistical Issues 10
α∗(t) = pr(τ ≤ t) =
0 , if t = 0
2 − 2Φ(zα2)/√
t , if 0 < t ≤ 1
(2.1)
where Φ is the standard normal cumulative distribution function. Lan and DeMets
conjecture that if a process (not necessarily a Brownian motion process) is discretised,
such that one evaluates some test statistic at time t by defining a point b1 so that
Pr(b1 ≤ t) ≈ α∗(t), then the same optimality properties should hold for the discretised
process. In other words, if one calculates the probability of exceeding a particular value
at a given time t, where time is measured as information time, and a rejection region is
constructed so that this probability approximates the value of α∗(t) then one can spend
α at a rate which enjoys the same optimality properties of the Brownian motion process.
At the first evaluation, the probability of being in the rejection region is simply
Pr(Z > z1) ≈ α∗(t) where Z is the test statistic of interest, however, at any additional
evaluation, the numerical integration methods of Armitage et al [26] must be used. Any
increasing function with α(1) = α is acceptable as long as one specifies the curve prior
to the first interim look at the data. It is notable that the function α∗(t), where t is the
proportion of information time, gives a rejection region that closely matches the O’Brien-
Fleming boundary region. For this function, one would reject at any analysis, when the
proportion of information time observed is t, if the test is significant at the α∗(t) level
of significance. A couple of curves are described in the Lan-DeMets paper which closely
resemble the Pocock and O’Brien-Fleming to demonstrate the utility of this function.
2.2.4 Repeated Confidence Intervals
Other approaches to evaluating whether one should continue a study beyond an interim
analysis have been proposed. One approach is the use of repeated confidence intervals,
advocated by Jennison and Turnbull [31] [19]. Typically at the end of a study, a con-
fidence interval of size 1-α can be computed for the mean using (x − tn−1,α/2σ/√
n, x +
Chapter 2. Statistical Issues 11
tn−1,α/2σ/√
n. In the repeated confidence interval approach, the confidence interval can
be computed at each analysis using the same calculations, but with α(t) replacing α,
and α(t) calculated using the same α-spending function as defined by Lan-DeMets. The
repeated confidence interval approach has some useful advantages, such as demonstrating
a range of values which are compatible with the data. Knowing a range of compatible
values is often more informative than simple yes/no decision. Another distinct advantage
occurs when the primary outcome falls outside a confidence interval, but instead of stop-
ping a trial the data safety monitoring committee or trial investigators decide to continue
to the next stage, overruling the statistical design. Subsequent confidence intervals are
not affected by the previous decision to continue, unlike a hypothesis test type I/II error.
This is important since there are often many reasons why one may not strictly follow
the statistical guidelines. Opponents of this method point out that a 95% confidence
interval at the trial conclusion depends on the number of previous looks, and there are
questions as to what confidence interval should be reported, the naive, or the adjusted,
slightly conservative CI. Additionally, the sample sizes calculated based on the repeated
confidence intervals approach can be prohibitively large.
2.2.5 Bayesian Approach
Another method used with some regularity is the use of Bayesian [32] [33], or likelihood
methods [34]. Bayesians look at the problem from a whole different context, believing
that when one is unsure of the true value of a parameter, one can consider the true value
to be random. A mathematical formula attributed to Rev. Thomas Bayes
P (A|B) =P (A ∩ B)
P (B)=
P (B|A)P (A)
P (B)(2.2)
shows how one can estimate a (posterior) distribution of this random value (the true
parameter value) based on prior belief and the accumulated data using the equation for
conditional probability shown in equation (2.2).
Chapter 2. Statistical Issues 12
Many Bayesians consider this to be more in agreement with the approach to research
and the way individuals think than traditional methods [35]; that is, one takes the given
knowledge they have, add to it the additional information gained from a trial, and adjust
their beliefs. Modern computer power has allowed users to apply Bayesian methods
(e.g. using programs such as WinBUGS [http://www.mrc-bsu.cam.ac.uk/bugs/]) and
get usable results, and an additional attribute of these methods is that the posterior
distribution is not affected by the number of previous looks at the data, since, according
to the likelihood principle "if two sample points result in the same likelihood function
then they contain the same information about θ "[36]. As a result, inference about the
parameter of interest is based solely on the data at hand, and is unaffected by the number
of previous looks or what one might have done in future "identical "trials or how the
trial itself is designed [37]. A good overview and arguments promoting this principle can
be found in Royall [34] and for Bayesian methods in medicine in Berry and Stangl [38].
Unfortunately, Bayesian decision theory is not straightforward. The most pressing
question is what prior distribution to use [39]. Some advocate using a range of prior
distributions [40], however, many argue that a single trial should not produce a range of
outcomes. Use of a non-informative [41] or a reference prior [42] [43] is another alternative,
unfortunately, every prior contains some information [44] [45]. One possibility is to make
a conclusion only when the posterior distribution shows evidence which are convincing
beyond a reasonable doubt. However, what one considers beyond a reasonable doubt is
arbitrary, may require unreasonably high sample sizes, and finally, as with frequentist
approaches, the probability of rejecting a hypothesis based on a fixed probability value
will increase with increased looks at the data [39]. Thus, as with Table 2.1, initially, one
will reject a null hypothesis with probability one if one looks at the data infinitely many
times.
Chapter 2. Statistical Issues 13
2.2.6 Using a Risk/Loss Function
A final method which has been proposed is based on a risk, or loss function [46] [47].
Here, one must quantify the potential risk of continuing a trial and weigh it against the
potential loss of life by not treating patients with a potentially useful new drug. The
quantification is invariably difficult and arbitrary, and it is also extremely difficult to
attempt to quantify future costs, outcomes, and results. Thus, this method has been
severely criticized [48] and not widely implemented.
2.2.7 Stochastic Curtailment and Conditional Probability
A method called stochastic curtailment was proposed by Lan, Simon and Halperin in
1982 [49], however, by 2003, Sebille and Bellissant suggested this had yet to be widely
used in the medical literature. In stochastic curtailment a study is designed with a fixed
sample size. Interim analyses are then conducted and the probability of rejecting the null
hypothesis (assuming continuation of the study) at the end of the study is calculated using
P (Z > Zα|Z1), (2.3)
where, Z1 is the standard normal test statistic at the first interim analysis. This prob-
abibility is based on the test statistic at the interim analysis and the amount of infor-
mation available out of the total amount expected. If the probability is sufficiently low,
then one would have sufficient evidence to terminate a study early for futility, as there
would be insufficient probability to obtain a statistically significant result. One of the
unique aspects to stochastic curtailment based on conditional power calculations is that
it looks at the futility of continuing, whereas most methods look only at whether clear
superiority has been observed. It is becoming common to design a clinical trial based
on group-sequential methods, perhaps based on an α-spending function resembling the
O’Brien-Fleming method, but using stochastic curtailment to stop a trial when there is
Chapter 2. Statistical Issues 14
evidence that continuing the trial would be futile.
Unfortunately, the probability in equation (2.3) may be based on an incorrect assump-
tion, the assumption of normality under H0. One generally would not assume the null
to be correct for future data, since one already has observed some data (which typically
would not be equal to H0) and assuming that future data would be distributed as under
H0 would likely be underestimating the true value. Alternatively, assuming future data
is not from H0 implies that equation (2.3) may not be normal and causes problems about
the end of analysis test statistic. Further, assuming future data occurs as is specified
under the alternative hypothesis (HA), or that the data occurs as was observed in the
first part of the study, may be overly-optimistic. Methods have been proposed to account
for this, such as using a range of future data [50] [51], but there is no consensus at this
time.
2.3 Sample Size Re-adjustment [52] [53] [54] [55]
Alternative to group-sequential designs, another theory of statistical analysis which in-
vestigates what to do at interim analyses is sample size re-adjustment methods [52] [53]
[54] [55]. Whereas group-sequential designs have a fixed maximum sample size, sample
size re-adjustment methods take into account the data observed up to the interim anal-
ysis and adjusts the sample size for the next stage of accrual based on the accumulated
data. This might be useful, for example, if one designed a trial based on overall survival
as the primary outcome, but the null hypotheses of survival in the control arm was un-
derestimated (i.e. less deaths than expected), then the power of the trial may not be
sufficient to detect the effect size of which one is interested. This is because there will be
less deaths than expected at trial completion. The trial would be underpowered and if
the p-value is low, but does not attain statistical significance, then there would be many
questions about whether there is a true treatment effect or not.
Chapter 2. Statistical Issues 15
For example, one may assume that 20% of patients will respond to standard treat-
ment. Using a two-sided Fisher’s exact test (α = 0.05), one might deem an experimental
treatment of interest if the response rate increased by 50%, i.e. to 30%. To attain 90%
power, one would need 412 patients in each treatment arm (calculations performed using
NCSS-PASS [56]). However, if instead of a 20% response rate, the true response rate
to standard treatment was only 15%, then a 22.5% response rate would be of interest
(increased by 50%) and one would require 594 patients per arm to attain sufficient sta-
tistical power. If the trial was stopped at 412 patients, then one would have only around
76% power and questions might abound if the end of trial p-value was reported as 0.07.
This overestimation of the standard treatment response rate might be observed early on
in the trial, and it might be desirable to the sponsoring agency or company to increase
the trial sample size to maintain 90% power rather than end with an inconclusive result.
However, using standard methods, this is not possible.
A real-life example where sample size re-estimation might have proved helpful is
discussed by Cui, Hung and Wang [57]. They discuss a phase III trial investigating
the effect of a new drug which aimed to prevent myocardial infarction [MI] amongst
patients undergoing coronary artery bypass graft surgery. The original sample size was
600 patients per treatment arm, which had sufficient power to detect a 50 % reduction
in the incidence of MI, from 22 % in placebo to 11 % in the treatment group. A planned
interim analysis was performed after half the population was accrued and the conditional
probability of finding a statistically significant result was very low if the trial continued as
planned. This was because the MI incidence rate in the treatment group was around 16.5
%, well above the planned incidence rate, but still below the observed placebo rate of 22 %
and thus, still of clinical interest. Unfortunately, at the time of this trial, no valid testing
procedure was available in the statistical literature and the sponsor ultimately decided
not to increase the sample size. The trial eventually failed when the new treatment
did not achieve a large enough decrease in the MI incidence rate to attain statistical
Chapter 2. Statistical Issues 16
significance. Although the ultimate fate of the new drug was not discussed, one might
presume that if the decrease in MI incidence rate was real, a subsequent clinical trial
would be necessary at substantial cost to the sponsor and forcing a considerable delay to
the approval of a potentially beneficial drug.
Proponents of these methods indicate that these methods are flexible enough to ac-
count for situations where the initial hypotheses are misspecified [58], citing that if one
knew H0/HA well, then there would be no need for performing the trial. This may be
a common occurrence where initial estimates for designing a study are based on small
earlier studies, such as designing of a phase III trial based on efficacy estimates from a
small phase II study. Additionally, there may be times when investigators report a trend
towards significance if the p-value is just above the critical value for significance (say 0.05
< p-value<0.10). It would be desirable if the trial could be extended slightly to attain
statistical significance. By allowing one to re-adjust sample sizes, one can improve the
likelihood that a valid study conclusion is reached [59]. However, if an interim analysis
is performed after n1 patients are accrued, and the total sample size is n = n1 + n2,
then n2 ⇒ ∞, P (Z > Zα|Z1) ⇒ P (Z > Zα). It is therefore important to document how
the sample size will be adjusted prior to the start of the trial using one of the methods
discussed below.
Critics point out that the sample sizes using re-adjustment methods may be exceed-
ingly large, they may be drastically different than what was in the originally proposed
and for which the trial was budgeted [60] and the efficiency of a well-designed group-
sequential design far surpasses a well designed trial using sample size re-adjustment [61]
[62].
2.3.1 Variance Spending Approach
Fisher [52] and Shen and Fisher [53] advocate a variance spending approach to sample
size readjustment. In this method, the difference between treatment outcomes at the end
Chapter 2. Statistical Issues 17
of a trial is compared using a test statistic based on the standard normal distribution. At
an interim analysis, some portion of the total variance is spent and an adjusted normal
test statistic is calculated as a measure of the difference in outcomes between treatment
arms. The stage 2 data is similarly constructed. The two normal statistics are then
summed together, with the total variance equal to 1. Under the null hypothesis of no
difference, regardless of any difference in sample size, with appropriate adjustments, the
final test statistic is a standard normal, N(0,1).
Specifically, if one calculates the difference in outcomes as S1 = Σrni=1(XAi − XBi) for
treatment groups A=experimental and B=standard and X is the statistic of interest, then
at an interim analysis S1 ≈ N(rnθ, rn) where 0 < r < 1. One can then transform this
value by dividing by the square root of the sample size, to attain W1 = S1√n≈ N(r
√nθ, r).
One has spent r, 0 < r < 1 of the total sample size. Adjusting the total sample size to
n∗ = rn+γ(1−r)n and calculating S2 = Σn∗i=rn+1(XAi−XBi) ≈ N(γ(1−r)nθ, γ(1−r)n)
and W2 = γ−1/2S2√n
≈ N(√
γ(1 − r)√
nθ, 1 − r) then one can calculate at the end of the
trial Z = W1 + W2 = S1+γ−1/2S2√n
≈ N(0, 1) and one would reject if Z > Zα.
One of the major problems with the variance-spending approach is that it does not
allow stopping of the trial at the interim analysis, and one is required to continue on
to stage 2 (or later stages, as this can be easily generalised to >2 stages of accrual).
Additionally, if one performs an interim analysis too late in the trial, the second stage of
accrual might be prohibatively large to maintain the necessary power. However, if one
performs the analysis too early, than there is insufficient data to make an appropriate
estimate of sample size for stage 2.
2.3.2 Fisher Combination Test
The Fisher combination test is named after R.A.Fisher, who first proposed this method-
ology for combining p-values, and not L.D.Fisher of the variance spending approach.
The methodology may be better attributed to Bauer and Kohne [54] who borrowed this
Chapter 2. Statistical Issues 18
procedure from meta-analysis where it is commonly used.
Simply, under the null hypothesis of no treatment effect, a p-value follows a uniform,
U(0,1), distribution. If one defines pi to be the p-value obtained at the ith interim analysis,
then p1p2 ≈ exp−1/2χ24,α . So, regardless of the distribution of the data, one can combine
the results from 2, or more, interim analyses using this test statistic.
If the p-value is less than α at stage 1, one can stop the trial at the interim analysis.
Otherwise, one simply constructs the stage 2 data to be sufficiently large to have enough
power to attain a small enough p-value at stage 2 to declare significance in the trial
overall.
Similar to the variance spending approach, the stage 2 sample size may be prohiba-
tively large if the p-value for the first interim analysis is not close to significance. An
additional criticism is that the distribution of the data must be correctly specified, else
the p-value itself is inappropriate.
2.3.3 Conditional Power
The last method of sample size re-adjustment is a modification of the group-sequential
stochastic curtailing method which also uses conditional power. Whereas stochastic cur-
tailing assumes a fixed sample size and investigates the conditional power (conditioning
on the data already observed) of getting a significant result if one was to continue, in this
method, the level of conditional power is fixed, and the sample size required to attain
this power is calculated. In this manner, one is able to ensure sufficient statistical power
to obtain a valid study conclusion.
The idea of adjusting the sample size based on conditional power was proposed by
Proschan and Hunsberger [55], who investigate conditional power when comparing the
difference in a continuous outcome between two groups, as would be common in phase III
clinical trials. In this paper, the maximum αmax for two-stages of accrual is calculated
if one indiscriminately selects the stage 2 sample size to maximise the probability of
Chapter 2. Statistical Issues 19
getting a significant result. For example, if Z1 > Zα than one might choose a stage
2 sample of size 0, ensuring a significant result. Conversely, if Z1 is very small, then
one might choose a stage 2 sample size which approaches ∞, thus, making the stage
1 data of almost no importance (i.e. as n2 ⇒ ∞, z = z1 + z2 ⇒ z2). The total error
by indiscriminately choosing the stage 2 sample size in this manner is increased to be
αmax = α + exp(−z2α/2)/4 and can more than double the planned error rate.
A simplistic, but inefficient, way of performing a two-stage design is to then select
αmax to be the error rate of interest, and to select a sample size for stage 2 in any way one
chooses, knowing that the total α < αmax is acceptable. A simple change increases the
efficiency dramatically. For some k and p*, at any interim analysis, one stops the trial if
Z1 > k and rejects H0 or if Z1 < Zp∗ and does not reject H0 where Z1 is the Normally
distributed test statistic at the time of the interim analysis. Thus, one can stop a trial
after the interim analysis if extreme results are seen, and one will continue to stage 2
only if Zp∗ ≤ Z1 ≤ k. This lowers αmax considerably.
A statistically more desirable extension procedure is also proposed based on the calcu-
lation of conditional power, that being Pδ(Z2 ≥ zα|z1, zα, n2) = 1−Φ[zα
√2(n1+n2)−z1
√2n1−n2δ√
2n2]
where Φ(z1) is the cumulative standard normal density, n2 is the second stage sam-
ple size, z1 is the observed test statistic after the first stage, Z2 is the test statistic
at the end of the second stage and δ = µ1−µ2
σ. It is shown that by using the empir-
ical estimate δ = y1−x1
σthen the formula for conditional power can be expressed as
CPδ(n2, zα|z1) = 1−Φ(zA −√
n2/2δ) where zA = ZCPδ(n2,zα|z1). One can then plot differ-
ent conditional power estimates for different stage 2 sample sizes and choose the sample
size based on these plots. Conversely, one can set CPδ(n2, zα|z1) = 1−β2 and solve for n2.
In other words, if one wants a defined amount of power at the end of stage 2, conditional
on the data at the interim analysis, one can set n2 =(zA+Zβ2
)2
δ2 or n2 = n1(zA+zβ2
)2
z21
if
using the empirical estimate δ. Since the observed estimate δ may be overly optimistic,
an alternative estimate may be derived from HA, or a mixture between δ and HA. This
Chapter 2. Statistical Issues 20
can be generalised to more than 2 stages.
2.4 Combining Data from Different Analyses
2.4.1 Continuous and Categorical Outcomes
For any of the methods in the previous section to be valid, one must assume that future
data is independent of already observed data. If future data is correlated with already
observed data, then one needs to model this correlation to get valid test statistics or
decision rules, which is not possible without knowing the future data. Fortunately, the
statistical theory has already been provided which demonstrate this independence.
When one has continuous data, the significance test is based on assuming a normal
distribution under the null hypothesis, and one can easily perform multiple tests by as-
suming all data accumulated prior to the first evaluation follows a normal distribution Z1,
all data after the first evaluation but prior to the second evaluation follows an indepen-
dent normal distribution Z2, and so on until the end of the trial. Since the data in Z1 has
no influence on the data in Z2, the data are independent, and it is well known that the
sum of two (or more) independent normal distributions produce a normal distribution.
Thus, the accumulated data test statistic can be based on a normal distribution under
the null hypothesis.
Similarly, for categorical data, significance tests are often based on the χ2 test. Inde-
pendence between different test statistics can be assumed as in the continuous case, and
the sum of two (or more)χ2 is again a χ2. Thus, the accumulated data test statistic can
again be based on the χ2 distribution under the null hypothesis.
2.4.2 Survival-Type Outcomes
Unfortunately, with survival outcomes, significance tests are not as easily constructed.
The probability of death for a patient at the second and later evaluation is correlated
Chapter 2. Statistical Issues 21
with the probability of death at the first evaluation - since if a patient is dead at the first
evaluation, they necessarily must be dead at all future analyses, but a patient censored at
the first evaluation may be dead or alive at later analysis, but necessarily having a survival
time at least as long as observed in the first evaluation. It might be assumed that the log-
rank test at evaluation 1 is correlated with a log-rank test at evaluation 2. Fortunately,
Tsaitis [63] showed that the joint distribution of the test statistics from repeated log-
rank tests evaluated at information time points t1, t2, . . . , tk, converge asymptotically to
a multivariate normal with mean 0 and an estimatable covariance matrix. From this it
is shown that as long as any weight function used in computing a survival test statistic
has asymptotically independent increments (as is the case in the log-rank test), then the
recursively calculated process used at interim analyses will have independent increments.
Further, this process will depend only on data collected at that time and on the previous
test results, not on future tests or data.
The Wilcoxon test (using the approach specified by Gehan [64]) weight function is
dependent upon the number of patients accrued and alive at each time point, thus, does
not have asymptoticallly independent intervals. Slud and Wei [65] show how one can
estimate the correlation between repeated significance tests. They note however, that
this requires high dimensional integrals which may deter the practicality of repeated
significance tests based on the Wilcoxon statistic.
The generalization of the Wilcoxon test by Peto and Peto [66] and Prentice [67] does
not depend on the censoring rate and has been recommended as a preferred generalization
of the Wilcoxon test when there is censored data [68] [69]. The Peto-Peto-Prentice
generalization use a modified Kaplan-Meier estimator in the derivation of the density
function of the survival times. Since the weight function does not depend on the censoring
rate, the weight function has asymptotically independent increments. Thus, the methods
of Tsaitis [63] are applicable, as they were for the log-rank test. The recursive process
at the interim analysis, therefore, has independent increments and the process depends
Chapter 2. Statistical Issues 22
only on data collected at that time and previously, not on future results.
Practically, the Wilcoxon statistic is rarely used in group-sequential clinical trials,
partly because classical group sequential methods can not be used for group sequential
monitoring when the Gehan generalization is used, but also partly due to the emphasis on
early deaths when using the Wilcoxon test. If one saw many early deaths (by chance), one
would have more reason to stop a trial very early, and this would lead to questions later
regarding whether the trial should have continued. Most trials require sufficient time to
elapse before declaring superiority of one treatment over another to satisfy non-statistical
concerns of investigators, even when a purely statistical conclusion is evident.
2.5 Markov Chains
2.5.1 Definition
A Markov chain is a discrete-time stochastic process which has the Markov property.
Although frequently used in many research fields, including engineering, reliability and
quality control, they are less common in the biomedical literature. Simplistically, a system
of interest will change from one state to another (including potentially to the state it was
previously in) at discrete time points. Each change of state is called a transition. The
Markov property states that the conditional probability the system will be in a given
state after the next transition depends only on which state the system is in presently.
In other words, knowledge of what state the system was in previously does not give any
information about future states. Mathematically, this is defined as follows:
Let ω = {1, 2, . . . ,m}, (m < ∞) be a state space and let Yt = Y0, Y1, . . . , Yt, . . . be a
sequence of random variables defined on ω. Then, the sequence will be called a Markov
chain if, for any sequence Y0 = i0, Y1 = i1, . . . , Yt−1 = it−1, Yt = it, t = 1, 2, . . . , we have
P (Yt = it|Yt−1 = it−1, . . . , Y0 = i0) = P (Yt = it|Yt−1 = it−1) (2.4)
Chapter 2. Statistical Issues 23
[70]
We will only be interested in Markov chains with a finite state space ω in this thesis.
The transitions can be described succintly as P (Yt = it|Yt−1 = it−1) = pij(t) and over a
finite state space, a m x m transition matrix, M, can be written as:
M = (pij(t)) =
p11(t) p12(t) · · · p1m(t)
p21(t) p22(t) · · · p2m(t)
· · · · · · · · · · · ·
pm1(t) pm2(t) · · · pmm(t)
(2.5)
A state is defined as an absorbing state if once the system enters that state, it will
never leave that state.
2.5.2 Properties of Markov Chains
A Markov chain is considered homogenous if the transition probabilities are identical
at all transitions, and is considered non-homogenous if the transtion probabilities are
different at some time t. The transition from state i to state j at any single time
point, P (Yt = it|Yt−1 = it−1) = pij(t) is called a one-step transition and is a first order
Markov chain. If it occurs over a series of n time periods, then it is a n-step transition
P (Yt = it|Yt−1 = it−1)∗P (Yt−1 = it−1|Yt−2 = it−2)∗· · ·∗P (Yt−n+1 = it−n+1|Yt−n = it−n) =
p(n)ij (t) and is an nth order Markov chain. The n-step probabilities can be calculated by
multiplying together the one-step probabilities as indicated in the Chapman-Kolmogorov
equations, which states: for any 0 < k < n,
p(n)ij =
∑
r∈S
p(k)ir p
(n−k)rj . (2.6)
Using the Chapman-Kolmogorov equations, it is possible to calculate many statistics,
although this calculation may still be quite complex.
Chapter 2. Statistical Issues 24
In certain situations, an approach called finite Markov chain imbedding, which is
summarised by Fu and Koutras [71] can be used to simplify the calculations. They
stated that
a nonnegative integer random variable Xn,k can be imbedded into a finite
Markov chain if:
a) there exists a finite Markov chain Yt : t ∈ 1, 2, . . . , n defined on the finite
state space ω
b) there exists a finite partition Cx, x = 0, 1, . . . ,m on the state space ω, and
c) for every x = 0, 1, · · · ,m, we have P (Xn,k = x) = P (Yn ∈ Cx).
The subscript k was used to represent certain random variables of interest, such as the
number of consecutive positive states in a run, and would be dependent on the random
variable of interest. It is then show that if the random variable Xn,k can be imbedded
into a finite Markov chain, then
P (Xn,k = x) = π0(n
∑
t=1
Λt)U′(Cx) (2.7)
where π0 is the initial probability vector of the Markov chain, Λt is the t-step transition
matrix and U ′(Cx) =∑
r:ar∈Cx Ur where Ur is a 1 x m unit vector having 1 at the rth
coordinate and 0 elsewhere and U(Cx) is the end-state transition state(s) of interest.
Simplistically, this means that if one has an imbeddable random variable, then one can
calculate the distribution, moments and probability-generating function of interest. One
only needs to have a proper state space, a proper partition of the state space and the
transition matrix associated with the imbedded Markov chain [71].
Conceptually, if one has a system which can be modelled using a finite state Markov
chain and the state space can be partitioned finitely such that a random variable Xn,k
can be distributed in a way equal to the distribution of a finite partition, then the
random variable can be said to be imbedded in the finite Markov chain. When one
Chapter 2. Statistical Issues 25
has a finite imbedded Markov chain, then the distribution of the random variable can
be modelled concisely using only the initial probability vector (the starting point), the
t-step transition matrices (the path through the system) and the end state partitions
(the ending states of interest). Probabilities of interest are thus calculable and can be
estimated using computer simulation.
2.5.3 Markov Chains as a Model for Cancer Phase II Clinical
Trials
It is possible to model a patient receiving treatment in a phase II cancer clinical trial using
Markov chain methodology. At any given time, a patient will be objectively observed
as being in one of a finite number of states and patients will transition from one state
to another at selected time points (i.e. when the tumour measurements occur which for
phase II clinical trials is generally after every 2nd cycle of treatment). The transition
is independent of what occurred to that patient previously, but only depends on in
which state the patient is presently. Thus, tumour size is a random variable which can
be modeled as a Markov chain; one needs only to show that this random variable is
imbeddable and one can calculate the distribution and moments.
One scenario of interest might be to model the RECIST [17] criteria. One can easily
construct a proper state space, ω, by defining states, ∅, complete response [CR], partial
response [PR], unconfirmed response [UR], stable disease [SD], off-treatment but had a
previous best response of SD [SDoff], progressive disease [PD] and off-study / censored
/ failed treatment [C] which can be partitioned finitely depending the random variable
of interest (e.g. let Xn,k be the random variable defined by a 1 if a patient is in state
k=2,3 which corresponds to states CR or PR after n=4 transitions, and 0 otherwise).
An appropriate transition matrix is in 2.8. Frequently, complete responders and partial
responders are grouped for simplicity purposes as responders [R] and this is shown in
matrix 2.9. States CR, PR, SDoff, PD and C are absorbing states, with SDoff included
Chapter 2. Statistical Issues 26
since one takes the best confirmed response of patients while on-treatment.
Thus, the random variable Xn,k defined can be imbedded into a finite Markov chain
and the distribution, moments and probability-generating function can be estimated.
M =
∅ CR PR UR SD SDoff PD C
∅ 0 0 0 p∅−ur p∅−sd 0 p∅−pd p∅−c
CR 0 1 0 0 0 0 0 0
PR 0 0 1 0 0 0 0 0
UR 0 pur−cr pur−pr 0 0 pur−sdo 0 0
SD 0 0 0 psd−ur psd−sd psd−sdo 0 0
SDo 0 0 0 0 0 1 0 0
PD 0 0 0 0 0 0 1 0
C 0 0 0 0 0 0 0 1
(2.8)
The transition matrix 2.9 can be interpreted as follows. Patients enter the system
in state ∅ which is defined since the status of the patients tumour prior to treatment
is often unknown - although frequently assumed to be progressing, radiological tumour
measurements are usually not taken prior to trial entry. At the first objective tumour
evaluation, a patient can transition to be either an unconfirmed response (UR), in stable
disease, having progressive disease (PD), or censored (off-treatment due to a reason
other than PD). At the next transition, patients who have had an UR will either have
a confirmed response pur−r, or will come off-study without having an observed response
pur−sdo. The UR state is necessary since according to the RECIST criteria, one uses
the best observed response and a response occurs only after having a confirmation of
the response at least 4 weeks after the first observation. If a patient does not have
a confirmation of their response, they are deemed as having a best response of SD.
Patients with SD will either transition to have an UR psd−ur, will remain in SD psd−sd or
will come off-study psd−sdo. All other transitions are either performed with probability 1
Chapter 2. Statistical Issues 27
(such as patients with PD will remain in PD as that is their best observed response) or
with probability 0 (i.e. it is not possible to transition from state CR to SD according to
the RECIST criteria), see matrix 2.9.
M =
∅ R UR SD SDoff PD C
∅ 0 0 p∅−ur p∅−sd 0 p∅−pd p∅−c
R 0 1 0 0 0 0 0
UR 0 pur−r 0 0 pur−sdo 0 0
SD 0 0 psd−ur psd−sd psd−sdo 0 0
SDo 0 0 0 0 1 0 0
PD 0 0 0 0 0 1 0
C 0 0 0 0 0 0 1
(2.9)
To calculate the probabilities of interest, or the first moments, one needs to then
only define the random variable of interest in such a way to properly partition the state
space. Traditionally, one would be interested only in the response rate, so the appropriate
partition is C ′(t) = {0100000}. Alternatively, one could partition the state space as
C ′(t) = {0100100} which defines the random variable based on response and stable
disease when off-study. One might be interested in the disease + stable disease rate while
some patients are still being treated and on-study, thus the partition is C ′(t) = {0111100}
at the time of analysis at time k. Many alternative transition matrices and outcomes are
possible, and some are described in detail in the next chapter.
Although the exact distribution of the random variable can be calculated, one can
also use simulation to attain probabilities of interest, and use of simulation allows one
to attain many different probabilities quickly and easily with only minor changes to
the input. Further, since the transition matrix at different transitions are correlated, the
covariance structure may not be straightforward, thus, calculating p-values of interest are
often difficult when using theoretical methods. As a result, when calculating probabilities
Chapter 2. Statistical Issues 28
of interest, the results were simulated as opposed to strictly calculated.
Chapter 3
Potential Trial Designs for Phase II
Oncology Clinical Trials
When designing an oncology phase 2 clinical trial, there are numerous potential designs
one can use, even when fixing the maximum allowable error rates and the null and
alternative hypotheses. To illustrate the plethora of design alternatives, this chapter will
describe the many options available to a trialist for a given hypothesis test. Some pros
and cons for each design option will be discussed.
As a basis for discussion, an example trial is discussed and designs are illustrated in
similar context to this trial. Details of this trial are given more explicitly in the next
chapter.
3.1 Design Summary
Recently, a single-arm, open-label phase II study of CCI-779 (temsirolimus) was per-
formed in patients with neuroendocrine carcinoma [72]. Further details of this study will
be discussed in Chapter 4. The original design of the study was based on using response
rate [RR] as the primary outcome, with response defined as per the RECIST criteria [17].
Hypotheses were set at H0:RR=0.05 versus HA:RR=0.25, α = 0.05 and β = 0.10 and a
29
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials30
modification of the Simon Minimax design [9] used, such that a minimum of 30 patients
were to be accrued. This modification was added by the investigators to ensure sufficient
numbers of patients were accrued to fully evaluate the treatment clinically. Accordingly,
the design specified that 15 patients were to be accrued in the first stage. If 2 or more
patients had an objective response, 15 additional patients would be accrued in stage II.
At the end of stage II, one would reject H0 and deem the treatment worthy of further
study if 4 or more of 30 patients had an objective response, otherwise, one would not
reject H0 and deem the treatment inactive if one observed 3 or less of 30 patients with
an objective response. The true α = 0.045 and the true β = 0.096 for this design. The
probability of stopping after the first stage assuming H0 is 0.829 and the expected sample
size under H0 is 17.56.
While describing potential trial designs, estimates will be designed similar to the
design described. It is worth noting as well that many early phase II designs were based
on asymptotic results, which resulted in similar designs as are used in phase III trial
designs, however, modern computer power has allowed more recent designs to be based
on exact calculations, such as the frequently cited Simon design [9].
3.2 Phase II Designs
Even after defining α, β,H0 and HA, there are still many different designs one can choose
from when using a simple two-stage design. As an example, Table 3.1 gives a list of
possible, valid and commonly-used designs which can be chosen when one is investigating
H0 : RR = 0.05 versus HA : RR = 0.20, with α = 0.05 and β = 0.20. In the same clinical
scenario, a different investigator might be interested in the response + stable disease
rate [RR+SD] and not just the response rate [RR]. Setting H0 : RR + SD = 0.40
versus HA : RR + SD = 0.60, with α = 0.05 and β = 0.20. Table 3.2 gives the list
of possible decision rules for this scenario. Any one of the designs listed could be used
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials31
Design Primary Outcome Accept H0:stage 1 Accept H0:stage 2
Gehan [7] Response 0/14 not stated
Fleming [8] Response 0/15 ≤ 3/35
Simon optimal [9] Response 0/12 ≤ 3/37
Simon minimax [9] Response 0/18 ≤ 3/32
Jung design 1 [74] Response 0/15 ≤ 3/33
Jung design 2 [74] Response 0/13 ≤ 3/35
Jennison & Turnbull ∗ [31] Response 0/16 ≤ 2/35
Bayesian ∗ [16] Response ≤ 1/16
≤ 4/40 ≤ 6/57
Table 3.1: Potential Phase II Designs Using Response
∗ Other designs are possible
and the decision of which design comes down often to personal preference or investigator
familiarity with one of the designs. Uncertainty can develop when the trial results are
borderline. This uncertainty can be compounded if the individual stage targets are not
met exactly, if the trial design is not clearly stated, there is some question about the
natural history of the disease or investigators have different beliefs about the standard
of care response rate. Unfortunately, some, if not all, of these uncertainties are present
in most phase II cancer clinical trials.
A review of all these commonly used clinical trial designs is performed in this section.
In the following section, multinomial designs are reviewed - a further complication which
occurs when investigators count a patient who has a response as different than a patient
who has stable disese.
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials32
Design Primary Outcome Accept H0:stage 1 Accept H0:stage 2
Fleming [8] Response+SD ≤ 7/20 ≤ 22/45
Simon optimal [9] Response+SD ≤ 7/18 ≤ 22/46
Simon minimax [9] Response+SD ≤ 11/28 ≤ 20/41
Jung design 1 [74] Response+SD ≤ 11/27 ≤ 21/43
Jung design 2 [74] Response+SD ≤ 9/23 ≤ 22/45
Jennison & Turnbull ∗ [31] Response+SD ≤ 8/22 ≤ 19/45
Bayesian ∗ [16] Response+SD ≤ 10/23 ≤ 20/43
Table 3.2: Potential Phase II Designs Using Response & Stable Disease
∗ Other designs are possible
3.2.1 Univariate Designs With Response as the Outcome
Single Stage Design [73]
The simplest phase II clinical trial design is one in which all patients are accrued in a
single stage. Calculations based on the binomial distribution can be used, however, exact
calculations are preferred such as those provided by A’hern [73]. Based on H0:RR=0.05
versus HA:RR=0.25, α = 0.05 and β = 0.10, 25 patients are required. Rejection of H0
occurs if 4 or more patients have an objective response and non-rejection of H0 occurs if
3 or less patients have an objective response. The true α = 0.034 and the true β = 0.096
for this design, and the expected sample size under H0 is 25.
Gehan Design [7]
The Gehan design was formulated to allow for early termination of trials conducted on
inactive agents, and is often thought of as the classical phase II design. It is the first
widely used 2-stage phase II design and it is occasionally still used today, primarily due
to familiarity with this design amongst many exprienced trialists. The formulation of this
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials33
design was to reject as soon as possible when no responses are observed and the results
are no longer consistent with the assumption that the ’beneficial’ response rate is true. If
one continues beyond the first stage, the desire is to improve estimation of the response
rates, thus, sample size is determined by looking at the precision of the estimates, and
ensuring the standard error is within certain limits.
The Gehan design was formulated at a time when there was very little useful treat-
ments for patients and standard of care for most diseases had minimal if any efficacy
(response rates usually < 5−10%) and for this design one would define RRH0 = 0.05 and
RRHA= 0.20. Fourteen patients are to be accrued in the first stage, since the probability
of having no responses of the first 14 patients if HA is true would be 0.814 = 0.044 < 0.05,
the defined level of significance. As a result, the study would be terminated and the treat-
ment deemed uninteresting if none of the first 14 patients have a response since at this
time the results would be inconsistent with the assumption that the beneficial response
rate is true. If 1 or more patients had a response, the trial would continue to stage
2, where the number of patients in stage 2 would depend on the number of responses
observed in stage 1 and the desired level of precision. Assuming 1 patient had a response
and given the specified precision as having a standard error of < 0.10, then one would
need 11 additional patients in stage 2. This is calculated by noting that if 1 of 14 patients
had a response in stage 1, at most 12 of 25 patients could have a response at the end of
stage 2, which gives a standard error of 0.0999. Although different levels of significance
and standard error could be used, it is this basic design that is almost always peformed.
One of the major criticisms with this design is that the trial is based on estimation and
not hypothesis testing. While some argue that this is a better objective of phase II trials,
most trials are conducted under a hypothesis testing framework to exclude ambiguity in
reporting study results and early termination of ineffective trials. Additional criticisms of
this design include the lack of flexibility if the stage 1 accrual target is not hit exactly, the
total sample size depends on what is observed at stage 1 (a serious financial and ethical
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials34
consideration), and the actual standard error is likely quite different from the bound.
Fleming Design [8]
The Fleming design is derived from methodology developed for phase III clinical trials
[28] in which one can reject or not reject the null hypothesis at any one of K interim anal-
yses. Each stage of accrual is identically sized and the total trial error rate is nominally
preserved. A trialist thus defines the response rate under the null hypothesis (RRH0),
the response rate under the alternative hypothesis (RRHA), the type I error rate, α, and
the type II error rate, β, and from this the design can be constructed. The total trial
sample size is calculated based on asymptotic estimation, using the null response rate of
interest and assuming a single stage of accrual, by the formula:
N =(Z1−β
√
RRHA(1 − RRHA)) + Z1−α
√
RRH0(1 − RRH0)
(RRHA− RRH0)
2, (3.1)
where Z1−α is the 1 − α quantile of the standard normal distribution. Generally N is
rounded up to the nearest 5th patient (e.g. 5,10,15,20,...) for simplicity sake. One would
reject H0 in a single stage trial if the number of responses is at least
S ≥ [NRRH0 + Z1−α
√
NRRH0 ∗ (1 − RRH0)] + 1. (3.2)
Alternatively, one could design a trial using only N,α,RRH0 , or in the situation where
one has a fixed sample size due to practical concerns. In this case, the single stage RRHA
varies and is equal to
RRHA= pA =
[√
N ∗ RRH0 +√
1 − RRH0Z1−α]2
N + Z21−α
. (3.3)
We will assume that there is no ceiling on N in designing the hypothetical trial.
For a multiple stage trial, an interim analysis is conducted after half the patients have
been accrued, rounded to the nearest 5th patient. Rejection and acceptance points are
defined using the methods of [28], however, in a phase II trial after stage 1, only the
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials35
acceptance point is of interest at the interim analysis. The alternative would be rejected
after stage 1 of accrual if the number of patients with a response is less than or equal to
the smallest integer greater than
a1 = n1RRHA− Z1−α
√
NRRHA(1 − RRHA
), (3.4)
where n1 is the sample size after the first stage. At the end of the trial, after n = n1 +n2
patients are accrued, the null hypothesis would be rejected if the number of observed
responses is at least
r2 = (n1 + n2)RRH0 + Z1−α
√
NRRH0(1 − RRH0) + 1. (3.5)
In our example, the parameters RRH0 = 0.05, RRHA= 0.20, α = 0.10 and β = 0.10
are defined by the investigators, thus, the total N is calculated as
N = [(Z1−β
√
RRHA(1 − RRHA
)) + Z1−α
√
RRH0(1 − RRH0)/(RRHA− RRH0)]
2
= [0.96 ∗√
0.2 ∗ 0.8 + 1.645 ∗√
0.05 ∗ 0.95]/[0.20 − 0.05]2
≈ 33.
One might then round up the total sample size to n = 35, with an interim analysis
occurring after, n1 = 15 patients are accrued. At stage 1, one would accept H0 and stop
the trial assuming treatment inactivity if
a1 = n1RRHA− Z1−α
√
NRRHA(1 − RRHA
)
= 15 ∗ 0.20 − 1.645 ∗√
35 ∗ 0.20 ∗ 0.80
< 0,
thus, if 0 responses are observed at the end of stage 1. After the second stage, which is
at the trial conclusion, one would reject H0 if
r2 = (n1 + n2)RRH0 + Z1−α
√
NRRH0(1 − RRH0) + 1
= (15 + 20) ∗ 0.05 + 1.645√
(35 ∗ .05 ∗ .95) + 1 (3.6)
= 4.87, (3.7)
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials36
thus, if 5 or more responses are observed.
Extensions of this design can be made to account for k ≥2 interim analyses, but
given the short duration of phase II trials, practical concerns limit the number of interim
analyses to a single time. These designs are criticised because they are not optimal
and one might use excess numbers of patients compared to what is needed. Further,
the designs are constructed using asymptotic estimation procedures, but given modern
computing power, analysis is conducted using exact calculations and accrual targets need
to be met exactly.
Pocock Design [27]
When designing a phase III clinical trial, there are two designs which are frequently used
for setting up the α-spending function, those being the O’Brien-Fleming and Pocock
designs. While the Fleming design in the previous section is a simple modification of the
O’Brien-Fleming design to allow for use in a phase II setting, a similar modification can
be performed to allow the Pocock design to be used similarly.
The main difference between the two designs is the amount of α spent at each anal-
ysis with rejection of H0 being more difficult early on using the O’Brien-Fleming de-
sign. The O’Brien-Fleming design specifies that one would reject H0 at test g whenever√
n1+n2+...+ng
NYg(p0) > Z1−α, where Yg is the normal approximation test statistic at test
g. Conversely, the Pocock design specifies that one rejects H0 at any analysis whenever
Yg ≥ c where c = Zαp) is some constant such that the overall test has size α. For an
analysis with two stages, with α = 0.10 the corresponding level of c is approximately
1.53, corresponding to an α = 0.062. As a result, one could propose the following design
for a similarly constructed phase II trial, with 15 patients accrued in stage 1 followed by
another 20 in stage 2:
Accept H0 at the end of stage 1 if P (X ≤ x|RRHA) ≤ 0.062, which corresponds
to accepting if the number of responses is x = 0. Reject H0 at the end of stage 2 if
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials37
P (X ≥ x|RRH0) ≤ 0.062, which corresponds to rejecting if the number of responses is
x ≥ 5.
Thus, for this particular design parameters, the Pocock and O’Brien-Fleming designs
are identical, but this is not always the case. (for example, if a trial testing RRH0 = 0.05
vs RRHA= 0.20 was conducted in two-stages of 21 patients in each stage, the Pocock
trial would accept H0 if 0 or 1 of 21 patients had a response in stage 1 and reject H0 if
4 or more of 42 had a response, whereas the O’Brien-Fleming design would accept H0 at
stage 1 with 0/21 with a response and reject at stage 2 only if 5 or more responses were
seen).
Simon Design [9]
Simon used a computer program to calculate trial designs that satisfy RRH0 , RRHA, α,
and β and to identify optimal designs using exact calculations. Two designs were defined
as optimal, the two-stage design having the lowest expected sample size under the null
hypothesis, ESS(H0), was called the optimal design, and the two-stage design which had
the smallest total sample size, SSTOT , was called the minimax design. The computer
program started by selecting a starting N using
N = RR(1 − RR)[
Z1−α + Z1−β
RRHA− RRH0
]2
(3.8)
where RR = (RRH0 + RRHA)/2. By starting at a total sample size just smaller than
N, a search was conducted over all possible stage 1 sample sizes n1 ∈ (0, N − 1) and
rejection regions r1 ∈ (0, n1) and r ∈ (r1, N). The design having the smallest SSTOT
was defined as the minimax design. The process was then repeated for each successively
larger SSTOT until the ESS(H0) was consistently increasing. The design which had the
smallest ESS(H0) was then declared the optimal design.
For RRH0 = 0.05, RRHA= 0.20, α = 0.10, β = 0.10 the optimal design is as follows:
accrue 12 patients in stage 1 and accept H0 (reject treatment as inactive) if 0 responses
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials38
are observed. If one or more patients have a response, accrue 25 additional patients.
Accept H0 and deem treatment as inactive if 3 or less patients have a response of the
total 37 patients, but reject H0 and deem the treatment as potentially of interest if 4 or
more of the 37 total patients have a response. The exact β for this design is calculated
by
r1∑
i=0
(n1
i
)
(1 − h)ihn1−i
+
r∑
j=i
r−1∑
i=r1
(n1
i
)
(1 − h)ihn1−i ∗((n − n1)
j
)
(1 − h)jhn−n1+j =(12
0
)
∗ .812
+(12
1
)
∗ .811 ∗ .21 ∗
[(25
0
)
∗ .825 +(25
1
)
∗ .824 ∗ .21 +(25
2
)
∗ .823 ∗ .22]
+(12
2
)
∗ .810 ∗ .22 ∗ [(25
0
)
∗ .825 +(25
1
)
∗ .824 ∗ .21]
+(12
3
)
∗ .89 ∗ .23 ∗(25
0
)
∗ .825
= .069 + .020 + .008 + .001
≈ 0.098
where h = RRHA. The exact α for this design is calculated as
1 −r1
∑
i=0
(n1
i
)
(1 − RRH0)iRRn1−i
H0
+
r∑
j=i
r−1∑
i=r1
(n1
i
)
(1 − RRH0)iRRn1−i
H0
∗((n − n1)
j
)
(1 − RRH0)jRRn−n1+j
H0= 1 −
(12
0
)
∗ .9512
−(12
1
)
∗ .9511 ∗ .051 ∗ [(25
0
)
∗ .9525 +(25
1
)
∗ .9524 ∗ .051 +(25
2
)
∗ .9523 ∗ .052]
−(12
2
)
∗ .9510 ∗ .052 ∗ [(25
0
)
∗ .9525 +(25
1
)
∗ .9524 ∗ .051]
−(12
3
)
∗ .959 ∗ .053 ∗(25
0
)
∗ .9525
= 1 − .540 − .298 − .063 − .005
≈ 0.094.
The minimax design states that one should: accrue 18 patients in stage 1 and accept
H0 (reject treatment as inactive) if 0 responses are observed. If one or more patients have
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials39
a response, accrue 14 additional patients. Accept H0 and deem treatment as inactive if
3 or less patients have a response of the total 32 patients, but reject H0 and deem the
treatment as potentially of interest if 4 or more of the 32 total patients have a response.
The true α for the minimax design is 0.072 and the true β is 0.099.
Simon also gives design optimality characteristics for each design. The probability
of stopping after stage 1, assuming H0 is true, is 0.54 for the optimal design and 0.40
for the minimax design and ESS(H0) is 0.54*12+(1-0.54)*37=23.5 for the optimal design
and 0.40*18+(1-0.40)*32=26.4 for the minimax design. Although these designs are opti-
mal, there still remains some criticisms, notably the requirement to meet accrual targets
exactly and the possibility that neither the optimal nor minimax designs is practically
useful. This latter criticism might occur if both designs have stage 1 sample sizes which
are too small (near 0) or too large (near N ).
Compromise Designs [10] [74]
Given that neither the optimal nor minimax design of Simon might be practically useful,
Jung provided a way of selecting from a set of designs which are a compromise between
these two designs. In the original 2001 paper, Jung et al used graphical methods to select
a design and provided a downloadable JAVA program to assist in this selection. For each
N ∈ [SSTOTmin, SSTOTopt], where SSTOTmin is the total sample size for the minimax
design and SSTOTopt is the total sample size for the optimal design, the design with the
minimum ESS(H0) amongst all designs satisfying the error constraints is selected. The
values are then plotted for each design where the horizontal axis is the SSTOT and the
vertical axis is the ESS(H0). By exploring this plot, one can choose designs which may
have more practically useful design characteristics.
The method of selecting designs was formalised using Bayesian methods in the 2004
paper and the JAVA program updated by formalising admissible designs. Once possible
designs are plotted, one can think of admissible designs as those which in some way min-
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials40
imise the two optimality criteria. Graphically, this process was described by connecting
candidate designs between the optimal and minimax design using a convex hull. Any
design on this convex hull would be deemed admissible [75].
The more formal Bayesian framework is as follows: One can draw a straight line
qSSTOT + (1 − q) ∗ ESS(H0) = ρ on the (SSTOT , ESS(H0)) plane for any q, q ∈ [0, 1],
where SSTOT is the stage 2 sample size and ESS(H0) is the expected number under the
null hypothesis. This line thus has slope −q/(1 − q) and intercept ρ/(1 − q) where ρ is
the Bayes risk. By starting from small ρ and moving the line upwards, the first design
touched would be an admissible, Bayes design, with Bayes risk ρ∗, where ρ∗/(1−q) is the
intercept of the line. One can weight the optimality criteria depending on the relative
merit of each criterion, q ∈ [0, 1], and any design which is a Bayes design would be
considered admissible.
For the parameters outlined in this section, this procedure provides two additional
admissible designs. The first design accepts H0 if 0/15 patients have a response in stage
1 or ≤ 3/33 patients at the end of the trial have a response. The second admissible
design would accept H0 if 0/13 patients have a response after stage 1 or ≤ 3/35 patients
at the end of the trial have a response. However, which design to use is then based on
subjective opinion, so it is imperative to define the design prior to starting the trial.
Repeated Confidence Intervals [31]
Jennison and Turnbull suggest the use of confidence intervals in evaluating interim and
final trial results. One could stop a phase II trial early and reject a treatment as unin-
teresting if the upper bound of the associated confidence interval is less than the RRHA.
It is argued that since confidence intervals are an estimation technique, there is no ad-
justment necessary in terms of precision for confidence intervals performed after interim
analyses. While valid, it is also true that when used for decision making purposes, in-
cluding whether to continue accruing to a study, later constructed confidence intervals
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials41
are affected by prior decisions. An α-spending function is thus proposed [30] as done for
phase III trials to adjust for the width of the confidence interval and prior decisions. In
this manner, there is considerable flexibility if the accrual targets are not met.
For the trial described, one might aim to accrue 35 patients with an interim analysis
after 15 patients are accrued. However, it is possible that the accrual target after stage I
was missed, and the interim analysis was performed after the 16th patient was accrued.
One might specify an α-spending function which mimics the Pocock design [27], the
Pocock design being more conducive to stopping early. Thus, at the interim analysis,
45.7% of the data is accrued, and confidence intervals would be constructed at the nominal
α = 0.028985 level of interest. The exact 94.2% confidence interval if 0 responses are
observed would have upper bound at 0.199, thus, one would not continue. If one continued
to stage 2, the upper bound of a confidence interval constructed when 35 patients are
accrued would be 0.145 (0.187, 0.226) with 1 (2, 3) responses observed and one would
reject the treatment as ineffective if 0, 1 or 2 responses were observed.
Criticisms of this technique include questions as to whether a 1- or 2-sided confidence
interval should be used. For phase II studies, in accordance with the 1-sided hypothesis
testing designs, 1-sided confidence intervals are generally used for decision theory, but
final results are often reported as 2-sided confidence intervals. While adjustments to a
width of a confidence interval due to prior decisions (i.e. amount of α spent) makes sta-
tistical sense, these adjustments are not intuitive to non-statisticians. There are further
uncertainties regarding which confidence intervals to report at the end of a trial when an
interim decision is over-ruled due to practical concerns or for secondary outcomes. Al-
ternatively, some argue that one should not make decisions based on confidence intervals
and these intervals should be used for estimation purposes only and the same criticisms
of the Gehan method would then apply. These unresolved issues are often reasons why
the confidence interval approach is used less frequently than other methods.
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials42
Bayesian Designs [16] [76]
Although there are many Bayesian designs which could be used for statistical analysis
of a phase II clinical trial [77] [78] [79], the design as described by Thall and Simon
[16] remains one of the most user-friendly designs. Using this methodology, the outcome
of interest remains response, and the response distribution is defined as a Beta-binomial
with parameters α and β. To elicit priors, the investigators are asked to provide the mean
response rate of standard therapy (µs = αs/(αs + βs), the width of a 90% probability
interval, W90, for the standard treatment response rate and a targeted improvement δ.
In essence the investigators must provide their belief of the standard treatment response
rate and the strength of this belief. The statistician must formulate this belief into a
proper prior distribution, in discussion with the investigator.
The distribution of the experimental treatment is defined as πe ≈ β(αe, βe) and
guidelines are suggested for eliciting the prior distribution. Specifically, let ce = αe + βe
and 2 ≤ ce ≤ 10, where ce describes the prior knowledge of the experimental treatment,
and the mean πe is equal to µs + δ/2. This formulation leads to prior parameters for the
experimental distribution of αe = ce(µs + δ/2) and βe = ce[1 − (µs + δ/2)].
The posterior probability is
λ(x, n; πs, πe, δ) = Pr(Θs + δ0 < Θe|Xn = xofn) (3.9)
=∫ 1−δ0
0[1 − F (p + δ0; αe + x, βe + n − x)]f(p; αs, βs)dp
Since Θe|Xn ≍ β(αe + Xn, βe + n − Xn), a decision rule is defined by the following:
1) If Xn ≥ Un, stop and declare the experimental treatment of interest for further
study, else
2) if Xn ≤ Ln, stop and delcare the experimental treatment inactive, else
3) if Ln < Xn < Un treat another patient, where Un is the smallest integer such that
λ(x, n; πs, πe, 0) ≥ pu and Ln is the largest integer such that λ(x, n; πs, πe, δ) ≤ pl and
pu, pl are pre-defined probabilities. Again, if Xn ≥ Un one might practically desire to
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials43
continue treating additional patients as is custom.
Thus, for this trial, the standard treatment may be believed to have a mean response
rate of 0.05 and W90 may extend from 0.02 to 0.10. This would give a distribution
similar to that shown in Figure 3.1 which can be easily generated using most statistical
packages. The figure was generated from a β(5, 95) distribution which gives a W90 of ≈
0.08, ranging from 0.020 to 0.099 and mean µs = α/(α + β) = 5/100 = 0.05.
0.0 0.05 0.10 0.15
Standard Treatment Response Rate
Figure 3.1: Potential Distribution for Standard Treatment Response Rate
The investigator might also deem that the targeted improvement in response rate is
15% (consistent with the frequentist designs) and, for the first trial with this treatment
combination, that there is little to no prior knowlege of the experimental treatment effect.
Thus, one might reasonably set ce = 2. From this, the experimental treatment would
have prior distribution β(ce(µs + δ/2), ce[1− (µs + δ/2)]) = β(20.05+0.152
, 2[1− 0.05+0.152
]) =
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials44
β(0.25, 1.75). According to [16], one might set SSTOT arbitrarily, however in [76], two
suggestions are made — to base SSTOT such that the width of the posterior credible
interval is less than some value, or to use frequentist power-type calculations to make
sure the false-positive or false-negative rates are within certain limits.
Using the first method, one might set the posterior mean to equal the targeted value
since the posterior distribution is not known a priori. In our study, we are targeting
an improvement of 15% for a total targeted response rate of 20%. Then, if we want a
95% credible interval to have width of less than 0.20, we would get the percentiles of a
β(r + 0.25, n− r + 1.75) distribution, where r/n = 0.20. Here, the maximum sample size
would be 57, although one might round to 60 for simplicity.
Given the relatively large targeted improvement, one might postulate that one would
need strong evidence before stopping a trial early, and set λ ≤ 0.04. In other words,
to stop the study early, we require that the experimental treatment response rate to be
superior to the standard treatment response rate by at least 0.15 with probability 0.96.
Using simulation to calculate probability estimates, one would then stop the study and
conclude the experimental treatment is uninteresting if one observes 0 of 5 patients with
a response, 1 of 16, 2 of 25, 3 of 33, 4 of 40, 5 of 48 or 6 of 55.
Practically, one issue with this design is the potential for stopping after only 5 patients
are accrued, but this highlights one of the criticisms of phase II clinical trials, notably that
the perceived minimum beneficial response rate is usually over optimistic and unrealistic.
One might specify a minimum number of patients needed for treatment prior to stopping
based on these practical concerns.
Looking at the design parameters specified, W90 for the standard treatment has an
upper bound on the response rate<0.10. With 57 patients, a 95% credible interval for
the posterior mean response rate for the experimental treatment has width <0.20. If
the targeted minimum beneficial response rate is 0.20, then the lower bound on the
95% credible interval will be >0.10. As a result, if these distributions are real, then
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials45
there would be almost complete separation between the standard treatment response
rate distribution and the posterior experimental treatment response rate distribution
with as little as 57 patients. Since advancements in oncology treatments are generally
quite small, this is unrealistic, and one is generally more interested in smaller, more
difficult to find improvements. However, practical concerns limit the sample size of these
trials and require this improbable targeted improvement.
While Bayesian designs have many proponents, there are some criticisms of these
methods as well. Notably, there is no specific test at the end of the trial, which some
argue is necessary as investigator judgements are likely clouded by their personal obser-
vations for there own patients (usually a subset of the entire trial sample). There are
ways to define a test to decide whether a treatment is worthy of further study or not,
however, multiple testing issues are then the same as in frequentist designs. Addition-
ally, the subjectivity in defining priors, and the rejection probability, has many critics,
although it is noted that the hypotheses and error rates for a frequentist design are also
subjective, and more restrictive, than Bayesian designs. Practically, Bayesian designs
are often disliked by non-statisticians who are unfamiliar with the terminology, and the
fact that these designs usually require a larger sample size, so a statistician who favours
Bayesian designs must have a good working relationship with the primary investigator to
approve a design. The supposed main advantage to using Bayesian designs is the explicit
incorporation of prior information, however, this is also the main disadvantage as many
investigators would argue that one should only consider results directly from the trial.
3.3 Multinomial Designs
The previous designs are all univariate, in that they all use only best response as the
primary outcome. This can have major implications, especially with the recent empha-
sis in clinical oncology on molecularly-targeted agents (MTAs) as opposed to cytotoxic
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials46
agents. MTAs have a different mechanism of action and these agents may be effective in
preventing tumour growth as opposed to simply shrinking the tumour. The most notable
instance of this occurring is in the use of Sorafenib in renal cell cancer [80]. In this break-
through trial, the response rate of patients treated with Sorafenib was only 4%, which
would ordinarily be clinically uninteresting. However, 70% of patients had stable disease
which lasted 12-weeks or more. While the statistical trial design was based on response
alone, and would recommend accepting H0 and deeming the treatment as uninteresting,
the extremely high stable disease rate was observed and deemed noteworthy enough to
use in phase III confirmatory testing along with other agents showing promise [81] and
later approved by health regulatory bodies.
While it is possible to create a single univariate outcome, such as defining a good
outcome as either objective response OR stable disease, this is not always satisfactory.
One problem with using a single outcome measure, like response, is exemplified in Fig-
ure 3.2. This figure shows three hypothetical tumour responses to treatment as per the
RECIST criteria [17]. According to RECIST, the maximum diameter of all measur-
able tumours (≥2mm in diameter) are summed at each response evaluation time. After
treatment, each evaluation is compared with baseline. If the summed value is ≥ 30 %
smaller compared to baseline, the patient is defined as having a partial response. A
complete response occurs only when all the lesions have completely disappeared, but due
to the small number of complete responses which occur, partial and complete response
outcomes are almost always combined when evaluating treatment efficacy. Patients who
have a growth of ≥ 20 % as compared to the nadir, or the smallest sum of diameters, is
considered to be having disease progression, and almost always will be taken off-study
at this point. Patients who are neither in response, nor progressing, are classified as
having stable disease. The best response at any time during the study is generally used
for determining treatment efficacy. Additionally, to have a best response in the study, a
response evaluation must be confirmed with a second measurement of response at least
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials47
4 weeks after the first evaluation.
Evaluation
Tum
our
Shr
inka
ge/G
row
th C
ompa
red
with
Bas
elin
e
0 1 2 3 4 5 6
0.6
0.8
1.0
1.2
1.4
Patient 1Patient 2Patient 3
Patient 1 - PRPatient 2 - PRPatient 3 - SD
Figure 3.2: Tumour Shrinkage and Growth for Three Hypothetical Patients Over Time
In Figure 3.2, all three patients in this hypothetical example have tumour shrinkage,
but at different speeds. Patient 1 has an immediate shrinkage, followed by substantial
growth. Patient 2 has substantial shrinkage which continues for a lengthy period of time
before stabilising, and patient 3 has slow but steady shrinkage until the 3rd evaluation at
which time they stabilise. Also note that censoring may occur due to patient withdrawl,
excessive toxicity, because the patient remains on treatment at the time of analysis or
completion of treatment as per protocol. According to RECIST [17] both patient 1 and
2 have a partial response since they experienced ≥ 30% shrinkage, and both would be
considered superior to patient 3, who has stable disease as the best response. Clearly
this does not agree with clinical practise. The response of patients 2 and 3 to treatment
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials48
would generally be considered superior to the response of patient 1. Further, if censoring
occurred at say evaluation 3, only patient one would be counted as having a PR and
would be thought to have a better outcome than either of the other 2 patients. Thus,
the typical method of analysis, to use a single outcome measure based on best response,
is insufficient.
Alternative statistical designs which use both response and stable disease endpoints
simultaneously have been proposed and are described below. While these designs are an
improvement in situations where one might be interested in both outcomes, there still
remains work to make these designs correspond to the clinical thought process.
3.3.1 Zee Design [11] [82]
Previous to the design proposed by Zee et al., a number of designs were proposed [83]
[84] [85] [15] which have multiple outcomes, generally toxicity and response, however they
did not use a multivariate design, but a dual-binomial outcome i.e. a design having two
separate outcomes. Zee et al. proposed a multinomial design based on the belief that
an ineffective treatment would not only have little response, but would have lots of early
progressions. Thus, both the response rate and the early progression rate needs to be
defined for decision criteria to be constructed. To mimic the design thought process of
the univariate designs as closely as possible, one might compare H0: response=0.05 AND
early progression=0.60 versus HA:response=0.20 AND early progressions=0.40. The use
of AND in both H0: and HA: is purposeful even though Zee et al. used OR in the
definition of HA. This is because the construction of boundaries for statistical testing,
and as a result the error rates, are calculated under the assumption of both response and
early progression hypotheses being true, not one or the other [86]. Error rates were set
at α = 0.10 and β = 0.10. For this design, one must use the program provided by the
authors.
Using this design, one might perform a trial with 30 patients in total, with an interim
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials49
analysis after 15 patients are accrued. One would conclude to not reject H0 and stop the
trial if i) 0 patients respond and 8 or more early progressions are observed, or ii) 13 or
more early progressions are observed regardless of the number of responses at the interim
analysis. After stage 2, with 30 patients, one would reject H0 if i) 1 or 2 responses and
≤ 20 early progressions are observed, if ii) 3 responses and ≤ 21 early progressions are
observed or if iii) 4 responses and any number of early progressions are observed. The
trial-wide α = 0.1116, β = 0.0848 and the expected sample size is 20.118 with probability
of stopping at stage 1 is 0.6588 assuming H0.
This design is criticised in that if the treatment prevents tumour growth for all pa-
tients, but does not shrink the tumour, the trial design will accept H0 at stage 2 and the
treatment will be deemed inactive. As an example, the Sorafenib in renal cell carcinoma
trial previously discussed [80] had a response rate of 0.04 and a 12-week stable disease
rate of 0.70. With a response rate of 0.04, the probability of observing 0 responses in
30 patients is > 0.29. So, if 0 responses and 21 (70%) stable diseases were observed in
the trial, the Zee design would incorrectly conclude that the treatment was ineffective.
While proponents argue that this is an extreme case, it is precisely this type of extreme
case that one wishes to identify using a multinomial design.
Using a Zee multinomial design, one would need to observe at least a few responses in
addition to the large number of stable diseases to declare efficacy (see Figure 3.3). After
stage 1 (Figure A), one would reject H0, accept H0 or continue to stage 2. After stage 2
(figure B), one would only reject H0, thus accepting the hypothesis of some drug activity,
only if one observed at least some measure of response. Using this design would result
in an incorrect conclusion of drug inactivity. If a treatment is cytostatic, the calculation
of the α and β errors in the multinomial design of Zee is based on two exact scenarios,
i.e. the test is based on comparing a response rate of x1 AND early progressive disease
rate of x2 versus a response rate of y1 AND early progressive disease rate of y2. What
investigators often are interested in is the situation where the alternative is y1 OR y2.
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials50
Figure 3.3: Decision Process for Zee [82] multinomial design. Figure A is the decision
rule after stage 1 and Figure B is the decision rule after stage 2
3.3.2 Trinomial Design [12]
A similar design to the Zee design was proposed by Panageas. The difference is that
instead of using response and early progressions, the trial was based on complete response
and partial response rates, which can be easily extended to response rate and stable
disease rate. To replicate the null and alternative hypotheses tested for the Zee paper, one
would then test H0:response rate=0.05 AND stable disease rate=0.35 versus HA:response
rate=0.20 AND stable disease rate=0.40. Note the difference between the good stable
disease rate and poor stable disease rate is only 0.05, and as a result this would require
thousands of patients (my computer crashed when attempting to calculate the exact
sample size due to the large memory needed).
Re-setting H0: response rate=0.05 AND stable disease rate=0.20 versus HA: response
rate=0.15 AND stable disease rate=0.35 resulting in a trial design where 10 patients are
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials51
acrued in stage 1 and a further 17 patients in stage 2, for a total of 27. At stage 1, one
would accept H0 (and declare the treatment as uninteresting) if one observes i) 0 responses
and ≤3 stable diseases, ii) 1 responses and ≤1 stable diseases, or iii) 2 responses and 0
stable diseases. At the end of stage 2, one would reject H0 and declare the treatment
of interest if one observed i) 0 responses and ≥10 stable diseases, ii) 1 response and ≥8
stable diseases, iii) 2 responses and ≥7 stable diseases, iv) 3 responses and ≥5 stable
diseases, or v) ≥ 4 responses and any number of stable diseases. The expected sample
size and probability of stopping after stage 1 under H0 is 17.19 and 0.59 respectively.
The boundaries for this design are constructed in a manner appropriate for the ques-
tions of interest, however, only optimal designs are provided in the manuscript and the
computer program provided. The calculations are complex and not easily computable,
so if accrual targets are not met, there is no easy way to calculate an appropriate bound-
ary. Further, the α and β errors are valid only for the joint multinomial hypothesis, and
might not be accurate for the marginal hypothesis if one of complete or partial response
is true but the other is not. Finally, this design weights response and stable disease rates
equivalently, however, clinicians may put more emphasis on a patient who has a response.
3.3.3 Dual-Response Design [13]
Lu et al note that clinicians place different emphasis on patients who have complete
response compared with patients who have partial response. Additionally, they note
that phase II oncology clinical trials have generally been performed using total response
as the outcome of interest, where the number of total responses is the number of par-
tial+complete responses. As a result, they have proposed a design, using exact calcu-
lations, which compares the rate of total responses and the rate of complete responses
simultaneously. They note, similar to Panageas et al, that while the design is proposed
based on total response and complete response outcomes, it can be easily revised to a
design based on total response and total response+stable disease outcomes; in fact, the
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials52
example provided in the paper is based on this revised dual outcomes.
It is important to note that the outcome of the number of total responses is ≤ the
number of total responses+stable diseases outcome. As such, the hypotheses to be tested
change from
H0 : RRH0 and SDH0 versus HA : RRHAor SDHA
(3.10)
to
H0 : RRH0 and RR + SDH0 versus HA : RRHAor RR + SDHA
. (3.11)
A rejection region is constructed, R1 = (XRR ≥ rRR or XRR+SD ≥ rRR+SD), which corre-
sponds to this dual response hypothesis. Given this rejection region, the type I error can
be calculated as Pr(R1|RRH0 and RR + SDH0) and the type II error can be calculated
as the value of (RRHA, RR+SDHA
) which maximises 1−Pr(R1|RRHA
⋃
RR + SDHA).
Since
RRHA
⋃
RR + SDHA= (RRHA
⋂
RR + SDHA) (3.12)
+(RRHA
⋂
RR + SDHA) + (RRHA
⋂
RR + SDHA),
where H is the compliment of H. As a result, to calculate the maximum β, one can
simply calculate the maximum β for each of the three areas defined on the right side of
equation 3.12. Therefore,
min
(RRHA
⋂
RR + SDHA)
Pr(R1|RR, RR + SD) ≥ Pr(XRR ≥ rRR|RR = RRHA) = 1 − βRR. (3.13)
Similarly,
min
(RRHA
⋂
RR + SDHA)
Pr(R1|RR, RR + SD) ≥ Pr(XRR+SD ≥ rRR+SD|RR + SD = RR + SDHA) (3.14)
= 1 − βRR+SD .
and
min
(RRHA
⋂
RR + SDHA)Pr(R1|RR, RR + SD) ≥ max(1 − βRR, 1 − βRR+SD). (3.15)
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials53
As a result of equations 3.13-3.15, the minimum power is 1−Pr(R1|RRHA
⋃
RR + SDHA) =
1 − max(βRR, βRR+SD) and the maximum β error is max(βRR, βRR+SD).
In designing a two-stage study, an early stopping region is formed after n1 patients
are accrued with bounds XRR ≤ sRR and XRR+SD ≤ sRR+SD for two points sRR, sRR+SD.
Thus, for fixed α, βRR and βRR+SD, a computer program was used to construct acceptable
designs by examining all possible choices of n1, n, sRR, sRR+SD, rRR, rRR+SD and minimax
and optimal designs chosen, where a minimax design was the design with smallest n and
optimal design had smallest ESS(H0).
Returning to our example, we wish to test H0: RRH0 = 0.05 AND RR + SDH0=0.25
versus HA: RRHA=0.15 OR RR + SDHA
=0.50, and we will set maximum error rates
of α ≤ 0.10 and β = max(βRR, βRR+SD) ≤ 0.20. The β errors are inflated as there are
two possible errors (i.e. if response alone is sufficient, or if response+stable disease is
sufficient) and the sample size would become unreasonably large if one was too restrictive.
It is noted that one could put different β errors on each alternative rejection scenario.
Using the software provided by the authors, one can compute the minimax and optimal
designs under these constraints.
Specifically, the minimax design would be to accrue 29 patients in stage 1 and continue
to stage 2 if one observed either 2 or more responses or 12 or more response+stable
diseases. In stage 2, an additional 15 patients would be accrued for a total of 44 patients,
and one would reject the null hypothesis (i.e. deem the treatment of interest for further
study) if one observed 5 or more responses or 16 or more response + stable diseases. The
expected sample size using this design is 35.6 and the probability of stopping after stage
1 assuming the null is true is 0.56. By contrast, the optimal design would stop after stage
1 if one observed 0 or 1 responses and 8 or less response + stable diseases after accrual
of 22 patients. In stage 2, 27 additional patients would be accrued for a total of 49, and
one would reject the null hypothesis if one observed 5 or more responses or 18 or more
response + stable diseases. This design has an expected sample size under H0 of 31.0
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials54
and a probability of stopping after stage 1 of 0.67.
As can be seen, the total sample size (44 or 49) is quite a bit larger using this
design than in the previous designs. This, however, is largely due to the response rate
comparison (0.05 versus 0.15). Even a Simon minimax two-stage design based on this
comparsion with α = 0.10 and β = 0.20 requires 44 patients at the end of stage 2. If we
increased the response rate of interest (under HA) to be 0.20, then the minimax design
requires a much more feasible 38 total patients and the optimal design 46 patients at the
end of stage 2, even with α = β = 0.10.
3.3.4 Weighted Response Design [14]
An alternative approach to the multinomial design was proposed by Lin and Chen, who
use weighted likelihood methods to design trial parameters and then exact methods to
construct optimal designs. In this situation, patients can have one of 3 possible outcomes,
say response, stable disease or progressive disease. Setting p0i to be the probabilities of
having each outcome i = 1, 2, 3 under H0 and p1i be the probabilities of having each
outcome i = 1, 2, 3 under the alternative, then the trinomial likelihood is defined as in
equation (3.16) at the end of a trial when x1 patients had the first outcome (response),
x2 patients had the second outcome (stable disease) and n − x1 − x2 patients had the
third response (progressive disease).
Λ =(
p01
p11
)x1(
p02
p12
)x2(
1 − p01 − p02
1 − p11 − p12
)n−x1−x2
. (3.16)
With appropriate re-arrangement it is shown that the trinomial log-likelihood is a mono-
tone function of the number of partial responses plus the number of complete responses
multiplied by some weight, as shown in equation (3.18).
log(Λ) = x1 log(
p01
p11
)
+ x2 log(
p02
p12
)
+ (n − x1 − x2) log(
1 − p01 − p02
1 − p11 − p12
)
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials55
= x1 log(
p01
p11
)
+ x2 log(
p02
p12
)
+ (n − x1 − x2) log(
1 − p0
1 − p1
)
= x1[log(
p01
p11
)
− log(
1 − p0
1 − p1
)
] + x2[log(
p02
p12
)
− log(
1 − p0
1 − p1
)
] + n log(
1 − p0
1 − p1
)
= x1[log(
p01
p11
)
− log(
1 − p0
1 − p1
)
− log(
p0
p1
)
+ log(
p0
p1
)
] +
x2[log(
p02
p12
)
− log(
1 − p0
1 − p1
)
− log(
p0
p1
)
+ log(
p0
p1
)
] + n log(
1 − p0
1 − p1
)
= x1[log(
p01p1
p11p0
)
− log(
(1 − p0)p1
(1 − p1)p0
)
] + x2[log(
p02p1
p12p0
)
− log(
(1 − p0)p1
(1 − p1)p0
)
] + C.
(3.17)
By setting ω = θ−µθ−ν
, θ = log(
p1(1−p0)p0(1−p1)
)
, µ = log(
p1p01
p0p11
)
, ν = log(
p1p02
p0p12
)
and C = n log(
1−p0
1−p1
)
,
then equation 3.17 becomes
log(Λ) = x1(µ − θ) + x2(ν − θ) + C
= −(θ − µ)(θ − ν
θ − ν)x1 − (θ − ν)x2 + C
= −(θ − ν)(ωx1 + x2) + C (3.18)
and hence, the log-likelihood is just a monotone function of the weighted score ωx1 + x2.
The primary question becomes how to define ω. Note that the use of partial responses and
complete responses can easily be replaced by the number of (prolonged) stable diseases
and responses as needed depending on the tumour and trial requirements.
Interpreting equation (3.18) and defining ω can be further simplified by noting that the
above equations are simply ratios and proportions of the values of interest. Specifically,
defining r as the odds ratio of having a response under HA, r =(
p1(1−p0)p0(1−p1)
)
, r0 and r1 as
the proportion of responses under H0 and HA respectively, r0 = p00/p0 and r1 = p11/p1
then ω depends on (p0, p1, p00, p01, p10, p11) only through (r, r0, r1). Thus
ω =θ − µ
θ − ν=
log(
p1(1−p0)p0(1−p1)
)
− log(
p1p01
p0p11
)
log(
p1(1−p0)p0(1−p1)
)
− log(
p1p02
p0p12
) =log(r) − log( r1
r0)
log(r) − log(1−r0
1−r1)
=log
(
rr1
r0
)
log(
r(1−r0)1−r1
) (3.19)
Limiting ω > 1, which restricts to the situation where a complete response is deemed
more important than a partial response (or equivalently, where a response is deemed more
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials56
important than a stable disease), then by defining p0, p1, r0 and r1 there is a unique ω,
which Lin and Chen call the likelihood ratio weight, and is the increased weight associated
with a patient having a complete response compared with a partial response.
To design a clinical trial, one would accrue n1 in the first stage and n2 in the sec-
ond stage for a total sample size of n = n1 + n2. A weighted score s is calculated
as ω times the number of complete responses plus the number of partial responses ob-
served (i.e. s = ω ∗ x1 + x2). One simply needs to find critical values at the end of
the first stage, s1, and at trial termination, s, such that error rates α and β are satisti-
fied given (n1, n, s1, s, ω, p0, p1, r0, r1). Lin and Chen searched over all possible values of
(n1, n, s1, s, ω) for selected (p0, p1, r0, r1) to find optimal and minimax designs. However,
a more realistic approach may be to fix n1 and n prior to the start of a trial based on
practical concerns.
To use this design in our example, set p0 = 0.25, r0 = 1/5, p1 = 0.50, r1 = 3/10, where
r0 and r1 are chosen due to the recommendation from Lin and Chen. Then from equation
(3.19) we have:
ω =log
(
rr1
r0
)
log(
r(1−r1)1−r0
)
=log
(
3(3/10)1/5
)
log(
3(7/10)4/5
)
=log(4.5)
log(2.625)
= 1.56.
If n1 = 15, n2 = 15 was fixed, there are still infinite s1, s which can be calculated using
a program provided by the authors. One such design which satisfies the α and β errors
is to accept H0 at the end of stage 1, terminate the trial and declare the treatment as
uninteresting if s1 ≤ [5.12, 5.56). This corresponds to accepting H0 if there are 0 responses
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials57
and 5 or less stable diseases, 1 response and 3 or less stable disease or 2 responses and
0 stable diseases. At the end of stage 2, one would reject H0 and declare the treatment
of interest if one observes s ≥ (11.12, 11.24], which corresponds to observing 0 responses
and 12 or more stable diseases, 1 response and 10 or more stable diseases, 2 responses
and 9 or more stable diseases, 3 responses and 7 or more stable diseases, 4 responses and
5 or more stable diseases, 5 responses and 4 or more stable diseases, 6 responses and 2 or
more stable diseases, 7 responses and 1 or more stable disease, or 8 or more responses.
This is summarised in Table 3.3.
Stage 1 Stage 2
s1 n1 Response \ SD s n Response \ SD
[5.12,5.56) 15 0 \ 5 [11.12, 11.24) 30 0 \ 11
1 \ 3 1 \ 9
2 \ 0 2 \ 8
3 \ 6
4 \ 4
5 \ 3
6 \ 1
7 \ 0
Table 3.3: Acceptance Region for Hypothetical Trial using Lin and Chen Design [14],
comparing H0 : RR = 0.05 and SD = 0.25 versus HA : RR = 0.15 and SD = 0.50
While the value of ω is defined based on the values (p0, p1, r0, r1), one could arbitrarilly
set ω to be any value identified by the investigators. For example, an investigator may
deem that a prolonged stable disease is equally important as a response, and thus set ω =
1. In this case, the optimal designs are just those as shown by Simon [9]. Alternatively, if
the tumour is a slow-growing tumour and the possibility of a response is extremely rare,
investigators might expect a lot of patients with stable disease. The investigators might
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials58
wish to design a study based on p0 = 0.4, r0 = 1/8, p1 = 0.60, r1 = 1/6, which would
result in ω = 1.44. However, the investigators might arbitrarily assign ω = 2, 3, 4 since
they subjectively value a response much more than 1.44 times that of a stable disease, or
similarly, they may have the belief that 2 responses (score s=2.88) are sufficiently more
interesting than 3 stable diseases (score s=3).
There are problems with this method. First, this design fails to appropriately capture
the extreme cases, where the experimental treatment results in slowing progression of
tumour without increasing the number of responses, or the experimental treatment results
in responses amongst a certain subset of the population and has no effect on others, thus,
the number of responses changes but the number of stable diseases does not. In the former
case, the proportion of responses might be assumed less what would occur under the HA
distribution - when the treatment is active, resulting in r0 > r1 and ω < 1, which is not
defined in the design. In the second situation, referring again to our example and Table
3.3, one sees that 7 responses and 0 stable diseases would result in accepting H0, thus
deeming the treatment uninteresting. This is a response rate of 23%, which would be
of considerable interest since the hypotheses were based on comparing response rates of
5% with 15%. Thus, it fails in the extreme cases, which is the reason why investigators
would be interested in a multinomial design (where one or the other outcome happens,
but not both). Third, only optimal designs are described in the paper, and, although
possible, it is not a straightforward procedure to calculate design parameters if accrual
targets are not met. This is even more pronounced at the end of stage 1, where more
than 1 design might be possible, and continuation or stopping the trial might depend on
which design one chooses.
Alternatively, one advantage of this design as compared to the other multinomial
designs is the ease in calculating and interpreting p-values and, to a lesser degree, con-
fidence intervals. P (S > s|H0) = 1 − P (S ≤ s|H0) could be calculated with relative
ease producing a p-value, and one would just have to find bounds (LL, UL) such that
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials59
P (S > s|x) ≥ α ∀ x ∈ (LL,UL) to get a (1 − α)100% confidence interval. Due to the
discreteness of the trinomial distribution, this would be an approximate (1 − α)100%
confidence interval.
3.4 Using Finite Markov Chain Imbedding
While classical group sequential methods can be used for many of these designs, partic-
ularly when the primary outcome is simple and straightforwad, Markov chains might be
beneficial for some designs when the primary outcome is more complex. For example,
when the primary outcome is binomial, such as response, exact calculations are fairly
straightforward. Thus, the Simon designs, or even the Fleming design which is based on
asymptotic outcomes, are easily calculable. When the outcomes become more compli-
cated, it is more difficult to calculate probabilities. The Zee designs and Panageas designs
demonstrate this. Determining what is more extreme is not simple and the computer
programs only cover the optimal cases. When trials occur and accrual targets are not
met or the extreme cases are observed, the statistics required can not be calculated by
the computer programs. Thus, substantial additional work would be needed to get the
values of interest.
Further, in situations where one puts different emphasis on different outcomes, Markov
chain methods would be superior to classical methods. One could weight a subject’s out-
comes such that if they have a response, their response status is weighted at a certain
level, say x, if they have a stable disease, they are assigned a weight of y, but if they have
stable disease for 3 or more consecutive observations, they are assigned a weight of z. At
an interim analysis, a subject may have only stable disease, but later develop response or
stable disease for 3 or more consecutive observations. Thus, a subject’s status at a later
analysis might change from their status at an earlier analysis. This could be difficult for
classical methods, but not as difficult for Markov chain methodology. The probability a
Chapter 3. Potential Trial Designs for Phase II Oncology Clinical Trials60
subject will transition from one state to another at a future can be estimated based on
the transitions of the subjects who are further ahead in their treatment. In situations
where one is unsure of the weights to assign different states, and if one wanted to explore
a variety of weighting schemes, exact calculations could be quite tedious, whereas Markov
chain methods require only a simple modification.
Chapter 4
Examples and Simulation Set-up
An illustration of the methods proposed in this thesis is crucial for a thorough under-
standing of the pros and cons of the use of Markov chains. For illustration purposes,
a previously performed trial is analysed using finite Markov chain imbedding methods.
The trial was chosen because of the controversial nature of the trial and the ambiguity
that resulted from the final trial results. Different investigators have different beliefs as
to the future drug development of the agent investigated in this trial, as is discussed
below, and it is felt that it is in this context that finite Markov chain imbedding would
be of most benefit.
To illustrate these methods, a simulation study was performed. Although the exact
probabilities can be calculated using distributional theory, the use of a simulation study
allowed for investigation of many different designs, outcomes and assumptions simultane-
ously. The primary strength of finite Markov chain imbedding is the flexibility it allows,
in that one is able to investigate multiple designs, outcomes and assumptions at the same
time. This flexibility allows greater understanding of the data and promote agreement
between investigators by showing results which could be obtained in different scenarios.
This simulation is facilitated by use of a flexible statistical code which can be run on
most computers in a relatively small time period. This chapter will describe the clinical
61
Chapter 4. Examples and Simulation Set-up 62
scenario of interest and the simulation that was performed.
4.1 Phase II Clinical Trial of CCI-779 (temsirolimus)
in Neuroendocrine Carcinoma
Thirty-seven patients were accrued to a multi-centre, single-arm phase II clinical trial of
patients with neuroendocrine carcinoma, including both pancreatic islet cell and carcinoid
histologies. Results have recently been published [72]. This study was chosen as an
example particularly due to the complexity of the disease, the controversy surrounding
the determination of efficacy in this trial, which is highlighted by two letters to the
editor following this trial publication [87] [88], and the failure of the statistical design to
adequately assess potential drug activity.
4.1.1 Trial Description
In this trial, patients were treated using a novel MTA (temsirolimus), which had been
previously studied in a number of phase I trials and phase II trials in other disease sites.
The safety profile of the MTA was believed to be satisfactory and there was promising
evidence of anti-tumour activity in neuroendocrine carcinoma. Patients were treated
in an outpatient setting, receiving once weekly doses of the treatment via a 30-minute
infusion. A cycle of treatment was defined as 28-days, thus, a patient received four
treatments per cycle. Patients were to continue with treatment until disease progression,
patient withdrawl of consent, severe adverse event or removal from study due to physician
discretion. At the time of study publication, 5 patients remained on study and being
treated. As is common with this disease, progression is relatively slow, with 48% being
progression-free at 6-months, and over 70% being alive at 1-year after treatment start.
Thus, time to progression and overall survival are generally considered poor primary
endpoints for phase II trials due to the necessity of keeping trials relatively short (total
Chapter 4. Examples and Simulation Set-up 63
trial duration from start of accrual to publication for this study was around 2 years).
The primary efficacy analysis was based on best objective tumour (partial or com-
plete) response as defined by the RECIST criteria [17]. The study was designed with
the primary endpoint, response rate [RR], using H0:RR=0.05 versus HA:RR=0.25, error
limits of α = 0.05 and β = 0.10, and a modified version of the Simon Minimax design
[9]. Accrual was conducted in two stages, with an interim analysis planned after 15
patients was accrued. The design specified that if 2 or more objective responses were
observed amongst the first 15 patients, accrual to stage 2 was to be conducted. If 1 or
less responses were observed, accrual was to be terminated and the treatment declared
inactive. After stage 2, a total of 30 patients were to be accrued, with non-rejection of H0
(declaring the treatment as inactive) if 3 or less responses were observed, and rejection
of H0 (declaring the treatment of interest for further study) if 4 or more responses were
observed of the total 30 patients.
4.1.2 Trial Results
After the first 15 patients were accrued, an interim analysis was conducted and 0 patients
had a PR, however 8 patients had SD, a number of whom had prolonged SD (defined
as a patient not having progressed after the completion of cycle 6). It is interesting to
note that the last patient accrued had SD at the time of interim analysis, however, later
developed PR in cycle 6. The number of patients having SD greatly exceeded expectations
and some of the patients with SD had severe worsening of disease prior to trial entry,
so the investigators overruled the statistical design and accrual continued to stage 2. At
study completion, 37 patients were accrued to study, but one patient was ineligible due
to having rapidly progressing disease and died prior to receiving any treatment. Thus,
a total of 36 patients were evaluable, 6 more than were initially planned, of which 3
patients had PR, 20 patients had SD, 8 patients had PD and 5 patients were inevaluable
(due to severe adverse events occurring prior to first objective post-treatment tumour
Chapter 4. Examples and Simulation Set-up 64
evaluation).
In conclusion, the authors concluded that "temsirolimus appears to have only modest
activity..." and "the results of this study do not warrant further investigation of this
drug as a single agent in this patient population." However, they state "evaluation of
temsirolimus, in combination with other targeted agents ... should be considered" [72] In
the letter to the editor in response O’Donnell and Ratain "...suggest drug activity beyond
the natural course of the disease" [87] and state that it is not the single-agent drug which
should be abandoned, but the trial design. In response, the authors defend the use of
single-arm trials [88] and there remains considerable discussion about the usefulness of
single-arm trials.
Statistically, the issue is as follows. The statistical design is based on a single endpoint,
response rate and hypotheses believed to be of interest. Using frequentist methodology,
one is not able to deviate from this trial design which is based on a single, primary
outcome, however, when evaluating a trial for potential efficacy, clinicians and researchers
evaluate all outcomes, including secondary outcomes. Thus, although statistically one
must reject the alternative hypothesis and deem the treatment not of interest for further
study, the apparent efficacy of the treatment based on secondary outcomes may still
be intriguing to researchers. That secondary outcomes are important is demonstrated
clearly by the experience with Sorafenib in renal cell cancer [89], where Sorafenib was
approved by the FDA, Health Canada and other agencies for treatment of renal cell
cancer based on prolongation of disease stabilisation even though the primary outcome,
response rate, was very low (< 5%).
4.1.3 Note regarding response rates
It is noted that the numbers of confirmed responders does not equal the number of
responses reported in the published manuscript [72]. This is because the data used for
this analysis was obtained subsequent to the data used in the manuscript, and between
Chapter 4. Examples and Simulation Set-up 65
this time, one patient with SD developed a PR. Thus, this PR is included in the thesis
results but not in the manuscript results. Further, in the manuscript, only 3 patients are
listed as being censored, and 10 patients with PD. Conversely in this analysis, there are 5
patients censored and 8 with PD. The reason for this is that 2 patients did not have an on-
study objective response evaluation and are thus inevaluable as per the RECIST criteria
- however, they did fail treatment as one had symptomatic progression and the other died
of disease before their first objective evaluation. Thus, for simplicity, this analysis will be
performed using their objective tumour measurements, but in the published manuscript,
they are considered as having PD.
4.2 Implementation of Markov Chain Methods
One of the key components of phase II clinical trials in oncology is that patients have
their tumour burden measured at regular intervals — in this trial, this was to be con-
ducted after every 2 cycles of treatment, or after approximately every 56 days. Response
was defined as per the RECIST criteria, which was briefly outlined in subsection 2.5.3.
Briefly, patients are classified based on the growth of the tumour as either having com-
plete response [CR], partial response [PR], stable disease [SD] or disease progression [PD].
Some patients may be removed from the study due to other practical concerns, such as
withdrawl of consent, adverse events, or discretion of the treating physician. These pa-
tients are thus censored [C] in terms of their response status. According to the RECIST
criteria, one must measure a response, followed by a subsequent confirmation measure-
ment, to have a declared objective response. In terms of a Markov chain, this means
a patient must transition into an unconfirmed response [UR] state and then transition
into a confirmed response state. Finally, due to the few number of complete responses
observed in clinical trials in cancer, CR and PR states are generally combined into a
single response [R] category.
Chapter 4. Examples and Simulation Set-up 66
4.2.1 RECIST criteria
One can design a transition matrix to describe the potential transitions as allowable
according to the RECIST criteria, and this was described earlier and shown in matrix
2.8. Since for this study, complete and partial responders are combined, the transition
matrix is reduced as in matrix 2.9. Here, all patients commence treatment in the ∅ state.
This is done for practical considerations, since for the neuroendocrine cancer trial, pre-
treatment response status was not measured — it is generally assumed most patients will
be progressing hence the need to enter a clinical trial. However, if one has the data, a
generalisation is possible for some trials in which patients may enter in different states,
not necessarily the ∅ state which would then reduce the transition matrix even further.
At the first tumour measurement (cycle 2), patients will transition from the ∅ state
to one of the other states. Since this is the first measurement, patients can not have a
confirmed response, but can only transition into the unconfirmed response state. Simi-
larly, patients can not have SD and be off-treatment simultaneously, thus, the transition
from ∅ to this state has probability 0. At each subsequent tumour measurement, further
transitions are possible. A patient presently in the UR state, can only transition to a
confirmed response state, or go off-study with a best response of SD (a patient with
a best objective response of an unconfirmed response is deemed as having a SD only).
Patients who are in the SD state can transition into the UR state, remain in the SD state
or be removed from treatment with a best response of SD (i.e. state SDoff). The other
states, R, SDoff, PD and C, are all absorbing states — once entered, a patient can never
leave this state since one evaluates the best objective response observed.
Patient data for the neuroendocrine trial are in Appendix A and listed by the appro-
priate state space when using the RECIST criteria in Appendix B. Patient 021-001 had
baseline tumour lesions summed to be 457 mm, with subsequent measurements of 456
mm, 429 mm, 426 mm, 423 mm and 444 mm. At the last evaluation, new lesions were
discovered, thus, the patient was classified as having PD. Thus, this patient started in the
Chapter 4. Examples and Simulation Set-up 67
∅ state, and transitioned to state SD, where they remained until the final measurement
(5th evaluation), when they transitioned to state SDoff. Conversely, patient 021-015 had
baseline tumour lesions summed to be 194 mm, followed by measurements of 161 mm,
146 mm, 136 mm, 135 mm, 125 mm and so on, until the 20th cycle when they finally
had disease progression. In terms of Markov chains, this patient transitioned from ∅
state to state SD at the first evaluation, since they only had 1-161/194=17% shrinkage.
They remained in SD at the next transitions, with a 25% shrinkages, finally achieving an
unconfirmed response at evaluation 3 with a 30% shrinkage, thus transitioning from SD
to UR. This response was confirmed at the next evaluation, so this patient transitioned
from UR to R. Since this patient now has a confirmed response, this patient remains in
this absorbing state.
A summary of transitions for all patients for the first 5 evaluations is found in Table
D.1 in the Appendix. Of all 36 patients, 1 transitioned at the first evaluation from ∅ to
UR, 22 went from ∅ to SD, 8 had immediate PD (∅ to PD) and 5 were censored with no
objective measurement. At the next evaluation the patient with UR transitioned to have
a confirmed response, and 6 patients with SD came off-study, thus transitioning to SDoff
(2nd row). The endstate proportions are described in Table D.2. Three patients had a
response, 7 patients remained on-study after the 4th evaluation with stable disease, 12
patients were off-study with stable disease as their best response, and 8 patients had PD
and 5 were censored. This is described in the first row of the Table D.1.
Similar results can be found for the first 15 patients, which is the time of the interim
analysis, in the same two Tables D.1 D.2 with rows titled "interim".Of the 15 patients,
8 had an initial evaluation of SD, 6 had PD and 1 was censored. The only patient with a
response amongst the first 15 first had UR at evaluation 3 — which is shown in the row
titled "Interim eval 3",and finally transitioned to confirmed response at evaluation 4 —
as demonstrated in the row titled "Interim eval 4".The endstate probabilities for states
R, UR, SD, SDoff, PD and C are 1/15, 0, 3/15, 4/15, 6/16 and 1/15.
Chapter 4. Examples and Simulation Set-up 68
The tabular format of displaying results will be continued throughout the remainder
of this chapter and in the reporting of results.
4.3 Simulation
All statistical analyses in the simulation were performed using R version 2.1.1 (http://www.r-
project.org) on a personal computer with an Intel Pentium 4 CPU 3.20 GHz processor
with a speed of 3192 MHz, in Microsoft Windows XP Professional operating system ver-
sion 5.1. (Microsoft Corporation, Seattle, WA). Each individual calculation (p-value or
conditional power) took less than 20 seconds.
P-values were calculated as if at the time of study completion, that is, assuming all 36
patients had been accrued. One-thousand simulations were performed using transition
matrices defined under H0 and for each outcome, the number of simulations for which
the simulated number of patients in the endstates of interest was greater than or equal to
actual trial number of patients in the endstates of interest was calculated. The p-value
was just this number divided by the number of simulations. For example, using the
RECIST criteria, the end-states of interest would be the confirmed objective responses,
and the p-value is the proportion of simulations under H0, for which there were 3 or
more objective responses. The number 3 that at the end of the actual trial there were 3
patients who had a confirmed response.
Conditional power estimates were calculated as if they were analysed at the interim
analysis, thus, after 15 patients had been accrued and partially evaluated, and with
an expected total of 36 patients. Thus, the conditional power is estimated based on
available data for 15 patients and assuming that a further 21 patients would be accrued.
1000 simulations of 36 simulated patients under H0 were performed and the observed data
would be comprised of the actual observed data for the first 15 patients and simulated
data for the additional 21 patients. The assumed future data was estimated in two ways,
Chapter 4. Examples and Simulation Set-up 69
first, by assuming future data followed some hypothesised distrubution HA, and second,
by assuming that future data followed a distribution similar to the results of the 15
patients already observed. Since the simulation allowed for different outcome definitions
in different scenarios, both H0 and HA had to be defined individually for each simulation
scenario. Defining the distributions of H0 and HA is described in the next subsection. For
each simulation, a p-value was calculated, resulting in 1000 p-values, and the conditional
power was the proportion of p-values ≤0.05. Additional simulations were performed
using 500 simulations for comparison and sample sizes changed from 36 to 54.
4.3.1 Estimating H0 and HA
As in any research study, the definition of H0 and HA is crucial, however, in practise
these definitions are often based on subjective beliefs of the investigators. It is acknowl-
edged that the investigators defining these hypotheses are experts in their fields and have
substantial experience treating the type of patient who will be enrolled into these studies,
however, it remains that the definition of these hypotheses are subjective and different
investigators may believe different hypotheses. For example, although a new treatment
might be of clinical interest if the response rate was 10%, where present treatments have
response rates of < 5%, in practise, one might use H0:RR=0.05 vs HA:RR=0.20 to keep
the sample size to reasonable numbers. This is based on practical limitations, a desire
to perform trials within 1-2 years and some consider it ethical to restrict the numbers
of patients. Thus, it is hypothesised that a range of hypotheses are needed to fully un-
derstand the results from this simulation study, particularly since the methods used are
novel and the endpoints are sometimes defined differently than previous methods (e.g.
response status is set at a specific time).
Initial hypotheses for the simulation study were defined based on the hypotheses in
the protocol, aiming to have endstate probabilities approximately equal to the initially
defined hypotheses, and with consultation with a clinical oncologist familiar with the
Chapter 4. Examples and Simulation Set-up 70
study. Afterwards, variations of the hypotheses were subjectively altered based on clin-
ical experience and consultation with oncologists experienced in phase II clinical trial
methodology, with an aim to investigate changes due to minor changes in the transition
state matrices; hence minor changes to the competing hypotheses. Models were assumed
most often to have time-independant transition matrices for simplicity, however, the ease
in using transition time-dependant transition matrices is demonstrated in some cases.
4.4 Models Investigated in Simulation
4.4.1 RECIST model
The RECIST model as described using the transition matrix 2.9 was investigated. This
analysis corresponds to the best objective response observed while on-treatment. Transi-
tioning between states becomes very irregular, as most patients do not have an improve-
ment after the first transition. In fact, in this study only 3 patients (021-015, 021-022
and 022-029) had a transition after the first transition (excepting transitioning from an
unconfirmed response to another state) if one uses this model. Patients who thus have
stable disease can not end up in a worse state than the stable disease state, even when
they progress. Thus, there is no difference between a patient who has stable disease and
then immediately progresses, and one who has prolonged stable disease. Since molecu-
larly targeted agents are often believed to be cytostatic, there is a very major clinical
significance between a patient who progresses early after having stable disease and one
who progresses many months later. This is one area where present designs fail. To
demonstrate the flexibility of finite Markov chain imbedding, it is important to model
the standard RECIST criteria, thus it is done, but the power of finite Markov chain
imbedding can only be demonstrated in other contexts, described below.
Chapter 4. Examples and Simulation Set-up 71
4.4.2 RECIST model evaluating outcomes at different transi-
tion times
Since the best objective response using RECIST is determined primarily by the first
transition, a transition time-important RECIST transition matrix was assumed in matrix
4.1 to demonstrate the power of using finite Markov chain imbedding. By time-important,
it is meant that the timing (number of transitions) of the evaluation is important. With
the RECIST criteria, the only change in results occurs if patients transition from SD to
R, thus, the time in which the analysis is conducted has little effect. In other words, if
no patient has a late response, say after evaluation 3, then the results will be the same
regardless of the timing of any analysis conducted after evaluation 3. However, there
might be vastly different interpretations of the results comparing a trial in which all
patients with SD had PD at the 4th evaluation, or if a number remained on-study with
SD for 10 or more evaluations.
M =
∅ R UR SD PD C
∅ 0 0 p∅−r p∅−sd p∅−pd p∅−c
R 0 1 0 0 0 0
UR 0 pur−r 0 0 pur−pd pur−c
SD 0 0 psd−ur psd−sd psd−pd psd−c
PD 0 0 0 0 1 0
C 0 0 0 0 0 1
(4.1)
This transition matrix was considered the primary transition matrix of interest, and
subsequent evaluations performed where the null hypothesis was modified slightly, and
the number of evaluations prior to analysis was increased and decreased. These adjust-
ments allow investigation to the robustness of these methods.
A further modification excludes response as an absorbing state since it could be
argued that duration of response is also important, so a transition matrix (4.2) was also
Chapter 4. Examples and Simulation Set-up 72
constructed and analysed.
M =
∅ R UR SD PD C
∅ 0 0 p∅−r p∅−sd p∅−pd p∅−c
R 0 pr−r 0 0 pr−pd pr−c
UR 0 pur−r 0 0 pur−pd pur−c
SD 0 0 psd−ur psd−sd psd−pd psd−c
PD 0 0 0 0 1 0
C 0 0 0 0 0 1
(4.2)
4.4.3 Transition Matrices Based on Immediate Changes
Given that Markov chain methodologies incorporate time in their evaluations, a transition
matrix was constructed which is based on immediate changes. This is potentially the
most beneficial method when using Markov chains. Patients transitioned based on the
change in tumour size compared to the previous evaluation. A patient who had shrinkage
of 5% or more compared to the previous evaluation was considered as having a response
[R] at that transition; a patient who had growth of 5% or more was deemed as having
tumour progression [PD]; a patient with less than 5% growth and less than 5% shrinkage
was considered as having stable disease [SD]; patients who were removed from the study
for any reason entered the off-study state [Off]. This transition matrix is shown in matrix
(4.3). The timing of analysis (i.e. after which evaluation) then becomes a concern, and
this is investigated, as is the effect if one modified the definition of response/progression
from 5% to, say, 10%. One could also eliminate the stable disease state and have response
defined as any shrinkage or no change, and growth as any growth whatsoever, as in matrix
(4.4). This type of transition matrix represents immediate changes and demonstrate
whether a treatment remains active at a given time.
Chapter 4. Examples and Simulation Set-up 73
M =
∅ R SD PD Off
∅ 0 p∅−r p∅−sd p∅−pd p∅−o
R 0 pr−r pr−sd pr−pd pr−o
SD 0 psd−r psd−sd psd−pd psd−o
PD 0 ppd−r ppd−sd ppd−pd ppd−o
Off 0 0 0 0 1
(4.3)
M =
∅ R PD Off
∅ 0 p∅−r p∅−pd p∅−off
R 0 pr−r pr−pd pr−off
PD 0 ppd−r ppd−pd ppd−off
Off 0 0 0 1
(4.4)
4.4.4 Transition Matrices with Different Positive Outcomes
Instead of just having a stable disease, one might define a positive result as a patient
who has prolonged SD, where prolonged is some set duration of time, i.e. when a patient
remains in SD for 3 (see matrix (4.5)), or 4 (see matrix (4.6)) consecutive transitions.
Patients can transition from PD to first SD state, and then to the second SD state, and
then the 3rd SD state. Patients can also transition into state R any time after having a
confirmed response.
Chapter 4. Examples and Simulation Set-up 74
M =
∅ R SD3 SD2 SD1 PD
∅ 0 0 0 0 p∅−sd1 p∅−pd
R 0 1 0 0 0 0
SD3 0 psd3−r psd3−sd3 0 0 psd3−pd
SD2 0 psd2−r psd2−sd3 0 0 psd2−pd
SD1 0 psd1−r 0 psd1−sd2 0 psd1−pd
PD 0 0 0 0 ppd−sd1 ppd−pd
(4.5)
M =
∅ R SD4 SD3 SD2 SD1 PD
∅ 0 0 0 0 0 p∅−sd1 p∅−pd
R 0 1 0 0 0 0 0
SD4 0 psd4−r psd4−sd4 0 0 0 psd4−pd
SD3 0 psd3−r psd3−sd4 0 0 0 psd3−pd
SD2 0 psd2−r 0 psd2−sd3 0 0 psd2−pd
SD1 0 psd1−r 0 0 psd1−sd2 0 psd1−pd
PD 0 0 0 0 0 ppd−sd1 ppd−pd
(4.6)
Alternatively, one might require consecutive minor shrinkages of, say, 5% or more to
be a positive result (see matrix (4.7)). Patients with prolonged stable disease would then
not be of interest. This model would be of interest if one assumes that drug activity
corresponds to some shrinkage, and eliminates the vagarities which occur for a very slow
growing tumour that might not appear to be growing on consecutive evaluations. One
might be interested if a patient has a response, or 2 consecutive minor shrinkages of, say,
5% or more as an indicator of activity
Chapter 4. Examples and Simulation Set-up 75
M =
∅ R MR2 MR1 SD PD
∅ 0 0 0 p∅−mr1 p∅−sd p∅−pd
R 0 1 0 0 0 0
MR2 0 pmr2−r pmr2−mr2 0 0 0
MR1 0 pmr1−r pmr1−mr2 0 pmr1−sd pmr1−pd
SD 0 0 0 psd−mr1 psd−sd psd−pd
PD 0 0 0 0 0 1
(4.7)
4.4.5 Multi-binomial transition matrices
While multinomial outcomes may be of interest, such as response or stable disease, and
these results are analysed using the previous matrices, one might also be interested in
multiple outcomes which are not measures of the same thing (e.g. multiple binomial out-
comes, such as toxicity and tumour size). For example, a treatment may be of interest
only if the number of responses is high and the number of adverse events is low. Addi-
tional outcomes might include overall survival, time to progression, use of a molecular
marker (PSA or CA125), or even a quality of life indicator. A model which might repre-
sent this is in matrix (4.8). For simplicity, the unconfirmed response state is dropped, and
patients are classified as being in one of the following states, response with no toxicity
[R], stable disease with no toxicity [SD], off-study without having a response or toxicity
[Off], off-study due to toxicity [Tox], or off-study due to toxicity and having prior re-
sponse [R & Tox]. The number of patients with response is then just R + R & Tox, and
the number of patients with toxicity is Tox + R & Tox.
Chapter 4. Examples and Simulation Set-up 76
M =
∅ R SD Off Tox R & Tox
∅ 0 0 p∅−sd p∅−off p∅−tox 0
R 0 pr−r 0 0 0 pr−Rtox
SD 0 psd−r psd−sd psd−off psdtox 0
Off 0 0 0 1 0 0
Tox 0 0 0 0 1 0
R & Tox 0 0 0 0 0 1
(4.8)
4.5 Calculation of p-values
The exact distribution of a random variable which can be imbedded into a finite Markov
chain is
P (Xn,k = x) = π0(n
∑
t=1
Λt)U′(Cx) (4.9)
where π0 is the initial probability vector of the Markov chain, Λt is the t-step transi-
tion matrix and U ′(Cx) defines a proper partition of the state space for calculating the
probability interest 2.7. A p-value is the probability of observing as extreme or more
extreme results from H0 than what is observed by the data. Thus, in the simulation, a
p-value is obtained by calculating∑∞
x P (Xn,k = xd|H0) after n transitions, where xd is
the observed number of patients in the states defined by the partition U ′(Cx).
When using a transition matrix as defined by the RECIST criteria (with combining
complete + partial responses into a single response category), the state space is ω =
∅, R, SD, SDoff , PD,C and for the classical definition of response, the proper partition
of interest is C(x) = 010000. The initial probability vector is π0 = 100000, thus, the
only thing that differs between the observed data and the data distributed under H0 is
the transition matrix Λt. In other words, at transition time t, if there were 3 observed
patients who were in state R, then the p-value is∑∞
3 P (Xt,k = x|H0). In the simulation
Chapter 4. Examples and Simulation Set-up 77
study, the H0 data is generated by the simulation and the p-value is thus calculated
as (∑∞
3 (Xt,k = x|sim))/m, where m is the number of generated data points in the
simulation (500 or 1000).
4.6 Methods Used for Investigating Different Out-
comes
For each transition matrix of interest, one of 9 possible analytical methods was selected
determining the primary outcome of interest, and a decision rule constructed based on
the primary outcome. For evaluating the flexibility of finite Markov chain imbedding,
multiple decision rules were constructed for a single primary outcome.
The first decision rule, method 1, is based on defining the primary outcome using a
single response state. For example, one might define the primary outcome as objective
response as per some criteria, and would construct a decision rule based on whether the
number of observed objective responses is more extreme than expected. Thus, at a given
transition time t, one can calculate the p-value as (∑∞
x=q(Xt,k = x|sim))/m where m is
the number of simulated data points and q is the observed number of patients who were
in state k at transition t. In matrix 2.9, state k=2. Method 1 coincides with the classical
phase II design and is the most frequently used primary outcome definition (as is found
in the Simon [9], Fleming [8] or Jung [10] [74] designs).
Method 2 is based on defining the primary outcome as the sum of 2 or more response
states. The primary outcome might be the sum of objective responses + unconfirmed
response, the sum of objective responses + unconfirmed responses + stable diseases, the
sum of patients remaining on-study at some point (for example in response, or on-study
but in state response, unconfirmed response or stable disease, but not in state stable
disease but off-study) and so on. The difference between this method and method 1 is
in the definition of the end-state probability vector. For method 2 and matrix 2.9, this
Chapter 4. Examples and Simulation Set-up 78
vector would be C(x) = 011100, but for matrix 2.8 the vector would be C(x) = 0111100 if
one was interested in response+stable disease. The p-value is calculated as (∑∞
x=q(Xt,k =
x|sim))/m, however in this instance, k = 2∩ 3∩ 4 for matrix 2.9 and k = 2∩ 3∩ 4∩ 5 for
matrix 2.8. Although not explicity stated as such, these designs are frequently presently
used and would use a slight modification to the designs as defined in Simon [9], Fleming
[8] or Jung [10] [74].
Method 3 is based on defining the primary outcome as any of 2 or more response
states. For example, an investigator might be interested in a novel treatment if the
number of observed objective responses was greater than expected or the number of
stable diseases observed was greater than expected. For matrix 2.9, this means that one
can calculate the p-value by calculating (∑∞
x=q
∑∞y=r(Xt,k = x ∪ Yt,l = y|sim))/m where
k = 2, l = 3 ∩ 4, q is the number of observed responses and r is the number of observed
stable diseases. This design has the same characteristics as the Panageas design [12] [82].
Method 4 uses a definition of primary outcome based on superiority of one out-
come, or superiority of a second multinomial outcome where the second outcome in-
cludes the first. This is similar to the Lu design [13], where one would be interested
in a treatment if the number of responses is greater than expected, or the number of
responses and stable diseases is greater than expected. The p-value is thus calculated by
(∑∞
x=q+1
∑
x+y=r+1(Xt,k = x∪X +Yt,k = x+ y|sim))/m, where r is the observed number
of patients with response or stable disease and y is the number of patients with stable
disease (or the second outcome of interest).
A slight modification to method 4 is found in method 5, where the primary outcome
is based on superiority of the first outcome, and superiority of a second outcome while
the first outcome is equivalent. In other words, while one might be interested in a
treatment which has greater response rate than expected, the treatment might still be
deemed of clinical interest if the response rate is equal to what is expected and there
is superiority of the stable disease rate. The p-value calculation for this method is
Chapter 4. Examples and Simulation Set-up 79
(∑∞
x=q+1
∑∞y=r+1−q(Xt,k = x ∪ (Xt,k = q ∩ Yt,k = y|sim))/m.
Method 6 uses a definition based on the equivalence or superiority of one outcome and
equivalence or inferiority of a second outcome. This is similar to the actual Zee design
[11] [82]. The p-value calculation is thus, (∑∞
x=q
∑ry=0(Xt,k = x ∩ Yt,k = y|sim))/m. A
slight modification to this is in method 7, which defines the primary outcome to be strict
superiority of one outcome or equality of the first outcome and strict inferiority of the
second. The p-value for method 7 will always be less than the p-value in method 6 and
is calculated by (∑∞
x=q
∑r−1y=0(Xt,k = x ∩ Yt,k = y|sim))/m.
Method 8 uses a weighted response state definition similar to the Lin design [14].
As an example, clinicians might deem a response to be 4 times as important as patient
with stable disease lasting greater than 6 cycles, which in turn is twice as important as a
patient having stable disease of less than 6 cycles, when evaluating whether a treatment
is active. Thus, one might assign weights of 8, 2 and 1 to end states associated with
a patient response, stable disease greater than 6 cycles and stable disease less than 6
cycles, with other states having weight 0. A variety of weighting schemes is shown for
each transition matrix. To calculate the p-value, first one must define v = w1x1 +w2x2 +
· · ·+ wjxj where wi is the weight associated with response state xi. Then, the p-value is
∑∞v=s(V (w1, w2, · · · , wj, x1, x2, · · · , xj) = v), where s = w1x1o + w2x2o + · · · + wjxjo and
xio is the number of patients observed to be in state i.
Method 9 is based on superiority of each of multiple different end states. Clinicians
might be interested in a treatment only if the number of responses is greater than expected
and the number of stable diseases is greater than expected. This method is primarily
useful when looking at multiple binomial designs, where one might want the number of
responses to be greater than expected, and the number of patients with no toxicity to be
greater than interested, similar to the Bryant-Day method [15]. The calculation of this
p-value is performed using (∑∞
x=q+1
∑∞y=r+1(Xt,k = x ∩ Yt,k = y|sim))/m.
Chapter 5
Results
Numerical results detailing the p-values and conditional power which were evaluated by
the simulation study are in Appendix D. A summary of these results are discussed in
this section. This summary includes discussion of the statistical issues and the results as
they relate to the original study.
5.1 RECIST Criteria
5.1.1 Interpretation
Table D.1 shows the input data associated with matrix (2.9), the transition matrix asso-
ciated with the RECIST criteria. Interpretation of Table D.1 is as follows. Of 36 patients,
1 transitioned from state ∅ to state unconfirmed response, 22 transitioned to state stable
disease, 8 to state progressive disease and 5 were censored at the first transition period.
This is shown in the first row of this table titled Data ∅. The subsequent rows, titled
Data eval 2-data eval 4 show the observed transitions for the first 4 transition periods.
For example, the patient in state unconfirmed response had a confirmed response at tran-
sition Data eval 2, thus the transition probability is 1 (since no other patient transitioned
from unconfirmed response to any other state during that time period). Sixteen of the 22
80
Chapter 5. Results 81
patients in state stable disease remained in state stable disease, while 6 came off-study,
during the 2nd onstudy transition. At the next transition, 2 patients had unconfirmed
response, where 1 later became a confirmed response and 1 came off-study without having
an observed response. At Data eval 4, an additional patient had a confirmed response,
transitioning to this state in the 5th on-study transition period (data eval 4).
The transition matrix under H0 is shown in matrix (5.1) and under a specified alter-
native HA in matrix (5.2). This is simplified in Table D.1 in rows titled H0 and HA. Note
that the main differences between these matrices is the initial transition, where under H0
only 2% of patients have transition into the response state, compared to 20% of patients
under HA, at the expense of the number of patients with progressive disease. Also, the
number of patients in state stable disease transitioning to the off-study state decreases.
This agrees with RECIST comparisons which are driven primarily by the first transition,
since one takes the best observed response as the primary outcome. Since a patient in
state PD can not improve, and it is not frequent that a patient transitions from SD to
R, it is usually the first transition which dictates the best observed response.
The data observed at the interim analysis are shown in the final 5 rows, titled interim
eval.
M =
∅ R UR SD SDoff PD C
∅ 0 0 .02 .4 0 .4 .18
R 0 1 0 0 0 0 0
UR 0 .85 0 0 .15 0 0
SD 0 0 .05 .6 .35 0 0
SDo 0 0 0 0 1 0 0
PD 0 0 0 0 0 1 0
C 0 0 0 0 0 0 1
(5.1)
Chapter 5. Results 82
M =
∅ R UR SD SDoff PD C
∅ 0 0 .2 .42 0 .2 .18
R 0 1 0 0 0 0 0
UR 0 .85 0 0 .15 0 0
SD 0 0 .1 .7 .2 0 0
SDo 0 0 0 0 1 0 0
PD 0 0 0 0 0 1 0
C 0 0 0 0 0 0 1
(5.2)
Endstates probabilities based on the data in Table D.1 are shown in Table D.2. The
endstates associated with the H0 and HA are similar to the initial statistical design,
with a comparison of response rates being ≈ 0.05 versus 0.25, and the non-progression
rates being ≈ 0.4 versus 0.6 (sum of pR, pur, psd, psdo). P-values and conditional power
estimates based on sample sizes of 36 and 54 are in Tables D.3 and D.4 respectively. In
these tables, the first column indicates the method used and the second is the number
of iterations performed in the simulation. The last 3 columns are the calculated p-value,
conditional power (assuming future data appears similar to data as under HA), and
the conditional power (assuming future data occurs similarly to the data observed at
the interim analysis). The two columns titled "outv" and "outv2" define the states of
interest corresponding to the outcomes defined by each method. With method 1, for
instance, the primary outcome is based on a single response state. Thus, the state of
interest will be listed under "outv". For method 4, where the primary outcome is based
on superiority of a first outcome and superiority of a second outcome, then the first set
of states will be listed under "outv" and the second set of states is listed under "outv2".
Chapter 5. Results 83
5.1.2 Results
The first observation is expected, that the similarities in results given the number of
iterations used for the simulated data, 1000 or 500. This is a fairly consistent obser-
vation throughout, demonstrating the overall speed of convergence. Additionally, as is
expected, in general, the p-value generally decreases and the conditional power generally
increases as one increases the sample size from 36 to 54, with exceptions occurring for
method 5. Looking closely at method 5, one only gets significance if the number of re-
sponses is greater than expected, or the number of responses is equal and the number of
responses+stable diseases is equal. There were 3 responses observed of 36 patients, a rate
of 0.0833. With 54 patients, it is impossible to get an equivalent rate - i.e. 4/54=.074
and 5/54=.093. Thus, the only way one can get a significant result is if the number of
simulated patients with response is greater than the observed rate and this p-value is ≈
equal to method 4. The p-value increases since the probability of getting more extreme
results is limited to having strictly greater numbers when using 54 patients. The con-
ditional power estimate is not equal, however, since when using conditional power, the
observed data is based on the interim data observed plus generated future data.
The difference in conditional power between that found using hypothesised data under
HA, and that using generated data based on observed results are strikingly different.
Generally, the conditional power of getting a significant result by continuing beyond the
interim analysis is quite high if one assumes future data follows HA, but quite low when
using observed data. This exemplifies the over-optimistic results that are frequently used
when specificying HA in most phase II designs. As discussed earlier, the use of such an
optimistic result (i.e. HA: RR=0.25) is partially done to keep the sample size feasible.
P-values tended to be moderate to low (between 0.1 and 0.3) indicating a slight trend
of improvement over what was expected, but only a very minor improvement. Significant
results occur only in situations where the sum of response + stable disease states either
unweighted (method 2, or method 8 with all weights equal to 1) or weighted (method 8)
Chapter 5. Results 84
are tested, which leads to the belief that the improvement is in the number of patients
with prolonged stable disease, however, this is observed in better detail with some of
the results to follow. At this time, one could also hypothesise that the few statistically
significant results could be partially explained by lack of statistical power.
5.1.3 Transition Time-Important RECIST model
Matrices (4.1) and (4.2) describe modifications to the RECIST model which account
for time, with the primary change being the loss of the stable disease but off-treatment
state. This was performed to investigate the state a patient was in at a given transition,
not necessarily the best transition (which, as mentioned, is primarily the first transition
state). Data input for matrix (4.1) is in Table D.5, with accompanying endstate proba-
bilities in Table D.6 and results in Tables D.7 and D.35. Endstate probabilities for this
first iteration was designed to be identical to the previous results for matrix (4.1) for
the response, unconfirmed response and stable disease states. The disappearance of the
stable disease and off-treatment state causes an increase in patients in the progressive
disease and censored states. Since analysis is no longer best observed response, the issue
of when to analyse (i.e. after which transition) is raised.
The results using the transition time-important RECIST transtition matrix tended
to be similar to the time-independent RECIST model, especially for methods which only
include response, unconfirmed response or stable disease but on-study as outcomes which
are considered indicators of efficacy. However, when looking at those methods which deem
stable disease but off-study (see for example the difference between row 2 of matrix D.3
compared with row 2 of matrix D.7), the p-value tends to become more significant, and
the conditional power increases. Thus, the number of patients with stable disease but
still on-study at the 5th evaluation is less likely under H0 than the number of patients
with stable disease as their best observed response. One hypothesis arising from this
finding is that patients with stable disease do not progress as fast as expected under H0.
Chapter 5. Results 85
This needs further exploration.
Varying H0
One question of interest which is raised regarding the use of any different methodology
is the robustness of the models when one misspecifies H0. To demonstrate this, the
simulation was re-run with slight variations to the null hypothesis, such as is shown in
Table D.9, where a variation was simulated in which a patient at the first transition ∅
had a slightly increased chance of having an unconfirmed response (.02 to .04) at the
expense of being censored (.18 to .16). No other alterations were made. The endstate
probabilities in Table D.10 change with pr going from .0503 to .0673, ppd increasing from
.6206 to .6216 and pc decreasing from .2730 to .2550. The other endstate probabilities
remain the same.
One would expect the p-values and conditional probabilities in Tables D.11 and D.12
to be very similar to the p-values and conditional probabilities in Tables D.7 and D.8.
This does occur, with, in general, p-values and conditional probabilities changing by less
than .05, which one might expect by random variation alone. Thus, small changes in
transition matrix probabilities do not greatly affect outcomes, which is reassuring.
A more extreme variation is simulated in Table D.13, where the endstate probability
(see Table D.14) a patient is in state SD are much higher (.0518 to .2458) and that a
patient is in state PD are much lower (.6206 to .4675). No method would result in a
statistically significant result (see Tables D.15 and D.16) and a conditional probability
computation performed at an interim analysis would result in low-moderate belief that
statistical significance can be observed. Large changes in hypotheses do result in large
changes in outcomes, as one would hope. Calculations for H0 must be carefully thought
out prior to any statistical analysis being performed, although this is often one of the least
thought out part of the trial by some investigators, and often one of the most disputed
post-trial. Since an assumed H0 can often be disputed by different investigators, one
Chapter 5. Results 86
could use a range of values for H0 and explore at what point the data tend to become
significant. This is not possible under classical statistical methods, but is a reasonable
suggestion given the uncertainty surrounding phase II oncology clinical trials. Even using
typical methods such as Simon’s optimal designs [9], exploring a range of H0 to explore
the results at trial termination might serve to create a better understanding of trial
results, and reduce the frequency of questionable scenarios.
A less severe but more reasonable alteration is shown in Table D.17, with endstate
probabilities in Table D.18. This scenario might be used to describe the belief where a
patient is likely to have a response only if they transition to the unconfirmed response
state immediately at the first transition with the further belief that it is very unlikely
for a late transition to response to occur (similar to what might be believed for cytotoxic
agents). The endstate probability of being in state R remains the same, however, the
probability of being in being in SD had decreased, with more patients being in state
PD. As expected, the level of significance is more extreme (see Tables D.19 and D.20),
particularly for methods which include stable disease as a positive outcome.
A final modification is shown in Table D.21 with endstate probabilities in Table
D.22. This H0 is concordant with a very optimistic view of the drug. The endstate
probability under H0 of being in state R is .1721, much higher than .0503 which was
used in the first model. Since response is considered a good outcome in all methods, the
level of significance is decreased throughout (i.e. higher p-values) and there is a decreased
conditional probability at an interim analysis which would lead investigators to be less
likely to continue to stage II of a study (see Tables D.23 and D.24).
Timing of Evaluations
One issue that arisies is when to perform an end-study evaluation. Thus, using the same
H0 and HA as in Table D.5, an extra evaluation was included in Table D.25 and removed
in the following simulation, with endstate probabilities in Tables D.26 and D.29. Results
Chapter 5. Results 87
are shown in Tables D.27 and D.28 for the extra transition simulation, and in Tables
D.30 and D.31 for the simulation with one less transition.
Under H0, there is a slight increased probability of being in an absorbing endstate,
R, PD, C, with the extra transition simulation and a slight decreased probability to be
in the same endstates for the simulation with one transition removed. During the clinical
trial, there was 1 patient who transitioned from state UR to R at the 5th transition, and
1 patient went from SD to C at the time of the 5th evaluation, and 1 who went from SD
to PD at the 6th evaluation. The outcomes appeared least significant when the endstates
were defined after the 4th evaluation, an increase in the level of significance (primarily
due to the patient who transitioned to the response state) after the 5th evaluation, and
then a further decrease in significance when evaluations occurred after the 6th transition.
The change in the level of significance did remain small, but noticeable.
One might tend to believe that the analyses performed as they were, after the 5th
evaluation, is an ’optimal’ selection point. It is also possible that the increase in the
number of patients with prolonged stable disease ceases around this time, which clinically
means a termination of treatment effect at this time. Further to this, of the 6 patients
remaining in SD after the 6th transition, 1 patient progressed at the next transition, 2
patients progressed and 1 had an adverse event requiring study discontinuation at the
transition following, and 1 patient progressed at the transition following that. Only
one patient remained with stable disease substantially beyond this time point. Thus,
by performing analyses at different transition times, one observes that the duration of
treatment efficacy is around 5 evaluations (10 months). This could assist in understanding
the biological characteristics of temsirolimus or neuroendocrine carcinoma.
5.1.4 Varying away from the RECIST criteria
Although the RECIST criteria is commonly used, there is no consensus that this is
optimal [90] [91] [92]. Particularly with the increased flexiblity of finite imbedding Markov
Chapter 5. Results 88
chain methods, there is no reason to limit our simulation to transition matrices based on
the RECIST criteria. A simple and straightforward modification is described in matrix
(4.2). The primary difference in this design is that one is not using the best objective
response status at any time, but one is fully incorporating the present status of each
patient. Specifically, patients who transition into state R can subsequently transition
out when they come off study due to disease progression or censoring.
Input data for this particular design is in Table D.32 and endstate probabilities are
in Table D.33. Of particular note is that one patient (021-025) had a partial response,
but came off treatment after the 3rd evaluation due to an unrelated adverse event, thus
was censored. The other two patients with partial responses remained on-study for a
lengthy period of time (9 evaluations and still receiving treatment, and 10 evaluations
prior to disease progression). Thus, instead of 3/36 patients in state R, only 2 patients
are still in this state. However, the corresponding endstate probabilities under H0 are
also decreased.
Results are in Tables D.34 and D.35 and generally indicate a greater level of signifi-
cance than using the best observed response at any time model. This increased signifi-
cance would again lend credence to the possibility of temsirolimus extending the length
of time to progression beyond what is expected, and it does not have just an immedi-
ate effect. The 2 patients who responded without censoring did not progress until much
later than even those with stable disease (note also the very long prolonged stable disease
duration of one patient), indicating that the treatment may remain active for longer in
a subset of patients, and that there is a particular subgroup of patients (possibly based
on some as yet unknown molecular characteristic) for which the treatment is extremely
effective. Unfortunately, correlative studies for this trial did not yield impressive results,
as frozen tissue samples were only available for 1 of the 3 patients who had a partial
response, and only 1 of the 3 had paired tumour biopsies with usable data.
Chapter 5. Results 89
5.1.5 Immediate response
A natural extension for evaluation when incorporating multiple evaluations would be
to investigate how each patient is doing at each individual transition, compared to the
previous transition time. This is represented by the transition matrix (4.3). For this
analysis, tumour shrinkage (state R) was defined as a reduction in tumour size of 5% or
more, tumour growth (state PD) was defined as an increase in tumour size of 5% or more
and disease stabilisation (state SD) was defined as growth less than 5% or shrinkage of
less than 5% compared to the measurement at the previous transition time. One could
also be off-study. This is represented in Tables D.36-D.39. A second definition was used
where 10% growth and shrinkage was used instead of 5% and is shown in Tables D.40-
D.43. Finally, one might consider not having a SD state, and considering only shrinkage
or non-shrinkage, and this is in transition matrix (4.4) and Tables D.44-D.47.
The level of statistical significance decreases as the definition of response goes from
any shrinkage, to > 5%, to > 10%. In other words, the number of patients having
impressive activity compared to what is expected is small. In addition, there are still a
number of patients having slight activity due to treatment under H0 by the end of the
4th and 5th evaluation. This is consistent with the theory that treatment-related activity
is slowing down or stopping around the 4th to 5th evaluation and also consistent with
the slow growing nature of neuroendocrine carcinoma.
5.1.6 Consecutive states
Since the primary purpose of phase II clinical trials in oncology is to determine whether
a treatment has any potential activity, and with MTA activity might be observed by
the prevention of future growth for a number of consecutive evaluations instead of sim-
ply having tumour shrinkage, it might be of interest to explore models in which a good
outcome is defined as a patient having tumour shrinkage (treatment response) or con-
Chapter 5. Results 90
secutive (2 or more) evaluations with no tumour growth. This can be modelled using a
transition matrix as in matrix (4.5), in which a patient having 3 consecutive evaluations
with no tumour growh would be considered as having a good outcome, or by transition
matrix (4.6), in which 4 consecutive evaluations are required. Alternatively, one might
define a good outcome as a major response, or consecutive evaluations with minor re-
sponse, such as in transition matrix (4.7). One additional twist as shown in Table D.56 is
that the transition matrix under H0 or HA does not necessarily have to remain constant
throughout the trial. If one has a reason to believe that treatment effects might change
during the course of the trial (possibly due to changes in dosage from adverse events, or
noting that certain treatments, like some chemotherapy and hormonal treatments, can
only be used for certain lengths of time and a patient may have completed that portion
of the treatment), then one can model this by modifying the input transition matrices
accordingly.
Results are similar as has been seen in other analyses, however, requiring 3 or more
consecutive observations of stable disease tends to have slightly more significant results.
This would be in accordance with the belief that the number of stable disease patients
observed at cycle 2 might not be greater than expected, however, the long-term effect of
the treatment might persist, such that patients who do have stable disease do tend to
not progress as quickly as expected.
5.1.7 Dual-Binomial Outcomes
Trial designs have been suggested which incorporate multiple endpoints, such as response
and toxicity, where a patient could have either endpoint, neither endpoint, or both. For
example, a particular agent might be deemed of interest only if the response rate is
sufficiently high, and the level of toxicity is adquately low. One might also be willing to
accept higher risk if there was corresponding higher efficacy, or conversely if lower toxicity
was observed, one might deem acceptable a treatment with less efficacy. This latter case
Chapter 5. Results 91
might also indicate a treatment which has the potential to be part of a multi-agent
combination therapy. A transition matrix to represent this type of analysis is defined in
matrix (4.8). The definition of a toxic event would need to be specified (e.g. any grade
3 adverse event of any attribution). A patient might have a toxic event and remain on
study, might come off study without having ever experienced the toxicity, might respond
and have toxicity or might have neither outcome.
Results for this analysis are in Tables D.60-D.63. The p-value for those analyses
which are using both response+stable disease and toxicity outcomes show a great deal of
significance - see method 2, second analysis, method 8, 2nd, 3rd and 4th analyses. This
is a result of the observed toxicity level being less than expected, so that when combined
with the increased rate of stable disease which was observed, a highly significant result
is observed. Thus, although there was only a slight improvement in efficacy alone, this
treatment might be of considerably greater interest given the relative lack of toxicity
observed. Further, one might find temsirolimus suitable as a combined therapy, since the
toxicity is low and a synergistic relationship might be possible without putting patients
at excessive risk.
5.1.8 Theoretical Versus Simulated Calculations
The results observed by using simulation was similar to results which would be observed
had one used theoretical calculations. As an example, look at matrix (2.9) based on
the RECIST criteria. The end-state probability response rate under H0 is 0.0503 as
defined by the transition matrices. If we calculated the p-value in this instance us-
ing theoretical calculations, we would calculate the probability that one observed 3 or
more responses out of 36 patients, given the probability of having a response is 0.0503.
This is 0.2709. Using the simulated data, the probability was 0.245. If the primary
outcome was determined to be response + stable disease rate, then the probability is
0.0503+0.0043+0.0518+0.3135=0.4199, and the probability of observing 23 or more pa-
Chapter 5. Results 92
tients in response or stable disease is 0.0066. This is not very different then the simulated
calculation of 0.029. Finally, if the primary outcome is response + stable disease but on-
study, the probability is 0.1034 and the theoretical p-value is 0.003, compared to the
simulated p-value of 0.005.
The theoretical conditional probability of obtaining a statistically significant result
if one was to continue the trial after 15 patients were accrued, is 0.048 if the primary
outcome was response. This compares to the simulated value of 0.049. If future data was
assumed to follow the distribution as under HA, then the conditional probability is 0.803
using theoretical calculations and 0.820 using simulated data. When the primary outcome
was response or stable disease, the conditional probability would be 0.599, compared to
the simulated value of 0.591, if future data was assumed to follow the distribution as
defined under HA, that is the probability of a response or stable disease of a future
patient being 0.62. If future data was assumed to occur similar to the data up to the
interim analysis, the conditional probability is 0.286 using theory and 0.278 using the
simulation. The conditional probability when the primary outcome is response + stable
disease but on-study is 0.851 (theoretical) and 0.867 (simulated) when future data is
assumed to occur the same as present data, and 0.975 (theoretical) and 0.971 (simulated)
when future data is assumed to follow HA. The simulated results are thus, similar to the
theoretical results which could be calculated.
Chapter 6
Discussion
There are a multitude of phase II clinical trial designs which can be used to investigate
whether an experimental treatment in cancer has potential efficacy. The choice of design
to use for any individual trial is frequently subjective, often based on a statistician’s
personal preferences. Even after a particular design is selected, specifying hypotheses
may often be a result of practical issues and not solely on clinical efficacy rates. This
leads to the possibility that different investigators would choose different hypotheses,
and conflicting conclusions can easily arise upon trial conclusion. Without consensus,
the ability to evaluate different designs and different hypotheses simultaneously would
be advantageous.
In this dissertation, it is shown that finite Markov chain imbedding can be used to
evaluate multiple designs and hypotheses all at one time. It is possible to test many hy-
potheses of which different investigators might hold, including optimistic and pessimistic
hypotheses, as well as hypotheses using different outcomes of interest. In this manner,
one can reduce post-trial conflicts by getting more information from the same data and
better understanding whether the experimental treatment is truly efficacious or not. To
do this, one needs only to set up the transition matrix appropriately and to manipulate
the data to correspond with the transition matrix. Given that oncology clinical trials are
93
Chapter 6. Discussion 94
naturally divided into sections of time, by the cycles of treatment, and there are already
defined states for the efficacy outcome most commonly used, that being response, these
trials fit in easily with Markov chain methodology. These states can be modified easily.
Two computer programs, written in R, are provided, one which calculates p-values
at trial termination, and one which calculates conditional power at an interim analysis.
Both programs require only a few seconds to complete the necessary simulations and
can calculate outcomes from any of 9 different methods. Actual trial data is used to
demonstrate the utility of this method and the ease of use of these programs. It is also
shown how the same data can provide additional information about the treatment using
finite Markov chain imbedding, as evidenced by the ability to detect that the treatment
appears to be effective for around 5-6 evaluations ( 10 months) amongst most patients
with disease stabilization, but a little longer for those patients having tumour response.
Additionally, a subset of patients appears to demonstrate activity, which leads to the
presumption that some unknown biological factor (e.g. molecular marker) present in
only a proportion of patients or tumours, may be affected by the treatment.
For binary primary outcomes, either simulated or theoretical calculations are possible,
however, for more complex primary outcomes, it is easier to work with the simulated data.
Particularly when investigating small modifications to the assumed distributions H0 or
HA, for example slight differences in the response rate, the work required to compute
the simulated outcome is minimal. Theoretically, the calculations have to be performed
again and they can be quite complicated. The program itself takes only a few seconds to
compute the statistics of interest, thus, a wide variety of results can be computed with
only minimal work and time required.
Finite Markov chain imbedding is an additional, valuable tool which statisticians
might avail themselves when analyzing studies that incorporate an outcome measured
repeatedly over time. Presently, in most situations, investigators choose a single mea-
surement evaluated at a single time point. The use of finite Markov chain imbedding
Chapter 6. Discussion 95
allows investigators to study results as a pattern of outcomes over time. It is this tran-
sition from a single measurement to a pattern, which could prove extremely important
to future researchers. By examining the pattern of results from observed data instead of
focusing on a single outcome measurement, one can more clearly understand the effect
of a treatment which could otherwise be obscured. This is particularly true when the
effect being measured is not well understood and difficult to pinpoint prior to initiating
a study.
The computer code provided is the first code, to my knowledge, which allows the
user to easily apply finite Markov chain imbedding for analysis in a common statistical
software package. The code is relatively simple to use, flexible and efficient. It is in-
tended that this code will be published and made freely available to other statisticians
and investigators for their use, such that they can implement finite Markov chain imbed-
ding methods with relative ease. The ease and simplicity of this code allows users to
investigate a multitude of possibilities with only minor modifications to the input pa-
rameters. Conflicting conclusions which could arise from subjective implementation of
different possible designs can be clarified by understanding the discrepancies between
designs. An improved understanding of the true treatment effect results which should
allow more effective use of limited money and resources and enhanced decision-making.
Nevertheless, despite these advantages, finite Markov chain imbedding is not a panacea.
Regardless of the results seen, a confirmatory randomized phase III trial would still need
to be conducted to compare a presumed effective treatment with the standard of care.
Phase II trials tend to be small, single arm trials which aim to study whether a treatment
has potential efficacy, and should be studied further. In addition, even though the ability
to investigate a range of hypotheses is provided by finite Markov chain imbedding, it
does not guarantee that all hypotheses of interest will be investigated. An investigator,
or investigators, who are overly optimistic/pessimistic, might remain so over the entire
range of hypotheses investigated. Finally, one must evaluate the trial results across all
Chapter 6. Discussion 96
simulations, not individually, which takes time to understand all the analyses undertaken,
and time to set up all the simulations.
Even though all frequentist hypothesis tests incorporate a priori information in the
framing of the hypotheses and type I and II errors, the individual analyses are not
Bayesian and do not formally incorporate prior information with the trial data. P-values
are probabilities of observing data as extreme, or more extreme, than one did observe,
under the null hypothesis. Each probability has to be interpreted in the context of the null
hypothesis, and does not measure directly whether the treatment is effective. Given that
these results are based on simulations, one must fully understand how the calculations are
performed to ensure proper interpretation of the results. Sufficient statistical knowledge
must be displayed by the user to prevent incorrect conclusions. This is partially forced
upon the user as they must have statistical coding abilities to construct arrays and utilize
R code, however, there is no guarantee of valid inference.
Further work is still needed on finite Markov chain imbedding. First, time-to-event
outcomes, such as survival or progression-free survival, were not discussed in this disser-
tation, although this is a very important efficacy outcome. While adding an additional
state to the state space would account for this partially, it does not wholly account for
time to event outcomes. Second, there is no accounting for known predictors of effi-
cacy. In breast cancer alone, Her2 status and whether there is nodal involvement are two
significant predictors of efficacy which are not always known prior to trial recruitment.
Thus, it is important to include known predictors, and will become more important as
additional disease markers become known. Third, finite Markov chain imbedding meth-
ods could be valuable for understanding treatment effects in other manners rather than
strict efficacy. There are often questions about the treatment regimen to use; such as
how long should a cycle be (e.g. 21, 28, 35 days), how should the dosing schedule be set
per cycle (e.g. daily, 3 weeks on-treatment followed by 1 week rest, twice daily), is there
a maximum number of cycles which is acceptable, should there be a lead-in period for
Chapter 6. Discussion 97
one agent in a multi-agent treatment, in which order should consecutive treatments be
structured, etc;. Fourth, there is a need to evaluate finite Markov chain imbedding as a
tool for study design and sample size calculation. Fifth, although 9 methods for calculat-
ing statistics are available, there are potentially other methods which could be used and
these should be investigated. Additional methods for calculating statistics could become
apparent if this design was used in other therapeutic areas for which finite Markov chain
imbedding methods might prove useful, such as central nervous system or pain studies.
These therapeutic areas have outcomes which are measured on each subject repeatedly
over time, and the outcome (or state) which a subject resides would change constantly
throughout the study period. Thus, they might naturally prove as therapeutic areas in
which finite Markov chain imbedding methods should be studied further.
In summary, this dissertation has used a new method, finite Markov chain imbedding,
to evaluate phase II oncology clinical trial data. The ability to investigate a range of
designs and hypotheses simultaneously with relative ease is demonstrated. This powerful
tool has the ability to increase trial efficiency and improve our statistical knowledge.
Appendix A
Data
98
Appendix A. Data 99
ID Baseline 1 2 3 4 5 6 7 8 9 Off-Study Best Response
021-001 457 456 426 429 423 444 Progression SD
021-002 124 127 126 111 Physician Discretion SD
021-003 110 102 119 Progression SD
021-004 265 282 268 321 Progression SD
021-005 223 257 Progression PD
021-006 114 101 102 100 91 90 90 79 84 88 Progression SD
021-007 140 138 133 134 Adverse Event SD
021-008 104 Adverse Event IE
021-009 66 95 Adverse Event PD
021-010 25 50 Progression PD
021-011 240 340 Progression PD
021-012 212 312 Progression PD
021-013 187 268 Progression PD
021-014 208 190 180 185 172 174 183 182 196 169 Still On-Treatment SD
021-015 194 161 146 136 135 125 128 130 135 133 Still On-Treatment PR
021-016 207 213 226 229 223 242 244 248 250 Progression SD
021-017 107 Death IE
021-018 70 72 76 69 79 75 75 81 Adverse Event SD
021-019 33 Never Treated Ineligible
021-020 114 137 Progression PD
021-021 324 318 Symptomatic Progression PD
021-022 294 239 217 167 Completed 8 Cycles SD (uPR)
021-023 429 368 385 Symptomatic Progression SD
021-024 25 24 Withdrew Consent SD
021-025 268 188 154 Unrelated Disease Complications PR
021-026 225 263 Symptomatic Progression SD
021-027 298 319 Symptomatic Progression SD
021-028 77 Adverse Event IE
021-029 34 27 33 27 25 22 28 27 12 14 Still On-Treatment PR
021-030 227 Adverse Event IE
021-031 58 48 53 48 46 43 42 Still On-Treatment SD
021-032 179 161 160 157 166 164 157 136 138 Still On-Treatment SD
021-033 165 163 174 182 Progression SD
021-034 79 84 84 72 71 64 37 59 Still On-Treatment SD
021-035 158 240 Progression PD
021-036 314 328 339 371 367 Symptomatic Progression SD
021-037 189 167 Withdrew Consent SD
*PR=Partial Response, SD=Stable Disease, PD=Progressive Disease, IE=Inevaluable, uPR=Unconfirmed Partial Response
Table A.1: Data, in mm
Appendix B
State Spaces
100
Appendix B. State Spaces 101
ID Baseline 1 2 3 4 5 6 7 8 9 Off-Study Best Response
021-001 ∅ SD SD SD SD PD Progression SD
021-002 ∅ SD SD SD C Physician Discretion SD
021-003 ∅ SD PD Progression SD
021-004 ∅ SD SD PD Progression SD
021-005 ∅ PD Progression PD
021-006 ∅ SD SD SD SD SD SD SD SD PD Progression SD
021-007 ∅ SD SD SD Adverse Event SD
021-008 ∅ C Adverse Event IE
021-009 ∅ PD Adverse Event PD
021-010 ∅ PD Progression PD
021-011 ∅ PD Progression PD
021-012 ∅ PD Progression PD
021-013 ∅ PD Progression PD
021-014 ∅ SD SD SD SD SD SD SD SD SD Still On-Treatment SD
021-015 ∅ SD SD UR R R R R R R Still On-Treatment R
021-016 ∅ SD SD SD SD SD SD SD SD Progression SD
021-017 ∅ C Death IE
021-018 ∅ SD SD SD SD SD SD SD Adverse Event SD
021-019 ∅ Never Treated Ineligible
021-020 ∅ PD Progression PD
021-021 ∅ PD Symptomatic Progression PD
021-022 ∅ SD SD UR C Completed 8 Cycles SD (uPR)
021-023 ∅ SD PD Symptomatic Progression SD
021-024 ∅ SD Withdrew Consent SD
021-025 ∅ UR R C Unrelated Disease Complications R
021-026 ∅ SD PD Symptomatic Progression SD
021-027 ∅ SD PD Symptomatic Progression SD
021-028 ∅ C Adverse Event IE
021-029 ∅ SD SD SD SD UR R R R R Still On-Treatment R
021-030 ∅ C Adverse Event IE
021-031 ∅ SD SD SD SD SD SD Still On-Treatment SD
021-032 ∅ SD SD SD SD SD SD SD SD Still On-Treatment SD
021-033 ∅ SD SD SD Progression SD
021-034 ∅ SD SD SD SD SD SD SD Still On-Treatment SD
021-035 ∅ PD Progression PD
021-036 ∅ SD SD SD PD Symptomatic Progression SD
021-037 ∅ SD Withdrew Consent SD
*R=Partial Response, SD=Stable Disease, PD=Progressive Disease, IE=Inevaluable, UR=Unconfirmed Partial Response, C=Censored
Table B.1: Data, in State Spaces According to RECIST Criteria
Appendix C
Computer Code
pval.fn<-function(startvector,H0array,dataarray,sampsize,iterations,method,outv,outv2){
##### startvector - starting positions #####
##### H0array - array of transition matrices under H0 #####
##### dataarray - array of transition matrices as given by data ######
l<-dim(H0array)[2]
ntransitions<-dim(H0array)[3]
update<-t(startvector)
data.update<-t(startvector)
#### Create data and H0 matrices #####
for (i in 1:ntransitions){
endstate.mat<-update%*%H0array[,,i]
update<-endstate.mat
data.endstate.mat<-data.update%*%dataarray[,,i]
data.update<-data.endstate.mat
}
#### Generate random data under H0 #####
nulldata<- matrix(sample(1:l,sampsize*iterations,prob=endstate.mat,replace=T),nrow=iterations,byrow=T)
##### Fake data added to get identical number of outcomes using summary.factor ####
fakedat<-matrix(rep(1:l,iterations),nrow=iterations,byrow=T)
endstatesuse<-cbind(nulldata,fakedat)
endstate<-t(as.matrix(apply(endstatesuse,1,summary.factor))-rep(1,l))/sampsize
#### Calculate p-value depending on method #####
#### Method 1 - Superiority of one outcome (i.e. CR) #####
if (method==1) {pval<-sum(endstate[,outv]>=data.update[outv])}
#### Method 2 - Superiority of the sum of multiple outcomes (i.e. CR+PR) #####
102
Appendix C. Computer Code 103
if (method==2) {pval<-sum(apply(endstate[,outv],1,sum)>=sum(data.update[outv]))}
#### Method 3 - Superiority of any one of many multiple outcomes (CR or PR) #####
if (method==3) {
v<-matrix(NA,nrow=iterations,ncol=length(outv))
for (i in 1:length(outv)){
v[,i]<-(endstate[,outv[i]]>=data.update[outv[i]])
}
pval<-sum(apply(v,1,max))
}
#### Method 4 - Either A or sum of B (CR or CR+PR) #####
if (method==4){
temp<-rep(0,iterations)
for (i in 1:iterations){temp[i]<-(sum(endstate[i,outv]>=data.update[outv] ||
sum(endstate[i,outv2])>=sum(data.update[outv2])))}
pval<-sum(temp)
}
##### Method 5 - Either superiority of A or equivalence of A and superiority of B ####
if (method==5){
temp<-rep(0,iterations)
for (i in 1:iterations){temp[i]<-(sum(endstate[i,outv]>data.update[outv] ||
(endstate[i,outv]==data.update[outv] && sum(endstate[i,outv2])>=sum(data.update[outv2]))))
}
pval<-sum(temp)
}
##### Method 6 - Superiority of A and inferiority of B ####
if (method==6){
temp<-rep(0,iterations)
for (i in 1:iterations){temp[i]<-(sum(endstate[i,outv]>=data.update[outv] ||
sum(endstate[i,outv2])<=sum(data.update[outv2])))}
pval<-sum(temp)
}
##### Method 7 - Strict superiority of A and strict inferiority of B ####
if (method==7){
temp<-rep(0,iterations)
for (i in 1:iterations){temp[i]<-(sum(endstate[i,outv]>data.update[outv] ||
sum(endstate[i,outv2])<sum(data.update[outv2])))}
pval<-sum(temp)
}
###### Method 8 - Weighted model ######
if (method==8){
temp<-endstate%*%outv
Appendix C. Computer Code 104
temp1<-data.update%*%outv
temp2<-rep(0,iterations)
for (i in 1:iterations){
temp2[i]<-temp[i]>=temp1
}
pval<-sum(temp2)
}
#### Method 9 - Superiority of each one of many multiple outcomes (CR and PR) #####
if (method==9) {
v<-matrix(NA,nrow=iterations,ncol=length(outv))
for (i in 1:length(outv)){
v[,i]<-(endstate[,outv[i]]>=data.update[outv[i]])
}
pval<-sum(apply(v,1,prod))
}
print(pval/iterations)
}
cond.prob.fn<-function(startvector,H0array,dataarray,h1array,sampsize,interimss,iterations,method,outv,outv2){
##### startvector - starting positions #####
##### H0array - array of transition matrices under H0 #####
##### dataarray - array of transition matrices as given by data ######
l<-dim(H0array)[2]
ntransitions<-dim(H0array)[3]
update<-t(startvector)
data.update<-t(startvector)
alt.update<-t(startvector)
#### Create data, HA and H0 matrices #####
for (i in 1:ntransitions){
endstate.mat<-update%*%H0array[,,i]
update<-endstate.mat
data.endstate.mat<-data.update%*%dataarray[,,i]
data.update<-data.endstate.mat
alt.endstate.mat<-alt.update%*%h1array[,,i]
alt.update<-alt.endstate.mat
}
#### Generate random data under H0 #####
nulldata<- matrix(sample(1:l,sampsize*iterations,prob=endstate.mat,replace=T),nrow=iterations,byrow=T)
##### Fake data added to get identical number of outcomes using summary.factor ####
Appendix C. Computer Code 105
fakedat<-matrix(rep(1:l,iterations),nrow=iterations,byrow=T)
endstatesuse<-cbind(nulldata,fakedat)
endstate<-t(as.matrix(apply(endstatesuse,1,summary.factor))-rep(1,l))
##### Data at interim analysis ######
int.data<-matrix(rep(data.update*interimss,iterations),nrow=iterations,byrow=T)
##### Generate future data under HA ######
futdata<- matrix(sample(1:l,(sampsize-interimss)*iterations,prob=alt.endstate.mat,replace=T),nrow=iterations,byrow=T)
##### Fake data added to get identical number of outcomes using summary.factor ####
fakedath1<-matrix(rep(1:l,iterations),nrow=iterations,byrow=T)
alt.endstatesuse<-cbind(futdata,fakedath1)
alt.endstate<-t(as.matrix(apply(alt.endstatesuse,1,summary.factor))-rep(1,l))+int.data
cond.prob<-rep(NA,iterations)
#### Calculate conditional probability depending on method #####
#### Method 1 - Superiority of one outcome (i.e. CR) #####
if (method==1) {for (i in 1:iterations){
cond.prob[i]<-sum(endstate[,outv]>=alt.endstate[i,outv])/iterations
}
}
#### Method 2 - Superiority of the sum of multiple outcomes (i.e. CR+PR) #####
if (method==2) {for (i in 1:iterations){
cond.prob[i]<-sum(apply(endstate[,outv],1,sum)>=sum(alt.endstate[i,outv]))/iterations
}
}
#### Method 3 - Superiority of any one of many multiple outcomes (CR or PR) #####
if (method==3) {
for (j in 1:iterations){
v<-matrix(NA,nrow=iterations,ncol=length(outv))
for (i in 1:length(outv)){
v[,i]<-(endstate[,outv[i]]>=alt.endstate[j,outv[i]])
}
cond.prob[j]<-sum(apply(v,1,max))/iterations
}
}
#### Method 4 - Either A or sum of B (CR or CR+PR) #####
if (method==4){
for (j in 1:iterations){
temp<-rep(0,iterations)
for (i in 1:iterations){temp[i]<-(sum(endstate[i,outv]>=alt.endstate[j,outv] ||
sum(endstate[i,outv2])>=sum(alt.endstate[j,outv2])))}
Appendix C. Computer Code 106
cond.prob[j]<-sum(temp)/iterations
}
}
##### Method 5 - Either superiority of A or equivalence of A and superiority of B ####
if (method==5){
for (j in 1:iterations){
temp<-rep(0,iterations)
for (i in 1:iterations){temp[i]<-(sum(endstate[i,outv]>alt.endstate[j,outv] ||
(endstate[i,outv]==alt.endstate[j,outv] && sum(endstate[i,outv2])>=sum(alt.endstate[j,outv2]))))
}
cond.prob[j]<-sum(temp)/iterations
}
}
##### Method 6 - Superiority of A and inferiority of B ####
if (method==6){
for (j in 1:iterations){
temp<-rep(0,iterations)
for (i in 1:iterations){temp[i]<-(sum(endstate[i,outv]>=alt.endstate[j,outv] ||
sum(endstate[i,outv2])<=sum(alt.endstate[j,outv2])))}
cond.prob[j]<-sum(temp)/iterations
}}
##### Method 7 - Strict superiority of A and strict inferiority of B ####
if (method==7){
for (j in 1:iterations){
temp<-rep(0,iterations)
for (i in 1:iterations){temp[i]<-(sum(endstate[i,outv]>alt.endstate[j,outv] ||
sum(endstate[i,outv2])<sum(alt.endstate[j,outv2])))}
cond.prob[j]<-sum(temp)/iterations
}}
###### Method 8 - Weighted model ######
if (method==8){
for (j in 1:iterations){
temp<-endstate%*%outv
temp1<-alt.endstate[j,]%*%outv
temp2<-rep(0,iterations)
for (i in 1:iterations){
temp2[i]<-sum(temp[i])>=sum(temp1)
}
cond.prob[j]<-sum(temp2)/iterations
}}
#### Method 9 - Superiority of each one of many multiple outcomes (CR and PR) #####
Appendix C. Computer Code 107
if (method==9) {
for (j in 1:iterations){
v<-matrix(NA,nrow=iterations,ncol=length(outv))
for (i in 1:length(outv)){
v[,i]<-(endstate[,outv[i]]>=alt.endstate[j,outv[i]])
}
cond.prob[j]<-sum(apply(v,1,prod))/iterations
}
}
condprob<-sum(cond.prob<=.05)/iterations
print(condprob)
}
Appendix D
Results
Matrix p∅−ur p∅−sd p∅−pd p∅−c pur−cr pur−sdo psd−ur psd−sd psd−sdo
Data ∅ 1/36 22/36 8/36 5/36 0 0 0 0 0
Data eval 2 0 0 0 0 1 0 0 16/22 6/22
Data eval 3 0 0 0 0 0 0 2/16 12/16 2/16
Data eval 4 0 0 0 0 1/2 1/2 1/12 8/12 3/12
Data eval 5 0 0 0 0 1 0 0 7/8 1/8
H0 .02 .4 .4 .18 .85 .15 .05 .6 .35
Interim ∅ 0 8/15 6/15 1/15 0 0 0 0 0
Interim eval 2 0 0 0 0 0 0 0 7/8 1/8
Interim eval 3 0 0 0 0 0 0 1/7 5/7 1/7
Interim eval 4 0 0 0 0 1 0 0 3/5 2/5
Interim eval 5 0 0 0 0 0 0 0 3/3 0
HA .2 .42 .2 .18 .85 .15 .1 .7 .2
Table D.1: Data input for matrix (2.9) modelling the RECIST criteria
Endstates pR pur psd psdo ppd pc
Data 3/36 0 7/36 13/36 8/36 5/36
H0 .0503 .0043 .0518 .3135 .4000 .1800
Interim 1/15 0 3/15 4/15 6/15 1/15
HA .2482 .0144 .1008 .2566 .2000 .1800
Table D.2: Endstate probabilities for (2.9) modelling the RECIST criteria
108
Appendix D. Results 109
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.245 0.820 0.049
2 1000 2,3,4,5 0.029 0.591 0.278
2 1000 2,3,4 0.005 0.971 0.867
3 1000 2,4 0.278 0.449 0.032
4 1000 2 3,4,5 0.372 0.002 0.002
4 1000 2 2,3,4,5 0.285 0.431 0.017
4 1000 2 2,3,4 0.277 0.826 0.056
5 1000 2 3,4,5 0.109 0.870 0.121
5 1000 2 2,3,4,5 0.107 0.902 0.137
6 1000 2 6 0.343 0.278 0.001
6 1000 2 6,7 0.299 0.486 0.023
7 1000 2 6 0.138 0.309 0.004
7 1000 2 6,7 0.112 0.714 0.074
8 1000 0,1,1,1,1,0,0 0.016 0.585 0.144
8 1000 0,4,1,1,1,0,0 0.050 0.868 0.131
8 1000 0,4,2,2,1,0,0 0.018 0.961 0.533
9 1000 2,5 0.143 0.813 0.073
1 500 2 0.256 0.798 0.056
2 500 2,3,4,5 0.028 0.586 0.306
2 500 2,3,4 0.002 0.974 0.870
3 500 2,4 0.272 0.468 0.038
4 500 2 3,4,5 0.328 0.008 0.000
4 500 2 2,3,4,5 0.280 0.476 0.008
4 500 2 2,3,4 0.306 0.798 0.012
5 500 2 3,4,5 0.094 0.862 0.150
5 500 2 2,3,4,5 0.128 0.868 0.130
6 500 2 6 0.294 0.304 0.000
6 500 2 6,7 0.296 0.466 0.022
7 500 2 6 0.152 0.544 0.004
7 500 2 6,7 0.116 0.730 0.050
8 500 0,1,1,1,1,0,0 0.028 0.596 0.290
8 500 0,4,1,1,1,0,0 0.046 0.934 0.194
8 500 0,4,2,2,1,0,0 0.012 0.962 0.502
9 500 2,5 0.152 0.848 0.090
Table D.3: Outcomes for (2.9) modelling the RECIST criteria and n=36 patients
Appendix D. Results 110
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.142 0.934 0.035
2 1000 2,3,4,5 0.011 0.819 0.415
2 1000 2,3,4 0.000 0.999 0.930
3 1000 2,4 0.139 0.543 0.093
4 1000 2 3,4,5 0.161 0.038 0.010
4 1000 2 2,3,4,5 0.149 0.791 0.026
4 1000 2 2,3,4 0.116 0.941 0.043
5 1000 2 3,4,5 0.148 0.974 0.186
5 1000 2 2,3,4,5 0.141 0.981 0.118
6 1000 2 6 0.167 0.720 0.001
6 1000 2 6,7 0.148 0.773 0.029
7 1000 2 6 0.160 0.853 0.007
7 1000 2 6,7 0.152 0.874 0.111
8 1000 0,1,1,1,1,0,0 0.006 0.874 0.405
8 1000 0,4,1,1,1,0,0 0.027 0.986 0.298
8 1000 0,4,2,2,1,0,0 0.005 0.992 0.700
9 1000 2,5 0.044 0.984 0.123
1 500 2 0.122 0.942 0.024
2 500 2,3,4,5 0.008 0.802 0.376
2 500 2,3,4 0.000 0.998 0.918
3 500 2,4 0.148 0.540 0.056
4 500 2 3,4,5 0.174 0.024 0.000
4 500 2 2,3,4,5 0.148 0.754 0.026
4 500 2 2,3,4 0.122 0.944 0.034
5 500 2 3,4,5 0.136 0.986 0.106
5 500 2 2,3,4,5 0.156 0.980 0.108
6 500 2 6 0.140 0.734 0.006
6 500 2 6,7 0.176 0.844 0.034
7 500 2 6 0.136 0.836 0.006
7 500 2 6,7 0.114 0.920 0.138
8 500 0,1,1,1,1,0,0 0.004 0.852 0.420
8 500 0,4,1,1,1,0,0 0.030 0.992 0.294
8 500 0,4,2,2,1,0,0 0.000 0.994 0.730
9 500 2,5 0.060 0.978 0.108
Table D.4: Outcomes for (2.9) modelling the RECIST criteria and n=54 patients
Matrix p∅−r p∅−sd p∅−pd p∅−c pur−cr pur−pd pur−c psd−ur psd−sd psd−pd psd−c
Data ∅ 1/36 20/36 10/36 5/36 0 0 0 0 0 0 0
Data eval 2 0 0 0 0 1 0 0 0 16/20 3/20 1/20
Data eval 3 0 0 0 0 0 0 0 2/16 12/16 2/16 0
Data eval 4 0 0 0 0 1/2 0 1/2 1/12 8/12 1/12 2/12
Data eval 5 0 0 0 0 1 0 0 0 7/8 0 1/8
H0 .02 .4 .4 .18 .85 .05 .1 .05 .6 .25 .1
Interim ∅ 0 8/15 6/15 1/15 0 0 0 0 0 0 0
Interim eval 2 0 0 0 0 0 0 0 0 7/8 1/8 0
Interim eval 3 0 0 0 0 0 0 0 1/7 5/7 1/7 0
Interim eval 4 0 0 0 0 1 0 0 0 3/5 0 2/5
Interim eval 5 0 0 0 0 0 0 0 0 3/3 0 0
HA .2 .42 .2 .18 .85 .05 .1 .1 .7 .15 .05
Table D.5: Data input for matrix (4.1) modelling the transition-time dependent RECIST
criteria
Appendix D. Results 111
Endstates pR pur psd ppd pc
Data 3/36 0 7/36 16/36 10/36
H0 .0503 .0043 .0518 .6206 .2730
Interim 1/15 0 3/15 8/15 3/15
HA .2482 .0144 .1008 .3742 .2624
Table D.6: Endstate probabilities for (4.1) modelling the transition-time dependent RE-
CIST criteria
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.288 0.810 0.051
2 1000 2,3,4 0.008 0.983 0.847
2 1000 2,3 0.311 0.830 0.011
3 1000 2,4 0.279 0.439 0.044
4 1000 2 3,4 0.282 0.309 0.029
4 1000 2 2,3,4 0.283 0.814 0.052
4 1000 2,3 2,3,4 0.273 0.798 0.058
5 1000 2 3,4 0.273 0.915 0.159
5 1000 2 2,3,4 0.275 0.934 0.144
6 1000 2 5 0.278 0.560 0.010
6 1000 2 5,6 0.285 0.802 0.043
7 1000 2 5 0.270 0.824 0.042
7 1000 2 5,6 0.281 0.914 0.155
8 1000 0,1,1,1,0,0 0.002 0.967 0.855
8 1000 0,4,1,1,0,0 0.058 0.930 0.243
8 1000 0,8,1,1,0,0 0.096 0.943 0.165
9 1000 2,4 0.001 0.991 0.966
1 500 2 0.256 0.820 0.060
2 500 2,3,4 0.006 0.962 0.852
2 500 2,3 0.306 0.852 0.036
3 500 2,4 0.270 0.464 0.044
4 500 2 3,4 0.240 0.340 0.028
4 500 2 2,3,4 0.284 0.772 0.050
4 500 2,3 2,3,4 0.274 0.798 0.050
5 500 2 3,4 0.104 0.914 0.168
5 500 2 2,3,4 0.126 0.906 0.152
6 500 2 5 0.272 0.608 0.014
6 500 2 5,6 0.282 0.778 0.058
7 500 2 5 0.134 0.716 0.026
7 500 2 5,6 0.100 0.910 0.176
8 500 0,1,1,1,0,0 0.000 0.970 0.860
8 500 0,4,1,1,0,0 0.028 0.922 0.232
8 500 0,8,1,1,0,0 0.128 0.910 0.162
9 500 2,4 0.000 0.992 0.970
Table D.7: Outcomes for (4.1) modelling the transition-time dependent RECIST criteria
with n=36 patients
Appendix D. Results 112
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.141 0.982 0.112
2 1000 2,3,4 0.000 0.997 0.919
2 1000 2,3 0.191 0.956 0.038
3 1000 2,4 0.129 0.511 0.031
4 1000 2 3,4 0.126 0.631 0.037
4 1000 2 2,3,4 0.120 0.956 0.120
4 1000 2,3 2,3,4 0.147 0.977 0.028
5 1000 2 3,4 0.129 0.994 0.115
5 1000 2 2,3,4 0.130 0.992 0.257
6 1000 2 5 0.145 0.890 0.027
6 1000 2 5,6 0.119 0.977 0.045
7 1000 2 5 0.142 0.918 0.109
7 1000 2 5,6 0.124 0.982 0.237
8 1000 0,1,1,1,0,0 0.001 0.996 0.961
8 1000 0,4,1,1,0,0 0.029 0.989 0.453
8 1000 0,8,1,1,0,0 0.057 0.996 0.274
9 1000 2,4 0.000 0.999 0.998
1 500 2 0.120 0.934 0.098
2 500 2,3,4 0.000 0.998 0.976
2 500 2,3 0.192 0.966 0.044
3 500 2,4 0.132 0.542 0.038
4 500 2 3,4 0.146 0.622 0.096
4 500 2 2,3,4 0.122 0.960 0.094
4 500 2,3 2,3,4 0.144 0.954 0.034
5 500 2 3,4 0.144 0.986 0.108
5 500 2 2,3,4 0.166 0.990 0.260
6 500 2 5 0.150 0.900 0.032
6 500 2 5,6 0.128 0.960 0.042
7 500 2 5 0.148 0.938 0.076
7 500 2 5,6 0.118 0.994 0.276
8 500 0,1,1,1,0,0 0.000 0.990 0.926
8 500 0,4,1,1,0,0 0.026 0.984 0.514
8 500 0,8,1,1,0,0 0.048 0.990 0.234
9 500 2,4 0.000 0.998 0.998
Table D.8: Outcomes for (4.1) modelling the transition-time dependent RECIST criteria
with n=54 patients
Appendix D. Results 113
Matrix p∅−ur p∅−sd p∅−pd p∅−c pur−cr pur−pd pur−c psd−ur psd−sd psd−pd psd−c
H0 .04 .4 .4 .16 .85 .05 .1 .05 .6 .25 .1
Table D.9: Modified data input (2), slightly better expectations under H0, for matrix
(4.1) modelling the transition-time dependent RECIST criteria
Endstates pR pur psd ppd pc
H0 .0673 .0043 .0518 .6216 .2550
Table D.10: Endstate probabilities for modified data input (2), slightly better expecta-
tions under H0, for (4.1) modelling the transition-time dependent RECIST criteria
Appendix D. Results 114
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.243 0.801 0.053
2 1000 2,3,4 0.005 0.980 0.841
2 1000 2,3 0.321 0.695 0.047
3 1000 2,4 0.288 0.392 0.036
4 1000 2 3,4 0.273 0.428 0.034
4 1000 2 2,3,4 0.275 0.797 0.039
4 1000 2,3 2,3,4 0.261 0.816 0.036
5 1000 2 3,4 0.111 0.925 0.164
5 1000 2 2,3,4 0.119 0.917 0.148
6 1000 2 5 0.272 0.528 0.009
6 1000 2 5,6 0.279 0.791 0.039
7 1000 2 5 0.124 0.743 0.042
7 1000 2 5,6 0.117 0.909 0.162
8 1000 0,1,1,1,0,0 0.001 0.978 0.874
8 1000 0,4,1,1,0,0 0.064 0.919 0.260
8 1000 0,8,1,1,0,0 0.109 0.906 0.202
9 1000 2,4 0.000 0.991 0.970
1 500 2 0.280 0.820 0.044
2 500 2,3,4 0.000 0.966 0.872
2 500 2,3 0.318 0.832 0.012
3 500 2,4 0.270 0.416 0.030
4 500 2 3,4 0.284 0.316 0.030
4 500 2 2,3,4 0.248 0.810 0.044
4 500 2,3 2,3,4 0.266 0.806 0.050
5 500 2 3,4 0.104 0.946 0.134
5 500 2 2,3,4 0.094 0.922 0.156
6 500 2 5 0.294 0.556 0.010
6 500 2 5,6 0.258 0.814 0.064
7 500 2 5 0.112 0.696 0.074
7 500 2 5,6 0.122 0.916 0.118
8 500 0,1,1,1,0,0 0.006 0.956 0.856
8 500 0,4,1,1,0,0 0.048 0.920 0.324
8 500 0,8,1,1,0,0 0.102 0.926 0.162
9 500 2,4 0.000 0.992 0.966
Table D.11: Outcomes for modified data input (2), slightly better expectations under H0,
for (4.1) modelling the transition-time dependent RECIST criteria with n=36 patients
Appendix D. Results 115
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.139 0.952 0.053
2 1000 2,3,4 0.001 1.000 0.940
2 1000 2,3 0.165 0.969 0.037
3 1000 2,4 0.143 0.722 0.047
4 1000 2 3,4 0.117 0.613 0.044
4 1000 2 2,3,4 0.151 0.940 0.037
4 1000 2,3 2,3,4 0.147 0.983 0.110
5 1000 2 3,4 0.141 0.979 0.268
5 1000 2 2,3,4 0.139 0.988 0.241
6 1000 2 5 0.127 0.883 0.017
6 1000 2 5,6 0.132 0.947 0.096
7 1000 2 5 0.139 0.955 0.058
7 1000 2 5,6 0.140 0.989 0.134
8 1000 0,1,1,1,0,0 0.000 0.997 0.933
8 1000 0,4,1,1,0,0 0.032 0.994 0.327
8 1000 0,8,1,1,0,0 0.051 0.976 0.245
9 1000 2,4 0.000 0.999 0.997
1 500 2 0.146 0.954 0.050
2 500 2,3,4 0.002 1.000 0.966
2 500 2,3 0.176 0.968 0.130
3 500 2,4 0.132 0.728 0.092
4 500 2 3,4 0.124 0.644 0.106
4 500 2 2,3,4 0.124 0.978 0.138
4 500 2,3 2,3,4 0.154 0.964 0.106
5 500 2 3,4 0.142 0.994 0.132
5 500 2 2,3,4 0.142 0.982 0.276
6 500 2 5 0.122 0.898 0.036
6 500 2 5,6 0.128 0.972 0.108
7 500 2 5 0.148 0.940 0.108
7 500 2 5,6 0.154 0.994 0.110
8 500 0,1,1,1,0,0 0.002 0.994 0.952
8 500 0,4,1,1,0,0 0.038 0.998 0.476
8 500 0,8,1,1,0,0 0.050 0.984 0.174
9 500 2,4 0.000 1.000 0.996
Table D.12: Outcomes for modified data input (2), slightly better expectations under
H0, for matrix (4.1) modelling the transition-time dependent RECIST criteria with n=54
patients
Matrix p∅−r p∅−sd p∅−pd p∅−c pur−cr pur−pd pur−c psd−ur psd−sd psd−pd psd−c
H0 .06 .6 .2 .16 .95 .025 .025 .01 .8 .15 .04
Table D.13: Modified data input (3), extremely better expectations under H0, for matrix
(4.1) modelling the transition-time dependent RECIST criteria
Endstates pR pur psd ppd pc
H0 .0709 .0031 .2458 .4675 .2327
Table D.14: Endstate Probabilities for Modified Data Input (3), extremely better expec-
tations under H0, for (4.1) modelling the transition-time dependent RECIST criteria
Appendix D. Results 116
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.465 0.630 0.007
2 1000 2,3,4 0.708 0.019 0.002
2 1000 2,3 0.511 0.497 0.002
3 1000 2,4 0.914 0.000 0.000
4 1000 2 3,4 0.919 0.000 0.000
4 1000 2 2,3,4 0.775 0.016 0.000
4 1000 2,3 2,3,4 0.814 0.017 0.000
5 1000 2 3,4 0.444 0.631 0.006
5 1000 2 2,3,4 0.417 0.635 0.014
6 1000 2 5 0.705 0.016 0.000
6 1000 2 5,6 0.791 0.017 0.000
7 1000 2 5 0.601 0.027 0.000
7 1000 2 5,6 0.716 0.039 0.002
8 1000 0,1,1,1,0,0 0.743 0.014 0.000
8 1000 0,4,1,1,0,0 0.493 0.480 0.006
8 1000 0,8,1,1,0,0 0.429 0.624 0.009
9 1000 2,4 0.386 0.638 0.033
1 500 2 0.438 0.646 0.010
2 500 2,3,4 0.734 0.010 0.000
2 500 2,3 0.482 0.672 0.008
3 500 2,4 0.878 0.000 0.000
4 500 2 3,4 0.908 0.000 0.000
4 500 2 2,3,4 0.798 0.016 0.000
4 500 2,3 2,3,4 0.796 0.014 0.000
5 500 2 3,4 0.414 0.610 0.014
5 500 2 2,3,4 0.424 0.624 0.012
6 500 2 5 0.672 0.026 0.000
6 500 2 5,6 0.794 0.004 0.000
7 500 2 5 0.546 0.022 0.000
7 500 2 5,6 0.752 0.038 0.002
8 500 0,1,1,1,0,0 0.752 0.020 0.002
8 500 0,4,1,1,0,0 0.500 0.522 0.008
8 500 0,8,1,1,0,0 0.396 0.608 0.014
9 500 2,4 0.320 0.666 0.012
Table D.15: Outcomes for modified data input (3), extremely better expectations un-
der H0, for (4.1) modelling the transition-time dependent RECIST criteria with n=36
patients
Appendix D. Results 117
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.341 0.899 0.012
2 1000 2,3,4 0.774 0.033 0.000
2 1000 2,3 0.338 0.922 0.016
3 1000 2,4 0.871 0.000 0.000
4 1000 2 3,4 0.876 0.000 0.000
4 1000 2 2,3,4 0.791 0.054 0.000
4 1000 2,3 2,3,4 0.797 0.048 0.000
5 1000 2 3,4 0.339 0.873 0.015
5 1000 2 2,3,4 0.352 0.886 0.024
6 1000 2 5 0.607 0.074 0.000
6 1000 2 5,6 0.781 0.036 0.000
7 1000 2 5 0.591 0.139 0.000
7 1000 2 5,6 0.790 0.144 0.002
8 1000 0,1,1,1,0,0 0.769 0.080 0.002
8 1000 0,4,1,1,0,0 0.453 0.778 0.013
8 1000 0,8,1,1,0,0 0.393 0.863 0.022
9 1000 2,4 0.238 0.888 0.026
1 500 2 0.344 0.778 0.018
2 500 2,3,4 0.746 0.044 0.000
2 500 2,3 0.366 0.944 0.016
3 500 2,4 0.860 0.000 0.000
4 500 2 3,4 0.886 0.000 0.000
4 500 2 2,3,4 0.818 0.072 0.000
4 500 2,3 2,3,4 0.768 0.086 0.000
5 500 2 3,4 0.340 0.880 0.012
5 500 2 2,3,4 0.336 0.872 0.020
6 500 2 5 0.608 0.102 0.000
6 500 2 5,6 0.796 0.044 0.000
7 500 2 5 0.612 0.138 0.000
7 500 2 5,6 0.758 0.064 0.000
8 500 0,1,1,1,0,0 0.718 0.038 0.002
8 500 0,4,1,1,0,0 0.470 0.746 0.010
8 500 0,8,1,1,0,0 0.382 0.896 0.012
9 500 2,4 0.212 0.880 0.034
Table D.16: Outcomes for modified data input (3), extremely better expectations under
H0, for matrix (4.1) modelling the transition-time dependent RECIST criteria with n=54
patients
Appendix D. Results 118
Matrix p∅−ur p∅−sd p∅−pd p∅−c pur−cr pur−pd pur−c psd−ur psd−sd psd−pd psd−c
H0 .05 .4 .4 .15 .7 .2 .1 .03 .5 .4 .07
Table D.17: Modified data input (4), hypothesing a cyto-toxic treatment with improved
immediate response but no durability, for matrix 4.1 modelling the transition-time de-
pendent RECIST criteria
Endstates pR pur psd ppd pc
H0 .0497 .0015 .0250 .7142 .2096
Table D.18: Endstate probabilities for modified data input (4), hypothesing a cyto-toxic
treatment with improved immediate response but no durability, for (4.1) modelling the
transition-time dependent RECIST criteria
Appendix D. Results 119
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.248 0.787 0.036
2 1000 2,3,4 0.000 1.000 0.987
2 1000 2,3 0.308 0.824 0.043
3 1000 2,4 0.235 0.716 0.049
4 1000 2 3,4 0.253 0.731 0.038
4 1000 2 2,3,4 0.279 0.794 0.040
4 1000 2,3 2,3,4 0.237 0.804 0.039
5 1000 2 3,4 0.117 0.921 0.166
5 1000 2 2,3,4 0.103 0.921 0.143
6 1000 2 5 0.242 0.782 0.034
6 1000 2 5,6 0.261 0.818 0.044
7 1000 2 5 0.098 0.923 0.145
7 1000 2 5,6 0.105 0.923 0.145
8 1000 0,1,1,1,0,0 0.000 1.000 0.993
8 1000 0,4,1,1,0,0 0.037 0.944 0.346
8 1000 0,8,1,1,0,0 0.085 0.932 0.185
9 1000 2,4 0.000 1.000 1.000
1 500 2 0.298 0.786 0.036
2 500 2,3,4 0.000 0.994 0.954
2 500 2,3 0.244 0.800 0.046
3 500 2,4 0.290 0.720 0.060
4 500 2 3,4 0.260 0.688 0.062
4 500 2 2,3,4 0.262 0.808 0.052
4 500 2,3 2,3,4 0.232 0.566 0.056
5 500 2 3,4 0.088 0.916 0.162
5 500 2 2,3,4 0.118 0.906 0.152
6 500 2 5 0.252 0.760 0.036
6 500 2 5,6 0.248 0.762 0.056
7 500 2 5 0.106 0.918 0.144
7 500 2 5,6 0.112 0.912 0.152
8 500 0,1,1,1,0,0 0.000 0.998 0.988
8 500 0,4,1,1,0,0 0.024 0.964 0.460
8 500 0,8,1,1,0,0 0.118 0.916 0.190
9 500 2,4 0.000 1.000 0.998
Table D.19: Outcomes for modified data input (4), hypothesing a cyto-toxic treatment
with improved immediate response but no durability, for (4.1) modelling the transition-
time dependent RECIST criteria with n=36 patients
Appendix D. Results 120
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.119 0.944 0.057
2 1000 2,3,4 0.000 1.000 0.995
2 1000 2,3 0.131 0.958 0.036
3 1000 2,4 0.134 0.891 0.098
4 1000 2 3,4 0.133 0.902 0.037
4 1000 2 2,3,4 0.116 0.943 0.109
4 1000 2,3 2,3,4 0.150 0.983 0.040
5 1000 2 3,4 0.129 0.982 0.131
5 1000 2 2,3,4 0.139 0.994 0.117
6 1000 2 5 0.131 0.976 0.046
6 1000 2 5,6 0.129 0.935 0.139
7 1000 2 5 0.124 0.995 0.101
7 1000 2 5,6 0.135 0.978 0.130
8 1000 0,1,1,1,0,0 0.000 1.000 0.999
8 1000 0,4,1,1,0,0 0.019 0.994 0.552
8 1000 0,8,1,1,0,0 0.057 0.990 0.241
9 1000 2,4 0.000 1.000 1.000
1 500 2 0.122 0.938 0.056
2 500 2,3,4 0.000 1.000 0.988
2 500 2,3 0.138 0.962 0.036
3 500 2,4 0.130 0.944 0.034
4 500 2 3,4 0.130 0.876 0.054
4 500 2 2,3,4 0.138 0.978 0.028
4 500 2,3 2,3,4 0.142 0.982 0.114
5 500 2 3,4 0.104 0.994 0.114
5 500 2 2,3,4 0.138 0.990 0.264
6 500 2 5 0.130 0.918 0.102
6 500 2 5,6 0.126 0.958 0.040
7 500 2 5 0.092 0.982 0.276
7 500 2 5,6 0.118 0.998 0.252
8 500 0,1,1,1,0,0 0.000 1.000 0.990
8 500 0,4,1,1,0,0 0.010 0.998 0.468
8 500 0,8,1,1,0,0 0.044 0.996 0.280
9 500 2,4 0.000 1.000 1.000
Table D.20: Outcomes for modified data input (4), hypothesing a cyto-toxic treatment
with improved immediate response but no durability, for matrix (4.1) modelling the
transition-time dependent RECIST criteria with n=54 patients
Appendix D. Results 121
Matrix p∅−ur p∅−sd p∅−pd p∅−c pur−cr pur−pd pur−c psd−ur psd−sd psd−pd psd−c
H0 .15 .6 .15 .1 .85 .05 .1 .05 .5 .3 .15
Table D.21: Modified data input (5), an extreme optimist, for matrix (4.1) modelling the
transition-time dependent RECIST criteria
Endstates pR pur psd ppd pc
H0 .1721 .0038 .0375 .4976 .2890
Table D.22: Endstate probabilities for modified data input (5), an extreme optimist, for
matrix (4.1) modelling the transition-time dependent RECIST criteria
Appendix D. Results 122
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.952 0.021 0.000
2 1000 2,3,4 0.224 0.335 0.084
2 1000 2,3 0.966 0.042 0.000
3 1000 2,4 0.961 0.009 0.000
4 1000 2 3,4 0.960 0.013 0.000
4 1000 2 2,3,4 0.967 0.024 0.000
4 1000 2,3 2,3,4 0.954 0.022 0.000
5 1000 2 3,4 0.878 0.062 0.000
5 1000 2 2,3,4 0.884 0.046 0.000
6 1000 2 5 0.954 0.008 0.000
6 1000 2 5,6 0.960 0.025 0.000
7 1000 2 5 0.891 0.007 0.000
7 1000 2 5,6 0.895 0.058 0.000
8 1000 0,1,1,1,0,0 0.221 0.347 0.074
8 1000 0,4,1,1,0,0 0.799 0.085 0.000
8 1000 0,8,1,1,0,0 0.891 0.054 0.000
9 1000 2,4 0.000 0.939 0.988
1 500 2 0.954 0.024 0.000
2 500 2,3,4 0.204 0.520 0.092
2 500 2,3 0.980 0.042 0.000
3 500 2,4 0.956 0.012 0.000
4 500 2 3,4 0.944 0.000 0.000
4 500 2 2,3,4 0.966 0.010 0.000
4 500 2,3 2,3,4 0.950 0.022 0.000
5 500 2 3,4 0.922 0.048 0.000
5 500 2 2,3,4 0.876 0.058 0.000
6 500 2 5 0.972 0.010 0.000
6 500 2 5,6 0.966 0.016 0.000
7 500 2 5 0.920 0.000 0.000
7 500 2 5,6 0.886 0.034 0.000
8 500 0,1,1,1,0,0 0.224 0.346 0.088
8 500 0,4,1,1,0,0 0.796 0.080 0.000
8 500 0,8,1,1,0,0 0.880 0.054 0.000
9 500 2,4 0.000 0.960 0.990
Table D.23: Outcomes for modified data input (5), an extreme optimist, for matrix (4.1)
modelling the transition-time dependent RECIST criteria with n=36 patients
Appendix D. Results 123
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.962 0.081 0.000
2 1000 2,3,4 0.163 0.569 0.134
2 1000 2,3 0.969 0.113 0.000
3 1000 2,4 0.967 0.048 0.000
4 1000 2 3,4 0.964 0.037 0.000
4 1000 2 2,3,4 0.964 0.085 0.000
4 1000 2,3 2,3,4 0.975 0.071 0.000
5 1000 2 3,4 0.966 0.153 0.000
5 1000 2 2,3,4 0.957 0.146 0.000
6 1000 2 5 0.968 0.045 0.000
6 1000 2 5,6 0.959 0.086 0.000
7 1000 2 5 0.967 0.080 0.000
7 1000 2 5,6 0.966 0.155 0.000
8 1000 0,1,1,1,0,0 0.175 0.593 0.136
8 1000 0,4,1,1,0,0 0.823 0.264 0.000
8 1000 0,8,1,1,0,0 0.919 0.157 0.000
9 1000 2,4 0.000 0.962 0.991
1 500 2 0.974 0.090 0.000
2 500 2,3,4 0.186 0.570 0.198
2 500 2,3 0.966 0.106 0.000
3 500 2,4 0.964 0.054 0.000
4 500 2 3,4 0.974 0.048 0.000
4 500 2 2,3,4 0.978 0.028 0.000
4 500 2,3 2,3,4 0.958 0.066 0.000
5 500 2 3,4 0.972 0.138 0.000
5 500 2 2,3,4 0.974 0.142 0.000
6 500 2 5 0.978 0.024 0.000
6 500 2 5,6 0.978 0.076 0.000
7 500 2 5 0.978 0.076 0.000
7 500 2 5,6 0.980 0.166 0.000
8 500 0,1,1,1,0,0 0.142 0.570 0.122
8 500 0,4,1,1,0,0 0.866 0.316 0.000
8 500 0,8,1,1,0,0 0.924 0.152 0.000
9 500 2,4 0.000 0.958 0.998
Table D.24: Outcomes for modified data input (5), an extreme optimist, for matrix (4.1)
modelling the transition-time dependent RECIST criteria with n=54 patients
Appendix D. Results 124
Matrix p∅−r p∅−sd p∅−pd p∅−c pur−cr pur−pd pur−c psd−pr psd−sd psd−pd psd−c
Data ∅ 1/36 20/36 10/36 5/36 0 0 0 0 0 0 0
Data eval 5 0 0 0 0 1 0 0 0 6/7 1/7 0
H0 .02 .4 .4 .18 .85 .05 .1 .05 .6 .25 .1
Interim eval 5 0 0 0 0 0 0 0 0 2/3 1/3 0
Table D.25: Data input (6), an additional transition, for matrix (4.1) modelling the
transition-time dependent RECIST criteria
Endstates pR pur psd ppd pc
H0 .0540 .0026 .0311 .6337 .2786
data .0833 .0000 .1667 .4722 .2778
Interim data .0667 .0000 .1333 .6000 .2000
HA .2604 .0101 .0706 .3900 .2689
Table D.26: Endstate probabilities (6), an additional transition, for matrix (4.1) mod-
elling the transition-time dependent RECIST criteria
Appendix D. Results 125
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.321 0.854 0.053
2 1000 2,3,4 0.002 0.964 0.623
2 1000 2,3 0.333 0.842 0.050
3 1000 2,4 0.321 0.275 0.006
4 1000 2 3,4 0.299 0.348 0.021
4 1000 2 2,3,4 0.309 0.683 0.043
4 1000 2,3 2,3,4 0.289 0.809 0.043
5 1000 2 3,4 0.126 0.813 0.047
5 1000 2 2,3,4 0.125 0.852 0.116
6 1000 2 5 0.318 0.457 0.001
6 1000 2 5,6 0.315 0.844 0.047
7 1000 2 5 0.143 0.662 0.015
7 1000 2 5,6 0.129 0.931 0.143
8 1000 0,1,1,1,0,0 0.002 0.960 0.651
8 1000 0,4,1,1,0,0 0.083 0.934 0.128
8 1000 0,8,1,1,0,0 0.116 0.927 0.112
9 1000 2,4 0.000 0.987 0.876
1 500 2 0.306 0.662 0.040
2 500 2,3,4 0.010 0.962 0.648
2 500 2,3 0.360 0.744 0.040
3 500 2,4 0.294 0.282 0.034
4 500 2 3,4 0.294 0.314 0.018
4 500 2 2,3,4 0.316 0.778 0.038
4 500 2,3 2,3,4 0.280 0.756 0.064
5 500 2 3,4 0.148 0.896 0.134
5 500 2 2,3,4 0.144 0.890 0.042
6 500 2 5 0.350 0.496 0.006
6 500 2 5,6 0.302 0.834 0.008
7 500 2 5 0.150 0.582 0.014
7 500 2 5,6 0.108 0.914 0.146
8 500 0,1,1,1,0,0 0.000 0.922 0.644
8 500 0,4,1,1,0,0 0.066 0.874 0.198
8 500 0,8,1,1,0,0 0.110 0.916 0.206
9 500 2,4 0.000 0.994 0.876
Table D.27: Outcomes (6), an additional transition, for matrix(4.1) modelling the
transition-time dependent RECIST criteria with n=36 patients
Appendix D. Results 126
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.163 0.965 0.049
2 1000 2,3,4 0.000 0.999 0.680
2 1000 2,3 0.188 0.982 0.036
3 1000 2,4 0.150 0.500 0.031
4 1000 2 3,4 0.183 0.585 0.032
4 1000 2 2,3,4 0.183 0.963 0.039
4 1000 2,3 2,3,4 0.169 0.966 0.048
5 1000 2 3,4 0.169 0.990 0.123
5 1000 2 2,3,4 0.166 0.986 0.135
6 1000 2 5 0.162 0.815 0.005
6 1000 2 5,6 0.165 0.961 0.032
7 1000 2 5 0.170 0.926 0.025
7 1000 2 5,6 0.149 0.990 0.105
8 1000 0,1,1,1,0,0 0.001 0.995 0.819
8 1000 0,4,1,1,0,0 0.037 0.981 0.269
8 1000 0,8,1,1,0,0 0.065 0.984 0.137
9 1000 2,4 0.000 0.999 0.955
1 500 2 0.620 0.964 0.030
2 500 2,3,4 0.000 0.988 0.826
2 500 2,3 0.172 0.968 0.048
3 500 2,4 0.170 0.490 0.014
4 500 2 3,4 0.172 0.374 0.038
4 500 2 2,3,4 0.160 0.976 0.030
4 500 2,3 2,3,4 0.142 0.964 0.042
5 500 2 3,4 0.178 0.984 0.110
5 500 2 2,3,4 0.122 0.976 0.130
6 500 2 5 0.176 0.892 0.010
6 500 2 5,6 0.172 0.972 0.052
7 500 2 5 0.158 0.924 0.018
7 500 2 5,6 0.172 0.980 0.124
8 500 0,1,1,1,0,0 0.000 1.000 0.820
8 500 0,4,1,1,0,0 0.054 0.984 0.252
8 500 0,8,1,1,0,0 0.084 0.988 0.168
9 500 2,4 0.000 1.000 0.970
Table D.28: Outcomes (6), an additional transition, for matrix (4.1) modelling the
transition-time dependent RECIST criteria with n=54 patients
Endstates pR pur psd ppd pc
H0 .0442 .0072 .0864 .5986 .2636
data .0556 .0278 .2222 .4444 .2500
Interim data .0667 .0000 .2000 .5333 .2000
HA .2307 .0206 .1441 .3515 .2531
Table D.29: Endstate Probabilities for matrix (4.1) modelling the transition-time depen-
dent RECIST criteria with only 3 transitions
Appendix D. Results 127
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.493 0.748 0.049
2 1000 2,3,4 0.006 0.903 0.502
2 1000 2,3 0.280 0.831 0.036
3 1000 2,4 0.478 0.238 0.021
4 1000 2 3,4 0.471 0.246 0.005
4 1000 2 2,3,4 0.503 0.733 0.042
4 1000 2,3 2,3,4 0.474 0.714 0.039
5 1000 2 3,4 0.245 0.887 0.158
5 1000 2 2,3,4 0.230 0.888 0.159
6 1000 2 5 0.495 0.521 0.010
6 1000 2 5,6 0.465 0.702 0.040
7 1000 2 5 0.265 0.663 0.049
7 1000 2 5,6 0.234 0.881 0.154
8 1000 0,1,1,1,0,0 0.006 0.961 0.509
8 1000 0,4,1,1,0,0 0.097 0.914 0.216
8 1000 0,8,1,1,0,0 0.210 0.884 0.165
9 1000 2,4 0.000 0.984 0.872
1 500 2 0.470 0.750 0.064
2 500 2,3,4 0.002 0.908 0.480
2 500 2,3 0.308 0.836 0.056
3 500 2,4 0.478 0.206 0.030
4 500 2 3,4 0.484 0.292 0.010
4 500 2 2,3,4 0.464 0.738 0.026
4 500 2,3 2,3,4 0.494 0.732 0.058
5 500 2 3,4 0.208 0.890 0.156
5 500 2 2,3,4 0.190 0.882 0.162
6 500 2 5 0.514 0.536 0.012
6 500 2 5,6 0.450 0.732 0.050
7 500 2 5 0.232 0.694 0.048
7 500 2 5,6 0.176 0.860 0.144
8 500 0,1,1,1,0,0 0.004 0.912 0.682
8 500 0,4,1,1,0,0 0.110 0.940 0.332
8 500 0,8,1,1,0,0 0.202 0.908 0.168
9 500 2,4 0.000 0.986 0.856
Table D.30: Outcomes for matrix (4.1) modelling the transition-time dependent RECIST
criteria with 3 transitions and n=36 patients
Appendix D. Results 128
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.423 0.965 0.117
2 1000 2,3,4 0.000 0.995 0.736
2 1000 2,3 0.139 0.946 0.038
3 1000 2,4 0.421 0.476 0.059
4 1000 2 3,4 0.437 0.397 0.058
4 1000 2 2,3,4 0.449 0.968 0.096
4 1000 2,3 2,3,4 0.422 0.959 0.109
5 1000 2 3,4 0.223 0.986 0.257
5 1000 2 2,3,4 0.216 0.991 0.269
6 1000 2 5 0.432 0.864 0.019
6 1000 2 5,6 0.411 0.960 0.093
7 1000 2 5 0.218 0.959 0.075
7 1000 2 5,6 0.233 0.981 0.232
8 1000 0,1,1,1,0,0 0.002 0.991 0.734
8 1000 0,4,1,1,0,0 0.059 0.992 0.450
8 1000 0,8,1,1,0,0 0.146 0.992 0.254
9 1000 2,4 0.001 0.999 0.934
1 500 2 0.456 0.968 0.094
2 500 2,3,4 0.002 0.990 0.736
2 500 2,3 0.124 0.948 0.122
3 500 2,4 0.392 0.458 0.068
4 500 2 3,4 0.446 0.448 0.064
4 500 2 2,3,4 0.412 0.960 0.108
4 500 2,3 2,3,4 0.420 0.958 0.094
5 500 2 3,4 0.240 0.988 0.278
5 500 2 2,3,4 0.218 0.994 0.250
6 500 2 5 0.446 0.892 0.014
6 500 2 5,6 0.400 0.956 0.102
7 500 2 5 0.240 0.930 0.068
7 500 2 5,6 0.192 0.976 0.230
8 500 0,1,1,1,0,0 0.000 0.990 0.734
8 500 0,4,1,1,0,0 0.044 0.998 0.358
8 500 0,8,1,1,0,0 0.112 0.992 0.296
9 500 2,4 0.000 0.998 0.928
Table D.31: Outcomes for matrix (4.1) modelling the transition-time dependent RECIST
criteria with 3 transitions and n=54 patients
Appendix
D.
Results
129
Matrix pr−r pr−pd pr−c p∅−r p∅−sd p∅−pd p∅−c pur−r pur−pd pur−c psd−r psd−sd psd−pd psd−c
Data ∅ 0 0 0 1/36 20/36 10/36 5/36 0 0 0 0 0 0 0
Data eval 2 0 0 0 0 0 0 0 1/1 0 0 0 16/20 3/20 1/20
Data eval 3 0 0 1/1 0 0 0 0 0 0 0 2/16 12/16 2/16 0
Data eval 4 0 0 0 0 0 0 0 1/2 0 1/2 1/12 8/12 1/12 2/12
Data eval 5 1/1 0 0 0 0 0 0 1/1 0 0 0 7/8 0 1/8
H0 .8 .15 .05 .02 .4 .4 .18 .85 .05 .1 .05 .6 .25 .1
Interim ∅ 0 0 0 0 8/15 6/15 1/15 0 0 0 0 0 0 0
Interim eval 2 0 0 0 0 0 0 0 0 0 0 0 7/8 1/8 0
Interim eval 3 0 0 0 0 0 0 0 0 0 0 1/7 5/7 1/7 0
Interim eval 4 0 0 0 0 0 0 0 1 0 0 0 3/5 0 2/5
Interim eval 5 0 0 0 0 0 0 0 0 0 0 0 3/3 0 0
HA .85 .1 .05 .2 .42 .2 .18 .85 .05 .1 .1 .7 .15 .05
Table D.32: Data input for matrix (4.2) modelling the transition-time dependent RECIST criteria with response not an absorbing
state
Appendix D. Results 130
Endstates pR pur psd ppd pc
Data 2/36 0 7/36 16/36 11/36
H0 .0338 .0043 .0518 .6329 .2771
Interim 1/15 0 3/15 8/15 3/15
HA .1689 .0144 .1008 .4270 .2888
Table D.33: Endstate probabilities for matrix (4.2) modelling the transition-time depen-
dent RECIST criteria with response not an absorbing state
Appendix D. Results 131
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.133 0.705 0.175
2 1000 2,3,4 0.002 0.952 0.945
2 1000 2,3 0.159 0.772 0.159
3 1000 2,4 0.139 0.316 0.146
4 1000 2 3,4 0.134 0.291 0.124
4 1000 2 2,3,4 0.150 0.697 0.172
4 1000 2,3 2,3,4 0.109 0.695 0.166
5 1000 2 3,4 0.121 0.858 0.414
5 1000 2 2,3,4 0.107 0.810 0.415
6 1000 2 5 0.132 0.435 0.048
6 1000 2 5,6 0.146 0.697 0.159
7 1000 2 5 0.140 0.596 0.178
7 1000 2 5,6 0.135 0.857 0.420
8 1000 0,1,1,1,0,0 0.001 0.968 0.953
8 1000 0,4,1,1,0,0 0.040 0.888 0.552
8 1000 0,8,1,1,0,0 0.114 0.912 0.438
9 1000 2,4 0.000 0.986 0.991
1 500 2 0.110 0.730 0.146
2 500 2,3,4 0.004 0.890 0.940
2 500 2,3 0.130 0.536 0.190
3 500 2,4 0.134 0.414 0.128
4 500 2 3,4 0.126 0.402 0.098
4 500 2 2,3,4 0.104 0.732 0.168
4 500 2,3 2,3,4 0.088 0.702 0.176
5 500 2 3,4 0.126 0.906 0.398
5 500 2 2,3,4 0.128 0.874 0.374
6 500 2 5 0.096 0.452 0.020
6 500 2 5,6 0.128 0.696 0.186
7 500 2 5 0.118 0.666 0.152
7 500 2 5,6 0.146 0.914 0.428
8 500 0,1,1,1,0,0 0.004 0.954 0.962
8 500 0,4,1,1,0,0 0.046 0.896 0.468
8 500 0,8,1,1,0,0 0.116 0.782 0.438
9 500 2,4 0.000 0.990 0.984
Table D.34: Outcomes for matrix (4.2) modelling the transition-time dependent RECIST
criteria with response not an absorbing state and n=36 patients
Appendix D. Results 132
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.109 0.907 0.266
2 1000 2,3,4 0.000 0.980 0.969
2 1000 2,3 0.143 0.959 0.119
3 1000 2,4 0.101 0.487 0.229
4 1000 2 3,4 0.100 0.596 0.236
4 1000 2 2,3,4 0.108 0.910 0.254
4 1000 2,3 2,3,4 0.093 0.914 0.249
5 1000 2 3,4 0.110 0.974 0.474
5 1000 2 2,3,4 0.116 0.977 0.506
6 1000 2 5 0.121 0.716 0.081
6 1000 2 5,6 0.128 0.911 0.264
7 1000 2 5 0.120 0.863 0.235
7 1000 2 5,6 0.111 0.969 0.504
8 1000 0,1,1,1,0,0 0.000 0.982 0.984
8 1000 0,4,1,1,0,0 0.029 0.972 0.766
8 1000 0,8,1,1,0,0 0.073 0.975 0.532
9 1000 2,4 0.000 0.994 0.999
1 500 2 0.124 0.930 0.256
2 500 2,3,4 0.000 0.988 0.996
2 500 2,3 0.124 0.934 0.104
3 500 2,4 0.118 0.482 0.246
4 500 2 3,4 0.096 0.610 0.238
4 500 2 2,3,4 0.104 0.902 0.272
4 500 2,3 2,3,4 0.102 0.896 0.280
5 500 2 3,4 0.098 0.956 0.544
5 500 2 2,3,4 0.124 0.972 0.478
6 500 2 5 0.106 0.778 0.122
6 500 2 5,6 0.114 0.914 0.264
7 500 2 5 0.124 0.876 0.230
7 500 2 5,6 0.106 0.952 0.500
8 500 0,1,1,1,0,0 0.002 0.980 0.974
8 500 0,4,1,1,0,0 0.044 0.978 0.748
8 500 0,8,1,1,0,0 0.084 0.966 0.508
9 500 2,4 0.000 0.996 0.994
Table D.35: Outcomes for matrix (4.2) modelling the transition-time dependent RECIST
criteria with response not an absorbing state and n=54 patients
Appendix
D.
Results
133
Matrix p∅−r p∅−sd p∅−pd p∅−o pr−r pr−sd pr−pd pr−o psd−r psd−sd psd−pd psd−o ppd−r ppd−sd ppd−pd ppd−o
Data ∅ 11/36 9/36 12/36 4/36 0 0 0 0 0 0 0 0 0 0 0 0
Data eval 2 0 0 0 0 4/11 3/11 3/11 1/11 0/9 3/9 4/9 2/9 0/12 2/12 0/12 10/12
Data eval 3 0 0 0 0 2/4 1/4 0/4 1/4 2/8 3/8 2/8 1/8 3/7 3/7 0/7 1/7
Data eval 4 0 0 0 0 1/7 3/7 1/7 2/7 2/7 2/7 1/7 2/7 0/2 1/2 0/2 1/2
Data eval 5 0 0 0 0 1/3 2/3 0/3 0/3 3/6 1/6 1/6 1/6 0/2 2/2 0/2 0/2
H0 .2 .35 .35 .1 .5 .3 .1 .1 .1 .4 .3 .2 .05 .2 .1 .65
Interim ∅ 4/15 3/15 7/15 1/15 0 0 0 0 0 0 0 0 0 0 0 0
Interim eval 2 0 0 0 0 2/4 1/4 1/4 0/4 0/3 2/3 1/3 0/3 0/7 1/7 0/7 6/7
Interim eval 3 0 0 0 0 1/2 1/2 0/2 0/2 1/4 2/4 1/4 0/4 0/2 1/2 0/2 1/2
Interim eval 4 0 0 0 0 0/2 1/2 0/2 1/2 2/4 1/4 0/4 1/4 0/1 0/1 0/1 1/1
Interim eval 5 0 0 0 0 0/2 2/2 0/2 0/2 1/2 1/2 0/2 0/2 0 0 0 0
HA .4 .4 .1 .1 .7 .2 .05 .05 .1 .6 .2 .1 .1 .2 .3 .4
Table D.36: Data input for matrix (4.3) modelling the change in response (10%) at each transition
Appendix D. Results 134
Endstates pR psd ppd poff
Data 4/36 5/36 1/36 26/36
H0 .0568 .0916 .0548 .7967
Interim 1/15 3/15 0/15 11/15
HA .2024 .2241 .0965 .4771
Table D.37: Endstate probabilities for matrix (4.3) modelling the change in response
(10%) at each transition
Appendix D. Results 135
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.121 0.632 0.016
2 1000 2,3,4 0.196 0.860 0.066
2 1000 2,3 0.072 0.955 0.500
3 1000 2,4 0.900 0.014 0.000
4 1000 2 3,4 0.549 0.161 0.000
4 1000 2 2,3,4 0.253 0.591 0.010
4 1000 2,3 2,3,4 0.265 0.411 0.005
5 1000 2 3,4 0.149 0.628 0.070
5 1000 2 2,3,4 0.149 0.671 0.044
6 1000 2 5 0.258 0.590 0.004
7 1000 2 5 0.190 0.781 0.031
8 1000 0,1,1,1,0 0.185 0.865 0.183
8 1000 0,4,1,1,0 0.084 0.822 0.064
8 1000 0,8,1,1,0 0.092 0.679 0.049
9 1000 2,4 0.130 0.743 0.017
1 500 2 0.106 0.442 0.054
2 500 2,3,4 0.156 0.864 0.150
2 500 2,3 0.092 0.946 0.504
3 500 2,4 0.896 0.004 0.000
4 500 2 3,4 0.518 0.182 0.000
4 500 2 2,3,4 0.256 0.504 0.010
4 500 2,3 2,3,4 0.232 0.434 0.004
5 500 2 3,4 0.146 0.790 0.048
5 500 2 2,3,4 0.158 0.648 0.060
6 500 2 5 0.262 0.386 0.004
7 500 2 5 0.206 0.780 0.042
8 500 0,1,1,1,0 0.168 0.866 0.174
8 500 0,4,1,1,0 0.084 0.762 0.100
8 500 0,8,1,1,0 0.062 0.800 0.048
9 500 2,4 0.100 0.724 0.046
Table D.38: Outcomes for matrix (4.3) modelling the change in response (10%) at each
transition and n=36 patients
Appendix D. Results 136
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.071 0.839 0.034
2 1000 2,3,4 0.107 0.994 0.216
2 1000 2,3 0.027 0.995 0.620
3 1000 2,4 0.815 0.037 0.000
4 1000 2 3,4 0.460 0.670 0.001
4 1000 2 2,3,4 0.178 0.826 0.025
4 1000 2,3 2,3,4 0.154 0.800 0.025
5 1000 2 3,4 0.088 0.910 0.076
5 1000 2 2,3,4 0.093 0.922 0.113
6 1000 2 5 0.165 0.831 0.022
7 1000 2 5 0.107 0.922 0.048
8 1000 0,1,1,1,0 0.122 0.991 0.223
8 1000 0,4,1,1,0 0.043 0.983 0.135
8 1000 0,8,1,1,0 0.046 0.927 0.065
9 1000 2,4 0.061 0.922 0.044
1 500 2 0.086 0.834 0.048
2 500 2,3,4 0.158 0.998 0.222
2 500 2,3 0.022 0.998 0.740
3 500 2,4 0.818 0.052 0.000
4 500 2 3,4 0.460 0.634 0.004
4 500 2 2,3,4 0.176 0.830 0.020
4 500 2,3 2,3,4 0.168 0.790 0.022
5 500 2 3,4 0.078 0.910 0.038
5 500 2 2,3,4 0.082 0.934 0.092
6 500 2 5 0.146 0.842 0.024
7 500 2 5 0.120 0.940 0.048
8 500 0,1,1,1,0 0.126 0.994 0.214
8 500 0,4,1,1,0 0.048 0.980 0.114
8 500 0,8,1,1,0 0.048 0.950 0.090
9 500 2,4 0.070 0.934 0.062
Table D.39: Outcomes for matrix (4.3) modelling the change in response (10%) at each
transition and n=54 patients
Appendix
D.
Results
137
Matrix p∅−r p∅−sd p∅−pd p∅−o pr−r pr−sd pr−pd pr−o psd−r psd−sd psd−pd psd−o ppd−r ppd−sd ppd−pd ppd−o
Data ∅ 9/36 14/36 9/36 4/36 0 0 0 0 0 0 0 0 0 0 0 0
Data eval 2 0 0 0 0 1/9 5/9 2/9 1/9 0/14 10/14 1/14 3/14 0/9 0/9 0/9 9/9
Data eval 3 0 0 0 0 0/1 0/1 0/1 1/1 3/15 10/15 1/15 1/15 1/3 1/3 0/3 1/3
Data eval 4 0 0 0 0 0/4 2/4 0/4 2/4 0/11 8/11 1/11 2/11 0/1 0/1 0/1 1/1
Data eval 5 0 0 0 0 0 0 0 0 1/10 8/10 0/10 1/10 0/1 1/1 0/1 0/1
H0 .25 .6 .05 .1 .25 .6 .05 .1 .05 .6 .15 .2 .05 .15 .05 .75
Interim ∅ 2/15 6/15 6/15 1/15 0 0 0 0 0 0 0 0 0 0 0 0
Interim eval 2 0 0 0 0 0/2 2/2 0/2 0/2 0/6 5/6 1/6 0/6 0/6 0/6 0/6 6/6
Interim eval 3 0 0 0 0 0 0 0 0 1/7 5/7 1/7 0/7 0/1 0/1 0/1 1/1
Interim eval 4 0 0 0 0 0/1 0/1 0/1 1/1 0/5 4/5 0/5 1/5 0/1 0/1 0/1 1/1
Interim eval 5 0 0 0 0 0 0 0 0 0/4 4/4 0/4 0/4 0 0 0 0
HA .25 .6 .05 .1 .5 .4 .05 .05 .05 .7 .15 .1 .05 .4 .15 .4
Table D.40: Data input for matrix (4.3) modelling the change in response (5%) at each transition
Appendix D. Results 138
Endstates pR psd ppd poff
Data 1/36 9/36 0/36 26/36
H0 .02168 .1627 .0383 .7773
Interim 0/15 4/15 0/15 11/15
HA .06859 .3700 .0823 .4791
Table D.41: Endstate probabilities for matrix (4.3) modelling the change in response
(5%) at each transition
Appendix D. Results 139
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.170 0.183 0.000
2 1000 2,3,4 0.255 0.861 0.086
2 1000 2,3 0.121 0.773 0.166
3 1000 2,4 1.000 0.002 0.000
4 1000 2 3,4 0.421 0.028 0.000
4 1000 2 2,3,4 0.369 0.123 0.000
4 1000 2,3 2,3,4 0.353 0.109 0.000
5 1000 2 3,4 0.186 0.366 0.000
5 1000 2 2,3,4 0.175 0.419 0.000
6 1000 2 5 0.391 0.115 0.000
7 1000 2 5 0.310 0.283 0.000
8 1000 0,1,1,1,0 0.269 0.863 0.081
8 1000 0,4,1,1,0 0.288 0.526 0.000
8 1000 0,8,1,1,0 0.274 0.343 0.000
9 1000 2,4 0.171 0.278 0.000
9 1000 2,3 0.025 0.917 0.312
1 500 2 0.214 0.182 0.000
2 500 2,3,4 0.264 0.838 0.036
2 500 2,3 0.132 0.762 0.312
3 500 2,4 1.000 0.002 0.000
4 500 2 3,4 0.450 0.032 0.000
4 500 2 2,3,4 0.376 0.180 0.000
4 500 2,3 2,3,4 0.382 0.164 0.000
5 500 2 3,4 0.194 0.410 0.000
5 500 2 2,3,4 0.186 0.418 0.000
6 500 2 5 0.404 0.046 0.000
7 500 2 5 0.300 0.418 0.000
8 500 0,1,1,1,0 0.256 0.854 0.090
8 500 0,4,1,1,0 0.282 0.462 0.000
8 500 0,8,1,1,0 0.290 0.378 0.000
9 500 2,4 0.194 0.262 0.000
9 500 2,3 0.014 0.906 0.338
Table D.42: Outcomes for matrix (4.3) modelling the change in response (5%) at each
transition and n=36 patients
Appendix D. Results 140
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.322 0.290 0.000
2 1000 2,3,4 0.200 0.992 0.130
2 1000 2,3 0.049 0.961 0.326
3 1000 2,4 1.000 0.044 0.000
4 1000 2 3,4 0.462 0.226 0.000
4 1000 2 2,3,4 0.430 0.275 0.000
4 1000 2,3 2,3,4 0.447 0.255 0.000
5 1000 2 3,4 0.320 0.512 0.000
5 1000 2 2,3,4 0.299 0.524 0.000
6 1000 2 5 0.452 0.254 0.000
7 1000 2 5 0.391 0.493 0.000
8 1000 0,1,1,1,0 0.186 0.982 0.063
8 1000 0,4,1,1,0 0.203 0.914 0.000
8 1000 0,8,1,1,0 0.259 0.646 0.000
9 1000 2,4 0.323 0.566 0.000
9 1000 2,3 0.009 0.989 0.468
1 500 2 0.280 0.288 0.000
2 500 2,3,4 0.188 0.986 0.122
2 500 2,3 0.060 0.982 0.470
3 500 2,4 1.000 0.012 0.000
4 500 2 3,4 0.444 0.208 0.000
4 500 2 2,3,4 0.406 0.270 0.000
4 500 2,3 2,3,4 0.428 0.246 0.000
5 500 2 3,4 0.338 0.514 0.000
5 500 2 2,3,4 0.326 0.506 0.000
6 500 2 5 0.462 0.296 0.000
7 500 2 5 0.410 0.478 0.000
8 500 0,1,1,1,0 0.220 0.984 0.224
8 500 0,4,1,1,0 0.202 0.898 0.000
8 500 0,8,1,1,0 0.236 0.690 0.000
9 500 2,4 0.374 0.568 0.000
9 500 2,3 0.004 0.988 0.598
Table D.43: Outcomes for matrix (4.3) modelling the change in response (5%) at each
transition and n=54 patients
Matrix p∅−r p∅−pd p∅−off pr−r pr−pd pr−off ppd−r ppd−pd ppd−off
Data ∅ 16/36 16/36 4/36 0 0 0 0 0 0
Data eval 2 0 0 0 11/16 2/16 3/16 0/16 6/16 10/16
Data eval 3 0 0 0 9/11 0/11 2/11 3/8 4/8 1/8
Data eval 4 0 0 0 8/12 1/12 3/12 0/4 2/4 2/4
Data eval 5 0 0 0 8/8 0/8 0/8 0/3 2/3 1/3
H0 .4 .4 .2 .6 .2 .2 .2 .4 .4
Interim Data ∅ 6/15 8/15 1/15 0 0 0 0 0 0
Interim Data eval 2 0 0 0 5/6 1/6 0/6 0/8 2/8 6/8
Interim Data eval 3 0 0 0 5/5 0/5 0/5 1/3 2/3 0/3
Interim Data eval 4 0 0 0 4/6 0/6 2/6 0/1 0/1 1/1
Interim Data eval 5 0 0 0 4/4 0/4 0/4 0 0 0
HA .6 .2 .2 .65 .2 .15 .65 .2 .15
Table D.44: Data input for matrix (4.4) modelling the change in response, with no stable
disease, at each transition
Appendix D. Results 141
Endstates pR ppd poff
Data 8/36 2/36 26/36
H0 .1280 .0800 .7920
Interim 4/15 0/15 11/15
HA .2090 .1739 .6172
Table D.45: Endstate probabilities for matrix (4.4) modelling the change in response,
with no stable disease, at each transition
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.073 0.470 0.714
2 1000 2,3 0.173 0.415 0.078
3 1000 2,3 0.813 0.011 0.000
4 1000 2 3 0.835 0.005 0.000
5 1000 2 3 0.071 0.567 0.693
6 1000 2 4 0.221 0.293 0.088
7 1000 2 4 0.116 0.492 0.190
8 1000 0,1,1,0 0.196 0.422 0.169
8 1000 0,4,1,0 0.074 0.583 0.537
8 1000 0,8,1,0 0.070 0.548 0.702
9 1000 2,3 0.054 0.773 0.710
1 500 2 0.096 0.482 0.716
2 500 2,3 0.242 0.618 0.162
3 500 2,3 0.806 0.006 0.000
4 500 2 3 0.792 0.008 0.000
5 500 2 3 0.058 0.482 0.702
6 500 2 4 0.210 0.344 0.182
7 500 2 4 0.114 0.512 0.174
8 500 0,1,1,0 0.204 0.624 0.070
8 500 0,4,1,0 0.070 0.580 0.464
8 500 0,8,1,0 0.056 0.442 0.702
9 500 2,3 0.052 0.802 0.696
Table D.46: Outcomes for matrix (4.4) modelling the change in response, with no stable
disease, at each transition and n=36 patients
Appendix D. Results 142
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.031 0.614 0.864
2 1000 2,3 0.138 0.811 0.238
3 1000 2,3 0.802 0.074 0.000
4 1000 2 3 0.824 0.088 0.000
5 1000 2 3 0.026 0.697 0.868
6 1000 2 4 0.145 0.466 0.219
7 1000 2 4 0.084 0.671 0.229
8 1000 0,1,1,0 0.136 0.673 0.196
8 1000 0,4,1,0 0.039 0.757 0.754
8 1000 0,8,1,0 0.030 0.714 0.849
9 1000 2,3 0.019 0.918 0.851
1 500 2 0.026 0.574 0.840
2 500 2,3 0.134 0.774 0.136
3 500 2,3 0.840 0.088 0.000
4 500 2 3 0.828 0.090 0.000
5 500 2 3 0.026 0.734 0.844
6 500 2 4 0.178 0.396 0.220
7 500 2 4 0.072 0.666 0.244
8 500 0,1,1,0 0.118 0.808 0.108
8 500 0,4,1,0 0.036 0.772 0.848
8 500 0,8,1,0 0.034 0.694 0.852
9 500 2,3 0.032 0.940 0.838
Table D.47: Outcomes for matrix (4.4) modelling the change in response, with no stable
disease, at each transition and n=54 patients
Appendix
D.
Results
143
Matrix p∅−sd1p∅−pd psd1−r psd1−sd2
psd1−pd psd2−r psd2−sd3psd2−pd psd3−r psd3−sd3
psd3−pd
Data ∅ 21/36 15/36 0 0 0 0 0 0 0 0 0
Data eval 2 0 0 1/21 16/21 4/21 0 0 0 0 0 0
Data eval 3 0 0 0 0 0 0/16 14/16 2/16 0 0 0
Data eval 4 0 0 0 0 0 0 0 0 1/14 9/14 4/14
H0 .4 .6 .05 .7 .25 .05 .7 .25 .05 .7 .25
Interim Data ∅ 8/15 7/15 0 0 0 0 0 0 0 0 0
Interim Data eval 2 0 0 0/8 7/8 1/8 0 0 0 0 0 0
Interim Data eval 3 0 0 0 0 0 0/7 6/7 1/7 0 0 0
Interim Data eval 4 0 0 0 0 0 0 0 0 1/6 3/6 2/6
HA .6 .4 .15 .65 .2 .15 .65 .2 .15 .65 .2
Table D.48: Data input for matrix (4.5) modelling response+3 consecutive stable disease observations as a good outcome
Appendix D. Results 144
Endstates pR psd3ppd
Data 2/36 9/36 25/36
H0 .0438 .1372 .8190
Interim 1/15 3/15 11/15
HA .1865 .1648 .6487
Table D.49: Endstate probabilities for matrix (4.5) modelling response+3 consecutive
stable disease observations as a good outcome
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.217 0.585 0.045
2 1000 2,3 0.040 0.454 0.312
3 1000 2,3 0.262 0.036 0.000
4 1000 2 3 0.235 0.039 0.000
5 1000 2 3 0.198 0.754 0.154
6 1000 2 6 0.220 0.426 0.024
7 1000 2 6 0.222 0.633 0.121
8 1000 0,1,1,0,0,0 0.042 0.502 0.325
8 1000 0,4,1,0,0,0 0.152 0.779 0.211
8 1000 0,8,1,0,0,0 0.217 0.741 0.155
9 1000 2,3 0.004 0.883 0.511
1 500 2 0.224 0.570 0.050
2 500 2,3 0.024 0.476 0.166
3 500 2,3 0.224 0.022 0.010
4 500 2 3 0.242 0.042 0.002
5 500 2 3 0.184 0.706 0.178
6 500 2 6 0.242 0.410 0.022
7 500 2 6 0.212 0.722 0.108
8 500 0,1,1,0,0,0 0.048 0.464 0.322
8 500 0,4,1,0,0,0 0.144 0.806 0.148
8 500 0,8,1,0,0,0 0.226 0.716 0.140
9 500 2,3 0.002 0.902 0.510
Table D.50: Outcomes for matrix (4.5) modelling response+3 consecutive stable disease
observations as a good outcome and n=36 patients
Appendix D. Results 145
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.199 0.883 0.109
2 1000 2,3 0.011 0.781 0.335
3 1000 2,3 0.213 0.079 0.014
4 1000 2 3 0.230 0.111 0.011
5 1000 2 3 0.208 0.926 0.217
6 1000 2 6 0.222 0.723 0.068
7 1000 2 6 0.211 0.841 0.161
8 1000 0,1,1,0,0,0 0.009 0.859 0.353
8 1000 0,4,1,0,0,0 0.091 0.962 0.275
8 1000 0,8,1,0,0,0 0.159 0.944 0.226
9 1000 2,3 0.001 0.971 0.597
1 500 2 0.180 0.878 0.128
2 500 2,3 0.012 0.784 0.352
3 500 2,3 0.218 0.094 0.012
4 500 2 3 0.198 0.142 0.008
5 500 2 3 0.216 0.930 0.178
6 500 2 6 0.194 0.710 0.064
7 500 2 6 0.244 0.866 0.206
8 500 0,1,1,0,0,0 0.010 0.852 0.460
8 500 0,4,1,0,0,0 0.088 0.938 0.294
8 500 0,8,1,0,0,0 0.180 0.920 0.238
9 500 2,3 0.002 0.970 0.636
Table D.51: Outcomes for matrix (4.5) modelling response+3 consecutive stable disease
observations as a good outcome and n=54 patients
Appendix
D.
Results
146
Matrix p∅−sd1p∅−pd psd1−r psd1−sd2
psd1−pd psd2−r psd2−sd3psd2−pd psd3−r psd3−sd4
psd3−pd psd4−r psd4−sd4psd4−pd
Data ∅ 21/36 15/36 0 0 0 0 0 0 0 0 0 0 0 0
Data eval 2 0 0 1/21 16/21 4/21 0 0 0 0 0 0 0 0 0
Data eval 3 0 0 0 0 0 0/16 14/16 2/16 0 0 0 0 0 0
Data eval 4 0 0 0 0 0 0 0 0 1/14 9/14 4/14 0 0 0
Data eval 5 0 0 0 0 0 0 0 0 0 0 0 1/9 8/9 0/9
H0 .4 .6 .05 .7 .25 .05 .7 .25 .05 .7 .25 .05 .7 .25
Interim Data ∅ 8/15 7/15 0 0 0 0 0 0 0 0 0 0 0 0
Interim Data eval 2 0 0 0/8 7/8 1/8 0 0 0 0 0 0 0 0 0
Interim Data eval 3 0 0 0 0 0 0/7 6/7 1/7 0 0 0 0 0 0
Interim Data eval 4 0 0 0 0 0 0 0 0 1/6 3/6 2/6 0 0 0
Interim Data eval 5 0 0 0 0 0 0 0 0 0 0 0 0/3 3/3 0/3
HA .6 .4 .15 .65 .2 .15 .65 .2 .15 .65 .2 .15 .65 .2
Table D.52: Data input for matrix (4.6) modelling response+4 consecutive stable disease observations as a good outcome
Appendix D. Results 147
Endstates pR psd4ppd
Data 3/36 8/36 25/36
H0 .0507 .0960 .8533
Interim 1/15 3/15 11/15
HA .2112 .1071 .6817
Table D.53: Endstate probabilities for matrix (4.6) modelling response+4 consecutive
stable disease observations as a good outcome
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.113 0.655 0.051
2 1000 2,3 0.012 0.697 0.496
3 1000 2,3 0.095 0.022 0.016
4 1000 2 3 0.137 0.035 0.008
5 1000 2 3 0.105 0.764 0.133
6 1000 2 7 0.116 0.576 0.051
7 1000 2 7 0.120 0.745 0.137
8 1000 0,1,1,0,0,0,0 0.013 0.686 0.521
8 1000 0,4,1,0,0,0,0 0.065 0.760 0.209
8 1000 0,8,1,0,0,0,0 0.112 0.784 0.144
9 1000 2,3 0.000 0.894 0.688
1 500 2 0.094 0.686 0.042
2 500 2,3 0.010 0.858 0.508
3 500 2,3 0.144 0.028 0.014
4 500 2 3 0.138 0.018 0.012
5 500 2 3 0.104 0.844 0.174
6 500 2 7 0.142 0.584 0.036
7 500 2 7 0.076 0.736 0.156
8 500 0,1,1,0,0,0,0 0.010 0.674 0.500
8 500 0,4,1,0,0,0,0 0.074 0.808 0.166
8 500 0,8,1,0,0,0,0 0.106 0.822 0.144
9 500 2,3 0.002 0.896 0.732
Table D.54: Outcomes for matrix (4.6) modelling response+4 consecutive stable disease
observations as a good outcome and n=36 patients
Appendix D. Results 148
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.126 0.865 0.111
2 1000 2,3 0.003 0.919 0.778
3 1000 2,3 0.130 0.086 0.026
4 1000 2 3 0.141 0.094 0.026
5 1000 2 3 0.155 0.938 0.117
6 1000 2 7 0.142 0.816 0.042
7 1000 2 7 0.151 0.905 0.109
8 1000 0,1,1,0,0,0,0 0.002 0.908 0.595
8 1000 0,4,1,0,0,0,0 0.036 0.940 0.298
8 1000 0,8,1,0,0,0,0 0.065 0.934 0.126
9 1000 2,3 0.000 0.974 0.900
1 500 2 0.152 0.868 0.118
2 500 2,3 0.002 0.840 0.766
3 500 2,3 0.166 0.076 0.034
4 500 2 3 0.124 0.096 0.038
5 500 2 3 0.122 0.968 0.112
6 500 2 7 0.132 0.816 0.090
7 500 2 7 0.152 0.914 0.258
8 500 0,1,1,0,0,0,0 0.004 0.930 0.714
8 500 0,4,1,0,0,0,0 0.034 0.948 0.272
8 500 0,8,1,0,0,0,0 0.072 0.938 0.152
9 500 2,3 0.000 0.986 0.866
Table D.55: Outcomes for matrix (4.6) modelling response+4 consecutive stable disease
observations as a good outcome and n=54 patients
Appendix
D.
Results
149
Matrix p∅−mr1p∅−sd p∅−pd pmr2−r pmr2−mr2
pmr1−r pmr1−mr2pmr1−sd pmr1−pd psd−mr1
psd−sd psd−pd
Data ∅ 11/36 12/36 13/36 0 0 0 0 0 0 0 0 0
Data eval 2 0 0 0 0 0 1/11 3/11 5/11 2/11 1/12 8/12 3/12
Data eval 3 0 0 0 0/3 3/3 0/1 0/1 1/1 0/1 5/13 6/13 2/13
Data eval 4 0 0 0 1/3 2/3 0/5 1/5 3/5 1/5 1/7 4/7 2/7
Data eval 5 0 0 0 1/3 2/3 0/1 0/1 1/1 0/1 2/7 4/7 1/7
Data eval 6 0 0 0 0/2 2/2 0/2 1/2 1/2 0/2 1/5 3/5 1/5
H0 .15 .2 .55 .05 .95 .15 .2 .55 .1 .2 .7 .1
H0 transition 5 0 0 0 0 1 0 .1 .55 .35 .1 .7 .2
Interim Data ∅ 4/15 4/15 7/15 0 0 0 0 0 0 0 0 0
Interim Data eval 2 0 0 0 0 0 0/4 2/4 1/4 1/4 1/4 3/4 0/4
Interim Data eval 3 0 0 0 0/2 2/2 0/1 0/1 1/1 0/1 1/5 2/5 2/5
Interim Data eval 4 0 0 0 1/2 1/2 0/1 0/1 0/1 1/1 1/3 1/3 1/3
Interim Data eval 5 0 0 0 0/1 1/1 0/1 0/1 1/1 0/1 0/1 1/1 0/1
Interim Data eval 6 0 0 0 0/1 1/1 0/0 0/0 0/0 0/0 0/2 1/2 1/2
HA transition ∅ 1 .3 .3 .4 0 0 .4 .25 .25 .1 .3 .5 .2
HA transition 2 0 0 0 .1 .9 .35 .3 .25 .1 .2 .7 .1
HA transition 3 0 0 0 .1 .9 .2 .2 .45 .15 .2 .7 .1
HA transition 4 0 0 0 .05 .95 .1 .2 .5 .2 .2 .7 .1
HA transition 5 0 0 0 0 1 0 .2 .5 .3 .1 .7 .2
Table D.56: Data input for matrix (4.7) modelling response+consecutive minor responses as a good outcome
Appendix D. Results 150
Endstates pR pMR2pMR1
psd ppd
Data 3/36 3/36 1/36 4/36 25/36
H0 .0497 .0563 .0144 .1202 .7593
Interim 1/15 1/15 0/15 1/15 12/15
HA .1905 .0977 .0120 .0989 .6009
Table D.57: Endstate probabilities for matrix (4.7) modelling response+consecutive mi-
nor responses as a good outcome
Appendix D. Results 151
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.271 0.587 0.053
2 1000 2,3,4,5 0.215 0.154 0.001
2 1000 2,3,4 0.064 0.432 0.015
2 1000 2,3 0.186 0.594 0.059
3 1000 2,3 0.508 0.045 0.000
4 1000 2 3,4,5 0.574 0.002 0.000
4 1000 2 2,3 0.321 0.501 0.008
4 1000 2,3 2,3,4,5 0.396 0.143 0.000
5 1000 2 3,4,5 0.146 0.638 0.045
5 1000 2 2,3 0.147 0.727 0.062
6 1000 2 6 0.382 0.141 0.000
7 1000 2 6 0.270 0.290 0.002
8 1000 0,1,1,1,1,0 0.228 0.160 0.001
8 1000 0,4,2,1,1,0 0.128 0.585 0.046
8 1000 0,8,6,1,1,0 0.111 0.671 0.060
9 1000 2,3 0.082 0.812 0.152
1 500 2 0.298 0.602 0.046
2 500 2,3,4,5 0.236 0.136 0.000
2 500 2,3,4 0.072 0.432 0.014
2 500 2,3 0.178 0.594 0.054
3 500 2,3 0.532 0.016 0.004
4 500 2 3,4,5 0.574 0.002 0.000
4 500 2 2,3 0.330 0.452 0.010
4 500 2,3 2,3,4,5 0.354 0.154 0.000
5 500 2 3,4,5 0.148 0.666 0.046
5 500 2 2,3 0.184 0.742 0.058
6 500 2 6 0.412 0.136 0.002
7 500 2 6 0.284 0.278 0.002
8 500 0,1,1,1,1,0 0.206 0.180 0.000
8 500 0,4,2,1,1,0 0.174 0.648 0.034
8 500 0,8,6,1,1,0 0.124 0.630 0.054
9 500 2,3 0.092 0.802 0.154
Table D.58: Outcomes for matrix (4.7) modelling response+consecutive minor responses
as a good outcome and n=36 patients
Appendix D. Results 152
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.149 0.785 0.124
2 1000 2,3,4,5 0.132 0.377 0.003
2 1000 2,3,4 0.047 0.767 0.063
2 1000 2,3 0.132 0.910 0.079
3 1000 2,3 0.297 0.118 0.000
4 1000 2 3,4,5 0.431 0.009 0.000
4 1000 2 2,3 0.174 0.794 0.024
4 1000 2,3 2,3,4,5 0.239 0.465 0.002
5 1000 2 3,4,5 0.138 0.897 0.116
5 1000 2 2,3 0.147 0.875 0.113
6 1000 2 6 0.233 0.500 0.001
7 1000 2 6 0.206 0.583 0.004
8 1000 0,1,1,1,1,0 0.142 0.514 0.000
8 1000 0,4,2,1,1,0 0.097 0.870 0.068
8 1000 0,8,6,1,1,0 0.073 0.878 0.125
9 1000 2,3 0.025 0.940 0.255
1 500 2 0.152 0.794 0.112
2 500 2,3,4,5 0.150 0.372 0.000
2 500 2,3,4 0.048 0.768 0.026
2 500 2,3 0.122 0.816 0.156
3 500 2,3 0.314 0.164 0.000
4 500 2 3,4,5 0.446 0.008 0.000
4 500 2 2,3 0.164 0.742 0.040
4 500 2,3 2,3,4,5 0.190 0.458 0.002
5 500 2 3,4,5 0.136 0.914 0.122
5 500 2 2,3 0.122 0.900 0.096
6 500 2 6 0.218 0.440 0.000
7 500 2 6 0.214 0.558 0.010
8 500 0,1,1,1,1,0 0.138 0.366 0.004
8 500 0,4,2,1,1,0 0.074 0.850 0.036
8 500 0,8,6,1,1,0 0.084 0.920 0.142
9 500 2,3 0.020 0.964 0.222
Table D.59: Outcomes for matrix (4.7) modelling response+consecutive minor responses
as a good outcome and n=54 patients
Matrix p∅−sd p∅−off p∅−tox pr−r pr−Rtox psd−r psd−sd psd−off psd−tox
Data ∅ 21/36 11/36 4/36 0 0 0 0 0 0
Data eval 2 0 0 0 0 0 1/21 16/21 4/21 0/21
Data eval 3 0 0 0 1/1 0/1 0/16 14/16 2/16 0/16
Data eval 4 0 0 0 1/1 0/1 1/14 9/14 3/14 1/14
Data eval 5 0 0 0 1/2 1/2 1/9 8/9 0/9 0/9
H0 .4 .4 .2 .9 .1 .05 .7 .2 .05
Interim Data ∅ 8/15 5/15 2/15 0 0 0 0 0 0
Interim Data eval 2 0 0 0 0 0 0/8 7/8 1/8 0/8
Interim Data eval 3 0 0 0 0 0 0/7 6/7 1/7 0/7
Interim Data eval 4 0 0 0 0 0 1/6 4/6 0/6 1/6
Interim Data eval 5 0 0 0 1/1 0/1 0/4 4/4 0/4 0/4
HA .6 .35 .05 .95 .05 .15 .7 .1 .05
Table D.60: Data mnput for matrix (4.8) modelling response & toxicity outcomes
Appendix D. Results 153
Endstates pR psd poff pT ox pR&T ox
Data 2/36 8/36 20/36 5/36 1/36
H0 .0416 .0960 .6026 .2507 .0091
Interim 1/15 4/15 7/15 3/15 0/15
HA .2068 .1441 .5020 .1260 .0212
Table D.61: Endstate probabilities for matrix (4.8) modelling response & toxicity out-
comes
Method iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.428 0.650 0.048
2 1000 2,6 0.280 0.752 0.047
2 1000 2,3,6 0.015 0.941 0.869
2 1000 2,3 0.012 0.914 0.947
3 1000 2,3 0.471 0.310 0.039
4 1000 2 2,3 0.451 0.665 0.053
5 1000 2 2,3 0.178 0.842 0.162
6 1000 2,6 5,6 0.499 0.129 0.003
7 1000 2,6 5,6 0.275 0.383 0.031
8 1000 0,1,0,0,0,1 0.280 0.738 0.038
8 1000 0,2,1,0,0,2 0.016 0.948 0.625
8 1000 0,8,3,1,0,4 0.041 0.934 0.489
8 1000 0,8,1,0,0,2 0.180 0.866 0.341
1 500 2 0.432 0.828 0.154
2 500 2,6 0.266 0.762 0.062
2 500 2,3,6 0.008 0.948 0.882
2 500 2,3 0.034 0.872 0.898
3 500 2,3 0.454 0.410 0.034
4 500 2 2,3 0.428 0.658 0.042
5 500 2 2,3 0.204 0.850 0.180
6 500 2,6 5,6 0.520 0.126 0.004
7 500 2,6 5,6 0.276 0.326 0.022
8 500 0,1,0,0,0,1 0.320 0.744 0.052
8 500 0,2,1,0,0,2 0.020 0.952 0.736
8 500 0,8,3,1,0,4 0.034 0.936 0.642
8 500 0,8,1,0,0,2 0.164 0.848 0.380
Table D.62: Outcomes for matrix (4.8) modelling response & toxicity outcomes and n=36
patients
Appendix D. Results 154
n=54 iterations outv outv2 p-value cond. prob. (HA) cond. prob. (data)
1 1000 2 0.390 0.916 0.117
2 1000 2,6 0.114 0.929 0.042
2 1000 2,3,6 0.001 0.991 0.975
2 1000 2,3 0.004 0.992 0.990
3 1000 2,3 0.410 0.448 0.113
4 1000 2 2,3 0.387 0.925 0.113
5 1000 2 2,3 0.177 0.982 0.275
6 1000 2,6 5,6 0.435 0.415 0.024
7 1000 2,6 5,6 0.254 0.635 0.077
8 1000 0,1,0,0,0,1 0.149 0.892 0.047
8 1000 0,2,1,0,0,2 0.005 0.992 0.849
8 1000 0,8,3,1,0,4 0.015 0.994 0.690
8 1000 0,8,1,0,0,2 0.115 0.983 0.429
1 500 2 0.418 0.914 0.110
2 500 2,6 0.124 0.972 0.044
2 500 2,3,6 0.000 0.996 0.982
2 500 2,3 0.008 0.990 0.960
3 500 2,3 0.394 0.414 0.110
4 500 2 2,3 0.384 0.928 0.108
5 500 2 2,3 0.210 0.976 0.254
6 500 2,6 5,6 0.398 0.428 0.016
7 500 2,6 5,6 0.248 0.622 0.064
8 500 0,1,0,0,0,1 0.142 0.910 0.050
8 500 0,2,1,0,0,2 0.006 0.988 0.864
8 500 0,8,3,1,0,4 0.014 0.996 0.726
8 500 0,8,1,0,0,2 0.110 0.984 0.396
Table D.63: Outcomes for matrix (4.8) modelling response & toxicity outcomes and n=54
patients
Bibliography
[1] National Cancer Institute of Canada annual report National Cancer Institute of
Canada, Annual Report 2004-2005, March, 2005.
[2] NCI Budget request for fiscal year 2008 http://plan.cancer.gov/budget.shtml ex-
tracted 28 June 2007.
[3] Canadian Cancer Society/National Cancer Institute of Canada: Canadian Cancer
Statistics 2005, Toronto, Canada, 2005.
[4] B.Fisher. On clinical trial participation Journal of Clinical Oncology 1991; 9: 1927-
1930.
[5] C.G.Wood, S.J.Wei, M.K.Hampshire, P.A.Devine and J.M.Metz. The influence of
race on the attitudes of radiation oncology patients towards clinical trial enrollment
American Journal of Clinical Oncology December 2006; 29(6):593-599.
[6] National Cancer Institute: Clincal Trials Frequently Asked Questions
http://www.cancer.gov/cancertopics/factsheet/Information/clinical-trials extracted
28 June 2007.
[7] E.A.Gehan. The determination of number of patients in a follow-up trial of a new
chemotherapeutic agent Journal of Chronic Disease, 1961; 13:346-353.
[8] T.R.Fleming. One-sample multiple testing procedure for phase II clinical trials. Bio-
metrics 1982 38:143-151.
155
Bibliography 156
[9] R.Simon. Optimal two-stage designs for phase II clinical trials. Controlled Clinical
Trials 1989 10:1-10.
[10] S-H.Jung, M.Carey and K.M.Kim. Graphical search for two-stage designs for phase
II clinical trials. Controlled Clinical Trials 2001 22(4):367-72.
[11] B.Zee, D.Melnychuk, J.Dancey and E.Eisenhauer. Multinomial phase II clinical trials
incorporating response and early progression. Journal of Biopharmaceutical Statis-
tics 1999; 9(2):351-363.
[12] K.S.Panageas, A.Smith, M.Gonen and P.B.Chapman. An optimal two-stage phase
II design utilizing complete and partial response information separately. Controlled
Clinical Trials 2002; 23(4):367-79.
[13] Y.Lu, H.Jin and K.R.Lamborn. A design of phase II cancer trials using total and
complete response endpoints. Statistics in Medicine 2005; 24:3155-3170.
[14] S.Lin and T.Chen. Optimal two-stage designs for phase II clinical trials with differ-
entiation of complete and partial responses. Communications in Statistics, Part A
- Theory and Methods 1998; 29: 923-940.
[15] J.Bryant and R.Day. Incorporating toxicity considerations into the design of two-
stage phase II clinical trials. Biometrics 1995; 51(4):1372-1383.
[16] P.F.Thall and R.Simon. Practical Bayesian guidelines for phase IIB clinical trials.
Biometrics 1994 50:337-349.
[17] P.Therasse, S.G.Arbuck, E.A.Eisenhauer, J.Wanders, R.S.Kaplan, L.Rubinstein,
J.Verwij, M.Van Glabbeke, A.T.Van Oosterom, M.C.Christian and S.G.Gwyther.
New guidelines to evaluate the response to treatment in solid tumors Journal of the
National Cancer Institute 2000 92:205-216.
Bibliography 157
[18] A.B.Miller, B.Hoogstraten, M.Staquet and A.Winkler. Reporting results of cancer
treatment. Cancer 1981; 47:207-214.
[19] C.Jennison and B.W.Turnbull. Statistical approaches to interim monitoring of med-
ical trials: A review and commentary. Statistical Science 1990; 5(3):299-317.
[20] S.J.Pocock. Clinical Trials: A Practical Approach Wiley: New York, 1983.
[21] P.C.Austin, M.M.Mamdani, D.N.Juulink and J.E.Hux. Testing multiple statistical
hypotheses resulted in spurious associations: a study of astrological signs and health
Journal of Clinical Epidemiology 2006; 59:964-969.
[22] J.W.Tukey. Some thoughts on clinical trials, especially problems of multiplicity.
Science 1977 198:679-684.
[23] T.V.Perneger. What’s wrong with Bonferroni adjustments British Medical Journal
1998 316(7139):1236-1241.
[24] A.Wald. Sequential Analysis Wiley: New York, 1947.
[25] T.W.Anderson. A modification of the sequential probability ratio test to reduce the
sample size. Annals of Mathematical Statistics 1960; 31:165-197.
[26] P.Armitage, C.K.McPherson and B.C.Rowe. Repeated significance tests on accumu-
lating data. Journal of the Royal Statistical Society, Series A 1969; 132:235-244.
[27] S.J.Pocock. Group-sequential methods in the design and analysis of clinical trials.
Biometrika 1977; 64:191-199.
[28] P.C.O’Brien and T.R.Fleming. A multiple testing procedure for clinical trials. Bio-
metrics 1979; 35:549-556.
[29] S.J.Pococok. Interim analyses for randomised clinical trials: The group sequential
approach. Biometrics 1982; 38:153-162.
Bibliography 158
[30] K.K.G.Lan and D.L.DeMets. Discrete sequential boundaries for clinical trials.
Biometrika 1983; 70(3):659-663.
[31] C.Jennison and B.W.Turnbull. Interim analyses: The repeated confidence interval
approach (with discussion) Journal of the Royal Statistical Society, Series B. 1989;
51:305-361.
[32] D.A.Berry. Interim analyses in clinical trials: The role of the likelihood principle.
American Statistician 1987; 41:117-122.
[33] D.J.Spiegelhalter and L.S.Freedman. Bayesian approaches to clinical trials (with
discussion), in Bayesian Statistics Bernardo, J.M., DeGroot, M.H., Lindley, D.V.
and Smith, A.F.M. eds; 453-477. Oxford University Press: Oxford, 1988.
[34] R.Royall. Statistical evidence: A likelihood paradigm. London, UK: Chapman & Hall,
1997.
[35] S.N.Goodman Towards evidence-based medical studies I: The p-value fallacy Annals
of Internal Medicine, 1999; 130:995-1004.
[36] G.Casella and R.L.Berger. Statistical inference. Belmont, California, USA: Duxbury
Press, 1990.
[37] S.N.Goodman and R.Royall Evidence and scientific research American Journal of
Public Health, 1988; 78:1568-1574.
[38] D.A.Berry and D.K.Stangl (eds.) (1996) Bayesian Biostatistics. New York Marcel-
Dekker.
[39] J.M.Bernardo, J.O.Berger, A.P.Dawid and A.F.M.Smith (eds.) (1999) Bayesian
Statistics 6 London, Oxford University Press.
[40] J.Dickey. Scientific reporting and personal probabilities: Student’s hypothesis. Jour-
nal of the Royal Statistical Society, Series B. 1973; 35:285-305.
Bibliography 159
[41] R.Kass and L.Wesserman The selection of prior distributions by formal rules. Journal
of the American Statistical Association 1996; 91:1343-1370.
[42] J.M.Bernardo Reference posterior distributions for Bayesian inference (with discus-
sion) Journal of the Royal Statistical Society, Series B 1979; 41:113-147.
[43] R.Yang and J.Berger A catalogue of non-informative priors. ISDS Discussion Paper
1997; 97-42; Duke University.
[44] J.Berger and D.Berry Analyzing data: Is objectivity possible? The American Sci-
entist 1988; 76:159-165.
[45] J.O.Berger Bayesian analysis: A look at today and thoughts of tomorrow. Journal
of the American Statistical Association 2000; 95:1269-1276.
[46] F.J.Anscombe. Sequential medical trials. Journal of the American Statistical Asso-
ciation 1963; 58:365-383.
[47] J.Colton A model for selecting one of two medical treatments. Journal of the Amer-
ican Statistical Association 1963; 58:388-400.
[48] R.Peto. Discussion of ”On the allocation of treatments in sequential medical trials”
by J.A. Bather. International Statistical Review 1985; 53:31-34.
[49] K.K.G.Lan, R.Simon and M.Halperin. Stochastically curtailed tests in long-term
clinical trials. Sequential Analysis 1982; 1:207-219.
[50] S.C.Choi, P.J.Smith and D.P.Becker. Early decision in clinical trials when treatment
differences are small. Controlled Clinical Trials 1985; 6:280-288.
[51] D.J.Spiegelhalter, L.S.Freedman and P.R.Blackburn. Monitoring clinical trials: Con-
ditional or predictive power? Controlled Clinical Trials 1986; 7:8-17.
[52] L.D.Fisher. Self-designing clinical trials Statistics in Medicine 1998; 17: 1551-1562.
Bibliography 160
[53] Y.Shen and L.Fisher. Statistical inference for self-designing clinical trials with a
one-sided hypothesis. Biometrics 1999; 55: 190-197.
[54] P.Bauer and K.Kohne. Evaluation of experiments with adaptive interim analyses.
Biometrics 1994; 50:1029-1041.
[55] M.A.Proschan and S.A.Hunsberger. Designed extension of studies based on condi-
tional power Biometrics 1995; 51:1315-1324.
[56] J.Hintze. (2004) NCSS and PASS. Number cruncher statistical systems Kaysville,
Utah. www.ncss.com
[57] L.Cui, H.M.J.Hung and S-J.Wang. Modifications of sample size in group-sequential
clinical trials. Biometrics 1999; 55: 853-857.
[58] S.J.Wang, H.M.J.Hung and R.T.O’Neill Adapting the sample size planning of a
phase III trial based on phase II data. Journal of Pharmaceutical Statistics 2006;
5(2):85-97.
[59] H.M.J.Hung, L.Cui, S.J.Wang and J.Lawrence Adaptive statistical analysis following
sample size modifications based on interim review of effect size. Journal of Biophar-
maceutical Statistics 2005; 15(4):693-706.
[60] P.Gallo Operational challenges in adaptive design implementation. Journal of Phar-
maceutical Statistics 2006; 5(2):119-124.
[61] A.A.Tsiatis and C.Mehta On the inefficiency of the adaptive design for monitoring
clinical trials. Biometrika 2003; 90:367-378.
[62] C.Jennison and B.W.Turnbull. Mid-course sample size modification in clinical trials.
Statistics in Medicine 2003; 22:971-993.
Bibliography 161
[63] A.A.Tsiatis. Repeated significance testing for a general class of statistics used in
censored survival analysis. Journal of the American Statistical Association 1982;
77(380):855-861.
[64] E.A.Gehan. A generalized Wilcoxon test for comparing arbitrary single-censored
samples. Biometrika 1965; 53:203-223.
[65] E.Slud and L.J.Wei. Two-sample repeated significance tests based on the mod-
ified Wilcoxon statistic. Journal of the American Statistical Association 1982;
77(380):862-868.
[66] R.Peto and J.Peto. Asymptotically efficient rank invariant test procedures. Journal
of the Royal Statistical Society, Series A 1972; 135:185-206.
[67] R.L.Prentice. Linear rank tests with right censored data. Biometrika 1978; 65:167-
179.
[68] R.L.Prentice and P.Marek. A qualitative discrepancy between censored data rank
test. Biometrics 1979; 35:861-867.
[69] W.Y.W.Lou and K.K.G.Lan. A note on the Gehan-Wilcoxon statistic. Communica-
tions in Statistics, Part A - Theory and Methods 1998; 27(6): 1453-1459.
[70] J.C.Fu and W.Y.W.Lou. Distribution Theory of Runs and Patterns and its Appli-
cations: A Finite Markov Chain Imbedding Approach World Scientific Publishing
Co. Pte. Ltd. 2003; London
[71] J.C.Fu and M.V.Koutras. Distribution Theory of Runs: A Markov Chain Approach
Journal of the American Statistical Association, 1994; 89(427):1050-1058.
[72] I.Duran, J.Kortmansky, D.Singh, H.Hirte, W.Kocha, G.Goss, L.Le, A.Oza,
T.Nicklee, J.Ho, D.Birle, G.R.Pond, D.Arboine, J.Dancey, S.Aviel-Ronen,
M.S.Tsao, D.Hedley and L.L.Siu A phase II clinical and pharmacodynamic study
Bibliography 162
of temsirolimus in advanced neuroendocrine carcinomas. British Journal of Cancer
November 2006 95(9):1148-1154.
[73] R.P.A.A’Hern. Sample size tables for exact single-stage phase II designs Statistics
in Medicine 2001 20:859-866.
[74] S-H.Jung, T.Lee, K.Kim and S.L.George. Admissible two-stage designs for phase II
cancer clinical trials. Statistics in Medicine 2004; 23(4):561-9
[75] M.H.Degroot. Optimal Statistical Decisions McGraw-Hill: New York, 1970.
[76] P.F.Thall and R.Simon. A Bayesian approach to establishing sample size and moni-
toring criteria for phase II clinical trials. Controlled Clinical Trials 1994 15(6):463-
481.
[77] P.F.Thall, J.K.Wathen, B.N.Bekele, R.E.Champlin, L.H.Baker and R.S.Benjamin.
Hierarchical Bayesian approaches to phase II trials in diseases with multiple subtypes
Statistics in Medicine 2003 22:763-780.
[78] P.F.Thall and J.D.Cook. Dose-finding based on efficacy-toxicity trade-offs. Biomet-
rics 2004; 60(3):684-93.
[79] G.Yin, Y.Li and Y.Ji. Bayesian dose-finding in phase I/II clinical trials using toxicity
and efficacy odds ratios Biometrics 2006; 62(3):777-797.
[80] M.J.Ratain, T.Eisen, W.M.Stadler, K.T.Flaherty, M.Gore, A.Desai, A.Patnaik,
H.Q.Xiong, B.Schwartz and P.O’Dwyer. Final findings from a phase II, placebo-
controlled, randomized discontinuation trial (RDT) of sorafenib (BAY 43-9006) in
patients with advanced renal cell carcinoma (RCC). Journal of Clinical Oncology
2005; abstract 4544.
[81] P.H.Patel, R.S.K.Chaqanti and R.J.Motzer. Targeted therapy for metastatic renal
cell carcinoma British Journal of Cancer 2006; 94:614-619.
Bibliography 163
[82] S.Dent, B.Zee, J.Dancey, A.Hanauske, J.Wanders and E.Eisenhauer. Application
of a new multinomial phase II stopping rule using response and early progression.
Journal of Clinical Oncology 2001; 19(3):785-791.
[83] P.F.Thall and H.G.Sung. Some extensions and applications of a Bayesian strat-
egy for monitoring multiple outcomes in clinical trials Statistics in Medicine 1998;
17(14):1563-80.
[84] P.F.Thall, R.M.Simon and E.H.Estey. New statistical strategy for monitoring
safety and efficacy in single-arm clinical trials Journal of Clinical Oncology 1996;
14(1):296-303.
[85] M.R. Conaway and G.R. Petroni Designs for phase II trials allowing for a trade-off
between response and toxicity. Biometrics 1996; 52: 1375-1386.
[86] B.Zee, B.Freidlin, J.Dancey, E.L.Korn and E.Eisenhauer. Multinomial Phase II Trial
Designs Journal of Clinical Oncology Jan 2002; 20(2):599.
[87] P.H.O’Donnell and M.J.Ratain. Evaluating the activity of temsirolimus in neuroen-
docrine cancer. British Journal of Cancer 2007 96:177.
[88] I.Duran, M.Moore and L.L.Siu. Reply: Evaluating the activity of temsirolimus in
neuroendocrine cancer. British Journal of Cancer 2007 96:178-179.
[89] M.J.Ratain, T.Eisen, W.M.Stadler, K.T.Flaherty, S.B.Kaye, G.L.Rosner, M.Gore,
A.A.Desai, A.Patnaik, H.Q.Xiong, E.Rowinsky, J.L.Abbruzzese, C.Xia, R.Simantov,
B.Schwartz and P.J.O’Dwyer. Phase II placebo-controlled randomized discontinua-
tion trial of sorafenib in patients with metastatic renal cell carcinoma. Journal of
Clinical Oncology 2006 24(16):2505-2512.
[90] S.D.Curran, A.U.Muellner and L.H.Schwartz. Imaging response assessment in on-
cology. Cancer Imaging Oct 2006; 31(6):S126-130.
Bibliography 164
[91] R.S.Tuma Sometimes size doesn’t matter: reevaluating RECIST and tumor response
rate endpoints Journal of the National Cancer Institute Sep 2006; 98(18):1272-1274.
[92] L.C.Michaelis and M.J.Ratain. Measuring response in a post-RECIST world: from
black and white to shades of grey Nature Review Cancer May 2006; 6(5):409-414.