Sample Size Pilot

5
Sample size calculations for pilot randomized trials: a confidence interval approach Kim Cocks, David J. Torgerson* York Trials Unit, Department of Health Sciences, University of York, Heslington, York YO10 5DD, UK Accepted 5 September 2012; Published online 27 November 2012 Abstract Objectives: To describe a method using confidence intervals (CIs) to estimate the sample size for a pilot randomized trial. Study Design: Using one-sided CIs and the estimated effect size that would be sought in a large trial, we calculated the sample size needed for pilot trials. Results: Using an 80% one-sided CI, we estimated that a pilot trial should have at least 9% of the sample size of the main planned trial. Conclusion: Using the estimated effect size difference for the main trial and using a one-sided CI, this allows us to calculate a sample size for a pilot trial, which will make its results more useful than at present. Ó 2013 Elsevier Inc. All rights reserved. Keywords: Sample size; Pilot trials; Confidence intervals; Statistical power; Review; Randomised trials 1. Background Randomized controlled trials (RCTs) are often complex, time consuming, and expensive. Ideally, before a large RCT is undertaken, a pilot or feasibility study that informs the design of the main trial should be conducted. It is useful, at this stage, to distinguish between a pilot trial and feasi- bility study. A pilot trial replicates, in miniature, a planned larger study [1], whereas a feasibility study may help in the development of the intervention and/or outcome measures. Consequently, in the definitive trial, the intervention may be quite different, and the outcomes may have changed. In this article, we are only discussing pilot trials: studies that mimic, in all the major essentials, the future definitive trial. Our arguments apply to both pilot trials run before the main study (external pilot trials) and those run as the first stage of the main trial (internal pilot studies). Often, researchers justify a pilot study to help with the calculation of the sample size for the main trial. Estimates of treatment effects and their variance from pilot studies may be used to generate possible sample size requirements, but there is a problem with this approach. Effect sizes from any small trial will be bounded by a high degree of uncer- tainty. Consequently, if one were to plan a definitive trial’s sample size based on the estimate obtained from a small pilot, then it is likely that the main trial will be underpow- ered as many published small trials overestimate treatment effects. For example, in the design of a trial of yoga for low back pain, three small previously published pilot trials returned an average difference in back pain scores of 0.98 of a standard deviation [2]. However, this difference was not used to plan the main trial as the researchers judged it to be unexpectedly large. Indeed, the main trial showed a much smaller difference of around 0.5 of a standard de- viation difference [3]. Therefore, although pilot trials are useful in many areas of trial design, their use for the deter- mination of the sample size for the main trial should be used with caution, and the clinical relevance must also be considered. There is often the scenario that the difference to be detected for the main trial can be reasonably informed by known clinical or economic importance; however, a pilot study may still be desirable to assess whether such a differ- ence is likely. We propose that if careful attention is paid to the sample size calculation in these pilot trials, they could be much more informative, resulting in cost savings and more efficient use of patients in trials. Sample size calculations for pilot trials are sometimes not undertaken. Indeed, many journal editors publishing pilot trials either do not expect them or suggest that they should not be done [1,4]. In our experience, this view is widely held, but we will argue that it is mistaken. The argument for not undertaking a sample size calcula- tion for pilot trials hinges around the problem of a type II Conflict of interest statement: We confirm that we have no conflict of interest. Competing interests: Both authors have no competing interests. * Corresponding author. E-mail address: [email protected] (D.J. Torgerson). 0895-4356/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jclinepi.2012.09.002 Journal of Clinical Epidemiology 66 (2013) 197e201

description

sample size for pilot study

Transcript of Sample Size Pilot

  • otap

    vidUniv

    ublish

    CIs)ze th

    rial sainat pr

    ; Rev

    considered. There is often the scenario that the difference

    pilot trials either do not expect them or suggest that theyshould not be done [1,4]. In our experience, this view iswidely held, but we will argue that it is mistaken.

    The argument for not undertaking a sample size calcula-

    Conflict of interest statement: We confirm that we have no conflict of

    interest.

    Competing interests: Both authors have no competing interests.

    * Corresponding author.

    Journal of Clinical Epidemiolomain study (external pilot trials) and those run as the firststage of the main trial (internal pilot studies).

    Often, researchers justify a pilot study to help with thecalculation of the sample size for the main trial. Estimatesof treatment effects and their variance from pilot studiesmay be used to generate possible sample size requirements,but there is a problem with this approach. Effect sizes fromany small trial will be bounded by a high degree of uncer-tainty. Consequently, if one were to plan a definitive trials

    to be detected for the main trial can be reasonably informedby known clinical or economic importance; however, a pilotstudy may still be desirable to assess whether such a differ-ence is likely. We propose that if careful attention is paid tothe sample size calculation in these pilot trials, they couldbe much more informative, resulting in cost savings andmore efficient use of patients in trials.

    Sample size calculations for pilot trials are sometimesnot undertaken. Indeed, many journal editors publishingtrial. Our arguments apply to both pilot trials run before the

    that mimic, in all the major essentials, the future definitive

    mination of the sample size for the main trial should beused with caution, and the clinical relevance must also beIn this article, we are only discussing pilot trials: studiesRandomized controlled trials (RCTs) are often complex,time consuming, and expensive. Ideally, before a large RCTis undertaken, a pilot or feasibility study that informs thedesign of the main trial should be conducted. It is useful,at this stage, to distinguish between a pilot trial and feasi-bility study. A pilot trial replicates, in miniature, a plannedlarger study [1], whereas a feasibility study may help in thedevelopment of the intervention and/or outcome measures.Consequently, in the definitive trial, the intervention maybe quite different, and the outcomes may have changed.

    pilot, then it is likely that the main trial will be underpow-ered as many published small trials overestimate treatmenteffects. For example, in the design of a trial of yoga for lowback pain, three small previously published pilot trialsreturned an average difference in back pain scores of 0.98of a standard deviation [2]. However, this difference wasnot used to plan the main trial as the researchers judgedit to be unexpectedly large. Indeed, the main trial showeda much smaller difference of around 0.5 of a standard de-viation difference [3]. Therefore, although pilot trials areuseful in many areas of trial design, their use for the deter-Sample size calculations for pilinterval

    Kim Cocks, DaYork Trials Unit, Department of Health Sciences,

    Accepted 5 September 2012; P

    Abstract

    Objectives: To describe a method using confidence intervals (Study Design: Using one-sided CIs and the estimated effect si

    needed for pilot trials.Results: Using an 80% one-sided CI, we estimated that a pilot tConclusion: Using the estimated effect size difference for the m

    size for a pilot trial, which will make its results more useful than

    Keywords: Sample size; Pilot trials; Confidence intervals; Statistical power

    1. BackgroundE-mail address: [email protected] (D.J. Torgerson).

    0895-4356/$ - see front matter 2013 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.jclinepi.2012.09.002randomized trials: a confidenceproach

    J. Torgerson*ersity of York, Heslington, York YO10 5DD, UK

    ed online 27 November 2012

    to estimate the sample size for a pilot randomized trial.at would be sought in a large trial, we calculated the sample size

    hould have at least 9% of the sample size of the main planned trial.trial and using a one-sided CI, this allows us to calculate a sampleesent. 2013 Elsevier Inc. All rights reserved.

    iew; Randomised trials

    sample size based on the estimate obtained from a small

    gy 66 (2013) 197e201tion for pilot trials hinges around the problem of a type II

  • cludes the minimum important clinical effect size

    198 K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201for the main study enhances a pilot studys utility.

    error, which is where one concludes, because of the smallsample size, that there is no worthwhile difference betweenthe groups when in fact there is. By definition, a pilot trialis small and not intended to be large enough to identifya meaningful difference between the treatment groups thatcan be statistically significant. Consequently, how wouldone undertake a meaningful sample size calculation? Nev-ertheless, some authors have made some suggestions ofsample sizes for pilot trials with figures of 12 [5], 10 [6],and 15 [7] per group, 32 in total [8] for a two-arm trialor 50% of the total main trials sample size [9] (Table 1for fuller details). Justifications for these figures includeprecision around the mean and variance or having thepower to show a large difference (1 SD) between groupsif it were present.

    In this article, we suggest an alternative approach,whereby the sample size calculation for the pilot trial isdriven by the proposed sample size of the main trial. Wesuggest that we can use objective criteria for establishinga sample size for pilot studies by using a confidence inter-val (CI) approach [10] rather than using the more usualpower and statistical significance method.

    2. The role of CIs for informing more research

    Most medical journals nowadays insist that reports of tri-als (and other quantitative studies) includeCIs or their Bayes-ian counterparts, credibility intervals. A 95% interval, themost commonly used, gives an estimate of where the true,but unknown, clinical difference lies. For large trials, the un-certainty surrounding any estimate is relatively narrow, andwe can be confident that the true clinical difference doesnot depart very much from the observed difference seen inthe trial. The most important use of measures of uncertaintyis to inform the need for further research. Very narrow inter-vals, in the context of a robust study, suggest that the furtherreplication of the study is not required, whereas widerWhat is new?

    Many randomized controlled pilot trials do nothave an a priori sample size calculation. In this ar-ticle, we argue that sample size calculations arebeneficial.

    In this study, we suggest a novel approach of usingthe anticipated main study to inform the pilot tri-als sample size using a confidence intervalapproach.

    We argue that using a sample size such that it givesa one-sided 80% confidence interval which ex-4. Sample size estimation

    In the following examples, CIs for the standardizedeffect size have been calculated using the inversion CI prin-ciple via SAS software (SAS Institute Inc., Cary, NC, USA)(NONCT function) [11]. Sample sizes for the main trialhave been calculated using PS Power and Sample Size soft-ware (Vanderbilt University) [12]. Table 2 and Fig. 1 showthe recommended pilot sample sizes for various standard-ized effect sizes (using one-sided 80% and 90% confidence3. Aimand rationale for the use of the 80%one-sidedCI

    What we suggest is that we identify a sample size, forour pilot trial, of sufficient size such that if our observeddifference between the two groups, in the pilot trial, is zero,then the upper confidence limit will exclude the estimatethat is considered clinically significant in the planneddefinitive trial. If we are to use uncertainty estimates inthe analysis of pilot trials then it follows, we should calcu-late sample sizes for pilot studies to optimize their utility.This would give us a statistical basis for our sample sizecalculation, which will ensure that we have a sufficientsample size to aid our decision as to whether we shouldmove forward into the main trial.

    We want to identify a sample size that gives us reason-able confidence that our pilot trial is big enough to enableus to be confident that we are making the right decision inproceeding to a larger trial or not. However, we would notrequire too large a sample size as this increases the cost,time taken to conduct the pilot, and potential for more pa-tients to be exposed to an ineffective treatment. These allnegate some of the advantages of undertaking a pilot beforeour main trial. Consequently, we would not use a 95%interval rather we would use a smaller interval, and we pro-pose an 80% interval that will satisfy the need for reason-able certainty for trial decision making but would besmall enough to deliver a study within a reasonable budgetand timeframe, although some may feel more comfortableusing a 90% interval. Furthermore, we propose to usea one-sided CI as we are only interested in proceedingtoward the main trial if there is some evidence of effective-ness. If the intervention appeared to be harmful, even if thiswere not statistically significant, we would not proceed.intervals mean a greater uncertainty and act as a pointer forfurther research. Usually, two-sided intervals are generatedbecause for most treatments, we are interested as to whetherthe intervention is harmful and beneficial.

    We can use CIs or the Bayesian equivalent, credibilityintervals, in pilot trials to inform the decision to go forwardwith the main study. Because uncertainty intervals (i.e.,confidence or credibility) are used to inform the need forfurther research, we would argue that they should beproduced in the analysis of pilot trials.

  • r pilo

    le siz

    varianudy.enouriance

    Brittaatesobtaisize.mbs piloriancervatiintervsampvidedeviat, 3, 4as blomiz

    Further reductions in variandecline after 12.

    likeianceed m

    199K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201Sim andLewis [13]

    At least 55. Smaller sample sizes areunderestimate the varlead to an underpowertrial.Table 1. Summary of recommendations for sample size calculations fo

    StudyRecommendedsample size Justification for samp

    Wittes andBrittain [9]

    50% of total samplesize of main trial.

    To confirm estimates ofused to power main ststudy should be largebe confident in the vaestimates.

    Birkett andDay [6]

    10 patients per group40 maximum. Largersample sizes for largermain studies.

    Extension to Wittes andargued that good estimstudy variance can bewith a smaller sample

    Browne [7] Not clear. Sample size rule of thuis too small if one usesample estimate of vaNeed to be more consuse upper confidence

    Sandvik et al.a Minimum of 20 patientsin total.

    Sample size depends onoriginal study that proestimate of standard d

    Julious [5] 12 patients per group. Number is divisible by 26, which can be usedsizes in restricted randlimits). Table 3 indicates the sample size requirements forbinary outcomes. Note that the sample size calculationshere make no allowance for attrition.

    5. Some worked examples

    How might this work? Suppose we wanted to undertakea study in which we felt that 0.3 of a standard deviationbetween two groups was worthwhile. Such a study wouldrequire about 350 participants (assuming 80% power anda two-sided alpha of 5%) in the final analysis (Table 2).However, the funding agency recommends that the re-searchers undertake a pilot trial to test the recruitment rateand assess whether such a difference is likely to be realistic.Using a CI approach, we would calculate the pilot samplesize required to produce an upper limit of a one-sided80% CI which excludes 0.3, assuming that the treatmentestimate from the pilot was zero or less. Thirty-twoparticipants (approximately 9% of main sample size) wouldbe required to produce a one-sided 80% confidence limit,which would exclude this estimate (i.e., upper 80%CI 5 0.2976). Consequently, if we undertook such a pilotwith that sample size and found an estimate larger than zeroand the pilot also showed that it were feasible to recruit and

    Present study At least 9% of maintrials sample size.

    This allows a one-sided 80%confidence interval to excluclinically important differen

    a Sandvik L, Erikssen J, Mowinckel P, Rodland EA. A method for det1587e90.t trials

    e Internal/external pilot Method of analysis and findings

    ceThegh to

    Internal Estimate of variance. If larger thanexpected, increase sample formain trial, and if lower, retainoriginal sample size. No estimateof treatment effect.

    in butofned

    Internal Estimate of variance. No estimate oftreatment effect.

    of 30te.ve andals.

    Internal/external Calculate 80% and 90% confidenceinterval of standard deviation, anduse this to plan main studyssample size.

    le of

    ion.

    Internal Estimate of a new standarddeviation.

    , andockation.ce

    Internal/external Use point estimate of variance.

    ly toandain

    Internal/external Use upper 95% one-sidedconfidence limit of pilot standarddeviation.retain the participants, and so forth, then the recommenda-tion would be to move forward with the main study. InTable 2, we show some examples of sample size calcu-lations for continuous outcome measures.

    An example of using this approach might be in the fieldof low back pain. The main outcome measure used in thisarea is the Roland and Morris disability questionnaire(RMDQ), which has a standard deviation of about 4 pointsand an average score of 8 [3]. Let us suppose that we wantto evaluate an inexpensive intervention such that a modestdifference of 1 point is considered worthwhile (i.e., a differ-ence of a quarter of a standard deviation). To have an 80%power to detect such a difference (alpha 5 0.05), we wouldrequire 504 participants in the analysis. If we recruited, ran-domized, and analyzed 46 participants (i.e., 23 in eachgroup), we could produce a one-sided 80% confidencelimit, which would exclude a 1 point difference on theRMDQ, if the point estimate from the pilot study were 0.

    Similarly, let us suppose thatwewant to undertake a trial toreduce the proportion of older peoplewho are at risk of fallingfrom 50% down to 40%. To show this difference with an 80%power (alpha5 0.05), we would need to randomize and ana-lyze about 800 participants. However, if wewish to undertakea pilot, then recruiting, randomizing, and analyzing 72 partic-ipants and assuming that 18 fell in each of the two groups (i.e.,

    de ace.

    Internal/external Obtain treatment estimate if abovezero proceed to main trial. Nohypothesis tests.

    ermining size of internal pilot studies. Statistics in Medicine 1996;

  • are unsure of the variance, then they should consider using

    In our experience, there is a current belief among some

    Table 2. Recommended pilot sample size for continuous outcome measures

    Standardized effectsize for main trial

    Sample sizefor main triala

    Pilot sample size(80% level)

    Upper 80% one-sidedconfidence limit

    Pilot sample size(90% level)

    Upper 90% one-sidedconfidence limit

    0.50 128 12 0.4859 28 0.48440.45 158 14 0.4499 34 0.43970.40 198 18 0.3967 42 0.39550.35 258 24 0.3436 54 0.34880.30 344 32 0.2976 74 0.29800.25 504 46 0.2482 106 0.24900.20 786 72 0.1984 166 0.19890.15 1398 126 0.14995 292 0.149990.10 3142 284 0.0999 658 0.0999

    NB: Sample sizes not corrected for attrition.a Assuming 80% power and 5% significance level using t-test.

    200 K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201one of the other approaches listed in the table.Using our methods, if we choose to use a one-sided 80%

    confidence limit, then we will need to recruit, retain, and an-alyze about 9% of the total sample size needed for the maintrial (for continuous or dichotomous outcomes). Clearly, insome instances, larger pilots may bewarranted. If wewantedto estimate the main studys standard deviation, for instance,we should probably seek a sample size of at least 50 [13]. Fur-50%) would produce a one-sided 80% confidence limit thatwould exclude us finding a 10% point difference that wouldbe statistically significant in a larger trial.

    In terms of the analysis of the pilot study, we are onlyinterested in whether the treatment estimate is larger orsmaller than zero. Consequently, it is not necessary to for-mally undertake a hypothesis test of the results.

    6. Recommendations for pilot study sample size

    There are alternative approaches to estimate the samplesizes for pilot studies, and some of these are summarized inTable 1. As Table 1 shows that most previous studies lookingat the sample size estimation hinge around trying to estimatean unknown variance. In contrast, our approach assumes thatthe variance is known; it is the likelihood of the main studyfinding a minimum clinically important effect size, whichdrives the sample size in our approach. If the researchersthermore, recruiting only 9% of the total estimated sample

    Fig. 1. Sample size requirements for one-sided 80% confidenjournal editors, researchers, and funders that it is not neces-sary or desirable to undertake sample size calculations for pi-lot trials. We disagree and have argued that by formallyundertaking a priori sample size calculation for a pilot studywill enhance its utility. However, we believe that the use ofour suggested approach or one advocated by other authors isbetter than not doing a sample size estimation for pilot trials.

    It is not the case that if an appropriately sized pilot studyshowed a zero or negative effect size, this would automati-cally preclude the main trial going forward. It may be thatthe study, although planned as a pilot, actually behaved morelike a feasibility study, in that during the study, the elementssize of the main trial may be insufficient to be sure that re-cruitment targets are achievable. Therefore, we would seethat 9%of themain studys sample size is potentially themin-imum size of a pilot rather than the maximum. Indeed, wewould suggest that as a minimum, at least 20 participantsshould be included in a pilot study as this seems to be thesmallest amount that is reasonable from statistical modelingstudies (Table 1). For pilot studies that wanted to estimate thevalue of a parameter, such as a standard deviation, and assesswhether the main trial is worthwhile, we suggest using thelargest sample size estimate, if these are different.

    7. Discussionof the intervention were found to require change, which may

    ce interval to exclude required standardized effect size.

  • successfully recruit to the main trial. Therefore, all the otherreasons for doing a pilot trial remain. However, undertaking

    Table 3. Recommended pilot sample size for binary outcomes

    Control groupproportion

    Difference to bedetected (%)

    Sample size formain trial

    Pilot sample size(80% level)

    Upper one-sided80% confidence limit

    Pilot sample size(90% level)

    Upper one-sided 90%confidence limit

    0.50 5 3,130 284 0.0499 658 0.05000.50 10 774 72 0.0992 166 0.09950.50 15 338 32 0.1488 74 0.14900.30 5 2,754 238 0.0500 552 0.05000.30 10 712 60 0.0996 138 0.099990.30 15 324 28 0.1458 62 0.14920.20 5 2,188 182 0.0499 422 0.04990.20 10 586 46 0.0993 106 0.0996

    0.1435 48 0.1480

    201K. Cocks, D.J. Torgerson / Journal of Clinical Epidemiology 66 (2013) 197e201a formal sample size calculation will ensure that the pilotproduces muchmore added value than is commonly the case.

    Small trials on average appear to generate larger estimates oftreatment effects than bigger trials, and if this phenomenonwere true, then pilot trials using our approach will tend to rec-ommend going aheadmore often with themain trial than is op-timum.However, this depends onwhy small trials seem to showlarger effects. Large effects may be because of either publica-tion bias (small positive trials are more likely to be publishedthan small negative studies) or quality differences. In a meta-analysis of small and large trials, Kjaergaard et al. [14] foundthat the exaggerated effect sizes of small trials disappearedwhen the analysis adjusted for quality of the randomization.Thus, small, or pilot, trials will not exaggerate treatment effectsif undertaken rigorously. Therefore, it is unlikely as long as thepilot study pays good attention to rigorous randomization pro-cedures and other quality criteria that theywill overestimate theneed for definitive trials. In conclusion, we believe that it ishelpful to undertake a priori sample size calculations for pilottrials and that they should be encouraged.

    Acknowledgment

    The authors would like to thank Julie Oates and Andrewhave explained a lack of effect. Similarly, even if the pilottrial identified a positive effect, it may also have found thatrecruitment was simply too difficult to make it possible toThorpe for assisting with the programming.References

    [1] Arain M, Campbell MJ, Cooper CL, Lancaster GA. What is a pilot or

    feasibility study? A review of current practice and editorial policy.

    BMC Med Res Methodol 2010;10:678.

    [2] Cox H, Tilbrook H, Aplin J, Chuang LH, Hewitt C, Jayakody S, et al.

    A pragmatic multi-centred randomised controlled trial of yoga for

    chronic low back pain: trial protocol. Complement Ther Clin Pract

    2010;16:76e80.

    [3] Tilbrook HE, Cox H, Hewitt CE, Kangombe AR, Chuang LH,

    Jayakody S, et al. Yoga for chronic low back pain. A randomized

    trial. Ann Intern Med 2011;155:569e78.

    [4] Thabane L, Ma J, Chu R, Cheng J, Ismaila A, Rios LP, et al. A tutorial

    on pilot studies: the what, why and how. BMC Med Res Methodol

    2010;10:1.

    [5] Julious SA. Sample size of 12 per group rule of thumb for a pilot

    study. Pharm Stat 2005;4:287e91.[6] Birkett MA, Day SJ. Internal pilot studies for estimating sample size.

    Stat Med 1994;13:2455e63.

    [7] Browne RH. On the use of a pilot sample for sample size determina-

    tion. Stat Med 1995;14:1933e40.[8] Torgerson DJ, Torgerson CJ. Designing randomised trials in health,

    education and the social sciences. Basingstoke, UK: Palgrave Mac-

    millan; 2008.

    [9] Wittes J, Brittain E. The role of internal pilot studies in increasing the

    efficiency of clinical trial. Stat Med 1990;9:65e72.

    [10] Bland JM. The tyranny of power: is there a better way to calculate

    sample size? BMJ 2009;339:b3985.

    [11] Smithson M. Confidence intervals. Thousand Oaks, CA: Sage Publi-

    cations, Inc; 2003.

    [12] Dupont WD, Plummer WD. Power and sample size calculations: a re-

    view and computer program. Control Clin Trials 1990;11:116e28.[13] Sim J, Lewis M. The size of a pilot study for a clinical trial should be

    calculated in relation to considerations of precision and efficiency.

    J Clin Epidemiol 2012;65:301e8.

    [14] Kjaergaard LL, Vilumsen J, Cluud C. Reported methodologic quality

    and discrepancies between large and small randomized trials in meta-0.20 15 276 22analyses. Ann Intern Med 2001;135:982e9.

    Sample size calculations for pilot randomized trials: a confidence interval approach1. Background2. The role of CIs for informing more research3. Aim and rationale for the use of the 80% one-sided CI4. Sample size estimation5. Some worked examples6. Recommendations for pilot study sample size7. DiscussionAcknowledgmentReferences