Susan Stewart, Ph.D. UC Davis School of Medicine November ...

45
Susan Stewart, Ph.D. UC Davis School of Medicine November, 2014

Transcript of Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Page 1: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Susan Stewart, Ph.D. UC Davis School of Medicine

November, 2014

Page 2: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Intro to sample size determination Basic concepts Estimating sample size parameters Response variables β—¦ Continuous β—¦ Categorical β—¦ Time-to-event

Components of sample size estimation

Page 3: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Why is it a good idea to do a sample size calculation?

Why shouldn’t you just pick a size that’s convenient?

Page 4: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Because the sample might be too small to help you answer your research question,

Or the sample might be much larger than you need.

Page 5: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Primary objective of a clinical trial: to evaluate the efficacy and safety of an intervention.

Efficacy evaluation β—¦ Compare the average response in the intervention

and control groups in the study sample. β—¦ Decide whether the difference between the groups

indicates a true difference between treatments.

Usually the efficacy evaluation is performed in the context of a hypothesis test.

Page 6: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Problem: Determine whether or not the population means of the intervention and control groups truly differ with respect to the outcome of interest. β—¦ We regard the intervention and control samples as

being drawn from the target population.

Solution: Assume that the two groups do not differ, and see if the sample data disagree with this assumption. That is, perform a hypothesis test.

Page 7: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

The null hypothesis (H0) assumes that there is no difference in outcome between the two groups.

The alternative hypothesis (HA) assumes that one group has a more favorable outcome than the other.

The research hypothesis is usually the alternative hypothesis.

Page 8: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

To do a hypothesis test: β—¦ Calculate a test statistic from the data. β—¦ Determine whether the value of the test

statistic is likely or unlikely under the null hypothesis. β—¦ If the value is very unlikely, reject the null

hypothesis.

Page 9: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Problem: we might reject the null hypothesis when it is true. β—¦ That is, we might commit Type I error.

Solution: Construct the test so that there is only a 5% chance of incorrectly rejecting the null hypothesis. β—¦ That is, the level of the test (alpha) is 0.05.

Page 10: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Hypothesis tests can be 1-sided or 2-sided β—¦ 1-sided: tests for differences in one direction only e.g., higher response rate in the intervention group

than in the control group

β—¦ 2-sided: tests for differences in both directions e.g., either higher or lower response rate in the

intervention group than in the control group

Even if you are primarily interested in one direction, it is customary to do a 2-sided test

Page 11: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

The p-value is the probability under the null hypothesis of obtaining data as extreme as that of the sample. β—¦ That is, the p-value is the strength of the evidence

against the null hypothesis.

For a level 0.05 test, we reject the null hypothesis if the p-value is 0.05 or less.

Page 12: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Problem: we might fail to reject the null hypothesis when the alternative is true. β—¦ That is, we might commit Type II error.

Solution: Select a large enough sample so that there is an 80% chance of rejecting the null hypothesis if the alternative is true. β—¦ Then the power to detect the alternative is 80%.

Page 13: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Specify null and alternative hypotheses, type I error rate, and power.

Define the population under study. Gather information relevant to parameters. If measuring time to failure, model recruitment

process and choose length of follow-up period. Calculate sample size over range of

parameters. Select sample size to use.

Epidemiol Rev, 2002; 24(1):39-53

Page 14: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Parameters include β—¦ Variability of the response β—¦ Level of the response variable in the control group β—¦ Difference anticipated or judged clinically relevant

May also need to consider β—¦ Loss to follow-up β—¦ Noncompliance

Sources of information β—¦ Pilot studies: external or internal β—¦ Literature: what others have found in similar studies

Page 15: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

When a response variable is normally distributed, the difference between the means of two independent samples is assessed with a 2-sample t-test. β—¦ The t-test is robust to departures from normality. β—¦ May need to transform the response variable (e.g.,

log transform) to obtain approximate normality. The sample size for a z-test usually can be

used to estimate the sample size for a t-test. β—¦ A z-test assumes that the sample standard

deviation is known.

Page 16: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

𝑧𝑧 =οΏ½Μ…οΏ½π‘₯ βˆ’ π‘¦π‘¦οΏ½πœŽπœŽ 2/𝑛𝑛

οΏ½Μ…οΏ½π‘₯ = intervention group mean 𝑦𝑦� = control group mean 𝜎𝜎2 = common variance in each group 𝑛𝑛 = sample size in each group

Epidemiol Rev, 2002; 24(1):39-53 (eq. 1)

Page 17: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

z

Page 18: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

𝑛𝑛 = 2𝜎𝜎2 𝑧𝑧1βˆ’π›Όπ›Ό/2 + 𝑧𝑧1βˆ’π›½π›½ /βˆ†π΄π΄

2

𝜎𝜎2 = common variance in each group 𝑧𝑧1βˆ’π›Όπ›Ό/2 = critical value for 2-sided level 𝛼𝛼 test 𝑧𝑧1βˆ’π›½π›½ = value of a standard normal variable with

cumulative probability equal to 1 βˆ’ 𝛽𝛽 (power) βˆ†π΄π΄ = difference corresponding to alternative

hypothesis

Epidemiol Rev, 2002; 24(1):39-53 (eq. 2)

Page 19: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Randomized, age-matched Healthy post-menopausal Chinese women

within 10 years of menopause onset Exclusion criteria β—¦ Regular participation in exercise β—¦ Hormone replacement therapy or drug treatment

affecting bone density β—¦ Hypo- or hyper-parathyroidism, hypo- or hyper-

thyroidism, renal or liver disease β—¦ History of fractures β—¦ BMI over 30

Arch Phys Med Rehabil 2004; 85:717-22

Page 20: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Intervention: Supervised TCC exercise (Yang style) 50 minutes a day, 5 times a week, for 12 months

Control: Retained sedentary lifestyle Primary outcome: Change in bone mineral

density over 12 months β—¦ Areal BMD at lumbar spine and proximal femur

measured by dual x-ray absorptiometry (DXA) β—¦ Volumetric BMD in distal tibia measured by

multislice peripheral quantitative computed tomography (pQCT)

Page 21: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Null hypothesis β—¦ Rate of bone mineral loss is the same in both study

arms. Alternative hypothesis β—¦ Rate of bone mineral loss is different (i.e., lower) in

the intervention (TCC) group.

Page 22: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Level of the test: 0.05 (2-sided) Power: 80% Mean bone loss in control group: 2.8% β—¦ Average annual trabecular bone loss in previous study in

same population Mean bone loss in intervention group: 1.4% β—¦ 50% reduction

Standard deviation in each group β—¦ Based on previous study, ~same as mean 3.0% in control group, 1.5% in intervention group (say)

β—¦ Compute pooled SD=2.37% Dropout: 25% in one year

Page 23: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

𝜎𝜎2 = common variance in each group = 2.372=5.62 𝑧𝑧1βˆ’π›Όπ›Ό/2 = critical value for 2-sided level 𝛼𝛼 test = 1.96 𝑧𝑧1βˆ’π›½π›½ = value of a standard normal variable with

cumulative probability equal to 1 βˆ’ 𝛽𝛽 (power) = 0.842 βˆ†π΄π΄ = difference corresponding to alternative

hypothesis = 1.4 𝑛𝑛 = 2𝜎𝜎2 𝑧𝑧1βˆ’π›Όπ›Ό/2 + 𝑧𝑧1βˆ’π›½π›½ /βˆ†π΄π΄

2 = 2(5.62) 1.96 + 0.842 /1.4 2 =45 per group =0.75 (60 per group), accounting for dropouts Actual enrollment n=132 total

Page 24: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

https://stattools.crab.org/

Page 25: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

When a response variable is categorical, a chi-square test of independence is often used to compare two groups.

When there are only 2 categories, this is the same as testing for a difference in proportions.

Need to specify the response proportion in the control group and β—¦ The response proportion in the intervention group,

or β—¦ The odds ratio

Page 26: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

𝑛𝑛 =𝑧𝑧1βˆ’π›Όπ›Ό/2 2πœ‹πœ‹οΏ½ 1 βˆ’ πœ‹πœ‹οΏ½ + 𝑧𝑧1βˆ’π›½π›½ πœ‹πœ‹π‘π‘ 1 βˆ’ πœ‹πœ‹π‘π‘ + πœ‹πœ‹π‘‘π‘‘ 1 βˆ’ πœ‹πœ‹π‘‘π‘‘

2

πœ‹πœ‹π‘π‘ βˆ’ πœ‹πœ‹π‘‘π‘‘ 2

𝑛𝑛′ = 𝑛𝑛4 1 + 1 + 4

𝑛𝑛 πœ‹πœ‹π‘π‘ βˆ’ πœ‹πœ‹π‘‘π‘‘

2

πœ‹πœ‹π‘π‘ = probability of event in control group πœ‹πœ‹π‘‘π‘‘ = probability of event in intervention group πœ‹πœ‹οΏ½ = average probability of event 𝑛𝑛′= number needed in each group

Epidemiol Rev, 2002; 24(1):39-53 (eq. 7B, 7C)

Page 27: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Study aim: test an outreach and counseling intervention to reduce cervical cancer incidence & mortality in low income women

Setting: Highland General Hospital (HGH) Time frame: 3 years Outcome measure: proportion of women who

received initial follow-up at Highland within 6 months of an abnormal Pap test

Prev Med 2005; 41: 741-8

Page 28: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Null hypothesis β—¦ Rate of follow-up of abnormal Pap tests is the same

in both study arms. Alternative hypothesis β—¦ Rate of follow-up of abnormal Pap tests is different

(i.e., greater) in the intervention group.

Page 29: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Assume 60% follow-up in control group based on previous research

Assume 75% follow-up in intervention group, a clinically important difference achieved in similar interventions

To detect this difference at the 0.05 level (2-sided) with 80% power: n=165 per arm

No loss to follow-upβ€”outcome ascertained through medical records

Page 30: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

𝑛𝑛 =1.96 2(0.675) 0.325 +0.842 0.6 0.4 +0.75 0.25

2

0.60βˆ’0.75 2 =152

𝑛𝑛′ = 1524

1 + 1 + 4152 0.60βˆ’0.75

2

= 165

πœ‹πœ‹π‘π‘ = probability of event in control group = 0.60 πœ‹πœ‹π‘‘π‘‘ = probability of event in intervention group = 0.75 πœ‹πœ‹οΏ½ = average probability of event = 0.675 𝑛𝑛′= number needed in each group = 165

Page 31: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

https://stattools.crab.org/

Page 32: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

The log rank test is often used to compare two survival curves.

Most sample size calculations assume an exponential survival distribution.

𝑆𝑆 𝑑𝑑 = π‘’π‘’βˆ’Ξ»π‘‘π‘‘, where 𝑑𝑑 = time, 𝑆𝑆 𝑑𝑑 = probability of survival to time 𝑑𝑑, and Ξ» = hazard rate = risk of an event per time

unit

Page 33: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Hazard rate: number of events per 100 person years

Median survival time=𝑙𝑙𝑙𝑙𝑙𝑙𝑒𝑒(2)/(hazard rate) Hazard rate=𝑙𝑙𝑙𝑙𝑙𝑙𝑒𝑒 (2)/(median survival time) Hazard rate=-𝑙𝑙𝑙𝑙𝑙𝑙𝑒𝑒 (𝑆𝑆 𝑑𝑑 )/t, where 𝑆𝑆 𝑑𝑑 =probability of surviving to time t =expected proportion without an event by t

Page 34: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

𝑛𝑛 =(𝑧𝑧1βˆ’Ξ±/2 + 𝑧𝑧1βˆ’Ξ²)2[Ο• λ𝐢𝐢 + Ο• λ𝐼𝐼 ]

(λ𝐼𝐼 βˆ’ λ𝐢𝐢)2

where Ο•(Ξ») = Ξ»2

1βˆ’[π‘’π‘’βˆ’πœ†πœ† π‘‡π‘‡βˆ’π‘‡π‘‡0 βˆ’π‘’π‘’βˆ’Ξ»π‘‡π‘‡] λ𝑇𝑇0οΏ½

𝑛𝑛 =number per group λ𝐼𝐼=hazard rate in intervention group λ𝐢𝐢=hazard rate in control group 𝑇𝑇 =total time of trial (first entry to end of study) 𝑇𝑇0=recruitment time (first entry to last entry)

Page 35: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

𝐷𝐷 =(𝑧𝑧1βˆ’Ξ±/2 + 𝑧𝑧1βˆ’Ξ²)2

𝑝𝑝(1 βˆ’ 𝑝𝑝)(ln (πœ†πœ†πΆπΆ/λ𝐼𝐼))2

where 𝐷𝐷 =number of events required to detect the hazard ratio with power 1-Ξ² at level Ξ± (2-sided) λ𝐼𝐼=hazard rate in intervention group λ𝐢𝐢=hazard rate in control group 𝑝𝑝 =proportion of participants in the control group

Page 36: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Primary research goal: Determine whether performing surgery of the primary tumor followed by systemic therapy improves survival in a certain patient population, compared with systemic therapy only.

Patient population: Patients with synchronous unresectable metastases of colorectal cancer and few or absent symptoms

Primary outcome: Overall survival Study design: Multi-center randomized phase III trial.

BMC Cancer 2014; 14:741

Page 37: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Null hypothesis β—¦ Overall survival is not affected by surgery of the

primary tumor before systemic therapy in this patient population.

Alternative hypothesis β—¦ Surgery of the primary tumor improves overall

survival in this patient population.

Page 38: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Level of the test: 0.05 (2-sided) Power: 80% Median survival in control group: 13 months Median survival in intervention group: 19

months β—¦ Minimal difference to justify a surgical procedure

Recruitment period: 30 months Minimum follow-up: 8 months Total sample size: 360

Page 39: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

𝑛𝑛 =(𝑧𝑧1βˆ’Ξ±/2 + 𝑧𝑧1βˆ’Ξ²)2[Ο• λ𝐢𝐢 + Ο• λ𝐼𝐼 ]

(λ𝐼𝐼 βˆ’ λ𝐢𝐢)2

where Ο•(Ξ») = Ξ»2

1βˆ’[π‘’π‘’βˆ’πœ†πœ† π‘‡π‘‡βˆ’π‘‡π‘‡0 βˆ’π‘’π‘’βˆ’Ξ»π‘‡π‘‡] λ𝑇𝑇0οΏ½

Ξ±=0.05; 𝑧𝑧1βˆ’Ξ±/2 =1.96; Ξ²=0.20; 𝑧𝑧1βˆ’Ξ² =0.842 λ𝐼𝐼=hazard rate in intervention group = ln(2)/(19/12)=0.438 λ𝐢𝐢=hazard rate in control group = ln(2)/(13/12)=0.640 hazard ratio = 19/13=1.46 𝑇𝑇 =total time of trial (first entry to end of study) =38/12=3.167 𝑇𝑇0=recruitment time (first entry to last entry) = 2.5 𝝓𝝓 𝝀𝝀π‘ͺπ‘ͺ =0.607; 𝝓𝝓 𝝀𝝀𝑰𝑰 =0.351; πŸπŸπ’π’ =368; required # of events = 218

Page 40: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

https://stattools.crab.org/

Page 41: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

𝛼𝛼(level): larger β†’ smaller sample size 1-𝛽𝛽 (power): larger β†’ larger sample size Variance: larger β†’ larger sample size β—¦ Binary variable: πœ‹πœ‹ (probability of event) = 0.5 has

largest variance Difference to detect: larger β†’ smaller sample

size

Page 42: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Problem: Sometimes the sample size required is too large.

Solutions: β—¦ Be content to detect with less power (allow more

type II error). β—¦ Increase the level of the test (allow more type I

error). β—¦ Pick a more extreme alternative.

Page 43: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

% Response in Intervention Group

Level Power 60% 65%

5% 90% 538 239

5% 80% 407 182

10% 80% 325 146

Page 44: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

Parameters used to estimate sample size are estimates β—¦ Often based on small studies

Effectiveness of the intervention β—¦ May be based on a different population β—¦ May be overestimated

Inclusion and exclusion criteria may change Control group participants may do better

than expected Mathematical models for sample size

calculations are approximate

Page 45: Susan Stewart, Ph.D. UC Davis School of Medicine November ...

www.statpages.org www.swogstat.org/statoolsout.html https://stattools.crab.org/