PowerPoint Presentation - Flinders University - pow… · Reporting power and sample size CONSORT...

11
14/06/2012 1 Sample size and power analysis You should have at least 80% power of your study (Anonymous) Sample size conversation Source: http://www.xtranormal.com/watch/6871831/biostatistics-vs-lab-research Two scenarios Another researcher conducted a similar study comparing the effect of the same intervention vs. the same placebo on reducing body weight, and found the same 5 lbs reduction with the intervention group but could not claim that the intervention was effective because P=0.35. What do you think the crying researcher did differently from the smiling one? A researcher conducted a study comparing the effect of an intervention vs. placebo on reducing body weight, and found 5 lbs reduction among the intervention group with P=0.01. Question (1) to Statistician Question: How can I make my P-value smaller? Enroll as many as you can. Answer (1) to Researcher You almost always need to estimate a required sample size or estimate analytical power given a sample size when you are planning a study. Only exception may be a pilot study (a smaller study to show feasibility, or to collect data to plan a larger study). Through this process, you can avoid wasting your efforts and resources conducting studies that are hopeless to begin with.

Transcript of PowerPoint Presentation - Flinders University - pow… · Reporting power and sample size CONSORT...

Page 1: PowerPoint Presentation - Flinders University - pow… · Reporting power and sample size CONSORT 22-point checklist PAPER SECTION Item Description TITLE ...

14/06/2012

1

Sample size and power analysis

You should have at least 80% power of your study

(Anonymous)

Sample size conversation

Source: http://www.xtranormal.com/watch/6871831/biostatistics-vs-lab-research

Two scenarios

Another researcher conducted a similar study

comparing the effect of the same intervention vs. the

same placebo on reducing body weight, and found

the same 5 lbs reduction with the intervention group

but could not claim that the intervention was

effective because P=0.35.

What do you think the crying researcher did

differently from the smiling one?

A researcher conducted a study comparing the effect

of an intervention vs. placebo on reducing body

weight, and found 5 lbs reduction among the

intervention group with P=0.01.

Question (1) to Statistician

Question: How can I make my P-value smaller?

Enroll as

many as

you can.

Answer (1) to Researcher

You almost always need to estimate a required sample

size or estimate analytical power given a sample size

when you are planning a study. Only exception may

be a pilot study (a smaller study to show feasibility, or

to collect data to plan a larger study).

Through this process, you can avoid wasting your

efforts and resources conducting studies that are

hopeless to begin with.

Page 2: PowerPoint Presentation - Flinders University - pow… · Reporting power and sample size CONSORT 22-point checklist PAPER SECTION Item Description TITLE ...

14/06/2012

2

Question (2) to Statistician

Question: Can I keep enrolling participants

into my study until I observe P<0.05?

Answer (2) Researcher: Absolutely

NOT

Question (3) to Statistician

If only I had a cent

for every time I

was asked

How many

participants

do I need for

my study?

Answer (3) to Researcher

The purpose of sample size formulae ‘is not to give an exact number…but rather to subject the study design to scrutiny, including an assessment of the validity and reliability of data collection, and to give an estimate to distinguish whether tens, hundreds, or thousands of participants are required’

Williamson et al. (2000) JRSSA 163(1): 5-13

Answer (3) to Researcher

It does not seem

an easy question,

like

How much money

should I take on my

holidays?

Statistics lecture

There is no such thing as a sample size problem. Sample size is but one

aspect of study design. When you are asked to help determine the sample

size a lot of questions must be asked and answered before you get to that

one…….You may often end up never discussing sample size because there

are other matters that override it in importance.

Russell Lenth (2001)

---------- sample size is dependent upon not only

on the desired power

but also the true variability in the population and

a specification of a practically significant effect

size

Question (3) to Statistician

Question: How to play with these terms?

Page 3: PowerPoint Presentation - Flinders University - pow… · Reporting power and sample size CONSORT 22-point checklist PAPER SECTION Item Description TITLE ...

14/06/2012

3

Session

Start 1 hour session

Ingradiants

Type I error: The probability of erroneously rejecting the

H0. (Conclude that there is an effect, when in fact there is

no effect.

1-

Δ

Type II error: The probability of erroneously failing to

reject the H0. (Conclude that there is no effect, when in

fact there is an effect.)

Power: The chance of correctly identify H1 (Conclude

that there is an effect, when in fact there is an effect)

Effect: Significant difference of body weight between

intervention and placebo groups

, Significance level

Significance level () • First type of error : Conclude that there is an effect,

when in fact there is no effect.

The level of your test is the probability that you will falsely conclude that the program has an effect, when in fact it does not.

So with a level of 5%, you can be 95% confident in the validity of your conclusion that the program had an effect

For policy purpose, you want to be very confident in the answer you

give: the level will be set fairly low.

Common level of : 5%, 10%, 1%.

1-, Power

Purpose of power analysis

Power analyses need to be conducted to ensure

adequate sample size to detect a meaningful

effect of your intervention

Page 4: PowerPoint Presentation - Flinders University - pow… · Reporting power and sample size CONSORT 22-point checklist PAPER SECTION Item Description TITLE ...

14/06/2012

4

Interpretation

A power of 80% tells us that, in 80% of the experiments of this sample size conducted in this population, if there is indeed an effect in the population, we will be able to say in our sample that there is an effect with the level of confidence desired. The larger the sample, the larger the power. Common Power used: 80%, 90%

Visual concept of power

Null

Distribution:

difference=0.

Clinically relevant

alternative:

difference=10%.

Rejection region. Any value

>= 6.5 (0+3.3*1.96)

For 5% significance level, one-tail area=2.5%

(Z/2 = 1.96)

Power= chance of being in the

rejection region if the

alternative is true=area to the

right of this line (in yellow)

Visual concept of power

Rejection region. Any

value >= 6.5

(0+3.3*1.96)

Power= chance of being in the

rejection region if the

alternative is true=area to the

right of this line (in yellow)

Power here:

%85=)06.1>Z(P

=)3.3

105.6>Z(P

Is power analysis always needed?

Needed when:

• Designing a study

• Applying for grant

Less needed when:

• Secondary data analysis

• Pilot study to assess effect

A priori power anlysis

You want to find how many cases you will need to have a specified amount of power given a specified effect size the criterion of significance to be employed

A posteriori power analysis

You want to find out what power would be for a specified effect size sample size the criterion of significance to be employed

Page 5: PowerPoint Presentation - Flinders University - pow… · Reporting power and sample size CONSORT 22-point checklist PAPER SECTION Item Description TITLE ...

14/06/2012

5

, Effect sizes

Effect size

• A descriptive metric that characterizes the

standardized difference (in SD units) between the

mean of a control group and the mean of a

treatment group (intervention)

• Can also be calculated from correlational data

derived from pre-experimental designs or from

repeated measures designs

Sources of finding effect size

On the basis of previous research Meta-Analysis: Reviewing the previous literature and

calculating the previously observed effect size (in the same

and/or similar situations)

Pilot study When no prior studies exist for which one can extrapolate an

ES, it is often appropriate to conduct a small study with 10-20

participants in order to get an initial estimate of the effect size

On the basis of theoretical importance Deciding whether a small, medium or large effect is required.

Smallest size that would be clinically meaningful.

Unstandardized effect size

GB =

38.07 s USA =

38.08 s

D = 0.01 s

Standardized effect size

• The standard deviation captures the variability in the outcome. The more variability, the higher the standard deviation is

• The Standardized effect size is the effect size divided by the standard deviation of the outcome

= effect size/Standard deviation

Zero effect size

= 0.00

Control Group

Intervention Group

Overlapping

Distributions

= 0.00 means that the average treatment participant

outperformed 0% of the control participants

Page 6: PowerPoint Presentation - Flinders University - pow… · Reporting power and sample size CONSORT 22-point checklist PAPER SECTION Item Description TITLE ...

14/06/2012

6

Moderate effect size

Control Group

Treatment Group

= 0.40

= 0.40 means that the average treatment participant

outperformed 65% of the control participants

Large effect size

Control Group

Intervention Condition

= 0.85

= 0.85 means that the average treatment participant

outperformed 80% of the control participants

Attrition rate

Study design

Measurement of outcome

Attrition rate

If study is longitudinal or intervention study need to adjust

sample size by attrition rate Get attrition estimates from pilot studies or the literature of

studies in the same population

Default estimate would be 20%

Do power calculation and then adjust sample size

Final N=(N from Power estimate)/(1-attrition rate)

Example: 20% attrition rate

Power analysis yields total sample size of 100

Targeted N=???

Study design

Different designs have different power distributions and

considerations

Regression type design different than 2 x 2 ANOVA

Longitudinal vs. cross-sectional designs

Some designs harder to find power programs than others

- Longitudinal

- Nested/clustered designs

- Dichotomous and categorical outcomes

Keep in mind aim of study and not just design

Measurement of outcome

Level of measurement of outcome can have some influence on

power estimates

Differences in means

- Ex: Intervention study looking at differences in depression using CESD

Differences in proportions

- Ex: Intervention study looking at differences in depression dx

Power done for primary outcome

If several important outcomes, conduct power for all and select

sample size so that power is at least .80 for all outcomes

Page 7: PowerPoint Presentation - Flinders University - pow… · Reporting power and sample size CONSORT 22-point checklist PAPER SECTION Item Description TITLE ...

14/06/2012

7

Inter-relationship

Factors needed for sample sizes

Power

Size of the effect

- Study design

- Measurement of outcome

Significance level desired

Attrition

Inter-relatioship

n Sample size

Significance

level

Δ

1- Power

Effect size

Standard case

P(T)

alpha 0.05

Sampling distribution

if H0 is true

POWER = 1 -

Effect Size

Sampling distribution

if HA is true

Increased

P(T)

T

alpha 0.1

Sampling distribution

if H0 is true

POWER = 1 -

Sampling distribution

if HA is true

Decreased

P(T)

T

alpha 0.01

Sampling distribution

if H0 is true

POWER = 1 -

Sampling distribution

if HA is true

Page 8: PowerPoint Presentation - Flinders University - pow… · Reporting power and sample size CONSORT 22-point checklist PAPER SECTION Item Description TITLE ...

14/06/2012

8

Increased n

P(T)

T

alpha 0.05

Sampling distribution

if H0 is true

POWER = 1 -

Sampling distribution

if HA is true

Increased

P(T)

T

alpha 0.05

Sampling distribution

if H0 is true

POWER = 1 -

Effect Size ↑

Sampling distribution

if HA is true

Key points

The power of a statistical test is influenced by the

• Sample size (n) ↑ n → power ↑

• Significance level (α) ↑ α → power ↑

• Difference (effect) to be detected (Δ) ↑ Δ → power ↑

• Variation in the outcome (σ2) ↓ σ2 → power ↑

Key points

What we need Where we get it

Significance level This is often conventionally set at 5%.

The lower it is, the larger the sample size

needed for a given power

The mean and the

variability of the outcome

in the comparison group

From previous surveys conducted in

similar settings. The larger the variability

is, the larger the sample for a given power

The effect size that we

want to detect

What is the smallest effect that should

prompt a policy response? The smaller

the effect size, the larger a sample size we

need for a given power

Reporting power and sample size

CONSORT 22-point checklist PAPER SECTION Item Description

TITLE & ABSTRACT 1 How participants were allocated to interventions (e.g., "random allocation", "randomized", or "randomly assigned").

INTRODUCTION

Background 2 Scientific background and explanation of rationale.

METHODS

Participants 3 Eligibility criteria for participants and the settings and locations where the data were collected.

Interventions 4 Precise details of the interventions intended for each group and how and when they were actually administered.

Objectives 5 Specific objectives and hypotheses.

Outcomes 6 Clearly defined primary and secondary outcome measures and, when applicable, any methods used to enhance the quality of

measurements (e.g., multiple observations, training of assessors).

Sample size 7 How sample size was determined and, when applicable, explanation of any interim analyses and stopping rules.

Randomization --

Sequence generation

8 Method used to generate the random allocation sequence, including details of any restriction (e.g., blocking, stratification).

Randomization --

Allocation concealment

9 Method used to implement the random allocation sequence (e.g., numbered containers or central telephone), clarifying whether the

sequence was concealed until interventions were assigned.

Randomization --

Implementation

10 Who generated the allocation sequence, who enrolled participants, and who assigned participants to their groups.

Blinding (masking) 11 Whether or not participants, those administering the interventions, and those assessing the outcomes were blinded to group

assignment. When relevant, how the success of blinding was evaluated.

Statistical methods 12 Statistical methods used to compare groups for primary outcome(s); Methods for additional analyses, such as subgroup analyses and

adjusted analyses.

RESULTS

Participant flow 13 Flow of participants through each stage (a diagram is strongly recommended). Specifically, for each group report the numbers of

participants randomly assigned, receiving intended treatment, completing the study protocol, and analyzed for the primary outcome. Describe protocol deviations from study as planned, together with reasons.

Recruitment 14 Dates defining the periods of recruitment and follow-up.

Baseline data 15 Baseline demographic and clinical characteristics of each group.

Numbers analyzed 16 Number of participants (denominator) in each group included in each analysis and whether the analysis was by "intention -to-treat".

State the results in absolute numbers when feasible (e.g., 10/20, not 50%).

Outcomes and

estimation

17 For each primary and secondary outcome, a summary of results for each group, and the estimated effect size and its precision (e.g.,

95% confidence interval).

Ancillary analyses 18 Address multiplicity by reporting any other analyses performed, including subgroup analyses and adjusted analyses, indicating those

pre-specified and those exploratory.

Adverse events 19 All important adverse events or side effects in each intervention group.

DISCUSSION

Interpretation 20 Interpretation of the results, taking into account study hypotheses, sources of potential bias or imprecision and the dangers associated

with multiplicity of analyses and outcomes.

Generalizability 21 Generalizability (external validity) of the trial findings.

Overall evidence 22 General interpretation of the results in the context of current evidence.

Scientific rationale

Patient population

Sample size

Study designs & methods

Patient flow

Statistical analysis & results

Interpretation

Page 9: PowerPoint Presentation - Flinders University - pow… · Reporting power and sample size CONSORT 22-point checklist PAPER SECTION Item Description TITLE ...

14/06/2012

9

Reporting power

Reality and scientific validity

Reality vs. scientific validity

n

Reality: Resources Scientific validity:

Sample size formulae

Resources

• Number of available participants

• Laboratory resources

–Diagnostic tests, training program etc. – if

needed

• Time you have available

–Set by funding agency

–Set by your career trajectory

• Funds and personnel

Example and software

Example

A Clinician wants to conduct RCT to assess the effect of an

intervention to reduce HbA1c level among patients with type 2

diabetes. A pilot data suggests that mean HbA1c level among

patients without this intervention is 8.7% with standard deviation

of 2.2%. We believe that the intervention will decrease patient’s

HbA1c level by 1%. A total of 154 patients (77 patients in each

group) are needed to achieve 80% power at two-sided 5%

significance level.

Estimation of sample size comparing 2 group means

(independent sample t-test): Comparing post trial values

Page 10: PowerPoint Presentation - Flinders University - pow… · Reporting power and sample size CONSORT 22-point checklist PAPER SECTION Item Description TITLE ...

14/06/2012

10

Example

• Select the appropriate statistical test, based on the types of

outcome measures.

• Determine the minimum effect size.

• For continuous outcomes, estimate the standard deviation. For

dichotomous outcomes, estimate the baseline risk or incidence/

prevalence of the event.

• Set limits for Type I (α) and Type II (β) error.

• Specify your null hypothesis and alternative hypothesis (1-

tailed or 2-tailed).

Parameters needed for sample size computation

, Significance level = 5% (2 sided)

1-, Power = 80%

, Effect size = 1

, Variability = 2.2

m, sample size ratio between the two groups = 1

Software

G*Power: http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/

PS: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize

Russ Lenth: http://www.stat.uiowa.edu/~rlenth/Power/

Epi Info: http://www.cdc.gov/Epiinfo/

WinPepi: http://www.brixtonhealth.com/pepi4windows.html

PASS: http://www.ncss.com/pass.html

G*Power 3

Determine the effect size

• Click on Determine

• Select n1=n2 for equal sample size

• Calculate and transfer to main window

Determine sample size

154 patients

77 in each group

Page 11: PowerPoint Presentation - Flinders University - pow… · Reporting power and sample size CONSORT 22-point checklist PAPER SECTION Item Description TITLE ...

14/06/2012

11

Determine power

Achieved 80%

power for 154

patients

Sample sizes vs. power

Conclusion

Crying researcher understood 80% (CI 70%-90%)

what he needs for

Smaller p-value

Better understanding

of study design Good knowledge of

outcome measure

Good statistical

approach Greater power to detect a

true difference!

Optimum

sample size ?

If in doubt…

Call Biostatisticians!!!!

FCEB (Flinders Centre for Epidemiology and Biostatistics) Discipline of General Practice

Level 3 Health Sciences Building

Flinders Medical Centre

Don’t miss…

FCEB Launch!!!!

3:00 PM today

Rooms 3.06-3.09, Health Science Lecture Theatre Complex