Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson...

31
Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician http://research.LABioMed.org/ Biostat

Transcript of Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson...

Page 1: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Biostatistics in Practice

Session 4: Study Size for Precision or Power

Peter D. ChristensonBiostatistician

http://research.LABioMed.org/Biostat

Page 2: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Session 4 Issue

How many subjects?

Page 3: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Session 4 Preparation

We have been using a recent study on hyperactivity in children under diets with various amounts of food additives for the concepts in this course. The questions below based on this paper are intended to prepare you for session 4, which is on determining the size of a study.

1. How many children were deemed necessary to complete the entire study? Use the second column on the 4th page of the paper.

Page 4: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Session 4 Preparation #1

Page 5: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Session 4 Preparation #2

2. The authors accounted for some children to start, but not complete the study. What percentage of "dropouts" did they build into their calculations?

The statistical requirements are for 80 “evaluable” subjects. They decided on a study size of 120, so they were allowing up to 40/120 = 33% of subjects to not complete.

Page 6: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Session 4 Preparation #3

3. The authors will perform a test similar to the t-test we discussed last week, to conclude whether there is evidence that hyperactivity differs under Mix A than placebo. There are two mistakes that they may make in this decision. What are they?

I. Conclude Mix A ≠ Placebo, but Mix A = Placebo

II. Conclude Mix A = Placebo, but Mix A ≠ Placebo

Page 7: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Session 4 Preparation #4 and #5

4. How large a difference between Mix A and placebo do they want to detect?

5. Does the value of 0.32 in the study size description (second column on the 4th page) refer to a difference? They seem to imply it is a SD. Based on what we have said about tests comparing "signal" to "noise", do you think both a difference and SD are relevant for determining the study size?

Page 8: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Session 4 Preparation: #4 and #5

Page 9: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Session 4 Preparation #4 and #5

They want to detect a difference Δ of 0.32 in GHA.[ Smallest clinically relevant Δ? ]

Both the Δ and SD need to be accounted for.

Effect size = Δ / SD = “# of SDs”.

Remember, reference range = 4 to 6 SDs.

For this study (unusual) GHA is scaled to have a SD of 1, so Δ = effect size =0.32.

Page 10: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Session 4 Goals Review estimating and testing

Δ, SD and N in estimating and testing

False positive and false negative conclusions from tests

What is needed to determine study size

Software for study size

Page 11: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Review Estimation

Typically:

1. Have sample of N representing “all”.

2. Find mean and SD from the N units.

3. Expect new unit to be within mean ± 2SD.

4. Confident (95%) that mean of all is in

mean ± 2SD/√N.

May have this info for one or multiple groups.

Page 12: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Study Size to Achieve Precision

Precision refers to how well a measure is estimated.

Margin of error = the ± value (half-width) of the 95% confidence interval.

Lower margin of error ↔ greater precision.

To achieve a specified margin of error, say d, solve the CI formula for N:

For a mean, d = 2SD/√N, so N=(2SD/d)2.

For a proportion p, d = 2[p(1-p)/N]1/2 ≤ 1/√N.

Most polls use N ≈ 1000, so margin of error on % ≈ 3%

Page 13: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Review Statistical Tests

1. Calculate a standardized quantity for the particular test, a “test statistic”:

• Often: t = (Mean – Expected) / SE(Mean)

If 1 group, Mean may be a change score.

If 2 groups, Mean may be the difference between means for two groups.

Expected = 0 if no effect.

Looking for evidence to contradict “no effect”.

Page 14: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Review Statistical Tests

2. Compare the test statistic to the range of values it should be if expectations are correct.

Often: The range has approx’ly normal bell curve.

3. Declare “effect” if test statistic is too extreme, relative to this range.

Often: |test statistic| >~2 → Declare effect.

Page 15: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

t-Test

Expect

95% Chance

Declare effect if test statistic is “too extreme”.

How extreme?

Convention:

“Too extreme” means < 5% chance of wrongly declaring an effect.

2.5%2.5%

Effect No Effect Effect

Declare:

t =

(mean – expected)SD/

√N

Page 16: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

t-Test

Expect

95% Chance

Declare effect if test statistic is “too extreme”.

Convention:

“Too extreme” means < 5% chance of wrongly declaring an effect.

But, what are the chances of wrongly declaring no effect?

2.5%2.5%

Effect No Effect Effect

Declare:

Page 17: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

t-Test

Expect

95% Chance

Declare effect if test statistic is “too extreme”.

But, what are the chances of wrongly declaring no effect?

To answer, we need a similar curve for the range of values expected when there is an effect.

2.5%2.5%

Effect No Effect Effect

Declare:

Page 18: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Two Possible Errors from t-test

No Effect

Real Effect

No real effect (0)

Real effect = 3

Effect in study=1.13

\\\ = Probability: Conclude Effect, But no Real Effect (5%).

/// = Probability: Conclude No Effect, But Real Effect (41%).

41%

5%

Δ = Effect (Difference Between Group Means)

Red Blue

Green

Just Δ, not t = Δ/SE(Δ) Conclude effect.

Consider just one possible real effect, the value 3.

Page 19: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Graphical Representation of t-test

No Effect

Real Effect

No real effect (0)

Real effect = 3

Effect in study=1.13

41%

5%

Δ = Effect (Difference Between Group Means)

Red Blue

Green

Just Δ, not t = Δ/SE(Δ) Conclude effect.

Suppose we need stronger proof; i.e., shift cutoff to right.

Then, chance of false positive is reduced to ~1%, but false negative is increased to ~60%.

Page 20: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Power of a Study

Statistical power is the sensitivity of a study to detect real effects, if they exist.

It is 100-41=59% two slides back.

Page 21: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Truth:No Disease Disease

No Disease

Disease

Diagnosis:

Correct

CorrectError

Error

Want high for a screening test

Need high in follow-up test

Specificity

Sensitivity

Two Possible Errors in a Diagnostic Test

Specificity ↓ as Sensitivity↑

Page 22: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Truth:

No Effect Effect

No Effect

Effect

Study Claims:

Correct

CorrectError (Type I)

Error (Type II)

Power: Maximize.

Choose N for 80%

Set α=0.05

Specificity=95%

Specificity

Sensitivity

Analogy with Diagnostic Testing

← Typical →

Page 23: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Summary: Factors Related to Study Size

Five factors are inter-related. Fixing four of these specifies the fifth:

1. Study size, N.

2. Power (often 80% is desirable).

3. p-value cutoff (level of significance, e.g., 0.05).

4. Magnitude of the effect to be detected (Δ).

5. Heterogeneity among subjects (SD).

The next slide shows how these factors (except SD) are typically presented in a study protocol.

Page 24: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Quote from Local Protocol ExampleThe following table presents detectable differences, with p=0.05 and 80% power, for different study sizes.

Total Number

of Subjects

Detectable Difference in Change in Mean MAP (mm Hg)(1)

Detectable Difference in Change in

Mean Number of

Vasopressors(2)

20 10.9 0.77 40 7.4 0.49 60 6.0 0.39 80 5.2 0.34

100 4.6 0.30 120 4.2 0.27

Thus, with a total of the planned 80 subjects, we are 80% sure to detect (p<0.05) group differences if treatments actually differ by at least 5.2 mm Hg in MAP change, or by a mean 0.34 change in number of vasopressors.

Page 25: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Comments on the Previous Table

• Typically power=80% and almost always p<0.05.

• SD was not mentioned. There may be several estimates from other studies (different populations, intervention characteristics such as dosage, time, etc). Here, a pilot study exactly like the trial was performed by the same investigators.

• Detectable difference refers to the unknown true difference for “all”, not the difference that will be seen eventually in the N study subjects.

• N ↑ as detectable difference ↓.

• So, the major consideration is usually a tradeoff between N and the detectable difference.

Page 26: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Free Study Size Softwarewww.stat.uiowa.edu/~rlenth/Power

Page 27: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Local Protocol Example: CalculationsPilot data: SD=8.16 for ΔMAP in 36 subjects.

For p-value<0.05, power=80%, N=40/group, the detectable Δ of 5.2 in the previous table is found as:

Page 28: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Hyperactivity Study Size

Study is 1-sample or paired (for each age

group).

SD=1 Δ=0.32

Use p-value<0.05. Want power=80%.

Solve for N in software to get N=79.

Page 29: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Study Size for Some Other Study Types

1. Phase I: Dose escalation. Safety, not efficacy. No power. Use N=3 low dose; if safe N=3 in higher dose, etc.

2. Phase II: Small, primarily safety; look for enough evidence of efficacy to go on to Phase III. Often staged: e.g., if 3/10 respond, test 10 more, etc.

3. Mortality studies: Patterns of deaths over time can be used in sample size calculations. Software not in the online package.

Page 30: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Approximate Formulas for Study Size

1. Two-sample t-test:

Total N ~ 4 x 7.85 x (SD/Δ)2

MAP Example: 4 x 7.85 x (8.16/5.2)2 = 77 ~ 80

2. Paired t-test:

N ~ 7.85 x (SD/Δ)2Hyperactivity Example:

7.85 x (1/0.32)2 = 77 ~ 80

Page 31: Biostatistics in Practice Session 4: Study Size for Precision or Power Peter D. Christenson Biostatistician .

Summary: Study Size and Power

1. Power analysis assures that effects of a specified magnitude can be detected.

2. Five factors including power are inter-related. Fixing four of these specifies the fifth.

3. For comparing means, need pilot or data from other studies to estimate SD for the outcome measure. Comparing %s does not require SD.

4. Helps support the believability of studies if the conclusions turn out to be negative.