Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of...

47
Assoc Prof Dr Sarimah binti Abdullah Unit of Biostatistics & Research Methodology Universiti Sains Malaysia Sample Size Determination

Transcript of Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of...

Page 1: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Assoc Prof Dr Sarimah binti Abdullah

Unit of Biostatistics & Research Methodology

Universiti Sains Malaysia

Sample Size Determination

Page 2: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Why do we need to calculate

the sample size?

Page 3: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Sample Size Calculation

A) for Estimation

i. a mean

ii. a proportion

B) for Hypothesis Testing

i. compare means

ii. compare proportions

iii. correlation

C) Diagnostic test

D) Validation

SINGLE MEAN FORMULA

SINGLE PROPORTION FORMULA

Page 4: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Sample Size Calculation

for Estimation

1. to estimate the duration of exercise among

adult in Kg X.

2. to estimate the prevalence of obesity

in Kg X.

Page 5: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Estimating a mean

• A study is planned to estimate the duration

of exercise among adult in Kampong X.

• The result should be reported as "mean

duration of exercise (DEx.) and its 95% CI".

e.g. mean DEx. 16.5 mins/day (95% CI: 15.5, 17.5)

The value of a study can be judged by the width

of Confidence Interval.

Wide CI means .. a poor study.

Page 6: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

If we plan for 95% confidence (5% error), so Z = 1.96;And SD (σ ) is estimated as 4.3 (Duration of Exercise) (either by previous study or a pilot study; if previous study, state

the reference)

To estimate duration of exercise

Page 7: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

If we plan for 95% confidence (5% error), so Z = 1.96;And SD (σ ) is estimated as 4.3 (Duration of Exercise) (either by previous study or a pilot study; if previous study, state

the reference)

Impossible to check for

normality assumption

Now, it is the researcher decision to select which sample size

will be appropriate for the study.

Page 8: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

How to report? (in Methodology)

• Sample size was determined as follows.

• The following formula (Daniel, 1999) is used to

calculate the sample size.

Z = 1.96 for 95% confidence

σ = SD of DEx. = 4.3 (Brian, 2002??)

∆ = Precision = 1 min/day

• We need 72 people in order to estimate the mean DEx. with

the precision of 1 min/day.

• We decided to take 90 people (additional 20%) for

anticipated non-response cases.

Page 9: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Sample Size Calculation

for Estimation

1. to estimate the duration of exercise among

adult in Kg X.

2. to estimate the prevalence of obesity

in Kg X.

Page 10: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Estimating a Proportion

• A study is planned to estimate the prevalence of

obesity in Kampong X.

• The result should be reported as Prevalence

(Proportion) of obesity and its 95% CI".

In our example data,

we get 37% (95% CI: 27%, 47%).

Page 11: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

If we plan for 95% confidence (5% error), Z = 1.96,

and P is estimated as 40% (Prevalence of Obesity) (Literature or Pilot study).

Page 12: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

(1) Generally, smaller precision is better.

(2) However, commonly, researchers are limited with

the availability of resources .

(3) It may depend on previous studies:

- Previous studies have reported with a certain width of CIs

in their studies. Somehow, if we want to repeat the study,

we should come out with a better width of CI (added value).

Setting the level of confidence is conventional at 95% (Z = 1.96).

P or SD is estimated by the literature or pilot study.

The remaining question is "HOW TO DECIDE THE PRECISION?".

Page 13: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

If we plan for 95% confidence (5% error), Z = 1.96,

and P is estimated as 40% (Prevalence of Obesity) (Literature or Pilot study).

Now, it is the

researcher

decision to

select which

sample size will

be appropriate

for the study.

Page 14: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Relationship between P & Sample Size

0

100

200

300

400

500

600

700

800

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

P

Sa

mp

le S

ize

Page 15: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Sample Size Calculation

for Hypothesis Testing

Page 16: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Sample Size Calculation

- For Hypothesis Testing

3. to compare duration of exercise between male

and female

4. to compare prevalence of obesity between male

and female

Page 17: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Important Concepts

1. Type I (α error)

2. Type II (β error) / Power of the Study (1-β)

3. Detectable Difference (Detectable

Alternative)

Page 18: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Test

DecisionCorrectType I

Type IICorrect

FalseTrue

Null Hypothesis (Ho)

Reject Ho

Do not reject Ho

Power of the study

"Power to reject the false null

hypothesis"

0.05

0.2

80%

Important Concepts

Page 19: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Detectable Difference

1. What is Detectable Difference?

2. How to decide the Detectable

Difference?

Important Concepts

Page 20: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

What is Detectable Difference (Detectable Alternative)?

• The “minimum size of the difference between groups”

that the study could detect !!!

Important Concepts

Page 21: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

60.0 Kg 60.1 Kg≈

60.0 Kg 60.5 Kg<

What is Detectable Difference (Detectable Alternative)?

Important Concepts

Page 22: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

What is Detectable Difference (Detectable Alternative)?

• The “minimum size of the difference between groups”

that the study could detect !!!

• "The study could detect" means ...

➢ Let's say, you are comparing means of 2 groups, and in

reality, the 2 means are truly different.

➢ And also at the end of the study, you get the result as "two

group means are significantly different" (one is more than

the other).

➢ It means that "you detect the difference".

➢ Let's say, you get the result "the difference is not

significant" ... meaning that "you fail to detect it".

Important Concepts

Page 23: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

How to decide Detectable Difference?

It should reflect the “Clinically Significant

Difference” (CSD).

We should be able to detect the “CSD”.

In other words, we should design a study to

detect CSD.

Comparing means of two (2) POPULATIONS

Page 24: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

• Comparing 2 means

• Comparing 2 proportions

Using PS software ….

http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize

Page 25: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

• Comparing 2 means

Using PS software ….

Researchers want to compare duration of exercise

between male and female college students.

Objective: To compare mean duration of exercise

between male and female students

( )2

2

βα/2

2 ZZ2σn

Δ

+=

Alpha (0.05)

Power (start with 80%)σ = SD (within group SD of duration of

exercise)

∆ = Detectable Difference (Clinically important

difference)

Page 26: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

1

2

3

Detectable Difference

SD from other study or pilot study

Ratio between 2 groups (m=1 means 1:1)

4. Fill all 5 inputs5

Page 27: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Detectable Difference

SD from other study or pilot study

Ratio between 2 groups (m=1 means 1:1)

4. Fill all 5 inputs

With the

sample size

33 in each

group, we

will achieve

80% power

to detect the

difference of

3 mins/day

(DEx.) with

the Alpha at

0.05.

Page 28: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

With the sample size 33 in each group,

we will achieve 80% power

to detect the difference of 3 mins/day (DEx.)

with the Alpha at 0.05.

Example:

Say, in really, the difference is 5 mins/day between male

and female.

With this sample size, you have at least 80% chance to

get the ‘significant’ or ‘positive’ result. (You have at least

80% power to reject the Null).

Say, if the difference is 1 mins/day. So, this sample size

will fail to detect this difference. But it’s OK, we don’t

want to detect this small size. It is not clinically/

practically important.

Page 29: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

IN SUMMARY, for comparing 2 means

We need to decide ….

Alpha (0.05; consensus – 0.05)

Power (80%=0.8)

SD (variable of interest – from previous study or pilot study)

Detectable Difference (should reflect clinical/practical importance)

Ratio of sample size between 2 groups (m = 1 “1:1”; m=2 “2:1”)

--------------------------------------------------------------------------

How to report?We use PS software (Dupont & Plummer, 1997) to calculate the

sample size based on comparing 2 means.

To detect the difference of 3 mins/day (Duration of Exercise) with

80% power and alpha 0.05, we need 33 students in each study

group (SD was estimated as 4.3, reference?).

We have decided to take 40 male and 40 female students

(additional 20%) with the anticipation of some non-responses.

Page 30: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Exercise

• Researchers want to compare the SBP between treated and

untreated hypertension patients.

• A recent study revealed that the SD of SBP among hypertension

patients was 10 mmHg (state ref.??).

• The researchers feel that it is important to detect the difference

of 5 mmHg between 2 study groups.

• They plan to take equal sample size (1:1) for 2 study groups

(m=1).

• Set alpha at 0.05 as usual.

• Calculate the sample size to achieve the power of 80%.

64 in

each

group

Page 31: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

• Researchers want to compare the SBP between treated and

untreated hypertension patients.

• SD was 10 mmHg (state reference??).

• DD sets at 5 mmHg.

• They plan to take 1:2 ratio for untreated: treated (m=2) (because

difficult to find ‘untreated’).

• Set alpha at 0.05 as usual.

• Calculate the sample size to achieve the power of 80%.

48 untreated

and 96 treated

hypertensive

patients

Page 32: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

• Comparing 2 proportions

Using PS software ….

Researchers want to compare prevalence of obesity

between male and female students.

Objective: To compare the prevalence of obesity

between male and female students.

Remember … We have to set Alpha, Power and

Detectable Difference.

Alpha = 0.05

Power = 80% (0.8)

Detectable Difference = ???

Page 33: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

• Comparing 2 proportions

Using PS software ….

Researchers want to compare prevalence of obesity

between male and female students.

Alpha (0.05)

Power (80% = 0.8)

∆ = Detectable Difference (Clinically important difference) (P1-P0)

P0 = Prevalence of obesity among male (Get from literature)

P1 = Prevalence of obesity among female (Set based on

desired DD)

m = 1 (equal ratio between male and female)

Objective: To compare the prevalence of obesity

between male and female students.

Page 34: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

• Comparing 2 proportions

Using PS software ….

Researchers want to compare prevalence of obesity

between male and female students.

Alpha (0.05)

Power (80% = 0.8)

∆ = Detectable Difference (Clinically important difference) (P1-P0)

P0 = 0.27 (Say, we get from literature)

P1 = 0.37 (This is our decision based of DD. Here, we put

0.37. It means that we are setting the DD in this study as 0.10

or 10%)

m = 1 (equal ratio between male and female)

Objective: To compare the prevalence of obesity

between male and female students.

Page 35: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

1

2

(P1-P0) is Detectable Difference.

Ratio between 2 groups (m=1 means 1:1)

4. Fill all 5 inputs

5

P0 – from previous or pilot study

3

Page 36: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

1

2

(P1-P0) is Detectable Difference.

Ratio between 2 groups (m=1 means 1:1)

4. Fill all 5 inputs

5

P0 – from previous or pilot study

3

Page 37: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

IN SUMMARY, for comparing 2 proportions

We need to decide ….

Alpha (0.05; consensus – 0.05)

Power (80% = 0.8)

Po (Prevalence of obesity among male, 27% (reference?)

Detectable difference (should reflect clinical/practical importance – in this

example, 10% difference is decided. Therefore, P1 = 37%)

Ratio between 2 groups (m=1 “1:1”; m=2 “2:1”)

--------------------------------------------------------------------------

How to report?

We use PS software (Dupont & Plummer, 1997) to calculate the sample size

based on comparing 2 proportions.

To detect the difference of 10% in prevalence of obesity (P0 27% versus P1

37%) between the 2 study groups with 80% power and alpha 0.05, we need

340 male and 340 female students. (Po, the prevalence of obesity among

male was estimated as 27%, ref?) (You may add some e.g. 10%)

Page 38: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Exercise!• Calculate the sample size for the detectable

difference of prevalence 20%. It means P0 = 27% and

P1 = 47%.90 students in

each group.

• Calculate the sample size for the detectable

difference of prevalence 20% (as above).

• And male: female ratio as 2:1 (m=2).

66 female & 132

male students.

Page 39: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Comparing 3 means

Using G power….

Page 40: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Correlation…using online http://www.medic.usm.my/biostat/

Page 41: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

C) DIAGNOSTIC TESTS –eg :sensitivity and specificity

Page 42: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

http://www.medic.usm.my/biostat/C) DIAGNOSTIC TESTS –eg :kappa aggrement

Page 43: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

http://www.medic.usm.my/biostat/C) DIAGNOSTIC TESTS –eg :kappa aggrement

Page 44: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

http://www.medic.usm.my/biostat/D) Validation

–eg :Cronbach alpha

Page 45: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

http://www.medic.usm.my/biostat/D) Validation

–eg :intraclass correlation

Page 46: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

• For each specific objective, should calculate the

sample size.

• Sometimes, in one objective, more than one variables

of interest (multiple linear regression).

• In this case, we need to calculate for each

variable of interest.

• Then, the biggest sample size will be “the sample size

of the study”.

• We need to add-up 10-20% because we may get non-

response, loss of follow up, or any other loss.

Final COMMENTS

Page 47: Sample Size Determination€¦ · • Calculate the sample size for the detectable difference of prevalence 20%. It means P 0 = 27% and P 1 = 47%. 90 students in each group. • Calculate

Acknowledgement

Special thank to :

Assoc Prof Dr Mohd Ayub Saddiq

Thank You.