1 Lecture 4 – Statistics: Hypothesis Testing and Estimation Michael Brown MD, MSc Professor...

1

Lecture 4 – Statistics: Hypothesis Testing Lecture 4 – Statistics: Hypothesis Testing and Estimationand Estimation

Michael Brown MD, MScMichael Brown MD, MSc

Professor Epidemiology and Professor Epidemiology and Emergency MedicineEmergency Medicine

Credit to Roger J. Lewis, MD, PhDCredit to Roger J. Lewis, MD, PhDDepartment of Emergency MedicineDepartment of Emergency Medicine

Harbor-UCLA Medical CenterHarbor-UCLA Medical Center

and and

Jeff Jones, Grand Rapids MERC / MSU Program in Jeff Jones, Grand Rapids MERC / MSU Program in Emergency MedicineEmergency Medicine

EPI-546 Block I

2 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.

Today’s TopicsToday’s Topics

Classical Hypothesis TestingClassical Hypothesis Testing Type I ErrorType I Error Type II Error, Power, Sample SizeType II Error, Power, Sample Size

Point Estimates and Confidence IntervalsPoint Estimates and Confidence Intervals Multiple ComparisonsMultiple Comparisons


Classical Hypothesis Testing:Classical Hypothesis Testing:StepsSteps

1.1. Define the null hypothesisDefine the null hypothesis2.2. Define the alternative hypothesisDefine the alternative hypothesis3.3. Calculate a Calculate a pp value value4.4. Accept or reject the null hypothesis Accept or reject the null hypothesis

based on the based on the pp value value5.5. If the null hypothesis is rejected, then If the null hypothesis is rejected, then

accept the alternative hypothesisaccept the alternative hypothesis


Classical Hypothesis Testing:Classical Hypothesis Testing:

• The Null Hypotheses:The Null Hypotheses: no difference no difference between the two groups to be between the two groups to be comparedcompared



• The Alternative Hypothesis:The Alternative Hypothesis: there is a there is a difference between the two groups to difference between the two groups to be comparedbe compared


Classical Hypothesis Testing:Classical Hypothesis Testing:Defining the Alternative HypothesisDefining the Alternative Hypothesis

• The size of the expected difference The size of the expected difference should be defined should be defined prior prior to to data data collection (collection (a prioria priori))

• The difference defined by the The difference defined by the alternative hypothesis should be alternative hypothesis should be clinically significantclinically significant

• Example: Difference in Pain Score on Example: Difference in Pain Score on 100mm VAS of 13mm or greater100mm VAS of 13mm or greater



• The The pp value: value: probability of obtaining probability of obtaining the results observed, if the null the results observed, if the null hypothesis were truehypothesis were true


Classical Hypothesis Testing:Classical Hypothesis Testing:pp value value

• If If pp = 0.01, then the chance of = 0.01, then the chance of obtaining the same results as the obtaining the same results as the experiment is 1%experiment is 1%• Very unlikely due to chance!Very unlikely due to chance!

• So we reject the null hypothesisSo we reject the null hypothesis


Classical Hypothesis Testing:Classical Hypothesis Testing:pp value value

• If If pp = 0.01, then the chance of = 0.01, then the chance of obtaining the same results as the obtaining the same results as the experiment is 1%experiment is 1%• Very unlikely due to chance!Very unlikely due to chance!

• So we reject the null hypothesisSo we reject the null hypothesis

• If If pp = 0.7, then the chance of = 0.7, then the chance of obtaining the same results as the obtaining the same results as the experiment is 70%experiment is 70%• accept the null hypothesisaccept the null hypothesis


Classical Hypothesis Testing:Classical Hypothesis Testing:Rejecting the Null HypothesisRejecting the Null Hypothesis

• The cut-point for rejecting the null The cut-point for rejecting the null hypothesis is arbitrary (hypothesis is arbitrary ())

• Typically, Typically, = 0.05 = 0.05

• If the null hypothesis is rejected, then If the null hypothesis is rejected, then the alternative hypothesis is accepted the alternative hypothesis is accepted as trueas true


Clinical Trial (statistical testing) Jury Trial (criminal law)

Assume the null hypothesis Presume innocent

Goal: detect a true difference Goal: convict the guilty (reject the null hypothesis)

“Level of significance” “Beyond reasonable p < .05 doubt”

Requires: Requires:adequate sample size convincing testimony


Similar to a Trial by Jury…..Similar to a Trial by Jury…..

• There are only 1 of 4 possible There are only 1 of 4 possible outcomes of aoutcomes of a Clinical Trial Clinical Trial::• 2 are correct: TP, TN2 are correct: TP, TN

• 2 are 2 are errorserrors: FP, FN: FP, FN


TRUTH

Guilty Innocent

SIGNF.

REJECT Ho(P < 0.05)

ACCEPT Ho(P > 0.05)

TP FP

FN TNTEST



Appropriately Correct verdict: reject the null hypothesis (TP) convict a guilty person

Appropriately Correct verdict:accept the null hypothesis (TN) acquit the innocent



Correct inference: Correct verdict: reject the null hypothesis convict a guilty person

Correct inference: Correct verdict:accept the null hypothesis acquit the innocent

Incorrect inference (FP) Incorrect verdict:Type I error hang innocent person

Incorrect inference (FN) Incorrect verdict:Type II error guilty skates free


TRUTH

Guilty Innocent

SIGNF.

REJECT Ho(P < 0.05)

ACCEPT Ho(P > 0.05)

TP FP

FN TN

Type I (alpha)

Type II (Beta)

TEST


Classical Hypothesis Testing:Classical Hypothesis Testing:Type II ErrorType II Error

• A false-negative resultA false-negative result

• p p value > .05 is obtained, yet the two value > .05 is obtained, yet the two groups groups are are differentdifferent

• The risk of a type II error = The risk of a type II error =


Type II ErrorType II Error

• Although trend toward benefit, Although trend toward benefit, p p value > .05value > .05

• Null hypothesis acceptedNull hypothesis accepted• Truth: larger study demonstrated that the two Truth: larger study demonstrated that the two

groups groups were actuallywere actually differentdifferent

• Committed a Type II ErrorCommitted a Type II Error

• Typical pilot study has low Typical pilot study has low powerpower to to detect a differencedetect a difference


Classical Hypothesis Testing:Classical Hypothesis Testing:PowerPower

• Power = 1 - Power = 1 - • If Power 80%:If Power 80%:

• 80% probability of detecting a true 80% probability of detecting a true difference if it existsdifference if it exists

• Power is determined by sample size, Power is determined by sample size, the magnitude of the difference the magnitude of the difference sought, and by sought, and by

• Pilot study had small sample size, Pilot study had small sample size, therefore “low” powertherefore “low” power


Steps in Sample Size DeterminationSteps in Sample Size Determination

1.1. Define the type of data (continuous, Define the type of data (continuous, ordinal, categorical, etc.)ordinal, categorical, etc.)


A Few Examples of Statistical TestsA Few Examples of Statistical Tests

Test Comparison Principal Assumptions

Student'st test

Means oftwo groups

Continuous variable,normally distributed,equal variance

Wilcoxonrank sum

Medians oftwo groups

Continuous variable

Chi-square Proportions Categorical variable,more than 5 patients inany particular "cell"

Fisher'sexact

Proportions Categorical variable


Steps in Sample Size DeterminationSteps in Sample Size Determination

1.1. Define the type of data (continuous, Define the type of data (continuous, ordinal, categorical, etc.)ordinal, categorical, etc.)

2.2. Define the size of the difference Define the size of the difference soughtsought

3.3. Define Define (usually 0.05) (usually 0.05)

4.4. Determine power desired (often 0.80)Determine power desired (often 0.80)

5.5. Look up the sample size: tables, Look up the sample size: tables, formulas or softwareformulas or software


Limitations of the Limitations of the pp Value Value

pp < 0.05 tells us that the observed < 0.05 tells us that the observed treatment difference is “statistically treatment difference is “statistically significantly” differentsignificantly” different

pp < 0.05 < 0.05 does notdoes not tell us: tell us: The uncertainty around the point estimateThe uncertainty around the point estimate The likelihood that the true treatment effect The likelihood that the true treatment effect

is clinically importantis clinically important


Confidence Intervals: ExampleConfidence Intervals: Example

PurposePurpose: to compare the effects of : to compare the effects of vasopressor A (Vvasopressor A (VAA) and vasopressor B (V) and vasopressor B (VBB) ) based on post-treatment SBP in hypotensive based on post-treatment SBP in hypotensive patientspatients

EndpointEndpoint: post-treatment SBP: post-treatment SBP

Null hypothesisNull hypothesis: mean SBP: mean SBPAA = mean SBP = mean SBPBB

ResultsResults:: mean SBPmean SBPAA = 70 mm Hg (after V = 70 mm Hg (after VAA))

mean SBPmean SBPB B = 95 mm Hg (after V= 95 mm Hg (after VBB))Observed difference = 25 mm Hg (Observed difference = 25 mm Hg (pp < 0.05) < 0.05)25 mm Hg difference is the “25 mm Hg difference is the “point estimatepoint estimate””


The Point Estimate and the CIThe Point Estimate and the CI

When using CIs, we report the point estimate When using CIs, we report the point estimate and the limits of the CI surrounding the point and the limits of the CI surrounding the point estimate: estimate:

25 mm Hg (95% CI: 5 to 44 mm Hg)25 mm Hg (95% CI: 5 to 44 mm Hg)


Interpretation of the CIInterpretation of the CI

Consider the comparison of vasopressor Consider the comparison of vasopressor A and vasopressor BA and vasopressor B

Since the 95% CI, 5 to 44 mm Hg Since the 95% CI, 5 to 44 mm Hg doesn’t include 0, this is equivalent to doesn’t include 0, this is equivalent to pp < 0.05< 0.05

5 25 44


Interpretation of the CIInterpretation of the CI

Although the point estimate for the Although the point estimate for the difference is 25 mm Hg, the results are difference is 25 mm Hg, the results are consistent with the true difference being consistent with the true difference being anywhere between 5 and 44 mm Hganywhere between 5 and 44 mm Hg

5 25 44


Why a 95% CI?Why a 95% CI?

The selection of 95% CIs (as opposed to The selection of 95% CIs (as opposed to 99% CIs, for example) is arbitrary99% CIs, for example) is arbitrary

like the selection of 0.05 as the cutoff for a like the selection of 0.05 as the cutoff for a statistically significant statistically significant pp value value


Middle Ear Squeeze StudyMiddle Ear Squeeze Study

For a power of 80%, we needed a sample size of For a power of 80%, we needed a sample size of approximately 120 subjectsapproximately 120 subjects

N = 116N = 116 60 treatment60 treatment 56 control56 control

Ann Emerg Med July 1992; 21:849-852.


Middle Ear Squeeze StudyMiddle Ear Squeeze StudyUsing Using pp value value

For a power of 80%, we needed a sample size of For a power of 80%, we needed a sample size of approximately 120 subjectsapproximately 120 subjects

N = 116N = 116 60 treatment60 treatment 56 control56 control

Outcome - ear discomfort:Outcome - ear discomfort: Treatment group 8%Treatment group 8% Control group 32%Control group 32%

pp = .001 = .001 Sudafed works! Sudafed works!

Ann Emerg Med July 1992; 21:849-852.


Middle Ear Squeeze StudyMiddle Ear Squeeze StudyUsing Point Estimate and 95% CIUsing Point Estimate and 95% CI

Ear discomfort:Ear discomfort: Treatment group 8%Treatment group 8% Control group 32%Control group 32% Absolute Risk Reduction 24% (95% CI: 9.9 to Absolute Risk Reduction 24% (95% CI: 9.9 to

38.3%)38.3%) NNT 4.2 (95% CI: 2.6 to 10.1)NNT 4.2 (95% CI: 2.6 to 10.1)


Cochrane Library. Wood-Baker, RR; Gibson, PG; Hannay, M; Walters, EH; Walters, JAEDate of Most Recent Update: 26-July-2005.


Clinical vs. Statistical SignificanceClinical vs. Statistical Significance

Oral ondansetron vs. placebo Oral ondansetron vs. placebo 215 children with gastroenteritis215 children with gastroenteritis Primary outcome: vomiting during oral Primary outcome: vomiting during oral

hydrationhydration RR = 0.4 (95% CI: 0.26 to 0.61)RR = 0.4 (95% CI: 0.26 to 0.61) NNT = 4.9 (95% CI: 3.1 to 10.3)NNT = 4.9 (95% CI: 3.1 to 10.3) Both clinically significant and statistically Both clinically significant and statistically

significantsignificant

N Engl J Med 2006; 354:1698-705


Clinical vs. Statistical SignificanceClinical vs. Statistical Significance

Secondary outcome: oral intake in EDSecondary outcome: oral intake in ED 239 ml vs. 196 ml239 ml vs. 196 ml pp = 0.001 (statistically significant) = 0.001 (statistically significant) But is a difference of 9 tsp clinically significant?But is a difference of 9 tsp clinically significant?


Multiple ComparisonsMultiple Comparisons

• When two When two identicalidentical groups of patients are groups of patients are compared, there is a chance (compared, there is a chance () that a ) that a statistically significant statistically significant pp value will be obtained value will be obtained (type I error)(type I error)

• When When multiplemultiple comparisons are performed, the comparisons are performed, the risk of one or more false-positive risk of one or more false-positive p p values is values is increasesincreases

• Multiple comparisons include:Multiple comparisons include:– Pair-wise comparisons of more than two groupsPair-wise comparisons of more than two groups– The comparison of multiple characteristics between two The comparison of multiple characteristics between two

groups groups (e.g., sub-group analyses)(e.g., sub-group analyses)– The comparison of two groups at multiple time pointsThe comparison of two groups at multiple time points


Multiple Comparisons:Multiple Comparisons:Risk of Risk of 1 False Positive 1 False Positive

Number ofComparisons

Probability of atLeast One Type I Error

12345

102030

0.050.100.140.190.230.400.640.79

Assumes = 0.05, uncorrelated comparisons


Multiple Comparisons:Multiple Comparisons:Bonferroni CorrectionBonferroni Correction

• A method for reducing the overall risk of A method for reducing the overall risk of a type I error when making multiple a type I error when making multiple comparisonscomparisons

• The overall (study-wise) type I error risk The overall (study-wise) type I error risk desired (e.g., 0.05) is divided by the desired (e.g., 0.05) is divided by the number of tests, and this new value is number of tests, and this new value is used as the used as the for each individual test for each individual test

• Controls the type I error risk, but Controls the type I error risk, but reduces the power (increased type II reduces the power (increased type II error risk)error risk)


Results: We tested these 24 associations in the independent validation cohort. Residents born under Leo had a higher probability of gastrointestinal hemorrhage (P =.04), while Sagittarians had a higher probability of humerus fracture (P =.01) compared to all other signs combined. After adjusting the significance level to account for multiple comparisons, none of the identified associations remained significant in either the derivation or validation cohort.

Bonferroni correction: .05/24 = 0.002 for statistical significance


Statistical Issues to Consider if Statistical Issues to Consider if Planning a StudyPlanning a Study

• Define the most important question to be Define the most important question to be answered – the “primary objective”answered – the “primary objective”

• Define the size of the difference you wish Define the size of the difference you wish to detectto detect

• Get as much information as possible Get as much information as possible about what you expect to see in the about what you expect to see in the control groupcontrol group


Statistical Issues to Consider if Statistical Issues to Consider if Planning a StudyPlanning a Study

• Define values for Define values for and power, and the and power, and the maximum sample size that is realisticmaximum sample size that is realistic

• Define clinically important subgroups of Define clinically important subgroups of the population the population ((a prioria priori sub-group sub-group analysesanalyses))

• Determine whether there are important Determine whether there are important multiple multiple comparisonscomparisons


When You Visit the Statistician:When You Visit the Statistician:

• Bring examples of published studies that Bring examples of published studies that illustrate the type of analysis you would illustrate the type of analysis you would like to perform at the end of the studylike to perform at the end of the study

1 Lecture 4 – Statistics: Hypothesis Testing and Estimation Michael Brown MD, MSc Professor...

Documents

Transcript of 1 Lecture 4 – Statistics: Hypothesis Testing and Estimation Michael Brown MD, MSc Professor...