1 Lecture 4 – Statistics: Hypothesis Testing and Estimation Michael Brown MD, MSc Professor...
-
Upload
kaylin-stoken -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Lecture 4 – Statistics: Hypothesis Testing and Estimation Michael Brown MD, MSc Professor...
1
Lecture 4 – Statistics: Hypothesis Testing Lecture 4 – Statistics: Hypothesis Testing and Estimationand Estimation
Michael Brown MD, MScMichael Brown MD, MSc
Professor Epidemiology and Professor Epidemiology and Emergency MedicineEmergency Medicine
Credit to Roger J. Lewis, MD, PhDCredit to Roger J. Lewis, MD, PhDDepartment of Emergency MedicineDepartment of Emergency Medicine
Harbor-UCLA Medical CenterHarbor-UCLA Medical Center
and and
Jeff Jones, Grand Rapids MERC / MSU Program in Jeff Jones, Grand Rapids MERC / MSU Program in Emergency MedicineEmergency Medicine
EPI-546 Block I
2 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Today’s TopicsToday’s Topics
Classical Hypothesis TestingClassical Hypothesis Testing Type I ErrorType I Error Type II Error, Power, Sample SizeType II Error, Power, Sample Size
Point Estimates and Confidence IntervalsPoint Estimates and Confidence Intervals Multiple ComparisonsMultiple Comparisons
3 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Classical Hypothesis Testing:Classical Hypothesis Testing:StepsSteps
1.1. Define the null hypothesisDefine the null hypothesis2.2. Define the alternative hypothesisDefine the alternative hypothesis3.3. Calculate a Calculate a pp value value4.4. Accept or reject the null hypothesis Accept or reject the null hypothesis
based on the based on the pp value value5.5. If the null hypothesis is rejected, then If the null hypothesis is rejected, then
accept the alternative hypothesisaccept the alternative hypothesis
4 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Classical Hypothesis Testing:Classical Hypothesis Testing:
• The Null Hypotheses:The Null Hypotheses: no difference no difference between the two groups to be between the two groups to be comparedcompared
5 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Classical Hypothesis Testing:Classical Hypothesis Testing:
• The Alternative Hypothesis:The Alternative Hypothesis: there is a there is a difference between the two groups to difference between the two groups to be comparedbe compared
6 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Classical Hypothesis Testing:Classical Hypothesis Testing:Defining the Alternative HypothesisDefining the Alternative Hypothesis
• The size of the expected difference The size of the expected difference should be defined should be defined prior prior to to data data collection (collection (a prioria priori))
• The difference defined by the The difference defined by the alternative hypothesis should be alternative hypothesis should be clinically significantclinically significant
• Example: Difference in Pain Score on Example: Difference in Pain Score on 100mm VAS of 13mm or greater100mm VAS of 13mm or greater
7 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Classical Hypothesis Testing:Classical Hypothesis Testing:
• The The pp value: value: probability of obtaining probability of obtaining the results observed, if the null the results observed, if the null hypothesis were truehypothesis were true
8 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Classical Hypothesis Testing:Classical Hypothesis Testing:pp value value
• If If pp = 0.01, then the chance of = 0.01, then the chance of obtaining the same results as the obtaining the same results as the experiment is 1%experiment is 1%• Very unlikely due to chance!Very unlikely due to chance!
• So we reject the null hypothesisSo we reject the null hypothesis
9 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Classical Hypothesis Testing:Classical Hypothesis Testing:pp value value
• If If pp = 0.01, then the chance of = 0.01, then the chance of obtaining the same results as the obtaining the same results as the experiment is 1%experiment is 1%• Very unlikely due to chance!Very unlikely due to chance!
• So we reject the null hypothesisSo we reject the null hypothesis
• If If pp = 0.7, then the chance of = 0.7, then the chance of obtaining the same results as the obtaining the same results as the experiment is 70%experiment is 70%• accept the null hypothesisaccept the null hypothesis
10 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Classical Hypothesis Testing:Classical Hypothesis Testing:Rejecting the Null HypothesisRejecting the Null Hypothesis
• The cut-point for rejecting the null The cut-point for rejecting the null hypothesis is arbitrary (hypothesis is arbitrary ())
• Typically, Typically, = 0.05 = 0.05
• If the null hypothesis is rejected, then If the null hypothesis is rejected, then the alternative hypothesis is accepted the alternative hypothesis is accepted as trueas true
11 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Clinical Trial (statistical testing) Jury Trial (criminal law)
Assume the null hypothesis Presume innocent
Goal: detect a true difference Goal: convict the guilty (reject the null hypothesis)
“Level of significance” “Beyond reasonable p < .05 doubt”
Requires: Requires:adequate sample size convincing testimony
12 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Similar to a Trial by Jury…..Similar to a Trial by Jury…..
• There are only 1 of 4 possible There are only 1 of 4 possible outcomes of aoutcomes of a Clinical Trial Clinical Trial::• 2 are correct: TP, TN2 are correct: TP, TN
• 2 are 2 are errorserrors: FP, FN: FP, FN
13 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
TRUTH
Guilty Innocent
SIGNF.
REJECT Ho(P < 0.05)
ACCEPT Ho(P > 0.05)
TP FP
FN TNTEST
14 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Clinical Trial (statistical testing) Jury Trial (criminal law)
Appropriately Correct verdict: reject the null hypothesis (TP) convict a guilty person
Appropriately Correct verdict:accept the null hypothesis (TN) acquit the innocent
15 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Clinical Trial (statistical testing) Jury Trial (criminal law)
Correct inference: Correct verdict: reject the null hypothesis convict a guilty person
Correct inference: Correct verdict:accept the null hypothesis acquit the innocent
Incorrect inference (FP) Incorrect verdict:Type I error hang innocent person
Incorrect inference (FN) Incorrect verdict:Type II error guilty skates free
16 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
TRUTH
Guilty Innocent
SIGNF.
REJECT Ho(P < 0.05)
ACCEPT Ho(P > 0.05)
TP FP
FN TN
Type I (alpha)
Type II (Beta)
TEST
17 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Classical Hypothesis Testing:Classical Hypothesis Testing:Type II ErrorType II Error
• A false-negative resultA false-negative result
• p p value > .05 is obtained, yet the two value > .05 is obtained, yet the two groups groups are are differentdifferent
• The risk of a type II error = The risk of a type II error =
18 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Type II ErrorType II Error
• Although trend toward benefit, Although trend toward benefit, p p value > .05value > .05
• Null hypothesis acceptedNull hypothesis accepted• Truth: larger study demonstrated that the two Truth: larger study demonstrated that the two
groups groups were actuallywere actually differentdifferent
• Committed a Type II ErrorCommitted a Type II Error
• Typical pilot study has low Typical pilot study has low powerpower to to detect a differencedetect a difference
19 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Classical Hypothesis Testing:Classical Hypothesis Testing:PowerPower
• Power = 1 - Power = 1 - • If Power 80%:If Power 80%:
• 80% probability of detecting a true 80% probability of detecting a true difference if it existsdifference if it exists
• Power is determined by sample size, Power is determined by sample size, the magnitude of the difference the magnitude of the difference sought, and by sought, and by
• Pilot study had small sample size, Pilot study had small sample size, therefore “low” powertherefore “low” power
20 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Steps in Sample Size DeterminationSteps in Sample Size Determination
1.1. Define the type of data (continuous, Define the type of data (continuous, ordinal, categorical, etc.)ordinal, categorical, etc.)
21 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
A Few Examples of Statistical TestsA Few Examples of Statistical Tests
Test Comparison Principal Assumptions
Student'st test
Means oftwo groups
Continuous variable,normally distributed,equal variance
Wilcoxonrank sum
Medians oftwo groups
Continuous variable
Chi-square Proportions Categorical variable,more than 5 patients inany particular "cell"
Fisher'sexact
Proportions Categorical variable
22 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Steps in Sample Size DeterminationSteps in Sample Size Determination
1.1. Define the type of data (continuous, Define the type of data (continuous, ordinal, categorical, etc.)ordinal, categorical, etc.)
2.2. Define the size of the difference Define the size of the difference soughtsought
3.3. Define Define (usually 0.05) (usually 0.05)
4.4. Determine power desired (often 0.80)Determine power desired (often 0.80)
5.5. Look up the sample size: tables, Look up the sample size: tables, formulas or softwareformulas or software
23 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Today’s TopicsToday’s Topics
Classical Hypothesis TestingClassical Hypothesis Testing Type I ErrorType I Error Type II Error, Power, Sample SizeType II Error, Power, Sample Size
Point Estimates and Confidence IntervalsPoint Estimates and Confidence Intervals Multiple ComparisonsMultiple Comparisons
24 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Limitations of the Limitations of the pp Value Value
pp < 0.05 tells us that the observed < 0.05 tells us that the observed treatment difference is “statistically treatment difference is “statistically significantly” differentsignificantly” different
pp < 0.05 < 0.05 does notdoes not tell us: tell us: The uncertainty around the point estimateThe uncertainty around the point estimate The likelihood that the true treatment effect The likelihood that the true treatment effect
is clinically importantis clinically important
25 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Confidence Intervals: ExampleConfidence Intervals: Example
PurposePurpose: to compare the effects of : to compare the effects of vasopressor A (Vvasopressor A (VAA) and vasopressor B (V) and vasopressor B (VBB) ) based on post-treatment SBP in hypotensive based on post-treatment SBP in hypotensive patientspatients
EndpointEndpoint: post-treatment SBP: post-treatment SBP
Null hypothesisNull hypothesis: mean SBP: mean SBPAA = mean SBP = mean SBPBB
ResultsResults:: mean SBPmean SBPAA = 70 mm Hg (after V = 70 mm Hg (after VAA))
mean SBPmean SBPB B = 95 mm Hg (after V= 95 mm Hg (after VBB))Observed difference = 25 mm Hg (Observed difference = 25 mm Hg (pp < 0.05) < 0.05)25 mm Hg difference is the “25 mm Hg difference is the “point estimatepoint estimate””
26 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
The Point Estimate and the CIThe Point Estimate and the CI
When using CIs, we report the point estimate When using CIs, we report the point estimate and the limits of the CI surrounding the point and the limits of the CI surrounding the point estimate: estimate:
25 mm Hg (95% CI: 5 to 44 mm Hg)25 mm Hg (95% CI: 5 to 44 mm Hg)
27 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Interpretation of the CIInterpretation of the CI
Consider the comparison of vasopressor Consider the comparison of vasopressor A and vasopressor BA and vasopressor B
Since the 95% CI, 5 to 44 mm Hg Since the 95% CI, 5 to 44 mm Hg doesn’t include 0, this is equivalent to doesn’t include 0, this is equivalent to pp < 0.05< 0.05
5 25 44
28 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Interpretation of the CIInterpretation of the CI
Although the point estimate for the Although the point estimate for the difference is 25 mm Hg, the results are difference is 25 mm Hg, the results are consistent with the true difference being consistent with the true difference being anywhere between 5 and 44 mm Hganywhere between 5 and 44 mm Hg
5 25 44
29 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Why a 95% CI?Why a 95% CI?
The selection of 95% CIs (as opposed to The selection of 95% CIs (as opposed to 99% CIs, for example) is arbitrary99% CIs, for example) is arbitrary
like the selection of 0.05 as the cutoff for a like the selection of 0.05 as the cutoff for a statistically significant statistically significant pp value value
30 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Middle Ear Squeeze StudyMiddle Ear Squeeze Study
For a power of 80%, we needed a sample size of For a power of 80%, we needed a sample size of approximately 120 subjectsapproximately 120 subjects
N = 116N = 116 60 treatment60 treatment 56 control56 control
Ann Emerg Med July 1992; 21:849-852.
31 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Middle Ear Squeeze StudyMiddle Ear Squeeze StudyUsing Using pp value value
For a power of 80%, we needed a sample size of For a power of 80%, we needed a sample size of approximately 120 subjectsapproximately 120 subjects
N = 116N = 116 60 treatment60 treatment 56 control56 control
Outcome - ear discomfort:Outcome - ear discomfort: Treatment group 8%Treatment group 8% Control group 32%Control group 32%
pp = .001 = .001 Sudafed works! Sudafed works!
Ann Emerg Med July 1992; 21:849-852.
32 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Middle Ear Squeeze StudyMiddle Ear Squeeze StudyUsing Point Estimate and 95% CIUsing Point Estimate and 95% CI
Ear discomfort:Ear discomfort: Treatment group 8%Treatment group 8% Control group 32%Control group 32% Absolute Risk Reduction 24% (95% CI: 9.9 to Absolute Risk Reduction 24% (95% CI: 9.9 to
38.3%)38.3%) NNT 4.2 (95% CI: 2.6 to 10.1)NNT 4.2 (95% CI: 2.6 to 10.1)
33 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Cochrane Library. Wood-Baker, RR; Gibson, PG; Hannay, M; Walters, EH; Walters, JAEDate of Most Recent Update: 26-July-2005.
34 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Clinical vs. Statistical SignificanceClinical vs. Statistical Significance
Oral ondansetron vs. placebo Oral ondansetron vs. placebo 215 children with gastroenteritis215 children with gastroenteritis Primary outcome: vomiting during oral Primary outcome: vomiting during oral
hydrationhydration RR = 0.4 (95% CI: 0.26 to 0.61)RR = 0.4 (95% CI: 0.26 to 0.61) NNT = 4.9 (95% CI: 3.1 to 10.3)NNT = 4.9 (95% CI: 3.1 to 10.3) Both clinically significant and statistically Both clinically significant and statistically
significantsignificant
N Engl J Med 2006; 354:1698-705
35 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Clinical vs. Statistical SignificanceClinical vs. Statistical Significance
Secondary outcome: oral intake in EDSecondary outcome: oral intake in ED 239 ml vs. 196 ml239 ml vs. 196 ml pp = 0.001 (statistically significant) = 0.001 (statistically significant) But is a difference of 9 tsp clinically significant?But is a difference of 9 tsp clinically significant?
36 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Today’s TopicsToday’s Topics
Classical Hypothesis TestingClassical Hypothesis Testing Type I ErrorType I Error Type II Error, Power, Sample SizeType II Error, Power, Sample Size
Point Estimates and Confidence IntervalsPoint Estimates and Confidence Intervals Multiple ComparisonsMultiple Comparisons
37 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Multiple ComparisonsMultiple Comparisons
• When two When two identicalidentical groups of patients are groups of patients are compared, there is a chance (compared, there is a chance () that a ) that a statistically significant statistically significant pp value will be obtained value will be obtained (type I error)(type I error)
• When When multiplemultiple comparisons are performed, the comparisons are performed, the risk of one or more false-positive risk of one or more false-positive p p values is values is increasesincreases
• Multiple comparisons include:Multiple comparisons include:– Pair-wise comparisons of more than two groupsPair-wise comparisons of more than two groups– The comparison of multiple characteristics between two The comparison of multiple characteristics between two
groups groups (e.g., sub-group analyses)(e.g., sub-group analyses)– The comparison of two groups at multiple time pointsThe comparison of two groups at multiple time points
38 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Multiple Comparisons:Multiple Comparisons:Risk of Risk of 1 False Positive 1 False Positive
Number ofComparisons
Probability of atLeast One Type I Error
12345
102030
0.050.100.140.190.230.400.640.79
Assumes = 0.05, uncorrelated comparisons
39 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Multiple Comparisons:Multiple Comparisons:Bonferroni CorrectionBonferroni Correction
• A method for reducing the overall risk of A method for reducing the overall risk of a type I error when making multiple a type I error when making multiple comparisonscomparisons
• The overall (study-wise) type I error risk The overall (study-wise) type I error risk desired (e.g., 0.05) is divided by the desired (e.g., 0.05) is divided by the number of tests, and this new value is number of tests, and this new value is used as the used as the for each individual test for each individual test
• Controls the type I error risk, but Controls the type I error risk, but reduces the power (increased type II reduces the power (increased type II error risk)error risk)
40 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Results: We tested these 24 associations in the independent validation cohort. Residents born under Leo had a higher probability of gastrointestinal hemorrhage (P =.04), while Sagittarians had a higher probability of humerus fracture (P =.01) compared to all other signs combined. After adjusting the significance level to account for multiple comparisons, none of the identified associations remained significant in either the derivation or validation cohort.
Bonferroni correction: .05/24 = 0.002 for statistical significance
41 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Statistical Issues to Consider if Statistical Issues to Consider if Planning a StudyPlanning a Study
• Define the most important question to be Define the most important question to be answered – the “primary objective”answered – the “primary objective”
• Define the size of the difference you wish Define the size of the difference you wish to detectto detect
• Get as much information as possible Get as much information as possible about what you expect to see in the about what you expect to see in the control groupcontrol group
42 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
Statistical Issues to Consider if Statistical Issues to Consider if Planning a StudyPlanning a Study
• Define values for Define values for and power, and the and power, and the maximum sample size that is realisticmaximum sample size that is realistic
• Define clinically important subgroups of Define clinically important subgroups of the population the population ((a prioria priori sub-group sub-group analysesanalyses))
• Determine whether there are important Determine whether there are important multiple multiple comparisonscomparisons
43 Dr. Michael Brown© Epidemiology Dept., Michigan State Univ.
When You Visit the Statistician:When You Visit the Statistician:
• Bring examples of published studies that Bring examples of published studies that illustrate the type of analysis you would illustrate the type of analysis you would like to perform at the end of the studylike to perform at the end of the study