TOPICS IN BIOSTATISTICS: PART 1
Transcript of TOPICS IN BIOSTATISTICS: PART 1
TOPICS INBIOSTATISTICS: PART 1
Susan S. Ellenberg, Ph.D.Center for Clinical Epidemiology and
BiostatisticsU Penn School of Medicine
OUTLINE
• Study design issues– Constructing hypotheses– Power and significance level– Sample size determination
• Descriptive statistics– Averages– variability
GENERAL APPROACH
• Concepts, not equations• Goal is to increase awareness of
statistical considerations• Statistical software widely available to
do basic calculations• Good basic reference: Altman DG,
Practical Statistics for MedicalResearch, Chapman & Hall/CRC
ESTIMATION VS TESTING• Sometimes primary goal is to describe data.
Then we are interested in estimation. Weestimate parameters such as– Means– Variances– Correlations
• When primary goal is to draw a conclusionabout a state of nature or the result of anexperiment, we are interested in statisticaltesting
EXPLORATORY VSCONFIRMATORY STUDIES
• Exploring patterns in data can be veryuseful, even if specific hypotheses havenot been set up beforehand
• Such analyses can generate interestingnew hypotheses; can’t generate finalconclusions
• If you want to be able to make adefinitive statement, important tospecify hypothesis in very specificterms, in advance of experiment
TYPES OF DATA• Independent: each observation from a
different subject• Paired: two observations (eg, before and
after some intervention, left and right eyes)in same subject, or in closely related subjects(eg, siblings for genetics studies)
• Clustered: multiple observations on eachsubject
• When designing study and conductinganalyses, need to use methods appropriate todata type
DESIGNING A(CONFIRMATORY) STUDY
• First requirement: a specifichypothesis to be tested– What will be measured– Criteria for “success”
• Usual convention: establish a “nullhypothesis,” then attempt to disprove
• Need to be specific—should be noambiguity about primary hypothesis
• Possible ambiguities not always obvious
EXAMPLE: STUDYING“LIQUID STITCHES”
• New material to apply to wound, stopbleeding
• Need to study how quickly andeffectively bleeding is stopped
• Possible outcomes of interest:– time to cessation of bleeding– whether bleeding stopped within X sec– total amount of blood loss– whether further effort was needed to
stop bleeding– whether blood loss greater than Y ml
OTHER AMBIGUITIES• Given a specific hypothesis, what
statistical test will be used to evaluatethat hypothesis?
• What potentially confounding variableswill be accounted for in the analysis?– size of subject– size of wound– Other demographics: age, gender, …
• How will missing data be handled?
COMMON PHRASES RELATEDTO “MULTIPLE TESTING”
• Testing to a foregone conclusion
• Data dredging
• Torturing the data until they confess
SIGNIFICANCE LEVEL• The significance level is also commonly
referred to as– The p-value– The alpha level– The false positive rate– The Type I error
• It is defined as the probability of seeing aneffect of a specified size just bychance—that is, if there really were noeffect at all (the “null hypothesis”)
• We must specify a significance level when wedesign our experiment
Bell-Shaped (Normal) Curve
ONE-SIDED OR TWO-SIDED
• A one-sided (or one-tailed) test is one thatlooks for effects in only one direction– “side” or “tail” refers to extreme end of normal
(bell-shaped) curve• If all of the false positive error is put into
one of the tails, requirement for significanceis less stringent
• No wide consensus among statisticians as towhen one-sided tests are OK or when two-sided tests are needed
VIEWS ON ONE-SIDEDTESTING
• One view: one-sided tests OK when aneffect is possible in one direction only– Example: a treatment to increase height
• Another view: one-sided tests OKanytime an effect is only of interest inone direction– Example: when evaluating a new drug for
regulatory approval, action is taken only ifthere is a positive effect: negative effectsare possible but treated like zero effects
POWER• The power of a study is the probability that it
will yield a statistically significant result ifthere truly is an effect
• 1 minus power is the Type II error, or falsenegative rate, or beta error
• Power depends on– The size of the effect– The size of the study– The false positive rate you can live with (if we
declare all experiments a success, we will have100% power but a very high false positive rate)
FACTS ABOUT POWER• There is always some effect for which power
is high, even with a small sample size• For a given sample size, the power to detect
and effect is higher when the effect ismeasured by a continuous variable (eg, labvalue) than a yes-no variable (eg, mortality atday 10)
• One typically wants a hypothesis-testingstudy to have power of 80-90%
DETERMINING SAMPLE SIZE
• A study should be large enough so thatif there is an effect of a size worthknowing about, the study willdemonstrate the effect
• To calculate the sample size, need– Effect size of interest– Error rates we will tolerate– Variability of outcome measure
COMPARISON OFCONTINUOUS OUTCOMES
• “Standardize” effect size by dividingeffect size of interest to confirm, byexpected SD
• For given power and significance level,sample size increases rapidly as thedesired effect size gets smaller
TWO-SIDED 0.05SIGNIFICANCE LEVEL
SAMPLE SIZE BY EFFECT SIZE
0.2 0.4 0.6 0.8
68120260110090% power
529220080080% power
COMPARISON OFRATES/PROPORTIONS
• Need larger sample sizes when trying todetect differences between proportions
• Reason: 0-1 data are less informativethan continuous data
• Use binomial distribution rather thannormal distribution
• For calculation need to specifydifference of interest, expectedproportion in control group, and errorprobabilities
two-sided 0.05 significance level
SAMPLE SIZE BY POWERAND RATES OF INTERESTEvent/success rates pwr=0.80 pwr=0.90
0.20 vs 0.40 182 236
0.40 vs 0.60 214 278
0.10 vs 0.20 438 572
0.20 vs 0.30 626 824
DESCRIBING DATA
• Two basic aspects of data– Centrality– variability
• Different measures for each• Optimal measure depends on type of
data being described
CENTRALITY• Mean
– Sum of observed values divided by number ofobservations
– Most common measure of centrality– Most informative when data follow normal
distribution (bell-shaped curve)• Median
– “middle” value: half of all observed values aresmaller, half are larger
– Best centrality measure when data are skewed• Mode
– Most frequently observed value
MEAN CAN MISLEAD• Group 1 data: 1,1,1,2,3,3,5,8,20
– Mean: 4.9 Median: 3 Mode: 1• Group 2 data: 1,1,1,2,3,3,5,8,10
– Mean: 3.8 Median: 3 Mode: 1• When data sets are small, a single extreme
observation will have great influence on mean,little or no influence on median
• In such cases, median is usually a moreinformative measure of centrality
TIME-TO-EVENT DATA• In many experiments, outcome of interest is
time to some event– Death– Resolution of disease/symptom– First symptom manifestation
• Such data are typically not normallydistributed; tend to follow an exponentialdistribution
• Data may be truncated (eg, all animalssacrificed at day X, so X is longest observabletime)
• Medians typically used for such data
VARIABILITY• Most commonly used measure to
describe variability is standarddeviation (SD)
• SD is a function of the squareddifferences of each observation fromthe mean
• If the mean is influenced by a singleextreme observation, the SD willoverstate the actual variability
ALTERNATIVE TO SD
• When using median as centralitymeasure, can describe variability byproviding range (min, max) andinterquartile range (25th and 75th
percentiles)• Graphical presentation often provides
best sense of variability
EXAMPLE
• Group 1 data: 1,1,1,2,3,3,5,8,20– Mean: 4.9 Median: 3
• Group 2 data: 1,1,1,3,3,3,5,8,10– Mean: 3.8 Median: 3
• SDs: group 1: 6.1 group 2: 3.2• Interquartile range: 1,5
CONFIDENCE INTERVALS• A confidence interval is intended to provide a
sense of the variability of an estimated mean• Can be defined as the set of possible values
that includes, with specified probability, thetrue mean
• Confidence intervals can be constructed forany type of variable, but here we consider themost common case of a normally distributedvariable
CONSTRUCTING ACONFIDENCE INTERVAL
• First, determine what level of probabilityshould define the interval
• Second, find the normal value (or z-value)that corresponds to that probability– 99%: 2.58– 95%: 1.96– 90%: 1.64
• Third, multiply the z-value by the standarderror of the mean
Bell-Shaped (Normal) Curve
FACTS ABOUT CONFIDENCEINTERVALS
• The more sure you want to be that thetrue value is included in your interval,the wider the interval will be– A 99% confidence interval will be wider
than a 95% confidence interval• Most common size confidence interval is
95%, but 90% and even 80% confidenceintervals are sometimes used
VALUE OF CONFIDENCEINTERVALS
• Two data sets may have the same mean;but if one data set has 5 observationsand the second has 500 observations,the two means convey very differentamounts of information
• Confidence intervals remind us howuncertain our estimate really is
FINAL COMMENTS
• Statistics are only helpful if theapproach taken is appropriate to theproblem at hand
• Most statistical procedures are basedon some assumptions about thecharacteristics of the data—these needto be checked
• Remember GIGO