TOPICS IN BIOSTATISTICS: PART 1

TOPICS INBIOSTATISTICS: PART 1

Susan S. Ellenberg, Ph.D.Center for Clinical Epidemiology and

BiostatisticsU Penn School of Medicine

OUTLINE

• Study design issues– Constructing hypotheses– Power and significance level– Sample size determination

• Descriptive statistics– Averages– variability

GENERAL APPROACH

• Concepts, not equations• Goal is to increase awareness of

statistical considerations• Statistical software widely available to

do basic calculations• Good basic reference: Altman DG,

Practical Statistics for MedicalResearch, Chapman & Hall/CRC

ESTIMATION VS TESTING• Sometimes primary goal is to describe data.

Then we are interested in estimation. Weestimate parameters such as– Means– Variances– Correlations

• When primary goal is to draw a conclusionabout a state of nature or the result of anexperiment, we are interested in statisticaltesting

EXPLORATORY VSCONFIRMATORY STUDIES

• Exploring patterns in data can be veryuseful, even if specific hypotheses havenot been set up beforehand

• Such analyses can generate interestingnew hypotheses; can’t generate finalconclusions

• If you want to be able to make adefinitive statement, important tospecify hypothesis in very specificterms, in advance of experiment

TYPES OF DATA• Independent: each observation from a

different subject• Paired: two observations (eg, before and

after some intervention, left and right eyes)in same subject, or in closely related subjects(eg, siblings for genetics studies)

• Clustered: multiple observations on eachsubject

• When designing study and conductinganalyses, need to use methods appropriate todata type

DESIGNING A(CONFIRMATORY) STUDY

• First requirement: a specifichypothesis to be tested– What will be measured– Criteria for “success”

• Usual convention: establish a “nullhypothesis,” then attempt to disprove

• Need to be specific—should be noambiguity about primary hypothesis

• Possible ambiguities not always obvious

EXAMPLE: STUDYING“LIQUID STITCHES”

• New material to apply to wound, stopbleeding

• Need to study how quickly andeffectively bleeding is stopped

• Possible outcomes of interest:– time to cessation of bleeding– whether bleeding stopped within X sec– total amount of blood loss– whether further effort was needed to

stop bleeding– whether blood loss greater than Y ml

OTHER AMBIGUITIES• Given a specific hypothesis, what

statistical test will be used to evaluatethat hypothesis?

• What potentially confounding variableswill be accounted for in the analysis?– size of subject– size of wound– Other demographics: age, gender, …

• How will missing data be handled?

COMMON PHRASES RELATEDTO “MULTIPLE TESTING”

• Testing to a foregone conclusion

• Data dredging

• Torturing the data until they confess

SIGNIFICANCE LEVEL• The significance level is also commonly

referred to as– The p-value– The alpha level– The false positive rate– The Type I error

• It is defined as the probability of seeing aneffect of a specified size just bychance—that is, if there really were noeffect at all (the “null hypothesis”)

• We must specify a significance level when wedesign our experiment

Bell-Shaped (Normal) Curve

ONE-SIDED OR TWO-SIDED

• A one-sided (or one-tailed) test is one thatlooks for effects in only one direction– “side” or “tail” refers to extreme end of normal

(bell-shaped) curve• If all of the false positive error is put into

one of the tails, requirement for significanceis less stringent

• No wide consensus among statisticians as towhen one-sided tests are OK or when two-sided tests are needed

VIEWS ON ONE-SIDEDTESTING

• One view: one-sided tests OK when aneffect is possible in one direction only– Example: a treatment to increase height

• Another view: one-sided tests OKanytime an effect is only of interest inone direction– Example: when evaluating a new drug for

regulatory approval, action is taken only ifthere is a positive effect: negative effectsare possible but treated like zero effects

POWER• The power of a study is the probability that it

will yield a statistically significant result ifthere truly is an effect

• 1 minus power is the Type II error, or falsenegative rate, or beta error

• Power depends on– The size of the effect– The size of the study– The false positive rate you can live with (if we

declare all experiments a success, we will have100% power but a very high false positive rate)

FACTS ABOUT POWER• There is always some effect for which power

is high, even with a small sample size• For a given sample size, the power to detect

and effect is higher when the effect ismeasured by a continuous variable (eg, labvalue) than a yes-no variable (eg, mortality atday 10)

• One typically wants a hypothesis-testingstudy to have power of 80-90%

DETERMINING SAMPLE SIZE

• A study should be large enough so thatif there is an effect of a size worthknowing about, the study willdemonstrate the effect

• To calculate the sample size, need– Effect size of interest– Error rates we will tolerate– Variability of outcome measure

COMPARISON OFCONTINUOUS OUTCOMES

• “Standardize” effect size by dividingeffect size of interest to confirm, byexpected SD

• For given power and significance level,sample size increases rapidly as thedesired effect size gets smaller

TWO-SIDED 0.05SIGNIFICANCE LEVEL

SAMPLE SIZE BY EFFECT SIZE

0.2 0.4 0.6 0.8

68120260110090% power

529220080080% power

COMPARISON OFRATES/PROPORTIONS

• Need larger sample sizes when trying todetect differences between proportions

• Reason: 0-1 data are less informativethan continuous data

• Use binomial distribution rather thannormal distribution

• For calculation need to specifydifference of interest, expectedproportion in control group, and errorprobabilities

two-sided 0.05 significance level

SAMPLE SIZE BY POWERAND RATES OF INTERESTEvent/success rates pwr=0.80 pwr=0.90

0.20 vs 0.40 182 236

0.40 vs 0.60 214 278

0.10 vs 0.20 438 572

0.20 vs 0.30 626 824

DESCRIBING DATA

• Two basic aspects of data– Centrality– variability

• Different measures for each• Optimal measure depends on type of

data being described

CENTRALITY• Mean

– Sum of observed values divided by number ofobservations

– Most common measure of centrality– Most informative when data follow normal

distribution (bell-shaped curve)• Median

– “middle” value: half of all observed values aresmaller, half are larger

– Best centrality measure when data are skewed• Mode

– Most frequently observed value

MEAN CAN MISLEAD• Group 1 data: 1,1,1,2,3,3,5,8,20

– Mean: 4.9 Median: 3 Mode: 1• Group 2 data: 1,1,1,2,3,3,5,8,10

– Mean: 3.8 Median: 3 Mode: 1• When data sets are small, a single extreme

observation will have great influence on mean,little or no influence on median

• In such cases, median is usually a moreinformative measure of centrality

TIME-TO-EVENT DATA• In many experiments, outcome of interest is

time to some event– Death– Resolution of disease/symptom– First symptom manifestation

• Such data are typically not normallydistributed; tend to follow an exponentialdistribution

• Data may be truncated (eg, all animalssacrificed at day X, so X is longest observabletime)

• Medians typically used for such data

VARIABILITY• Most commonly used measure to

describe variability is standarddeviation (SD)

• SD is a function of the squareddifferences of each observation fromthe mean

• If the mean is influenced by a singleextreme observation, the SD willoverstate the actual variability

ALTERNATIVE TO SD

• When using median as centralitymeasure, can describe variability byproviding range (min, max) andinterquartile range (25th and 75th

percentiles)• Graphical presentation often provides

best sense of variability

EXAMPLE

• Group 1 data: 1,1,1,2,3,3,5,8,20– Mean: 4.9 Median: 3

• Group 2 data: 1,1,1,3,3,3,5,8,10– Mean: 3.8 Median: 3

• SDs: group 1: 6.1 group 2: 3.2• Interquartile range: 1,5

CONFIDENCE INTERVALS• A confidence interval is intended to provide a

sense of the variability of an estimated mean• Can be defined as the set of possible values

that includes, with specified probability, thetrue mean

• Confidence intervals can be constructed forany type of variable, but here we consider themost common case of a normally distributedvariable

CONSTRUCTING ACONFIDENCE INTERVAL

• First, determine what level of probabilityshould define the interval

• Second, find the normal value (or z-value)that corresponds to that probability– 99%: 2.58– 95%: 1.96– 90%: 1.64

• Third, multiply the z-value by the standarderror of the mean

Bell-Shaped (Normal) Curve

FACTS ABOUT CONFIDENCEINTERVALS

• The more sure you want to be that thetrue value is included in your interval,the wider the interval will be– A 99% confidence interval will be wider

than a 95% confidence interval• Most common size confidence interval is

95%, but 90% and even 80% confidenceintervals are sometimes used

VALUE OF CONFIDENCEINTERVALS

• Two data sets may have the same mean;but if one data set has 5 observationsand the second has 500 observations,the two means convey very differentamounts of information

• Confidence intervals remind us howuncertain our estimate really is

FINAL COMMENTS

• Statistics are only helpful if theapproach taken is appropriate to theproblem at hand

• Most statistical procedures are basedon some assumptions about thecharacteristics of the data—these needto be checked

• Remember GIGO

TOPICS IN BIOSTATISTICS: PART 1

Documents

Transcript of TOPICS IN BIOSTATISTICS: PART 1