Lecture 3 – Data Summary Measures and Graphical Display of Results
description
Transcript of Lecture 3 – Data Summary Measures and Graphical Display of Results
![Page 1: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/1.jpg)
Lecture 3 – Data Summary Measures and Graphical
Display of Results
Univariate Data –
Analysis of one variable at a time
![Page 2: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/2.jpg)
Why Think About/Explore Data?• Done to accomplish:
– Checking for data entry errors– Describing demographic and study
characteristics– Examining distributions of outcomes
•Central tendency•Variability
– Checking for outliers– Checking assumptions for subsequent
analyses– Give a picture of your sample
![Page 3: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/3.jpg)
• In order to understand choices of which statistics could be appropriate, it is paramount to ascertain what measurement level the outcome (s) and predictor (s) have.
Dependent variable = outcome Independent variable = predictor
![Page 4: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/4.jpg)
Types of DataNominal – Qualitative Data
Measured in unordered categories
Ordinal – Qualitative Data Measured in ordered categories
Continuous – Quantitative Data Measured on a continuum
(summarize with %’s):
(summarize with %’s):
summarize with Many Summary Measures
![Page 5: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/5.jpg)
Types of DataNominal – Qualitative Data
Measured in unordered categoriesRace Blood TypeDead/Alive
Ordinal – Qualitative Data Measured in ordered categoriesCancer StagesSocio-economic Status (low, med, hi)
Continuous – Quantitative Data Measured on a continuumSerum CreatinineHeight/Weight/BMI
Gender On Dialysis/Not on Dialysis
Likert (unlikely, somewhat unlikely, neutral, likely, very likely)
Systolic Blood PressureDiastolic Blood PressureOthers???
![Page 6: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/6.jpg)
Continuous (Numerical)
Mean Arithmetic AverageSum of Values/Number of ValuesNice mathematical/statistical properties
Median (a.k.a 50th Percentile)Value where half the sample is above, half
the sample is belowBetter measure for skewed data. Robust to
Extreme values
ModeMost Frequently Occurring value in Sample
Measures of Location
![Page 7: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/7.jpg)
Continuous (Numerical)NORMAL DISTRIBUTION
![Page 8: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/8.jpg)
Measures of VariabilityMeasures of Variability
• Range = (maximum - minimum)
• Interquartile range = (Q3 – Q1) always covers half the sample (75th - 25th percentile)
• Variance = average of the squares of the deviations of the observations from their mean
• Standard deviation =
Variance
Continuous (Numerical)
n
i
i
n
xx
1
2
1
)(var
![Page 9: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/9.jpg)
Continuous (Numerical)NORMAL DISTRIBUTION
http://www.stattucino.com/berrie/dsl/index.html
![Page 10: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/10.jpg)
Describing Data using Numerical Summaries
Descriptive statistics:
Explore data in order to describe their main features
Get an initial picture of data sample
![Page 11: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/11.jpg)
Let’s Talk Data…
![Page 12: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/12.jpg)
Categorical
GenderN %
Female 6163
38.4%
Male 3837
61.6%
DialysisN %
No 8093 80.9%
Yes 1907 19.1%
0%
20%
40%
60%
80%
Gender
Female Male
0%
20%
40%
60%
80%
100%
Dialysis
No Yes
![Page 13: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/13.jpg)
CategoricalRace
N %
Black 1942
19.4%
Hispanic 723 7.2%
Other 1068
10.7%
White 6267
62.7%
EducationN %
Elementary 1491
14.9%
High School Grad
2640
26.4%
College Grad 3246
32.5%
Post Graduate
2616
26.2%
0%
20%
40%
Education
Elementary High School Grad College Grad Post Graduate
0%
20%
40%
60%
80%
Race/Ethnicity
Black Hispanic Other White
![Page 14: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/14.jpg)
CategoricalRace
N %
Black 1942
19.4%
Hispanic 723 7.2%
Other 1068
10.7%
White 6267
62.7%
EducationN %
Elementary 1491
14.9%
High School Grad
2640
26.4%
College Grad 3246
32.5%
Post Graduate
2616
26.2%
0%
20%
40%
Education
Elementary High School Grad College Grad Post Graduate
0%
20%
40%
60%
80%
Race/Ethnicity
Black Hispanic Other White
![Page 15: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/15.jpg)
Continuous
![Page 16: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/16.jpg)
BMIMeasure
Mean 32.2
Std Dev 5.46
Median 31.8
Minimum 16.0
Maximum 50.7
25th Percentile
28.2
75th Percentile
35.9
Mode 29.0
![Page 17: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/17.jpg)
N = 115
BMIMeasure
Mean 32.0
Std Dev 5.34
Median 31.2
Minimum 21.8
Maximum 44.5
25th Percentile
28.5
75th Percentile
34.8
Mode .
![Page 18: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/18.jpg)
BMIMean: 32.2
Std: 5.4
Median: 31.8
![Page 19: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/19.jpg)
Mean: 136.3
Std: 17.1
Median: 135
![Page 20: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/20.jpg)
Mean: 189.77
Std: 148.9
Median: 154.11
![Page 21: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/21.jpg)
Fra
ctio
n
z-3.19068 3.16666
0
.224
Fra
ctio
n
x-29.644 -.540257
0
.1955
Fra
ctio
n
z.397801 31.7841
0
.1995
Shape of a distributionsymmetric
skewed tothe right
skewed tothe left
Mean greater than Median(positively skewed)
Mean less than Median(negatively skewed)
![Page 22: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/22.jpg)
Mean: 136.3
Std: 17.1
Median: 135
Skewness: 0.38
![Page 23: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/23.jpg)
Mean: 189.77
Std: 148.9
Median: 154.11
Skewness: 5.63
![Page 24: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/24.jpg)
NORMAL DISTRIBUTION
Normal Distribution – Has Excellent Statistical Properties
Many Statistical techniques require normal distributions
If data does not have Normal Distribution, need to consider alternative techniques appropriate for data
![Page 25: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/25.jpg)
Box (and Whisker) PlotsBox (and Whisker) Plots• A graph of the 5 number summary
with suspected outliers plotted individually
• 5 number summary: Min, Q1, Median, Q3, Max• A line somewhere inside the box marks
the Median• IQR = Q3 – Q1• Cases more than 1.5*IQR are plotted
individually (possible outliers)• Lines from the box extend to the
smallest and largest values that are not more than 1.5*IQR
![Page 26: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/26.jpg)
median
25th Percentile
75th Percentile
mean
1.5 x IQR
Outlier
![Page 27: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/27.jpg)
![Page 28: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/28.jpg)
Skewed to the right Skewed to the leftSymmetric
+
+
+
![Page 29: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/29.jpg)
Normal Probability PlotNormal Probability Plot
• Plot that can help assess normality.
• Idea: plot the observed levels of the variable against the expected levels corresponding to a Normal distribution.
• If data lie in a reasonably straight diagonal line, then assumption of Normality is reasonable.
![Page 30: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/30.jpg)
Normal Probability PlotsNormal Probability Plots
BMI
Triglycerides
![Page 31: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/31.jpg)
Error Error Bar Bar
PlotsPlots
Circle denotes the mean and the bars denote the standard deviation (in this case).
![Page 32: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/32.jpg)
Part II – Measures of Association
(plus a little more)
![Page 33: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/33.jpg)
Measures of Association• Continuous Variables
– Correlation– Agreement (reliability)
• Categorical Variables– Two-way layout (2×2 tables)– “Risk” measures– Agreement– Others
![Page 34: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/34.jpg)
Two Continuous Variables
Correlation– General sense: the relationship between two
variables (quantitative or qualitative)– Narrow (statistical) sense: measure of
interdependence between two continuous random variables
• The degree to which increases or decreases in Y occur with increases or decreases in X
• Values range between -1 (perfect discordance) and 1 (perfect concordance)
• A value of 0 indicates no association
![Page 35: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/35.jpg)
Pearson Correlation
Data
Subject # X Y
1 x1 y1
2 x2 y2 . . .
.
.
.
.
.
. n xn yn
Purpose - measures linear association between two continuous variables X and Y
![Page 36: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/36.jpg)
Pearson CorrelationThe Pearson (product-moment) correlation coefficient can be calculated for 2 continuous variables in a sample (regardless of distribution) using the formula:
N
1i
2
i
N
1i
2
i
N
1iii
xy
YYXX
YYXXrrρ̂
![Page 37: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/37.jpg)
Correlation Figures
•
•• •••
•• •
••
•
••••••••
••••••••••
•••
••
••• •• ••
•
•
•• • • •
••••• •
••
•••
No relationship X
YA B C
D E
Perfect positive relationship Perfect negative relationship
Moderate positive relationship Strong negative relationship
•
••
•••
•
ρ = 0
ρ = 1ρ = -1
ρ = 0.5 ρ = -0.8
![Page 38: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/38.jpg)
Correlation Inference• Easy “large sample” test for H0: ρ=0
For n ≥ 25, compute
which has N(0, ) distribution under H0
• This test assumes X,Y~ NBiv(μX, μY, σX
2, σX2, ρ)
e
ˆ1 1 ρlog
ˆ2 1 ρ
Many times a tenuous assumption!• Beware positive skewness & outliers• Beware data not truly continuous
1
(n-3)
![Page 39: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/39.jpg)
Timeout: ASSUMPTIONS• As with any mathematical or physical
model, model assumptions are critical to making the correct inference
• Dealing with assumptions has lead to development of:– Nonparametric statistics: techniques that
reduce or eliminate dependence on the underlying distribution of the data
– Robust statistics: techniques that are affected little by departures from assumptions
![Page 40: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/40.jpg)
Correlation (resumed)• A nonparametric version of the correlation
coefficient: Spearman’s Rank Correlation
• Like ρ, rs :
– ranges from -1 to 1– 0 no correlation, 1 perfect agreement– only requires ordinal data
2i i
s 2
6 [R(X ) R(Y)]r 1
n(n 1)
where R( ) is the of the variable
i
rank
![Page 41: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/41.jpg)
Correlation Example: SBP and DBPSBP DBP R(SBP) R(DBP)
141.8 89.7 12 14
140.2 74.4 8.5 1
131.8 83.5 3 4
132.5 77.8 4 2
135.7 85.8 7 7
141.2 86.5 11 10
143.9 89.4 14 13
140.2 89.3 8.5 12
140.8 88.0 10 11
131.7 82.2 2 3
130.8 84.6 1 6
135.6 84.4 6 5
143.6 86.3 13 9
133.2 85.9 5 8
![Page 42: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/42.jpg)
Correlation Example: SBP and DBP
SB
P
125
130
135
140
145
DBP
70 75 80 85 90
• All Data: ρ = 0.42; rs = 0.71
• Outlier deleted: ρ = 0.75; rs = 0.82
![Page 43: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/43.jpg)
Questions –
1.Can we calculate a correlation coefficient between the incomes of a group of people and what city they live in?
Correlation Coefficient
No, we cannot, since city is a categorical variable. Correlation requires that both variables be quantitative.
![Page 44: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/44.jpg)
Questions –
2.Does it change the correlation between height and weight if we measure height in inches rather than centimeters and weight in pounds rather than kilograms?
Correlation Coefficient
No. Because ρ (and r) uses the standardized values of the observations, ρ does not change when we change the units of measurements of x , y, or both.The correlation ρ itself has no unit of measure; it is just a number.
![Page 45: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/45.jpg)
Question –
3.Does ρ = 0 mean there is no relationship between X and Y ?
Correlation Coefficient
Correlation only measures the strength of the linear relationship between two variables. Correlation does not describe nonlinear relationships between two variables, no matter how strong they are.
x
y •
• •••••
••••••
![Page 46: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/46.jpg)
Correlation and Regression
••
••• •• ••
•
•
•• • • •
••••• •
••
•••
Moderate positive relationship Strong negative relationship
ρ = 0.5 ρ = -0.8
2i
Y
2Xi
(Y Y)σn-1ˆ ˆ ˆβ = ρ = ρσ(X X)
n-1
Y Y
X X
Y = α+βX
![Page 47: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/47.jpg)
Correlation and RegressionS
BP
125
130
135
140
145
DBP
70 75 80 85 90
SBP = 40.1 + 1.12×DBP
DBP = 16.3 + 0.51×SBP
SBP and DBP example (continued)
σSBP= 4.9 (mmHg)
σDBP= 3.3 (mmHg)
ρ = 0.75
4.90.75
3.3
3.30.75
4.9
![Page 48: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/48.jpg)
Correlation and Covariance• Suppose two random variables, X and Y:
E(X) = μX, V(X) = σX2; E(Y) = μY, V(Y) = σY
2; and Corr(X,Y) = ρ
• Define Cov(X,Y) = E[(X-μX)(Y-μY)]
Note: Cov(X,X) = E[(X-μX)(X-μx)] = E(X-μX)2 = σX2
• Population correlation (ρ) is defined as:
• Thus Cov(X,Y) = ρσXσY
X Y
X Y X Y
E[(X-μ )(Y-μ )] Cov(X,Y)ρ =
σ σ σ σ
![Page 49: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/49.jpg)
Correlation and Covariance
What’s the big deal about covariance?Use it to find the variance of functions of
random variables, e.g.:
In general:2 2 2 2
X YV(aX+bY) = a σ b σ 2abCov(X,Y)
2 2X YV(X-Y) = σ σ 2Cov(X,Y)
2 2X YV(X+Y) = σ σ 2Cov(X,Y)
![Page 50: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/50.jpg)
Correlation as AgreementSBP1 SBP2
141.8 139.7
140.2 144.4
131.8 133.5
132.5 127.8
135.7 135.8
141.2 146.5
143.9 139.4
140.2 139.3
140.8 138.0
131.7 132.2
130.8 134.6
135.6 134.4
143.6 146.3
133.2 135.9
Suppose two nurses are measuring SBP in the same patients and each nurse measures SBP 3 times in each patient.
![Page 51: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/51.jpg)
Correlation as Agreement• Could use Pearson correlation
• Another measure, intraclass correlation– Can separate the variance into two sources: between-
subject and within-subject– The intraclass correlation is the ratio of the within-
subject to the total (i.e., within + between)– By definition, intraclass correlation ranges from 0 to 1– Best measure of the “individual” touch
• In SBP example:
ρ(Pearson) = 0.809 ρ(Intraclass) = 0.814
![Page 52: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/52.jpg)
Things to Remember AboutCorrelation
• 5 warnings (adopted from Huck):
1. Does not speak to cause-and-effect
2. Beware outliers
3. Assumes linear relationship
4. Correlation vs. Independence Zero correlation implies independence for
Normal distribution only
5. Strength of relationship WRT trend
![Page 53: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/53.jpg)
Categorical Outcomes: Two-way Tables
• Prospective DesignRelative Risk (RR)
P(Disease in Exposed Group) P(D|E)
P(Disease in Unexposed Group) P(D|E)
• Retrospective DesignOdds Ratio (OR)
=
P(E|D)P(Exposure in Cases)P(E|D)1-P(Exposure in Cases) P(E|D)P(E|D)
P(E|D)P(E|D)P(Exposure in Controls) P(E|D)1-P(Exposure in Controls) P(E|D)
![Page 54: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/54.jpg)
Two-way TablesDisease
Yes No
Yes a b a+b
No c d c+d
a+c b+d n=a+b+c+dExp
osur
e
P(D|E) = a/(a+b)
P(D|E) = c/(c+d)
P(E|D) = a/(a+c)
P(E|D) = b/(b+d)
Prospective Retrospective
E
a dada+c b+d
OR = = c b bc
a+c b+d
acRR =
ac+bc
![Page 55: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/55.jpg)
Two-way Tables• Prospective design and relative risk (RR)
are optimal
• Retrospective designs and odds ratio (OR) are easiest (cheapest)
• Can compute OR for prospective design
D
a dada+b c+d
OR = = b c bc
a+b c+d
![Page 56: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/56.jpg)
Two-way Table• Why we like the odds ratio…
The exposure odds ratio is equivalent to the disease odds ratio!
• Regardless of study design (i.e., which margin is fixed) the estimate of the OR is the same
D E
a d a dada+b c+d a+c b+d
OR = = = = OR b c b cbc
a+b c+d b+d a+c
![Page 57: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/57.jpg)
Two-way TablesCancer
Yes No
Yes 35 25 60
No 5 35 40
40 60 100
Sm
oke
35 5RR = = 4.7
35 5+25 5
35 35OR = = 9.8
25 5
![Page 58: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/58.jpg)
Two-way TableWhy we like the odds ratio – Part II
• For retrospective design, if…– Cases are representative of the population of
all cases– Controls are representative of the population
of all controls– The disease is “rare” (i.e., prevalence <20%)
Then OR ≈ RR
![Page 59: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/59.jpg)
Two-way TablesCancer
Yes No
Yes 75 325 400
No 25 575 600
100 900 1000
Sm
oke
35 5RR = = 4.5
35 5+25 5
35 35OR = = 5.3
25 5
![Page 60: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/60.jpg)
Other Measures From Clinical TrialsOutcome
Yes No
Experimental 15 135 150
Control 100 150 250
115 285 400Tre
atm
ent
P(O|E) = 15/150 = 0.1 P(O|C) = 100/250 = 0.4
RR = P(O|E)/P(O|C) = 0.25
• Absolute Risk Reduction (ARR) = P(O|C) - P(O|E) = 0.3• Relative Risk Reduction (RRR) = 1 – RR = 0.75• Number Needed to Treat (NNT) = 1/ARR = 3.33 (number needed to treat in the population to prevent 1 outcome event)
![Page 61: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/61.jpg)
Things to Remember About Measures of Association
1. Beware: some sources use “odds ratio” and “relative risk” interchangeably
– In most settings, OR overestimates RR
2. Be on guard when considering ARR, RRR, and NNT
– Almost never see a SE or CI estimate– Should be based on large, well planned,
prospective studies
![Page 62: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/62.jpg)
Categorical Measures of Agreement
• The “kappa” coefficient or κ • Example: two physicians diagnosing a disease
Here pa, pb, pc, pd are the proportions of subjects, not the number of subjects.
DOCTOR B
Disease No Disease
Disease pa pb pA
No Disease pc pd qA
pB qB 1DO
CT
OR
A
a d b c
A B B A
2(p p p p )κ̂
p q p q
![Page 63: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/63.jpg)
Kappa ExamplePsychiatrist B
Neurosis Normal
Neurosis 0.04 0.06 0.10
Normal 0.01 0.89 0.90
0.05 0.95 1.00Psy
chia
tris
t A
2(0.04 0.89 0.06 0.01)κ̂ 0.50
0.10 0.95 0.05 0.90
• Kappa is a categorical analog of the intraclass correlation• Kappa can be computed for any “square” (k×k) tables
![Page 64: Lecture 3 – Data Summary Measures and Graphical Display of Results](https://reader036.fdocuments.us/reader036/viewer/2022062314/56813acb550346895da2e737/html5/thumbnails/64.jpg)
Schedule