Ch2. Contingency Tables 2 - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung1/3.pdfβ...
Transcript of Ch2. Contingency Tables 2 - KOCWcontents.kocw.net/KOCW/document/2015/gachon/kimnamhyoung1/3.pdfβ...
-
Ch2. Contingency Tables_2
Namhyoung Kim
Dept. of Applied Statistics
Gachon University
1
-
2.3 The Odds Ratio
β’ For a probability of success ππ, ππππππππ = ππ/(1 β ππ) =prob. of success/prob. of failure β’ The odds are nonnegative
ππ =ππππππππ
ππππππππ + 1
β’ In 2x2 tables, ππππππππ1 = ππ1/(1 β ππ1) and ππππππππ2 =ππ2/(1 β ππ2)
β’ The odds ratio ππ: another measure of association
ππ =ππππππππππππππππππ
=ππ1/(1 β ππ1)ππ2/(1 β ππ2)
2
-
Properties of the Odds Ratio
3
β’ The odds ratio can equal any nonnegative number.
β’ When X and Y are independent, ππ1 = ππ2 odds1=odds2 and ππ = 1
β’ When ππ >1, the odds of success are higher in row 1 than in row 2.
β’ Values of ππ father from 1.0 in a given direction represent stronger association.
-
Properties of the Odds Ratio
4
β’ When one value is the inverse of the other represent the same strength of association, but in opposite direction ππ=0.25 is equivalent to ππ=1/0.25=4
β’ The odds ratio does not change value when the table orientation reverses β it is unnecessary to identify one
classification as a response variable in order to estimate ππ (cf. the relative risk requires this)
-
Properties of the Odds Ratio
5
β’ When both variables are response variables ππ = ππ11/ππ12
ππ21/ππ22= ππ11ππ22
ππ12ππ21
β’ The odds ratio is also called the cross-product ratio.
β’ The sample odds ratio
πποΏ½ =ππ1/(1 β ππ1)ππ2/(1 β ππ2)
=ππ11/ππ12ππ21/ππ22
=ππ11ππ22ππ12ππ21
β’ This is the ML estimator of ππ
-
Example: Odds Ratio for Aspirin Use and Heart Attacks
β’ For the physicians taking placebo, the estimated odds of MI : n11/n12=189/10845=0.0174
β’ For those taking aspirin : 104/10933=0.0095 β’ The sample odds ratio οΏ½ΜοΏ½π =0.0174/0.0095=1.832 The estimated
odds were 83% higher for the placebo group
6
-
Inference for Odds Ratios and Log Odds Ratios
β’ Unless the sample size is extremely large, the sampling distribution of the odds ratio is highly skewed. (positive skew, skewed to the right)
β’ Because of this skewness, use an alternative but equivalent measure log(ππ)
β’ independence corresponds to log(ππ)=0 β’ The log odds ratio is symmetric about
zero
7
-
Inference for Odds Ratios and Log Odds Ratios
β’ Its approximating normal dist. has a mean of log(ππ) and a SE
ππππ =1ππ11
+1ππ12
+1ππ21
+1ππ22
β’ C.I. for log(ππ) log πποΏ½ Β± π§π§πΌπΌ
2(ππππ)
β’ Exponentiating endpoints of this C.I. yields one for ππ
8
-
Inference for Odds Ratios and Log Odds Ratios
β’ For Table 2.3, log(1.832)=0.605
β’ ππππ = 1189
+ 110933
+ 1104
+ 110845
= 0.ππ3
β’ a 95% C.I. for logππ equals 0.605 Β±1.96(0.123) or (0.365,0.846) β’ the corresponding C.I. for ππ is [exp(0.365), exp(0.846)]=(1.44, 2.33)
9
-
Inference for Odds Ratios and Log Odds Ratios
β’ The sample odds ratio πποΏ½ equals 0 or β if any ππππππ=0, and it is undefined if both entries in a row or column are zero.
β’ The slightly amended estimator
πποΏ½ =(ππ11 + 0.5)(ππ22 + 0.5)(ππ12 + 0.5)(ππ21 + 0.5)
10
-
Relationship Between Odds Ratio and Relative Risk
β’ Odds ratio= ππ1/(1βππ1)ππ2/(1βππ2)
= π π π π π π π π π π π π π π π π πππ π ππππ Γ (1βππ2)(1βππ1)
β’ When ππ1 and ππ2 are both close to zero, the fraction in the last term of this expression equals approximately 1.0 odds ratio and relative risk take similar values
β’ For Table 2.3, the sample odds ratio of 1.83 is similar to the sample relative risk of 1.82
β’ In such a case, an odds ratio of 1.83 does mean that ππ1 is approximately 1.83 times ππ2
11
-
The Odds Ratio Applies in Case-Control Studies
β’ The marginal dist. of MI is fixed by the sampling design. (each case was matched with two control patients)
β’ The outcome measured for each subject is whether she was a smoker
β’ The study, which uses a retrospective design to look into the past, is called a case-control study β common in health-related applications
12
-
The Odds Ratio Applies in Case-Control Studies
β’ estimate the conditional distribution of smoking status, given MI status. β for women suffering MI, 172/262=0.656 β for women who had not suffered MI,
173/519=0.333 β’ the sample odds ratio is [0.656/(1-
0.656)]/[0.333/(1-0.333)]=(172x346)/(173x90)=3.8 β’ if we expect P(Y=1|X) to be small, then the
sample odds ratio as a rough indication of the relative risk women who had ever smoked were about four times as likely to suffer MI as women who had never smoked.
13
-
Types of Observational Studies
β’ retrospective design(νν₯μ μ°κ΅¬μ€κ³) β case-control study
β’ prospective design(μ ν₯μ μ°κ΅¬μ€κ³) β cohort study β clinical trials
β’ cross-sectional design(ν‘λ¨μ°κ΅¬μ€κ³)
β’ Observational study β case-control, cohort, and cross-sectional design
β’ Experimental study β a clinical trial
14
-
2.4 Chi-Squared Tests of Independence
β’ Consider the null hypothesis (H0) that cell probabilities equal certain fixed value {ππππππ}
β’ For a sample size n with cell counts {ππππππ}, the values {ππππππ = ππππππππ} are expected frequencies.
β’ To judge whether the data contradict H0, we compare {ππππππ} to {ππππππ}
β’ The larger the differences {ππππππ β ππππππ}, the stronger the evidence against H0.
15
-
Pearson Statistics and the Chi-squared Distribution
β’ The Pearson chi-squared statistic for testing H0
β’ ππ2 = β (ππππππβππππππ)2
ππππππ
β’ This statistic takes its minimum value of zero when all ππππππ = ππππππ
β’ For a fixed sample size, greater differences {ππππππ β ππππππ} produce larger ππ2 values and stronger evidence against H0
16
-
Pearson Statistics and the Chi-squared Distribution
β’ The ππ2 statistic has approximately a chi-squared distribution, for large n.
17
-
Pearson Statistics and the Chi-squared Distribution
β’ The chi-squared approximation improves as {ππππππ} increase, and {ππππππ β₯5} is usually sufficient
β’ The chi-squared dist. is concentrated over nonnegative values.
β’ It has mean equal to its degrees of freedom(df), and its standard deviation equals (πππππ)
β’ The distribution is skewed to the right, but it becomes more bell-shaped(normal) as df increases.
β’ the df value equals the difference between the number of parameters in the alternative hypothesis and in the null hypothesis.
18
-
Likelihood-Ratio Statistic β’ likelihood function: the probability of the data, viewed
as a function of the parameter once the data are observed
β’ The likelihood-ratio method for significance tests test statistics uses the ratio of the maximized likelihoods :
βπ logπππ π πππ π ππππππ π π π π πππ π π π π π πππππππ π€π€ππ π ππ πππ π πππ π πππ π π π π π ππππ πππ π π π π π πππππ π π»π»0
πππ π πππ π ππππππ π π π π πππ π π π π π πππππππ π€π€ππ π ππ πππ π πππ π πππ π π π π π ππππ π π πππ π πππππππ π πππ π πππ π π’π’π π π π ππ
β’ For two-way contingency tables with the multinomial dist., the likelihood-ratio statistic simplifies to
πΊπΊ2 = πβππππππlog (ππππππππππππ
)
β’ This statistic is called the likelihood-ratio chi-squared statistic.
19
-
Tests of Independence
β’ The null hypothesis of statistical independence is
H0 : ππππππ = ππππ+ππ+ππ for all i and j β’ the expected frequency ππππππ = ππππππππ =ππππππ+ππ+ππ
β’ estimated expected frequencies οΏ½ΜοΏ½πππππ = ππππππ+ππ+ππ = ππ
ππππ+ππ
ππ+ππππ
=ππππ+ππ+ππππ
20
-
Tests of Independence
β’ For testing independence in IxJ contingency tables, the Pearson and likelihood-ratio statistics equal
β’ ππ2 = β (ππππππβπποΏ½ππππ)2
πποΏ½ππππ,πΊπΊ2 = πβππππππlog (
πππππππποΏ½ππππ
)
β’ Their large-sample chi-squared dist. have df=(I-1)(J-1)
21
-
Example: Gender Gap in Political Affiliation
22
-
Example: Gender Gap in Political Affiliation
23
-
Residuals for Cells in a Contingency Table
β’ For the test of independence, a useful cell residual is
ππππππ β οΏ½ΜοΏ½ππππποΏ½ΜοΏ½πππππ(1 β ππππ+)(1 β ππ+ππ)
β’ The ratio is called a standardized residual. β’ When H0 is true, each standardized
residual has a large-sample standard normal distribution.
24
-
β’ Positive residuals for female Democrats and male Republicans more female Democrats and male Republicans than the hypothesis of independence predicts
Residuals for Cells in a Contingency Table
25
-
Partitioning Chi-Squared
β’ One chi-squared statistic with df1 + a separate, independent, chi-squared statistic with df2 = a chi-squared distribution with df1+df2 β For example, suppose we have two 2x3
tables, then the sum of the ππ2 or πΊπΊ2 values from the two tables is a chi-squared statistic with df=2+2=4
26
-
Partitioning Chi-Squared
β’ Chi-squared statistics having df>1 can be broken into components with fewer degrees of freedom. β For testing independence in 2xJ tables,
df=(J-1) and a chi-squared statistic can partition into J-1 components
27
-
Comments About Chi-Squared Tests
β’ limitations β merely indicate the degree of evidence for
an association β require large samples β treat both classifications as nominal
28
Ch2. Contingency Tables_22.3 The Odds RatioProperties of the Odds RatioProperties of the Odds RatioProperties of the Odds Ratio Example: Odds Ratio for Aspirin Use and Heart AttacksInference for Odds Ratios and Log Odds RatiosInference for Odds Ratios and Log Odds RatiosInference for Odds Ratios and Log Odds RatiosInference for Odds Ratios and Log Odds RatiosRelationship Between Odds Ratio and Relative RiskThe Odds Ratio Applies in Case-Control StudiesThe Odds Ratio Applies in Case-Control StudiesTypes of Observational Studies2.4 Chi-Squared Tests of IndependencePearson Statistics and the Chi-squared DistributionPearson Statistics and the Chi-squared DistributionPearson Statistics and the Chi-squared DistributionLikelihood-Ratio StatisticTests of IndependenceTests of IndependenceExample: Gender Gap in Political AffiliationExample: Gender Gap in Political AffiliationResiduals for Cells in a Contingency TableResiduals for Cells in a Contingency TablePartitioning Chi-SquaredPartitioning Chi-SquaredComments About Chi-Squared Tests