Post on 03-Jun-2018
8/12/2019 Inferences on Two-Way Contingency Tables
1/45
INFERENCES ON TWO-WAY
CONTINGENCY TABLES
8/12/2019 Inferences on Two-Way Contingency Tables
2/45
DIFFERENCE OF PROPORTIONS
Suppose denote the (conditional) probability
of success for row i. Then the difference of
proportions ( ) compares the success
probabilities in the two rows, i andj.
Note:1 1
8/12/2019 Inferences on Two-Way Contingency Tables
3/45
DIFFERENCE OF PROPORTIONS
estimates the true difference .
=
+
+
[Large Sample] %CI: [due to Walds]
8/12/2019 Inferences on Two-Way Contingency Tables
4/45
EXAMPLE # 1
The following table is from a report on the relationship between
aspirin use and myocardial infarction (heart attacks) by the
Physicians Health Study Research Group at Harvard MedicalSchool. The Physicians Health Study was a five-yearrandomized study testing whether regular intake of aspirin
reduces mortality from cardiovascular disease. Every other day,
the male physicians participating in the study took either one
aspirin tablet or a placebo. The study was blind thephysicians in the study did not know which type of pill theywere taking.
8/12/2019 Inferences on Two-Way Contingency Tables
5/45
EXAMPLE # 1
8/12/2019 Inferences on Two-Way Contingency Tables
6/45
EXAMPLE # 1
(a) Estimate the probability of suffering myocardial
infarction (MI) for both placebo and aspirin groups.
(b) Construct a 95% CI for the true difference of
probabilities of heart attack between male physicians who
took placebo and those who took aspirin. From this,
determine if aspirin is effective in diminishing the risk of
heart attack?
8/12/2019 Inferences on Two-Way Contingency Tables
7/45
RELATIVE RISK
For 2-by-2 tables, the relative risk(RR) is the ratio
= /
where it can be any non-negative number. RR = 1.0 iff
= .
/estimates the true ratio (RR) /.
8/12/2019 Inferences on Two-Way Contingency Tables
8/45
RELATIVE RISK
The importance of RR is due to the importance ofdifferences of a certain fixed size when proportions of
success (in all levels of ) are close to 0 or 1. That is, whilethe same difference was observed for (a) 0.010 and 0.001and (b) 0.410 and 0.401, (a) is more striking since thediscrepancy between the two proportions can be expressedas 10 times of the other. This goes to show that RR may
give better interpretative meaning for public healthimplications, than relying on the differences of proportions
alone (which may be misleading if i 0 or 1).
8/12/2019 Inferences on Two-Way Contingency Tables
9/45
RELATIVE RISK
The sampling distribution of RR is highly skewed
unless the sample sizes are quite large. Under which, anapproximate [large-sample due to Walds] 1
100%CI for the true log RRis given by:
/
+
+
8/12/2019 Inferences on Two-Way Contingency Tables
10/45
EXAMPLE # 2
Refer to the aspirin use and myocardial infarction (heartattacks) study by the PhysiciansHealth Study Research
Group at Harvard Medical School.(a) Estimate and interpret the RR of heart attackbetween male physicians who took placebo and thosewho took aspirin.
(b) Construct a 95% CI for the true RR of heart attackbetween male physicians who took placebo and thosewho took aspirin.
8/12/2019 Inferences on Two-Way Contingency Tables
11/45
ODDS RATIO
For a probability of success , the odds(of success)
are defined to be
= /( )
from which we can get
= /( )
8/12/2019 Inferences on Two-Way Contingency Tables
12/45
ODDS RATIO
For 2-by-2 tables, the odds ratio () is the ratio
=
=
/
/
where it can be any non-negative number.
Sample odds ratio () [through ML under multinomial
assumption, or independent binomial assumption]:
=
8/12/2019 Inferences on Two-Way Contingency Tables
13/45
ODDS RATIO
and independent = .
> . : higher success rate for row [Xlevel] 1
< . : higher success rate for row [Xlevel] 2
Values of farther from 1.0 in any direction represent strongerassociation between and.
is orientation invariant (unlike RR).
may be viewed as a cross-product ratio of joint probabilities ifinterdependence is desired.
8/12/2019 Inferences on Two-Way Contingency Tables
14/45
ODDS RATIO
The sampling distribution of is highly skewed unless
the sample sizes are quite large. Under which, anapproximate [large-sample] 1 100% CI for the
true log [which is symmetric about 0] is given by:
8/12/2019 Inferences on Two-Way Contingency Tables
15/45
ODDS RATIO
If some cell counts (nij) are 0, then can either be 0 or ,
or even undefined if both entries in a row or column are 0. To
adjust for this, an amended estimator is given by
=(. )( . )
( . )( . )
i.e., an adjustment of 0.5 was made on each cell count (also
applies for SE() for estimating a 1 100%CI).
8/12/2019 Inferences on Two-Way Contingency Tables
16/45
EXAMPLE # 3
Refer to the aspirin use and myocardial infarction (heartattacks) study by the PhysiciansHealth Study Research
Group at Harvard Medical School.(a) Estimate and interpret of heart attack betweenmale physicians who took placebo and those who tookaspirin.
(b) Construct a 95% CI for the true of heart attackbetween male physicians who took placebo and thosewho took aspirin.
8/12/2019 Inferences on Two-Way Contingency Tables
17/45
R E L A T I O N S H I P B E T W E E N
O D D S R A T I O A N D R E L A T I V E R I S K
=
Hence, whenever direct estimation of RR is not
possible, one can estimate instead, and use it to
approximate RR, as long asand .
8/12/2019 Inferences on Two-Way Contingency Tables
18/45
ODDS RATIO AND
CASE-CONTROL STUDIES
In most case-control studies, marginal distribution of the
response variable is usually fixed by the sampling design.
With this being retrospective, one can construct conditional
distributions for the explanatory variable, within levels of
the response outcome of interest. In this case, only can
be estimated due to its symmetric orientation (invariance).
Thus, for relatively rare successes [usually rare diseases],
RR is usually approximated by .
8/12/2019 Inferences on Two-Way Contingency Tables
19/45
TESTS OF IND EPENDENCE
Consider
:
For a sample of size with cell counts *nij, the values *ij = nijare expected frequencies, i.e. *(nij)under which is true.
To arrive at a decision, *nij is compared with *ij, such that for
is true, *nij ij must be small, i.e. larger differences provide
stringer evidences against .
Test statistics used to make such comparisons have large-sample
distributions.
8/12/2019 Inferences on Two-Way Contingency Tables
20/45
TESTS OF IND EPENDENCE
()
Mean:
Variance:
=
(,)
8/12/2019 Inferences on Two-Way Contingency Tables
21/45
PEARSON STATISTIC
=
score statistic
minimum at 0 if all nij = ij
p-value: -
* > for decent approximation
8/12/2019 Inferences on Two-Way Contingency Tables
22/45
LIKELIHOOD-RATIO STATISTIC
=
likelihood-ratio statistic [based on multinomial assumption]
minimum at 0 if all nij = ij
p-value:
-
* > for decent approximation
8/12/2019 Inferences on Two-Way Contingency Tables
23/45
TESTS OF IND EPENDENCE
In two-way tables, the null hypothesis of statistical independence
has the form
: = ++
: = ++
Note: *is estimated by the estimated expected frequencies
* =ni+n+j
n
8/12/2019 Inferences on Two-Way Contingency Tables
24/45
TESTS OF IND EPENDENCE
For testing independence in I x Jcontingency tables,
the and statistics are used, with both having
large-sample 2 distribution with degrees of
freedom = ( )( ).
converges in distribution more quickly than .
8/12/2019 Inferences on Two-Way Contingency Tables
25/45
TESTS OF IND EPENDENCE
Recall:
The degrees of freedom is obtained by taking the differencebetween the number of parameters [cell counts] under the alternative
[for w/c there are IJ 1 non-redundant parameters] and null
[for w/c there are (I 1)+(J 1) non-redundant parameters]
hypotheses, i.e.,
= = ( )( )
8/12/2019 Inferences on Two-Way Contingency Tables
26/45
EXAMPLE # 4
The following table, from the 2000 General SocialSurvey, cross classifies gender and political party
identification. Subjects indicated whether they identifiedmore strongly with the Democratic or Republican partyor as Independents. This also contains estimated
expected frequencies for : Independence betweenGender and Political Party Identification.
Determine if a significant association between genderand political party identification exists or not.
8/12/2019 Inferences on Two-Way Contingency Tables
27/45
EXAMPLE # 4
8/12/2019 Inferences on Two-Way Contingency Tables
28/45
RESIDUALS FOR CEL LS
A cell-by-cell comparison of observed and estimated
frequencies help us better understand the nature of theevidence.
However, it is rather insufficient to simply rely on the
raw cell differences [due to the inherent
magnitude of the counts].
8/12/2019 Inferences on Two-Way Contingency Tables
29/45
STANDARDIZED RESIDUAL
+ +
follows a [large-sample] standard normal distribution under
: (as compared to 0) evidence towards lack of fit of
i.e., at a significance level , one expects 100% of the
standardized residuals to be beyond 2 (or 3, if many cells) by chance
alone under
8/12/2019 Inferences on Two-Way Contingency Tables
30/45
EXAMPLE # 5
Refer to the gender and political party identification
example. The following table shows the standardized
residuals for testing independence in the previous
example. Try to make sense of the computed standardized
residuals in relation with the observed global result for
testing independence between gender and political arty
identification.
8/12/2019 Inferences on Two-Way Contingency Tables
31/45
EXAMPLE # 5
8/12/2019 Inferences on Two-Way Contingency Tables
32/45
STANDARDIZED RESIDUALS
Notice that residuals for the females are the negative
of those of males. In general, the residuals in each
column must sum up to 0 as the observed counts and the
expected frequencies are constrained by the same row
and column totals. In particular, for 2 x J tables,
= ( )
8/12/2019 Inferences on Two-Way Contingency Tables
33/45
PARTITIONING
Recall: Let and be independent 2RVs w/ degrees of
freedom and 2, respectively. Then
= ~+
In essence, this enables one to separate/collapse rows or columns
of I x Jtables to several sub-tables, and obtain 2or 2statistics for
which the sum of each corresponding partitioned statistic is the
globalstatistics.
8/12/2019 Inferences on Two-Way Contingency Tables
34/45
PARTITIONING
Consider: For a test of independence in a 2 x J table, a
2
statistic can be broken down intoJ
1components: [1] thefirst two columns; [2] collapsing of the first two columns, then
compared with the 3rd column; [3] collapsing of the first three
columns, then compared with the 4th column, etc. until the Jth
column is considered. In particular, this is true for .
8/12/2019 Inferences on Two-Way Contingency Tables
35/45
PARTITIONING
While it might seem more natural to obtain statistic for each
2 x 2pairing, note that the sum of these individual statistics willnot total the global.[Issues due to non-independence]
has exact partitionings; does not (at least, algebraically).
Nevertheless, partitioning 2 is valid for both statisticsas long as
independence of partitions are met.
8/12/2019 Inferences on Two-Way Contingency Tables
36/45
SOME COMMENTS ON TESTS
These tests likewise require a very large sample size n
relative to IJ. Moreover, converges poorly as compared
to for very small sample sizes, i.e. for large IorJ,
still provides decent approximation even if some expected
frequencies are as small as 1.
8/12/2019 Inferences on Two-Way Contingency Tables
37/45
SOME COMMENTS ON TESTS
tests merely indicate the degree of evidence for an association;
they do not give anything about the strength and the nature of the
association.
Both and are orientation invariant, i.e. they do not change
values with reorderings of rows or columns. However, both are only
powerful when associations regarding nominal variables are of concern.
For ordinal, more powerful tests exist.
8/12/2019 Inferences on Two-Way Contingency Tables
38/45
FISHERS EXACT TEST
Recall:For 2 x 2 tables,independence = .
Consider the cell counts { }. A small-sample nullprobability distribution for the cell counts that does not
depend on unknown parameters results from considering the
set of tables having the same row and column total. Under this
condition, each * then have the hypergeometric
distribution.
8/12/2019 Inferences on Two-Way Contingency Tables
39/45
FISHERS EXACT TEST
It is sufficient to know alone to determine all other cell
counts. Under the null hypothesis of independence : = ,
is hypergeometricwith
=
Hence, thep-valueequals the sum of hypergeometric probabilitiesfor outcomes at least as favorable to as the observed outcome.
8/12/2019 Inferences on Two-Way Contingency Tables
40/45
EXAMPLE # 6
In his 1935 book, The Design of Experiments, Fisher described
the following experiment: When drinking tea, a colleague of
Fishers at Rothamsted Experiment Station near London
claimed she could distinguish whether milk or tea was added
to the cup first. To test her claim, Fisher designed an
experiment in which she tasted eight cups of tea. Four cups
had milk added first, and the other four had tea added first.
8/12/2019 Inferences on Two-Way Contingency Tables
41/45
EXAMPLE # 6
She was told there were four cups of each type and she
should try to select the four that had milk added first. The
cups were presented to her in random order. The following
table shows a potential result of the experiemtn. Perform a
test to check whether there is evidence of a positive
association between the true order of the pouring and her
guess. Compute for the exact p-value of the test.
8/12/2019 Inferences on Two-Way Contingency Tables
42/45
EXAMPLE # 6
8/12/2019 Inferences on Two-Way Contingency Tables
43/45
CONSERVATISM OF
FISHERS EXACT TEST
Being an exact test, the test is very conservative, i.e. the
actual error rate when the null hypothesis of independence istrue is much smaller than the intended one. This is essentially
true for one-sided alternatives. Hence, mid p-value is
preferred as an alternative to diminish the conservativeness.
8/12/2019 Inferences on Two-Way Contingency Tables
44/45
SMALL-SAMPLE
CONFIDENC E INTERVAL FOR
It is also possible to construct small-sample confidence
intervals for odds ratio. The procedure involved is a
generalization of Fishersexact test that tests an arbitrary value,
: = . Hence, a % CI would then
contain all values of for which the exact p-value of
: = is greater than 0.05. This can also be constructed
using mid p-value to preserve conservatism.
8/12/2019 Inferences on Two-Way Contingency Tables
45/45
SMALL-SAMPLE
CONFIDENC E INTERVAL FOR
For the tea-taste experiment, a 95% CI for can be
computed to be as follows:
Exact p-value: (0.21 , 626.17)
Mid p-value: (0.31 , 308.55)