1 G89.2228 Lect 7a G89.2228 Lecture 7a Comparing proportions from independent samples Analysis of...
-
Upload
cameron-walsh -
Category
Documents
-
view
217 -
download
0
Transcript of 1 G89.2228 Lect 7a G89.2228 Lecture 7a Comparing proportions from independent samples Analysis of...
1
G89.2228 Lect 7a
G89.2228Lecture 7a
• Comparing proportions from independent samples
• Analysis of matched samples
• Small samples and 22 Tables
• Strength of Association– Odds Ratios
2
G89.2228 Lect 7a
Difference in proportions from independent samples
• Henderson-King & Nisbett had a binary outcome that is most appropriately analyzed using methods for categorical data. Consider their question, is choice to sit next to a Black person different in groups exposed to disruptive Black vs. White?
• In their study and . Are these numbers consistent with the same population mean? (Is there difference zero?)
• Consider the general large sample test statistic, which will have N(0,1), when the sample sizes are large:
2ˆ
2ˆ
21
ˆ21
ˆˆˆ
ppD
ppDz
37/11ˆ1 p 35/16ˆ 2 p
3
G89.2228 Lect 7a
Differences in proportions• Under the null hypothesis, the standard
errors of the means are and , where is the population standard deviation.
• Under the null hypotheses, the common proportion is estimated by pooling the data:
The common variance is
• The Z statistic is then,
The two-tailed p-value is .16.• A 95% CI bound on the difference is
.160±(1.96)(.114) = (-.06, .38). It includes the H0 value of zero.
1/ n
2/ n
375.72
27
3537
1611ˆ
p
234.)375.1(375.ˆˆ qp
4.1114.
16.
)35/234(.)37/234(.
457.297.
z
4
G89.2228 Lect 7a
Pearson Chi Square for2 2 Tables
• The z test statistic has a standard normal N(0,1) distribution for large samples.
• z2 is distributed as 2 with 1 degree of freedom for large samples.
• From the example, 1.42=1.96. Howell's table for Chi Square Pr(2 > 1.96) to be in range .1 to .25.
• Pearson’s calculation for this test statistic is:
where Oi is an observed frequency and Ei is the expected frequency given the null hypothesis of equal proportions.
n
i i
ii
E
EO
1
22
5
G89.2228 Lect 7a
Expected values for no association
• From the example: p1=11/37=.297, and
p2=16/35=.457
• The expected frequencies are based on a pooled
p=(11+16)/(37+35)=.375
6
G89.2228 Lect 7a
Chi square test of association, continued
Observed Frequency Expected Frequency Joint probability|H0
Black Confederate
White Confederate
Total
Chose Black 11 13.875 .193
16 13.125 .182
27 .375
Avoided Black 26 23.125 .321
19 21.875 .304
45 .625
37 .514
35 .486
72
• Marginal probabilities = pooled
• Expected joint probabilities|H0 = product of marginals (e.g. .193 =.375*.514)
• Ei = expected joint probability * n (e.g. 13.875 = .193*72=27*37/72)
•We use these values in Pearson's formula
7
G89.2228 Lect 7a
Analysis of a 22 tables for small samples
• The z and 2 tests are justified on the basis of the central limit theorem, and will be approximately correct for fairly small n’s. What if the sample is ridiculously small?– Rule of thumb: if expected frequencies are less
than 2.5, the sample is small
• For small n's, Fisher recommended using a Randomization test– Suppose we have N subjects, and g1 are in
group 1 and r1 overall respond positively
– Under H0, response and group are independent
– Consider this thought experiment:
• Put all N subjects in an urn.
• Randomly draw r1 subjects and pretend that they are positive responders?
• How often would the original pattern of data emerge from such a random process?
8
G89.2228 Lect 7a
Fisher’s Exact test
• Suppose we have the following table
• Pearson ChiSquare would be 3.6, and two tailed p is .058
• Hypergeometric probability of getting 1 or fewer Grp2 responses (given that 5 people responded) is:
103.099.004.
5
10
4
5
1
5
5
10
5
5
0
5
Grp1 Grp2
Respond a 4
b 1
a+b 5
Not Respond
c 1
d 4
c+d 5
a+c 5
b+d 5
N 10
9
G89.2228 Lect 7a
Analysis of Matched Samples
• Many research questions involve comparing proportions computed from related observations: Analogue of paired t-test.– Analysis of change– Within-subjects designs– Analysis of siblings, spouses,
supervisor-employee pairs, …– Samples constructed by matching
on confounding variables
• When the outcome is binary, display the data showing the numbers of pairs (joint dist.)
10
G89.2228 Lect 7a
Example(Howell Ex. 6.21-22)
• Is the proportion pro the same at the two time points?
• Note that the marginals (30/40 and 15/40) are not independent
• Instead of comparing those proportions, examine those whose opinions change
• Compare (5,20) to the expected (12.5,12.5) as a Chi Square test
Pro 2 Con 2Pro 1 10
cell a20
cell b30
Con 1 5cell c
5cell d
10
15 25 40
11
G89.2228 Lect 7a
McNemar’s test
• McNemar showed that this test [whether (5,20) is significantly different from (12.5,12.5)] may be computed, with the Yates correction for continuity, as:
• For the example,
(20-5-1)2/25=7.84
is unusual for 1 d.o.f. 2, yielding p=.005
cb
cb
2
2 1
12
G89.2228 Lect 7a
Confidence interval for matched proportion difference
• Fleiss (1981) recommends using the general form of the symmetric CI for testing the difference between p1 and p2
where the standard error is estimated using
• E.g. the 95% CI for the difference (30/40)-(15/40)=.375 requires the SE
nn
bccbdapp
4))((21
2196.1)( 21 pppp
)59,.16(.)11)(.96.1(375. thusis CI
11.4040
)5)(20(4)520)(510(
13
G89.2228 Lect 7a
Measures of association
• Consider two tables:
• The proportions with D in groups A and B is .90 vs .50 in the first table (.9-.5=.4) and .82 vs .33 (.82-.33=.49) in the second.
• Is the difference stronger in the second table?
D ~DA 9 1B 45 45
D ~DA 9 2B 45 90
14
G89.2228 Lect 7a
Odds ratios as an alternative to differences in proportions
• The proportions in group A in levels D and ~D do not differ across tables.
• Which way we look at the table gives different answers.
• The odds of D (vs ~D) are 9 to 1 in group A and 1 to 1 in group B in the first table. The odds ratio is 9: the odd are 9 times greater for group A than B.
• The odds ratio is also 9 in the second table.
15
G89.2228 Lect 7a
Properties of odds ratios
• Invariant to multiplying rows or columns by a constant
• Equal to one for equal odds
• Approaches infinity when off-diagonal cells approach zero
• Approaches zero when diagonal cells approach zero
• Easily computed as =ad/bc
• Log() (a “logit”) has a less obvious interpretation, but nicer scale features:with equal odds point ln(1)=0
)ln(
16
G89.2228 Lect 7a
Confidence interval on odds ratio
• Like other bounded parameters, confidence intervals for are difficult (symmetric bound does not work well)
• Approximate, but improved CI on ln()=ln(ad/bc) uses
• Compute CI on ln(), then take antilog (i.e., ex) of each bound.
dcbaSE
1111)][ln(
ˆ*96.1ˆ SE
17
G89.2228 Lect 7a
Example
• From the first table above,
• 95% CI on ln() is
• 95% CI on is thus:
an asymmetric confidence interval
075.145
1
45
1
1
1
9
1)ˆln(
2.2)ˆln(,9ˆ
SE
)307.4,083(.075.1*96.12.2
),2.74,09.1(),( 307.4083. ee