1 G89.2228 Lect 7a G89.2228 Lecture 7a Comparing proportions from independent samples Analysis of...

1

G89.2228 Lect 7a

G89.2228Lecture 7a

• Comparing proportions from independent samples

• Analysis of matched samples

• Small samples and 22 Tables

• Strength of Association– Odds Ratios

2

G89.2228 Lect 7a

Difference in proportions from independent samples

• Henderson-King & Nisbett had a binary outcome that is most appropriately analyzed using methods for categorical data. Consider their question, is choice to sit next to a Black person different in groups exposed to disruptive Black vs. White?

• In their study and . Are these numbers consistent with the same population mean? (Is there difference zero?)

• Consider the general large sample test statistic, which will have N(0,1), when the sample sizes are large:

2ˆ

2ˆ

21

ˆ21

ˆˆˆ

ppD

ppDz

37/11ˆ1 p 35/16ˆ 2 p

3

G89.2228 Lect 7a

Differences in proportions• Under the null hypothesis, the standard

errors of the means are and , where is the population standard deviation.

• Under the null hypotheses, the common proportion is estimated by pooling the data:

The common variance is

• The Z statistic is then,

The two-tailed p-value is .16.• A 95% CI bound on the difference is

.160±(1.96)(.114) = (-.06, .38). It includes the H0 value of zero.

1/ n

2/ n

375.72

27

3537

1611ˆ

p

234.)375.1(375.ˆˆ qp

4.1114.

16.

)35/234(.)37/234(.

457.297.

z

4

G89.2228 Lect 7a

Pearson Chi Square for2 2 Tables

• The z test statistic has a standard normal N(0,1) distribution for large samples.

• z2 is distributed as 2 with 1 degree of freedom for large samples.

• From the example, 1.42=1.96. Howell's table for Chi Square Pr(2 > 1.96) to be in range .1 to .25.

• Pearson’s calculation for this test statistic is:

where Oi is an observed frequency and Ei is the expected frequency given the null hypothesis of equal proportions.

n

i i

ii

E

EO

1

22

5

G89.2228 Lect 7a

Expected values for no association

• From the example: p1=11/37=.297, and

p2=16/35=.457

• The expected frequencies are based on a pooled

p=(11+16)/(37+35)=.375

6

G89.2228 Lect 7a

Chi square test of association, continued

Observed Frequency Expected Frequency Joint probability|H0

Black Confederate

White Confederate

Total

Chose Black 11 13.875 .193

16 13.125 .182

27 .375

Avoided Black 26 23.125 .321

19 21.875 .304

45 .625

37 .514

35 .486

72

• Marginal probabilities = pooled

• Expected joint probabilities|H0 = product of marginals (e.g. .193 =.375*.514)

• Ei = expected joint probability * n (e.g. 13.875 = .193*72=27*37/72)

•We use these values in Pearson's formula

7

G89.2228 Lect 7a

Analysis of a 22 tables for small samples

• The z and 2 tests are justified on the basis of the central limit theorem, and will be approximately correct for fairly small n’s. What if the sample is ridiculously small?– Rule of thumb: if expected frequencies are less

than 2.5, the sample is small

• For small n's, Fisher recommended using a Randomization test– Suppose we have N subjects, and g1 are in

group 1 and r1 overall respond positively

– Under H0, response and group are independent

– Consider this thought experiment:

• Put all N subjects in an urn.

• Randomly draw r1 subjects and pretend that they are positive responders?

• How often would the original pattern of data emerge from such a random process?

8

G89.2228 Lect 7a

Fisher’s Exact test

• Suppose we have the following table

• Pearson ChiSquare would be 3.6, and two tailed p is .058

• Hypergeometric probability of getting 1 or fewer Grp2 responses (given that 5 people responded) is:

103.099.004.

5

10

4

5

1

5

5

10

5

5

0

5

Grp1 Grp2

Respond a 4

b 1

a+b 5

Not Respond

c 1

d 4

c+d 5

a+c 5

b+d 5

N 10

9

G89.2228 Lect 7a

Analysis of Matched Samples

• Many research questions involve comparing proportions computed from related observations: Analogue of paired t-test.– Analysis of change– Within-subjects designs– Analysis of siblings, spouses,

supervisor-employee pairs, …– Samples constructed by matching

on confounding variables

• When the outcome is binary, display the data showing the numbers of pairs (joint dist.)

10

G89.2228 Lect 7a

Example(Howell Ex. 6.21-22)

• Is the proportion pro the same at the two time points?

• Note that the marginals (30/40 and 15/40) are not independent

• Instead of comparing those proportions, examine those whose opinions change

• Compare (5,20) to the expected (12.5,12.5) as a Chi Square test

Pro 2 Con 2Pro 1 10

cell a20

cell b30

Con 1 5cell c

5cell d

10

15 25 40

11

G89.2228 Lect 7a

McNemar’s test

• McNemar showed that this test [whether (5,20) is significantly different from (12.5,12.5)] may be computed, with the Yates correction for continuity, as:

• For the example,

(20-5-1)2/25=7.84

is unusual for 1 d.o.f. 2, yielding p=.005

cb

cb

2

2 1

12

G89.2228 Lect 7a

Confidence interval for matched proportion difference

• Fleiss (1981) recommends using the general form of the symmetric CI for testing the difference between p1 and p2

where the standard error is estimated using

• E.g. the 95% CI for the difference (30/40)-(15/40)=.375 requires the SE

nn

bccbdapp

4))((21

2196.1)( 21 pppp

)59,.16(.)11)(.96.1(375. thusis CI

11.4040

)5)(20(4)520)(510(

13

G89.2228 Lect 7a

Measures of association

• Consider two tables:

• The proportions with D in groups A and B is .90 vs .50 in the first table (.9-.5=.4) and .82 vs .33 (.82-.33=.49) in the second.

• Is the difference stronger in the second table?

D ~DA 9 1B 45 45

D ~DA 9 2B 45 90

14

G89.2228 Lect 7a

Odds ratios as an alternative to differences in proportions

• The proportions in group A in levels D and ~D do not differ across tables.

• Which way we look at the table gives different answers.

• The odds of D (vs ~D) are 9 to 1 in group A and 1 to 1 in group B in the first table. The odds ratio is 9: the odd are 9 times greater for group A than B.

• The odds ratio is also 9 in the second table.

15

G89.2228 Lect 7a

Properties of odds ratios

• Invariant to multiplying rows or columns by a constant

• Equal to one for equal odds

• Approaches infinity when off-diagonal cells approach zero

• Approaches zero when diagonal cells approach zero

• Easily computed as =ad/bc

• Log() (a “logit”) has a less obvious interpretation, but nicer scale features:with equal odds point ln(1)=0

)ln(

16

G89.2228 Lect 7a

Confidence interval on odds ratio

• Like other bounded parameters, confidence intervals for are difficult (symmetric bound does not work well)

• Approximate, but improved CI on ln()=ln(ad/bc) uses

• Compute CI on ln(), then take antilog (i.e., ex) of each bound.

dcbaSE

1111)][ln(

ˆ*96.1ˆ SE

17

G89.2228 Lect 7a

Example

• From the first table above,

• 95% CI on ln() is

• 95% CI on is thus:

an asymmetric confidence interval

075.145

1

45

1

1

1

9

1)ˆln(

2.2)ˆln(,9ˆ

SE

)307.4,083(.075.1*96.12.2

),2.74,09.1(),( 307.4083. ee

1 G89.2228 Lect 7a G89.2228 Lecture 7a Comparing proportions from independent samples Analysis of...

Documents

Transcript of 1 G89.2228 Lect 7a G89.2228 Lecture 7a Comparing proportions from independent samples Analysis of...