Comparing Two Proportions€¦ · • In the Seattle study, heart-attack victims were randomly...

Post on 13-May-2020

2 views 0 download

Transcript of Comparing Two Proportions€¦ · • In the Seattle study, heart-attack victims were randomly...

Comparing Two Proportions

Case Study

Recall the question that was actually asked in the CPR study reported in the NEJM.

• Do we need to give mouth-to-mouth ventilation and chest compression?

• Or will just doing chest compression alone be just as effective?

Summary• In the Seattle study, heart-attack victims were

randomly assigned to two groups: full CPR or chest compression alone.

• They found a 10.4% survival rate for those receiving full CPR (x = 29, n = 278) and a 14.6% survival rate for those receiving chest compression alone (x = 35, n = 240).

• The trial was designed to detect a 3.5% improvement of chest compression alone over full CPR.

Question

• Is there any difference in the survival proportions of dispatcher-instructed bystander administered CPR depending on whether mouth-to-mouth ventilation is used or not?

Steps for Hypothesis Testing©AMB

Phase 1: State the Question1. Evaluate and describe the data2. Review the assumptions3. State the question—in the form of

hypotheses

Phase 2: Decide How to Answer the Question

4. Decide on a summary number—a statistic—that reflects the question

5. How could random variation affect that statistic?

6. State a decision rule, using p-values, to answer the question

Phase 3: Answer the Question

7. Calculate the statistic

8. Make a statistical decision

9. State the substantive conclusion

Phase 4: Communicate the Answer to the Question

10. Document our understanding with text, tables, or figures

The NEJM CPR Results

How do these steps get applied in the case of comparing two proportions?

Phase 1: State the Question

1. Evaluate and describe the data

Contingency Table

• The number of patients in each group, and the number of survivors (but not the non-survivors) is shown in Table 4.

• This form of tabular display is somewhat like a contingency table.

• The contingency table corresponding to these results is:

This form of tabular display is called:• a contingency table• a cross-tabulation table • a two-way classification • a 2 x 2 table

• We observed n = 278 CPR patients who received instructions by phone, of whom x = 29 survived to hospital discharge.

• We observed n = 240 chest-compression alone patients, of whom x = 35 survived. Overall (ignoring group membership), there were 64 survivors out of a total of 518.

Observations

Histogram

A histogram visually compares two things that should be compared (proportions)

Tabular Display of Proportions

• Notice the columns sum to 1• Each proportion was calculated separately for

each population or treatment group

Questions?• What proportion of everyone receiving chest

compression plus mouth-to-mouth ventilation, survived to hospital discharge?

• Of those receiving chest compression alone, what proportion survived to discharge?

• Of those receiving chest compression plus mouth-to-mouth ventilation, what proportion did not survive?

• Of those receiving chest compression alone, what proportion did not survive?

Which display?

The intent is to compare the two survival proportions. So which display(s) are best?

Use a display(s) that describes the sample and the statistic being compared.

Example

Population proportion (N)

Variable

Chest Compression and mouth-to-mouth

Chest Compression

Alone Row Total Survived 0.104 (29) 0.146 (35) 0.124 (64) Did not survive 0.896 (249) 0.854 (205) 0.876 (454)

Column Total 1.00 (278) 1.00 (240) 1.00 (518)

2. Review assumptions

As in the case where we’re interested in a single proportion, with two proportions must also meet the three assumptions:

• representativeness, • independence, and • sample size

Representativeness

Are the subjects in each group representative of some population of interest?

If the study subjects were chosen as a simple-random sample from a larger population and if these subjects were randomly assigned to the two groups, then we can be comfortable that the information in this sample is representative of the population of interest.

Independence

Does the response of one subject depend on the response of another?

If so, then the subjects are independent.

Sufficient sizeIn order for the test statistic to follow the normal

distribution, n must be large enough to expect both 5 survivals and 5 non-survivals in each group.

As in the single population case, we are not asking whether you observed at least 5 subjects in each cell.

To check this, we must calculate the expected number of subjects under the null-hypothesis. But we have not yet stated the hypotheses. Let’s do that and then come back

3. State the question

Are the proportions in the two groups the same?

The alternative is that the two groups have a different survival proportion.

H0: pCPR = pchest

HA: pCPR ≠ pchest

If Null is True

• The two groups are said to be homogeneous (Of uniform nature, similar in kind)

• The two proportions are the same. • If they are the same, it’s convenient to think

of the proportion as a single number, p. • So, another way to think of the null

hypothesis is:H0: pCPR = pchest = p

What is p?What is the best estimate of p, the survival

proportion under the null hypothesis? We observed a total of 64 survivors out of 518

people so we’ll use this, called p-bar:

1 21 2

x xpn n

+=

+

Revisit sample size

If the true proportion is the same for both groups, we should use p-bar to determine if there is sufficient size.

If the null hypothesis is true, how many people do we expect to see in each of the four cells?

We keep the number of subjects in each group fixed and use p-bar

Survival groups

If you have 278 people and 0.124 proportion survive how many do you expect to survive?

( ) 278 0.124 34.3CPRp n = × =

If you have 240 people and 0.124 proportion If you have 240 people and 0.124 proportion survive how many do you expect to survive? survive how many do you expect to survive?

( ) 240 0.124 29.7Chestp n = × =

Non-survival groups?

Variable

Chest Compression and mouth-to-mouth

Chest Compression

Alone Row Total Survived 34.3 29.7 64 Did not survive 243.7 210.3 454 Column Total 278 240 518

• Is this assumption for our statistical test met? (Are the expected counts in all cells greater than 5?)

• If it is, then we can trust that the sample proportion will be normally distributed. If we can trust that the sample proportion is normally distributed, then we can calculate a p-value.

• If we can calculate a p-value we trust, then we can make a decision with understandable risk.

Phase 2: Decide How to Answer the Question

4. Decide on a summary statistic that reflects the question

• We want to know if the two proportions are the same:

H0: pCPR = pchest = p

• This is equivalent to asking if the difference between the two is zero:

H0: pCPR - pchest = 0

One versus Two Proportions

• Recall that when looking at one proportion there were three possibilities for null hypotheses.

• In the case when we’re looking at two proportions we’re almost always interested in the null-hypothesis: “same proportions”and the alternative hypothesis: “different proportions.”

Generic Test Statistic• From our earlier discussion, recall that the

generic test statistic is:

( )ˆ 0

10 0

p pz

p pn

−=

H 0 : p C P R - p c h e s t = 0

( )

( )( ) ( )

ˆ ˆ 02 10

ˆ ˆ 02 11 1

1 2

p pz

SE

p p

p p p pn n

− −=

− −=

− −+

5. How could random variation affect that statistic?

• If the null hypothesis is true, then z is zero. Since the assumptions are met, z is normally distributed.

• Extreme values of z reflect larger differences and thus favor the alternative hypothesis.

6. State a decision rule, using the statistic, to answer the question

• Just like in the first case study, if we want to reject the null-hypothesis 5% of the time, our decision rule is to choose to believe:

H0: pCPR – pchest = 0 . Choose this if p-value ? α (usually 0.05)

HA: pCPR – pchest ? 0. Choose this if p-value < α (usually 0.05)

Phase 3: Answer the Question

7. Calculate the statisticWe’ve already calculated pCPR as 0.104

24035ˆ 0.146240

1 2 0.1241 2

chestn

p

x xpn n

=

= =

+= =

+

Z-score

( )( ) ( )

0.104 0.146 0

124 1 124 124 1 0.124278 240

0.0420.029

1.432

z− −

=− −

+

−=

= −

8. Make a Statistical Decision

• Determine the p-value

• To calculate a p-value, use the “two-tail”method where we are interested in calculating the probability of differences between the two proportions as large or larger than we observed.

Using p-value Calculator

For z = -1.43

In words

• The p-value = 0.1521.

• Since p-value > α = 0.05, we will fail to reject the null hypothesis.

9. State a Substantive Solution

There is insufficient evidence to conclude that the two survival proportions are different.

Phase 4: Communicate the Answer to the Question

10. Document our understanding with text, tables, or figures

For a dispatcher-instructed bystander-administered intervention after a cardiac arrest, is the survival proportion for full CPR different from the survival proportion with chest compression alone? In this study, n = 278 patients were randomized to the chest-compression and mouth-to-mouth ventilation group, and we observed p = 0.104 (x =29) survived until hospital discharge.

Step 10 (cont)

And n = 240 patients were randomized to the chest-compression alone group, where we observed p = 0.146 (x =35) survived until hospital discharge. Thus, there was a nominal improvement in survival of 4.2% but the two proportions were compared and found to be not significantly different (z = 1.4, p-value = 0.1521).

Question: Why did we report a positive z value?

By convention, if were doing is testing “is A different than B?” we could have just as well phrase the question as “is B different than A?”.

Thus, the sign does not matter. So, we report the positive value.

Question: Why is our p-value different than the one reported in the NEJM paper?

On page 1547 of the paper, in the last paragraph of methods it says:

“The primary analysis consisted of a simple comparison of proportions by Fisher’s exact test.”

Fisher’s Exact Test

Determining the exact probability of obtaining the observed results or results that are more extreme.

The z-score is an asymptotic probability based on large samples requiring that the normality assumption is met.

Advantage to Fisher’s

• We can use it even if the sample sizes are too small for the normal approximation assumptions to be met.

• If we don’t expect to see more than 5 responses in each cell.

Fisher’s method

• Fisher’s idea was that with small samples we don’t have to approximate the distribution with z to calculate p-values.

• We can enumerate (count) all the possible outcomes and calculate p-values exactly.

EnumerationLet’s look at a simple example. Fisher used an

example of a woman tasting tea. • A British woman claimed to be able to

distinguish whether milk or tea was added to the cup first.

• The Null hypothesis is that there is no ability• Let’s use a more up-to- date question. Can you

tell the difference between Coke and Pepsi?

Two cups

• Pour, hidden from you, two soft-drink cups. One with Coke and one with Pepsi.

• Then I ask you: “Which is Coke? And which is Pepsi?”

• What are the possible outcomes of this experiment?

Possible Results

• And we can look at the exact distribution of the number of correct.

• Thus we can determine the p-value we’d conclude for all the possibilities.

Four Cups

Assuming an equal number of Cokes and Pepsis, the next larger experiment would be 4 cups:

Results?

If someone is guessing randomly, these 6 possibilities are equally likely.

Conclusion

• So if someone got all 4 right, we be able to conclude that this person could “… tell the difference between Coke and Pepsi, p-value = 0.1667.”

• Would this be convincing?

Calculation of Fisher’s exact p-values

• How are we going to use this exact test in practice?

• Fortunately, software can calculate these p-values easily.

• So how do you interpret the output?

Reports all p-values

Which one?

• The most conservative p-value to report is the “2-tail” one.

• In this case that’s what they did in the NEJM paper.

Short cut: Comparing Two Proportions

We start by labeling the four cells with the letters a thru d:

The Statistic

It’s actually the square of the z statistic we have already seen:

( )( )( )( )( )

22 n ad bc

a c b d a b c dχ

−=

+ + + +

CPR Example

( )( )( )( )( )

2518 29 205 35 249229 249 35 205 29 35 249 205

2.05

notice that 2.05 1.43

χ−

=+ + + +

=

→ =

i i

Decision Rule

• The decision rule is straightforward.

• Take the square-root of the χ2 value (it is z) and look up the p-value.

Confidence Interval

• Similar to the one proportion CI but use both observed proportions an an “average” SE:

( ) ( )( ) ( )ˆ ˆ ˆ ˆ1 11 1 2 2ˆ ˆ1 2 1 1 22

p p p pp p z

n nα− −

− ± +−

CPR Example

( ) ( ) ( )0.104 1 0.104 0.146 1 0.1460.104 0.146 1.96

278 240− −

− ± +

( )0.042 1.96 0.029− ±

[ ]0.099, 0.016−

Interpretation

We’re 95% confident that the interval –9.9% to 1.6% covers the true difference in the population survival proportion from full CPR versus chest compression alone.

NoticeNote: The 95% CI includes zero, meaning that

using a confidence interval alone to test the difference, we would conclude the difference is zero or that there is no difference in the treatment groups.

If you find a significant difference, you should add the confidence interval about the observed difference to step 10 of the hypothesis testing steps.

ReviewWe have applied the ten steps of hypothesis testing to

comparing a single observed proportion to an assumed proportions and comparing two observed proportions.

We tested the two observed proportions by actually testing if the difference of the two observed proportions is equal to no difference.

We will continue to apply the 10 steps of hypothesis testing to other types of hypothesis tests, such as comparing a single mean to an assumed mean, comparing two means, and comparing several means.