1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis...

38
1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances

Transcript of 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis...

Page 1: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

1

Topic 8 - Comparing two samples• Confidence intervals/hypothesis tests for two means

• Hypothesis test for two variances

Page 2: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

2

Comparing two populations

• Sometimes we want to compare two populations rather making decisions about a single population.

• For example, we might want to compare two population means or two population proportions to see if they are equal.

– Is the expected drying time for one type of paint lower than that of another type of paint?

– Is a new drug more effective? Either increased or decreased mean versus the “established” drug, or increased or decreased percentage vs. control

– Does the new method actually result in increased crop yields or percentages, or decrease in tons lost to insects, etc.

Page 3: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

Behind the scenes. What do the distributions look like?

3

Page 4: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

4

Comparing two population means• Suppose we have two independent samples, X1,…,Xm

and Y1,…,Yn, from two separate populations.

• A natural statistic for comparing the two population means, X and Y, is .

• The distribution of is also Normal for m and n both large.

X Y

( ) ( ) ( ) from chapter 5x yE X Y E X E Y

22

( ) ( ) ( ) yxVar X Y Var X Var Ym n

X Y

Page 5: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

5

Large samples test for comparing population means

To test H0: X – Y = 0, use the test statistic

0

2 2/ /X Y

X YZ

s m s n

HA Reject H0 if

X – Y < 0 Z < -z

X – Y > 0 Z > z

X – Y ≠ 0 |Z| > z/2

Page 6: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

6

Home sales data

A realtor in Albuquerque wants to argue that houses in the Northeast are more expensive on average than those in the rest of town.

NE = 0 indicates a home was not in the Northeast.

Test the appropriate hypotheses with = 0.01.

Page 7: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

This is what the StatCrunch data looks like.

Summary statistics for PRICE:Group by: NE

NE n Mean Variance Std. Dev. Std. Err. Median0 39 97,282 1,026,531,010 32,040 5,130 94,0001 78 110,769 1,612,360,830 40,154 4,547 98,500

7

Page 8: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

Here’s the output in StatCrunchHypothesis test results:μ1 : mean of PRICE where NE=1 (Std. Dev. not specified)μ2 : mean of PRICE where NE=0 (Std. Dev. not specified)μ1 - μ2 : mean differenceH0 : μ1 - μ2 = 0HA : μ1 - μ2 > 0Difference n1 n2 Sample Mean Std. Err. Z-Stat P-value

μ1 - μ2 78 39 13487.18 6855.115 1.967462 0.0246

8

Page 9: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

What does it look like?

9

2 2

110769 97282 134871.967

6855.114732040 4015439 78

testZ

Page 10: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

10

Large samples confidence interval for the difference between two population means

• A large sample (1-)100% confidence interval for X – Y is

• For the home sales data, what is a 99% confidence interval for the difference between sale prices in the Northeast and the rest of town?

2 2/ 2 / /X YX Y z s m s n

Page 11: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

11

Equal population variances

• Suppose we assume that the two populations have a common variance 2.

• We can then estimate this common variance using the pooled sample variance:

2 22 ( 1) ( 1)

2X Y

p

m s n ss

n m

2 2

2 1 1( ) ( )Var X Y

m n m n

Page 12: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

12

Small samples test for comparing population means from Normal distributions with equal variances

To test H0: X – Y = 0, use the test statistic

0

1/ 1/p

X YT

s m n

HA Reject H0 if

X – Y < 0 T < -t,n+m-2

X – Y > 0 T > t,n+m-2

X – Y ≠ 0 |T| > t/2,n+m-2

Page 13: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

13

THC example with equal variancesThe active component in marijuana is THC. An experiment was conducted to compare two slightly different configurations of this substance.

The THC data set contains the time until the effect was perceived for 6 subjects exposed to each configuration.

Is there any evidence that the mean time to perception is different between the two configurations using = 0.01?

Page 14: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

Here’s what the calculations look like.

Pooled standard deviation

14

Summary statistics:Column n Mean Variance Std. Dev. Std. Err.

THC1 6 18.786667 34.908108 5.908309 2.412057THC2 6 18.011667 19.519497 4.418088 1.803677

2 (6 1)34.9081 (6 1)19.519527.2138

6 6 25.216685

1 1 1 15.216685 3.01185

6 6

p

p

p

s

s

sm n

Page 15: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

What does it look like?

15

18.78667 18.011670.2573

3.01185(1 ) 2 0.4011 0.8022

testT

p value tail x

Twice the one tail value.

Page 16: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

16

Small samples confidence interval for the difference between two population means

• Assuming equal variances, a small sample (1-)100% confidence interval for X – Y is

• For the THC data, what is a 99% confidence interval for the mean difference between the detection times for the two configurations?

/ 2, 2 1/ 1/n m pX Y t s m n

Page 17: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

17

Unequal population variances• The pooled procedures we have discussed previously

are fairly robust to the assumption of equal variances.

• In other words if the two population variances are relatively close, the procedures perform well:

– The level of significance for the hypothesis test is close to what it should be

– The coverage probability for the confidence interval is close to what it should be

• If the variances are quite different, then we need a different procedure.

Page 18: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

18

Small samples test for comparing population means from Normal distributions with unequal variances

To test H0: X – Y = 0, use the test statistic

with degrees of freedom

0

2 2/ /X Y

X YT

s m s n

HA Reject H0 if

X – Y < 0 T < -t,v

X – Y > 0 T > t,v

X – Y ≠ 0 |T| > t/2,v

2 2 2

2 2 2 2

( / / )( / ) ( / )

1 1

X Y

X Y

s m s nv

s m s nm n

Page 19: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

19

Small samples confidence interval for the difference between two population means…with unequal variances.

• Assuming unequal variances, a small sample (1-)100% confidence interval for X – Y is

• For the THC data, what is a 99% confidence interval for the mean difference between the detection times for the two configurations?

2 2/ 2, / /v x YX Y t s m s n

Page 20: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

20

Comparing two population variances

• Suppose two chemical companies can supply a raw material, but we suspect the variability in concentration may differ between the two.

• The standard deviation of concentration in a random sample of 15 batches from company 1 was found to be 4.7 g/l (variance 22.09). A sample of 21 batches from company 2 yielded a standard deviation of 5.8 g/l (variance 33.64).

• Is there sufficient evidence to conclude that the variability in concentration differs for the two companies?

Page 21: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

21

Test for comparing population variances from Normal distributions

To test H0: XY

2, use the test statistic

2

2X

Y

sF

s

HA Reject H0 if

X>Y

2 F > Fm-1,n-1

X< Y

2 F < F1m-1,n-1

X≠ Y

2 F > Fm-1,n-1

or F < F1m-1,n-1

Page 22: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

22

Chemical example

• Is there sufficient evidence to conclude that the variability in concentration differs for the two companies with = 0.05?

• Demonstrate the F calculator.

Page 23: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

23

Confidence interval for the ratio of two Normal population variances

• A (1-)100% confidence interval for X/Y

2 is

• For the THC example, what is a 95% confidence interval for the ratio of concentration variances?

2 2 2 2

1 / 2, 1, 1 / 2, 1, 1

/ /,X Y X Y

m n m n

s s s sF F

The additional file for Topic 8 contains examples of large and small scale tests on the differences in population means and proportions.

Page 24: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

24

Paired data• Sometimes we have a third variable that connects

elements from the X and Y samples.

• In this case, the assumption of independence between the two samples may be violated.

• Is there any evidence that the first twin and the second twin have different average weights among boy-boy twins?

• In this case, the twins are clearly connected by the mother.

• It might be better to base our test on the n pairwise differences, Di = Xi – Yi.

Page 25: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

25

Paired test for comparing population means

To test H0: X – Y = 0, use the test statistic

0

D

DT

sn

HA Reject H0 if

X – Y < 0 T < -t,n-1

X – Y > 0 T > t,n-1

X – Y ≠ 0 |T| > t/2,n-1

Page 26: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

26

Twins example

• Load the Twins data from StatCrunch sample data sets.

• Is there any evidence that Twin A and Twin B have different average weights among boy-boy twins with = 0.1?

Page 27: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

27

Additional pooled vs. paired

• Example: The article “Sex and Race Discrimination in the New Car Showroom: A fact or Myth” (J. Consumer Affairs, 1977, pp 107-113) reports the results of an experiment in which individuals of different races and sexes visited 9 car dealerships to request the best possible deal on a certain car. The actual car prices obtained are shown below:

Page 28: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

28

The standard deviations are relatively close, so we could consider this as a pooledtest of differences, with the following results;

Summary data:

2

2

4476.778, 40118.69, 200.2965

4388.444, 18405.28, 135.6661

x x

y y

x s s

y s s

Is there sufficient evidence at α = 0.05 to conclude that the dealerships are quoting different prices for the black woman and the white man?

Page 29: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

29

Page 30: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

Two ways to look at the situation

30

Why did we get such poor results from our test?

The assumption in a pooled test is that there’s independenceof data. In other words, any values from the woman’s distributionof prices are independent of values from the man’s distribution….

A valid comparison in that situation looks like this….

Page 31: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

31

However, we know that’s not the case.

Prices from dealership 1 can be compared to each other (M to W), dealership 2, etc. There’s a relationship between the prices, a“pairing variable”. They are not independent and when viewedcorrectly, the data shows something completely diffferent…..

Page 32: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

32

Page 33: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

33

Page 34: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

34

Paired confidence interval for the difference between two population means

• A small sample (1-)100% confidence interval for X – Y is

• For the car price example, what is a 90% confidence interval for the mean difference between the prices quoted to the black woman vs. the white man?

• CarData

/ 2, 1 /n DD t s n

Page 35: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

35

Comparing two population proportions• A natural statistic for comparing the two population

proportions, pX and pY, is .

• The distribution of is also Normal for m and n both large.

ˆ ˆX Yp p

ˆ ˆ ˆ ˆ( ) ( ) ( )X Y X Y X YE p p E p E p p p

ˆ ˆ ˆ ˆ(1 ) (1 )ˆ ˆ( )

1 1(1 )( ),

X X Y YX Y

p p p pVar p p

m n

p p with common pm n

ˆ ˆX Yp p

Page 36: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

36

Large samples test for comparing population proportions

To test H0: pX – pY = 0, use the test statistic

HA Reject H0 if

pX – pY < 0 Z < -z

pX – pY > 0 Z > z

pX – pY ≠ 0 |Z| > z/2

ˆ ˆ 0

1 1ˆ ˆ(1 )( )

X Yp pZ

p pm n

Please note that the common p listed above is calculated as the total number of successes overall in the study, divided by the total number of observations…..

Page 37: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

37

Polio example

• The following table summarizes a study of the efficacy of the Salk vaccine. (Please note that I changed the actual percentages who got polio in this example to make the numbers MUCH more workable….don’t panic).

• Was the vaccine effective? Test at = 0.05.

TreatmentTotal

Patients Polio

Vaccine 2,000 30

Placebo 2,000 100

Page 38: 1 Topic 8 - Comparing two samples Confidence intervals/hypothesis tests for two means Hypothesis test for two variances.

38

Large samples confidence interval for the difference between two population proportions

• A large sample (1-)100% confidence interval for pX – pY is

• For the Polio data, what is a 95% confidence interval for the difference between the proportion who contract the disease under each treatment?

/ 2ˆ ˆ ˆ ˆ ˆ ˆ(1 )/ (1 )/X Y X X Y Yp p z p p m p p n

(0.015 0.05) 1.96 [0.015(0.985)]/ 2000 [0.05(0.95)]/ 2000

0.035 0.01093

( 0.0459; 0.0241)