Chapter 24: Comparing Means. Comparing Two Means Population model parameter of interest is the...

23
Chapter 24: Comparing Means

Transcript of Chapter 24: Comparing Means. Comparing Two Means Population model parameter of interest is the...

Page 1: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Chapter 24:

Comparing Means

Page 2: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Comparing Two Means

Population model parameter of interest is the difference between the means, .

The statistic of interest is the difference in the two observed means,

.

For independent random variables, variances add.

If we know the population means

If we estimate, using the sample means

1 2

1 2y y

2 2

1 21 2

1 2

SD y yn n

2 2

1 21 2

1 2

s sSE y y

n n

Page 3: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Comparing Two Means

Confidence interval is call a two-sample t-interval.

The hypothesis test is called a two-sample t-test.

1 2

1 2*

y y ME

ME t SE y y

Page 4: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

A Sampling Distribution for the Difference Between Two Means

When the conditions are met, the standardized sample difference between the means of two independent groups,

Modeled by a Student’s t-model with a number of degrees of freedom found with a special formula.

We estimate the standard error with

1 2 1 2

1 2

y yt

SE y y

2 2

1 21 2

1 2

s sSE y y

n n

Page 5: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Assumptions and Conditions

Independence Assumption Randomization

Surveys: representative random samples Experiments: randomized

10% Condition

Normal Population Assumption Nearly Normal Condition

Check both samples. Draw pictures!

Independent Groups Assumption Think about how the data were collected.

Page 6: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Two-sample t-interval

When the conditions are met, find the confidence interval for the difference between means of two independent groups.

The critical value depends on the particular confidence level C that you specify and on the number of degrees of freedom, which we get from sample size and a special formula.

2 21 2

1 21 2

1 1 22

Since the standard error of the difference is

,

the interval is *

-

- -df

s sSE y y

n n

y y t SE y y

Page 7: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Comparing Brand Name & Generic Batteries

L1: Brand Name L2: Generic Plot

Find the interval that is likely with 95% confidence to contain the true difference

between the mean lifetime of the generic brand AA batteries and the mean lifetime of the brand-name batteries

G B

Page 8: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Comparing Brand Name & Generic Batteries

Check the conditions: Independent groups assumption: batteries

manufactured by two different companies from separate packages should be independent.

Randomization: the batteries were selected at random from those available for sale. This is not exactly an SRS, but a reasonably representative random sample. Since the batteries come in packs, they may not be independent. Repeat the experiment for several packages of batteries.

Page 9: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Comparing Brand Name & Generic Batteries

Check the conditions: 10%: the number of

sampled batteries are certainly less than 10% of all AA batteries manufactured by the companies.

Nearly Normal condition: the samples are small, but the histograms look unimodal and symmetric.

Histograms Brand Name (L1)

Generic (L2)

Page 10: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Comparing Brand Name & Generic Batteries

State the sampling distribution model for the statistic:

Under these conditions, the sampling model of the difference in the sample means can be modeled by a Student’s t-model with about 9 degrees of freedom.

Choose your method: We will use a two-sample

t-interval.

STAT TESTS 2-SampTInt

Page 11: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Comparing Brand Name & Generic Batteries

Interpretation: tell what the confidence interval means We are 95% confident that the mean useful life of the

generic batteries is between 2.1 minutes and 35.1 minutes longer than the mean useful life of the brand-name batteries for this task.

If generic batteries are cheaper, there seems little reason not to use them. If it is more trouble or costs more to buy them, then you should consider whether the additional performance is worth it.

Page 12: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Testing the Difference Between Two Means

Two-sample t-test for the difference between the means of two independent groups:

The conditions for the two-sample t-test for the difference between the means of two independent groups are the same as for the two-sample t-interval.

1 2

1 2

1 2

2 21 2

1 21 2

We test the hypothesis , where

the hypothesized difference is almost always 0, using the

statistic . The standard error is

.

:O O

O

H

y yt

SE y y

s sSE y y

n n

Page 13: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Camera Price Offers

State the null hypothesis: We want to know if people are

more likely to offer a different amount for a used camera when buying from a friend or a stranger.

HO: The difference in mean price offered to friends and the mean price offered to strangers is zero:

HA:The difference in mean price is not zero:

Check to plots: L1: Friend L2: Stranger

0F S

0F S

Page 14: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Camera Price Offers

Check the conditions: Independent groups

assumption: randomizing the experiment gives us independent groups.

Randomization condition: the experiment was randomized. Subjects were assigned to treatment groups at random.

10% condition: this is a randomized experiment, so this condition does not apply.

Check the conditions: Nearly Normal condition:

Histograms of the two sets of prices are unimodal and symmetric.

L1 L2

Page 15: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Camera Price Offers

State the sampling distribution model of the statistic: Because the conditions are satisfied, it is appropriate to

model the sampling distribution of the difference in the means with a Student’s t-model.

Choose your method. We will perform a two-sample t-test.

Page 16: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Camera Price Offers

Calculate: STAT TESTS

2-SampTTests

Draw:

Page 17: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Camera Price Offers

Conclusion: The P-value tells us that if there were no difference in the

mean prices, the difference we have observed would occur only 0.6% of the time. That’s too rare for most people to believe, so we reject the null hypothesis and conclude that people are likely to pay a friend for a used camera a different amount than they would pay a stranger.

We may want to take special care not to pay too much when buying an item such as this from a friend.

Page 18: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Pooled t-test

If we are willing to assume that means’ variances are equal, we can pool the data from the two groups to estimate the common variance and make the degrees of freedom formula much simpler.

We are still estimating the pooled standard deviation from the data, so we use Student’s t-model, and the test is called a pooled t-test.

Page 19: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Pooled Variance t-test for the Difference Between Two Independent Means

The conditions for the pooled t-test for the difference between two independent means are the same as for the two-sample t-test with the additional assumption that the variances of the two groups are the same.

1 2

1 2

1 2

2 2

1 21 2

We test the hypothesis , where

the hypothesized difference is almost always 0, using the

statistic . The standard error is

.

:O O

O

pooled

pooled pooledpooled

H

y ytSE y y

s sSE y y

n n

Page 20: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

Pooled Variance t-test for the Difference Between Two Independent Means

The pooled variance is:

When the conditions are met and the null hypothesis is true, this statistic follows a Student’s t-model with

2 21 1 2 22

1 2

1 1.

1 1pooled

n s n ss

n n

1 2

1 2 1 2

1 2

1 1 degrees of freedom.

The corresponding interval is ,

where the critical value depends on the confidence level

and is found with 1 1 degrees of freedom.

*

*

- -df pooled

n n

y y t SE y y

t

n n

Page 21: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

When to Pool?

The advantage of the pooled method is greatest when the samples are small.

But this is when it’s hardest to check conditions. When the choice between two-sample t and pooled-t methods make a

difference (sample size is small), the test for whether the variances are equal hardly works at all.

In a randomized comparative experiment, we know that each treatment group is a random sample from the same population.

So each treatment group begins with the same population variance. In this case, assuming equal variances is the same as assuming that the

treatment doesn’t change the variance. Check the conditions: Boxplots, Boxplots, Boxplots!!!

Page 22: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

When to Pool?

Because the advantages of pooling are small, and you are allowed to pool only rarely – when the equal variances assumption is met:

DON’T! It is never wrong NOT to pool!!

Page 23: Chapter 24: Comparing Means. Comparing Two Means  Population model parameter of interest is the difference between the means,.  The statistic of interest.

CAUTION!!!

Watch out for paired data. If the samples are not independent, you cannot

use the two-sample methods. Two-sample methods can only be used if the

observations in the two groups are independent. Look at the plots!

Check for outliers and non-normal distributions. Make and examine boxplots.