AP Statistics Chapter 25 Paired Samples and Blocks.

AP StatisticsChapter 25

Paired Samples and Blocks

Objectives:

• Paired Data• Paired t-Test• Paired t-Confidence Interval

Paired Data

• Data are paired when the observations are collected in pairs or the observations in one group are naturally related to observations in the other group.

• Paired data arise in a number of ways. Perhaps the most common is to compare subjects with themselves before and after a treatment.– When pairs arise from an experiment, the pairing is a type

of blocking. – When they arise from an observational study, it is a form of

matching.

Paired Data (cont.)

• If you know the data are paired, you can (and must!) take advantage of it. – To decide if the data are paired, consider how they were

collected and what they mean (check the W’s).– There is no test to determine whether the data are paired.

• Once we know the data are paired, we can examine the pairwise differences. – Because it is the differences we care about, we treat them as

if they were the data and ignore the original two sets of data.

Paired Data (cont.)

• Now that we have only one set of data to consider, we can return to the simple one-sample t-test.

• Mechanically, a paired t-test is just a one-sample t-test for the means of the pairwise differences.– The sample size is the number of pairs.

Assumptions and Conditions

• Paired Data Assumption: – Paired data Assumption: The data must be paired.

• Independence Assumption: – Independence Assumption: The differences must be independent of

each other. – Randomization Condition: Randomness can arise in many ways. What

we want to know usually focuses our attention on where the randomness should be.

– 10% Condition: When a sample is obviously small, we may not explicitly check this condition.

• Normal Population Assumption: We need to assume that the population of differences follows a Normal model.– Nearly Normal Condition: Check this with a histogram or Normal

probability plot of the differences.

The Paired t-Test

• When the conditions are met, we are ready to test whether the paired differences differ significantly from zero.

• We test the hypothesis H0: d = 0, where the d’s are the pairwise differences and 0 is almost always 0.

The Paired t-Test (cont.)

• We use the statistic

where n is the number of pairs.

• is the ordinary standard error for the

mean applied to the differences.

• When the conditions are met and the null hypothesis is true, this statistic follows a Student’s t-model on n – 1 degrees of freedom, so we can use that model to obtain a P-value.

0

1n

dt

SE d

dsSE d

n

Example: Matched Pairs t Hypothesis Test

• The following chart shows the number of men and women employed in randomly selected professions. Do these data suggest that there is a significant difference in gender in the workplace?

Solution:

• Choose method– Matched pairs t hypothesis test– The differences are (males – females): 1681, 837, 182, 466, -206,

133, -723, 399.

• Check conditions– Independence: data paired because they count employees in the

same occupations.– Randomization: occupations were randomly selected– Population Size: there are more than 80 occupations.– Sample Size: n=8, a small sample size, must check for skew and

outliers

Solution:

• Hypotheses: H0: μd=0 The mean gender difference in the workplace is 0.

Ha: μd≠0 The mean gender difference in the workplace is not 0.

• Find the t-statistic:

• Find the p-value: p-value = 2P(t ≥ 1.373) = .212• Conclusion: The p-value is too large at any commonly accepted

levels to be able to reject the null hypothesis. There is insufficient evidence to support a difference in gender in the workplace.

Confidence Intervals for Matched Pairs

• When the conditions are met, we are ready to find the confidence interval for the mean of the paired differences.

• The confidence interval is

where the standard error of the mean difference is

The critical value t* depends on the particular confidence level, C, that you specify and on the degrees of freedom, n – 1, which is based on the number of pairs, n.

dsSE d

n

1nd t SE d

Example: Confidence Intervals for Matched Pairs

• Over the ten-year period 1990-2000, the unemployment rates for Australia and the United Kingdom were reported as follows.

Find a 90% confidence interval for the mean difference in unemployment rates for Australia and the united Kingdom.

Solution:

• Choose method.– Matched pairs t confidence interval.– The paired differences are: (AU-UK) .4, .8, .6, .4, 0, -.2, .4, 1.6, 1.6, 1.2, .8.

• Check conditions– Independence: Not, matched pairs is the method to use.– Randomization: Yes, no reason to believe that this sequence of years is not representative of the

mean of the differences in unemployment rates of these 2 countries.– Sample Size: n < 15, check for skew and outliers.

• There is some skewness and no outliers, use t-distribution.

• Calculation the interval– df = n-1 = 10, t* = 1.812, , s = .589– 90% CI is (.369, 1.013)

• Conclusion– We are 90% confident that the mean difference in the unemployment rates between AU and UK is

between .369 and 1.013.

Blocking

• Consider estimating the mean difference in age between husbands and wives.

• The following display is worthless. It does no good to compare all the wives as a group with all the husbands—we care about the paired differences.

Blocking (cont.)

• In this case, we have paired data—each husband is paired with his respective wife. The display we are interested in is the difference in ages:

Blocking (cont.)

• Pairing removes the extra variation that we saw in the side-by-side boxplots and allows us to concentrate on the variation associated with the difference in age for each pair.

• A paired design is an example of blocking.

What Can Go Wrong?

• Don’t use a two-sample t-test for paired data.• Don’t use a paired-t method when the samples

aren’t paired.• Don’t forget outliers—the outliers we care about

now are in the differences.• Don’t look for the difference between means of

paired groups with side-by-side boxplots.

What have we learned?

• Pairing can be a very effective strategy.– Because pairing can help control variability between

individual subjects, paired methods are usually more powerful than methods that compare independent groups.

• Analyzing data from matched pairs requires different inference procedures.– Paired t-methods look at pairwise differences.

• We test hypotheses and generate confidence intervals based on these differences.

– We learned to Think about the design of the study that collected the data before we proceed with inference.

AP Statistics Chapter 25 Paired Samples and Blocks.

Documents

Transcript of AP Statistics Chapter 25 Paired Samples and Blocks.