AP Statistics Chapter 25 Paired Samples and Blocks.
-
Upload
bryan-pitts -
Category
Documents
-
view
219 -
download
0
Transcript of AP Statistics Chapter 25 Paired Samples and Blocks.
AP StatisticsChapter 25
Paired Samples and Blocks
Objectives:
• Paired Data• Paired t-Test• Paired t-Confidence Interval
Paired Data
• Data are paired when the observations are collected in pairs or the observations in one group are naturally related to observations in the other group.
• Paired data arise in a number of ways. Perhaps the most common is to compare subjects with themselves before and after a treatment.– When pairs arise from an experiment, the pairing is a type
of blocking. – When they arise from an observational study, it is a form of
matching.
Paired Data (cont.)
• If you know the data are paired, you can (and must!) take advantage of it. – To decide if the data are paired, consider how they were
collected and what they mean (check the W’s).– There is no test to determine whether the data are paired.
• Once we know the data are paired, we can examine the pairwise differences. – Because it is the differences we care about, we treat them as
if they were the data and ignore the original two sets of data.
Paired Data (cont.)
• Now that we have only one set of data to consider, we can return to the simple one-sample t-test.
• Mechanically, a paired t-test is just a one-sample t-test for the means of the pairwise differences.– The sample size is the number of pairs.
Assumptions and Conditions
• Paired Data Assumption: – Paired data Assumption: The data must be paired.
• Independence Assumption: – Independence Assumption: The differences must be independent of
each other. – Randomization Condition: Randomness can arise in many ways. What
we want to know usually focuses our attention on where the randomness should be.
– 10% Condition: When a sample is obviously small, we may not explicitly check this condition.
• Normal Population Assumption: We need to assume that the population of differences follows a Normal model.– Nearly Normal Condition: Check this with a histogram or Normal
probability plot of the differences.
The Paired t-Test
• When the conditions are met, we are ready to test whether the paired differences differ significantly from zero.
• We test the hypothesis H0: d = 0, where the d’s are the pairwise differences and 0 is almost always 0.
The Paired t-Test (cont.)
• We use the statistic
where n is the number of pairs.
• is the ordinary standard error for the
mean applied to the differences.
• When the conditions are met and the null hypothesis is true, this statistic follows a Student’s t-model on n – 1 degrees of freedom, so we can use that model to obtain a P-value.
0
1n
dt
SE d
dsSE d
n
Example: Matched Pairs t Hypothesis Test
• The following chart shows the number of men and women employed in randomly selected professions. Do these data suggest that there is a significant difference in gender in the workplace?
Solution:
• Choose method– Matched pairs t hypothesis test– The differences are (males – females): 1681, 837, 182, 466, -206,
133, -723, 399.
• Check conditions– Independence: data paired because they count employees in the
same occupations.– Randomization: occupations were randomly selected– Population Size: there are more than 80 occupations.– Sample Size: n=8, a small sample size, must check for skew and
outliers
Solution:
• Hypotheses: H0: μd=0 The mean gender difference in the workplace is 0.
Ha: μd≠0 The mean gender difference in the workplace is not 0.
• Find the t-statistic:
• Find the p-value: p-value = 2P(t ≥ 1.373) = .212• Conclusion: The p-value is too large at any commonly accepted
levels to be able to reject the null hypothesis. There is insufficient evidence to support a difference in gender in the workplace.
Confidence Intervals for Matched Pairs
• When the conditions are met, we are ready to find the confidence interval for the mean of the paired differences.
• The confidence interval is
where the standard error of the mean difference is
The critical value t* depends on the particular confidence level, C, that you specify and on the degrees of freedom, n – 1, which is based on the number of pairs, n.
dsSE d
n
1nd t SE d
Example: Confidence Intervals for Matched Pairs
• Over the ten-year period 1990-2000, the unemployment rates for Australia and the United Kingdom were reported as follows.
Find a 90% confidence interval for the mean difference in unemployment rates for Australia and the united Kingdom.
Solution:
• Choose method.– Matched pairs t confidence interval.– The paired differences are: (AU-UK) .4, .8, .6, .4, 0, -.2, .4, 1.6, 1.6, 1.2, .8.
• Check conditions– Independence: Not, matched pairs is the method to use.– Randomization: Yes, no reason to believe that this sequence of years is not representative of the
mean of the differences in unemployment rates of these 2 countries.– Sample Size: n < 15, check for skew and outliers.
• There is some skewness and no outliers, use t-distribution.
• Calculation the interval– df = n-1 = 10, t* = 1.812, , s = .589– 90% CI is (.369, 1.013)
• Conclusion– We are 90% confident that the mean difference in the unemployment rates between AU and UK is
between .369 and 1.013.
Blocking
• Consider estimating the mean difference in age between husbands and wives.
• The following display is worthless. It does no good to compare all the wives as a group with all the husbands—we care about the paired differences.
Blocking (cont.)
• In this case, we have paired data—each husband is paired with his respective wife. The display we are interested in is the difference in ages:
Blocking (cont.)
• Pairing removes the extra variation that we saw in the side-by-side boxplots and allows us to concentrate on the variation associated with the difference in age for each pair.
• A paired design is an example of blocking.
What Can Go Wrong?
• Don’t use a two-sample t-test for paired data.• Don’t use a paired-t method when the samples
aren’t paired.• Don’t forget outliers—the outliers we care about
now are in the differences.• Don’t look for the difference between means of
paired groups with side-by-side boxplots.
What have we learned?
• Pairing can be a very effective strategy.– Because pairing can help control variability between
individual subjects, paired methods are usually more powerful than methods that compare independent groups.
• Analyzing data from matched pairs requires different inference procedures.– Paired t-methods look at pairwise differences.
• We test hypotheses and generate confidence intervals based on these differences.
– We learned to Think about the design of the study that collected the data before we proceed with inference.