Instructor’s Guide with Solutions - Macmillan Learning · Instructor’s Guide with Solutions for...
Transcript of Instructor’s Guide with Solutions - Macmillan Learning · Instructor’s Guide with Solutions for...
Instructor’s Guide with Solutions for
Daniel T. Larose’s
Discovering Statistics
Third Edition
Christina Morian Lincoln University
© 2016 by W. H. Freeman and Company
All rights reserved.
Printed in the United States of America
ISBN-10: 1-4641-8863-7
ISBN-13: 978-1-4641-8863-3
First printing
W. H. Freeman and Company
1 New York Plaza Suite 4500
New York, NY 10004
www.whfreeman.com
Contents
1 To the Instructor
2 Chapter Comments
Chapter 1 The Nature of Statistics
Chapter 2 Describing Data Using Graphs and Tables
Chapter 3 Describing Data Numerically
Chapter 4 Correlation and Regression
Chapter 5 Probability
Chapter 6 Probability Distributions
Chapter 7 Sampling Distributions
Chapter 8 Confidence Intervals
Chapter 9 Hypothesis Testing
Chapter 10 Two-Sample Inference
Chapter 11 Categorical Data Analysis
Chapter 12 Analysis of Variance
Chapter 13 Inference in Regression
Chapter 14 Nonparametric Statistics
Solutions to Exercises
Chapter 1 The Nature of Statistics
Chapter 2 Describing Data Using Graphs and Tables
Chapter 3 Describing Data Numerically
Chapter 4 Correlation and Regression
Chapter 5 Probability
Chapter 6 Probability Distributions
Chapter 7 Sampling Distributions
Chapter 8 Confidence Intervals
Chapter 9 Hypothesis Testing
Chapter 10 Two-Sample Inference
Chapter 11 Categorical Data Analysis
Chapter 12 Analysis of Variance
Chapter 13 Inference in Regression
Chapter 14 Nonparametric Statistics
iv
1 TO THE INSTRUCTOR
Discovering Statistics, Third Edition, is intended for an algebra-based, undergraduate, one-
semester course in general introductory statistics for non-majors. The only prerequisite is
Intermediate Algebra. Discovering Statistics will prepare students to work with data in fields
such as psychology, business, nursing, education, and liberal arts, to name a few.
In 2005, the American Statistical Association endorsed the GAISE guidelines, which includ-
ed the following recommendations:
1. Emphasize statistical literacy and develop statistical thinking.
2. Use real data.
3. Stress conceptual understanding rather than mere knowledge of procedures.
4. Foster active learning in the classroom.
5. Use technology for developing conceptual understanding and analyzing data.
6. Use assessments to improve and evaluate student learning, Discovering Statistics
adopts these guidelines verbatim as the course pedagogical objectives, with the fol-
lowing single adjustment: (3) Stress conceptual understanding in addition to
knowledge of procedures. To these, the text adds two course pedagogical objectives:
7. Use case studies to show how newly acquired analytic tools may be applied to a fa-
miliar problem.
8. Encourage student motivation. The text integrates data interpretation and discovery-
based methods with complete computational coverage of introductory statistics topics.
The text helps students develop their “statistical sense”—understanding the meaning
behind the numbers. The text also includes many step-by-step solutions within exam-
ples. Some examples include screen shots and computer output from TI-83/84, Ex-
cel, and Minitab, with keystroke instructions in the Step-by-Step Technology Guides
at the ends of sections. Discovering Statistics emphasizes how students will be able
v
to explain statistical results to people who are unfamiliar with statistics. Discovering
Statistics contains many current examples with real data. Students frequently ask
“When will I ever use this?” in class. Seeing examples from many different areas
helps the students see the relevance of statistics to everyday life.
Sample course outline for a one-semester course:
Week Material to be covered
1 Chapter 1
2 Chapter 2
3 Chapter 2, Chapter 3
4 Chapter 3
5 Chapter 3, Exam I
6 Chapter 4
7 Chapter 5
8 Chapter 5, Chapter 6
9 Chapter 6
10 Chapter 6, Exam II
11 Chapter 7
12 Chapter 8
13 Chapter 8, Chapter 9
14 Chapter 9, Exam III
15 Review for final and extended examples
Comprehensive final examination
You may need to adjust this outline depending on the level and the needs of the students.
You might want to consider letting the students use a formula sheet. Many students are
overwhelmed by the number of formulas is statistics.
vi
2 CHAPTER COMMENTS
CHAPTER 1: The Nature of Statistics
Chapter 1 introduces the basic ideas of the field of statistics and the methods for gath-
ering data.
Section 1.1 Data Stories: The People Behind the Numbers
The objective for this section is for the students to be able to realize that behind each
data set lies a story about real people undergoing real life-experiences. This section shares
three data stories: Declining murder rate in New York City, UFO sightings, and California
wildfires.
Section 1.2 An Introduction to Statistics
This section has three main objectives. The first main objective is to describe what
statistics is. Several examples of statistics are presented in this section from a variety of are-
as, including an example about batting average leaders in major league baseball. Several ex-
amples of how statistics is both relevant and useful in several different majors are also pro-
vided. The case study titled: “Does Friday the 13th Change Human Behavior?” outlines the
process of how researchers conducted a study in support of the hypothesis that Friday the
13th does change human behavior.
The second objective of this section is for the students to be able to state the meaning
of descriptive statistics. The concepts of elements, variables, and observations are intro-
duced. The difference between a qualitative and a quantitative variable and the difference
between quantitative variables that are discrete and quantitative variables that are continuous
are also introduced. The difference in how the different types of variables are analyzed is
discussed in later chapters. The levels of measurement are also discussed.
The third objective of this section is for the students to understand that inferential sta-
tistics refers to learning about a population by studying a sample from that population. The
idea that a population is the collection of all elements in a particular study and that a sample
vii
is a subset of the population from which information is discussed, as well as the concept of
parameters characteristics of populations and statistics characteristics of samples. The main
idea here is that there are many situations that studying the entire population is impractical or
impossible and therefore a sample is taken from the population and inferences about the pop-
ulation are made based on the sample.
Section 1.3 Gathering Data
There are four main objectives for this section. The first is to explain what a random
sample is and why we need one. A random sample is an inexpensive way to eliminate many
types of bias. How the Gallop Organization obtains a random sample and how to generate a
random sample using technology are discussed.
Since, in certain circumstances, random sampling can have shortcomings, other types
of sampling are needed. Thus the second objective is to identify other sampling methods.
The other types of sampling discussed are stratified sampling, systematic sampling, cluster
sampling, and convenience sampling.
One way to illustrate the different types of sampling is to draw samples of the students
in the classroom using the different types of sampling. To illustrate random sampling, write
each person’s name on a slip of paper and place all of the slips of paper in a box. Then ran-
domly select a certain number of names from the box. To show stratified sampling, divide
the names into males and females or freshman, sophomores, juniors and seniors. Then select
a random sample from each group. For systematic sampling, have the class count off to some
number, such as 1, 2, 3, 1, 2, 3, and so forth. Then all of one of the numbers, such as all of
the 3’s, may be sampled. Clustered sampling can be demonstrated by randomly selecting a
sample of rows or a sample of tables, and then sampling everyone in the selected row or ta-
ble.
The third objective is to explain the selection bias and good questionnaire design. The
students could either obtain questionnaires that have been used by other people on campus
viii
and discuss how they meet of fail to meet the five factors for good questionnaire design or
they could divide up into small groups and construct their own questionnaire and have the
rest of the class discuss how it meets the five factors.
The fourth objective is for the students to be able to understand the difference between
an observational study and an experiment. You may not be able to obtain the information
you require by using survey or sampling methods. In this case, you may have to conduct an
experimental study. In an experimental study, researchers investigate how varying the re-
sponse predictor variable affects the response variable. The subjects are divided into the
treatment group and the control group. There are three main factors that should be consid-
ered when designing an experimental study: control, randomization, and replication. There
are circumstances where it is either impossible, impractical, or unethical for the researcher to
place subjects into treatment and control groups, so the researcher must use an observational
study.
CHAPTER 2 Describing Data Using Graphs and Tables
In Chapter 2, we apply the adage “A picture is worth a thousand words.” The human
mind can assess information presented pictorially better than it can through words and num-
bers alone. Psychologists sometimes call this innate ability pattern recognition. Statistical
graphs and tables take advantage of this ability to quickly summarize data.
Section 2.1 Graphs and Tables for Categorical Data
The objectives of this section are for the students to be able to construct and interpret a
frequency distribution and a relative frequency distribution for qualitative data, construct and
interpret bar graphs and Pareto charts, construct and interpret pie charts, construct and inter-
pret crosstabulations (two-way tables) to describe the relationship between two variables, to
construct a clustered bar graph to describe the relationship between two variables, and work-
ing with tabular data.
ix
It will be helpful for students to see all of the graphs and tables for categorical data us-
ing one small data set to illustrate constructing the graphs and tables by hand. Examples can
be selected from students by asking for volunteer information, such as gender, year in school,
or major. Using the same data set for both the tables and the graphs gives students an oppor-
tunity to comment on the different information that can be obtained from each type of graph
and table.
It might also be helpful to notice that the Pareto chart function in Minitab constructs
the Pareto charts with the rectangles touching whereas the textbook constructs the Pareto
charts with the rectangles separate.
Section 2.2 Graphs and Tables for Quantitative Data
The objectives of this section are for the students to be able to construct and interpret a
frequency distribution and a relative frequency distribution for discrete and continuous data,
use histograms and frequency polygons to summarize quantitative data, construct and inter-
pret stem-and-leaf displays and dotplots, recognize distribution shape, symmetry, and skew-
ness, and obtaining information from graphs and tables..
It will be helpful for students to see all of the graphs and tables for quantitative data
using one data set. The data set should be small and can be collected from students. Possible
examples are quiz or exam scores or how long it took them to find a parking space. Using the
same data set for both the tables and the graphs gives students an opportunity to comment on
the different information that can be obtained from each type of graph and table.
Section 2.3 Further Graphs and Tables for Quantitative Data
The objectives of this section are for the students to be able to build cumulative fre-
quency distributions and cumulative relative frequency distributions, create frequency ogives
and relative frequency ogives, and construct and interpret time series graphs.
Section 2.4 Graphical Misrepresentations of Data
x
This section is important for statistical literacy, the ability to understand the use of
graphs, tables, and statistics in the everyday media. The objective of this section is for stu-
dents to be able to understand what can make a graph misleading, confusing, or deceptive.
The eight common methods for making a graph misleading are discussed.
A possible classroom activity would be for the students to collect graphs from newspa-
pers, magazines, or websites and to discuss any possible methods for making a graph mis-
leading that might have been used.
CHAPTER 3 Describing Data Numerically
In this chapter methods for summarizing and analyzing data sets using descriptive sta-
tistics are shown.
Section 3.1 Measures of Center
The objectives of this section are for the students to be able to calculate the mean for a
given data set, find the median and describe why the median is sometimes preferable to the
mean, find the mode of the data set, and describe how skewness and symmetry affects these
measures of center.
The Mean and Median applet included in the disk gives the students a chance to dis-
cover how a value that is much larger than the rest of the data set affects the mean and medi-
an of that data set.
Discuss the advantages and disadvantages of using the mean, the median, and the
mode. The Mean and Median applet allows the students to see for themselves that the mean
is sensitive to extreme values but the median is not. Thus the median income or the median
expenditure per students of a school district may be reported instead of the mean. The mean
is used for symmetrical, unimodal distributions like the normal distribution and the t distribu-
tion. The mode is often called a “typical case”.
xi
New topics added to this section in the exercises are the trimmed mean, the midrange, the
harmonic mean, and the geometric mean. Because the mean is sensitive to extremes values,
the trimmed mean was developed as another measure of center. A specified small percentage
of the largest values and the smallest values are omitted from the data set and the mean of the
remaining values is calculated. The midrange is the midpoint of the interval with the smallest
data value as the left endpoint and the largest data value as the right endpoint. The harmonic
mean is a measure of center most appropriately used when dealing with rates. The geometric
mean is a measure of center used to calculate growth rates.
Section 3.2 Measures of Variability
The objectives of this section are for the students to be able to understand and calcu-
late the range of a data set, explain in their own words what a deviation is, and calculate the
variance and standard deviation for a population or sample.
The text stresses that the standard deviation is in the same units as the original data. It
represents a “typical distance” an observation is from the mean.
New topics added to this section are the coefficient of variation, the mean absolute de-
viation, and the coefficient of skewness. The coefficient of variation enables analysts to
compare the variability of two data sets that are measured on different scales. For example,
suppose a box contained a random sample of 10 caterpillars that had a mean of ̅ inches
and a standard deviation of
inch and another box contained a random sample of 10 ana-
condas that had a mean of ̅ feet = 240 inches and a standard deviation of
inch.
The coefficient of variation for the caterpillars is CV
⁄
and the coefficient of variation for the anacondas is CV
⁄
. Notice that the standard deviation is the same for both
samples but the coefficient of variation is much larger for the caterpillars than it is for the an-
xii
acondas. Therefore
inch is a lot of variability for the caterpillars but very little variabil-
ity for the snakes. The mean absolute deviation (MAD) is a measure of spread that looks at
the average of the absolute deviations. The coefficient of skewness quantifies the skewness
of the distribution. Negative values of skewness are associated with left-skewed distributions
while positive values of skewness are associated with right-skewed distributions. Values
close to zero indicate symmetric distributions.
Section 3.3 Working with Grouped Data
The objectives for this section are for the students to be able to calculate the weighted
mean, estimate the mean for grouped data, and estimate the variance and standard deviation
for grouped data.
Section 3.4 Measures of Position and Outliers
The objectives for this section are for the students to be able to find percentiles for both
small and large data sets, find the percentile rank, find quartiles and the interquartile range,
calculate z-scores and explain why we use them, finding a data value given its - score, and
use z-scores to detect outliers.
One wonderful advantage of using z-scores is that they can provide us with infor-
mation about data values even when we do not fully understand the original data set. Z-
scores are used later for the normal distribution.
Minitab and graphing calculators determine the quartiles using a different method than
that which is used in the text and therefore may produce different values for these quantities.
Section 3.5 Five-Number Summary and Boxplots
The objectives for this section are the five-number summary, boxplots and the IQR
method for detecting outliers. The five-number summary consists of the minimum value of
the data set, the first quartile, the median, the third quartile, and the maximum value of the
data set. A boxplot is a graphical representation of the five-number summary. It can be used
xiii
to assess the skewness of the data set. The IQR method of detecting outliers is not sensitive
to extreme values and is therefore a more robust method of detecting outliers than the z-score
method.
CHAPTER 4 Correlation and Regression
In Chapter 4, we learn how to examine the relationship between two quantitative varia-
bles. Correlation and regression are introduced in this chapter and are discussed in more ful-
ly in Chapter 13.
Section 4.1 Scatterplots and Correlation
The objectives for this section are for the students to be able to construct and interpret
scatterplots for two quantitative variables, to calculate and interpret the value of the correla-
tion coefficient, and to perform the test for linear correlation.
Section 4.2 Introduction to Regression
The objectives for this section are for the students to be able to calculate the value and
understand the meaning of the slope and the y intercept of the regression line and to predict
values of y for given values of x. The prediction error ̂ is also calculated.
For Sections 4.1 and 4.2, construct a scatterplot and calculate the correlation coefficient
and the regression equation for the same data set. One possible example is to use high school
GPA (x) and college GPA (y). This example can also be used for Chapter 13.
Section 4.3 Further Topics in Regression Analysis
The objectives for this section are for the student to calculate the sum of squares error
(SSE) and use the standard error of the estimate as a measure of a typical prediction error,
to describe how total variability, prediction error, and improvement are measured by the total
sum of squares (SST), the sum of squares error (SSE), and the sum of squares regression
(SSR), and to calculate and explain the meaning of the coefficient of determination as a
measure of the usefulness of the regression.
xiv
Note that there two ways of calculating SST. The first way is the usual method of sum-
ming the ( ̅) and the second way is by using the fact that SST ( ) . This
method underscores the fact that SST measures the variability in .
CHAPTER 5 Probability
Chapter 5 explains the tools of probability, which enable data analysts to quantify the
level of uncertainty in statistical inference.
Section 5.1 Introducing Probability
The objectives of this section are for the students to be able to understand the meaning
of an experiment, an outcome, an event, and a sample space, to describe the classical method
of assigning probability, and to explain the Law of Large Numbers and the relative frequency
method of assigning probability.
One example that can be used to explain the difference between the classical method
of assigning probability and the relative frequency method for assigning probability is to first
ask the students a question that uses the classical method of assigning probability, such as “If
you roll a 100-sided die, what is the probability of rolling a 72?” Most students will be able
to answer “1/100” even if they have never rolled a 100-sided die before. Now ask the stu-
dents a question that uses the relative frequency method of assigning probability, such as “If
you randomly select a student from this college or university, what is the probability that the
student is taking Elementary Statistics this semester?” The students will not be able to do
this unless they know how many students are enrolled in their school and how many of them
are taking Elementary Statistics this semester. The three types of probability are just three
different sources for the numbers used as probabilities.
To illustrate the Law of Large Numbers, have students do an experiment in class such
as flipping a coin or rolling a die. Have them divide up into small groups of 2 or 3 students
each and have each group either flip the coin or roll the die the same number of times, say 50
xv
or 100 times, recording the number of heads and tails or the number of 1’s, 2’s, 3’s, 4’s, 5’s,
or 6’s. Then have them pool all of the results together and calculate the relative frequency of
heads and tails or of 1’s, 2’s, 3’s, 4’s, 5’s, and 6’s. Then have them compare the class results
to the theoretical probabilities.
The Law of Large Numbers for Proportions applet allows students to explore the Law
of Large Numbers for themselves.
Real-life examples that use the Law of Large Numbers are the proportion of girls and
boys born and Punnett squares in biology.
Section 5.2 Combining Events
The objectives of this section are for the students to be able to understand how to com-
bine events using complement, union, and intersection, and to apply the Addition Rule to
events in general and to mutually exclusive events in particular.
Start with a simple example of union and intersection, such as Example 5.12 and Ex-
ample 5.13. Two-way tables are also great examples of using unions and intersections in cal-
culating probabilities. Example 5.14 is one example of a two-way table. Another possible
example is the following:
A college band had the following number of students in each section.
Woodwind Brass Percussion Total
Instruments Instruments Instruments
Male 13 25 8 46
Female 20 11 7 38
Total 33 36 15 84
If a student from this band is selected at random, find the probability that the
xvi
student is
(a) is female (Answer:
≈ 0.4524)
(b) plays a woodwind instrument (Answer:
≈ 0.3929)
(c) is female and plays a woodwind instrument (Answer:
≈ 0.2381)
(d) is female or plays a woodwind instrument (Answer:
0.6071)
(e) plays a brass or a percussion instrument. (Answer:
=
≈ 0.6071)
Section 5.3 Conditional Probability
The objectives of this section are for the students to be able to calculate conditional
probabilities, explain independent and dependent events, solve problems using the Multipli-
cation Rule, and recognize the difference between sampling with replacement and sampling
without replacement.
Two-way tables are also provide great examples of conditional probability. Example
5.17 is one example of a two-way table. Another example of a two-way table is the follow-
ing:
A college band had the following number of students in each section.
Woodwind Brass Percussion Total
Instruments Instruments Instruments
Male 13 25 8 46
Female 20 11 7 38
Total 33 36 15 84
xvii
If a student from this band is selected at random, find the probability that the
student is
(a) is female, given that the student plays a woodwind instrument (Answer:
≈
0.5263. Stress that all of the numbers come from the row labeled “Female”, so condi-
tional probability reduces the sample space.)
(b) plays a woodwind instrument, given that the student is female (Answer:
0.6061. Stress that all of the numbers come from the column labeled “Woodwind In-
struments”, so conditional probability reduces the sample space.)
This example can be used to demonstrate that P(A|B) ≠ P(B|A).
Section 5.4 Counting Methods
The objectives of this section are for the students to be able to apply the Multiplication
Rule for Counting to solve certain counting problems, use permutations and combinations to
solve certain counting problems, and compute probabilities using permutations.
One way to explain how to use the counting rule is to ask the students to first determine
how they would perform the task. Once this is determined, the numbers will usually fall into
place. For example, suppose you wanted to line up 6 people. Ask the students how many
ways can they select a first person in line. There are 6 ways. Have them select a first person
in line. Now ask the students how many ways are there to select a second person in line.
There are 5 ways to do this. Have them choose a second person in line. Now ask them how
many ways are there to choose a third person in line. Continue this process until all 6 of the
people are in line. Explain that there are 6·5·4·3·2·1 = 720 ways to line up 6 people.
The usual way of explaining the difference between permutations and combinations is
that order is important in permutations but not important in combinations. It also helps to
explain that in a permutation each object or person selected is doing something different or
specific whereas in a combination every object or every person is doing the same thing. For
xviii
example, calculating how many ways can a club with 10 members select a president, a vice
president, and a secretary is a permutation since each person selected is doing something dif-
ferent or specific. The first person selected is the president, the second person selected is the
vice president, and the third person selected is the secretary. However, calculating the num-
ber of ways that a club with 10 members can select a committee of three people to organize
the spring picnic is a combination. All 3 people selected are working on the same thing and it
does not matter who was picked first, who was picked second and who was picked third. It
only matters who was picked for the committee.
CHAPTER 6 Probability Distributions
In Chapter 6, students encounter random variables and probability distributions. With
these new tools, they can increase the efficiency of their decision making.
Section 6.1 Discrete Random Variables
The objectives of this section are for the students to be able to identify random varia-
bles, explain what a discrete probability distribution is and construct probability distribution
tables and graphs, and to calculate the mean, variance, and standard deviation of a discrete
random variable.
Describing the number of students in class today is a way to convey the meaning of a
discrete variable. For example, there may be either 20 or 21 students in class but nothing in
between (there are not 20
students in the class or 20.23 students in the class, for instance). A
continuous variable can be explained by using an example of distance. How far can someone
throw an eraser in the room? If someone throws an eraser towards the back wall of the room,
the eraser can land distance between 0 feet from the person to the distance from the person to
the back wall of the room. It is possible for someone to throw an eraser 10
feet or 8.645
feet. Another example of a continuous variable is time.
xix
Section 6.2 Binomial Probability Distributions
The objectives of this section are for the students to be able to explain what constitutes
a binomial experiment, compute probabilities using the binomial probability formula, find
probabilities using the binomial tables, and calculate and interpret the mean, variance, and
standard deviation of the binomial random variable.
Section 6.3 Poisson Probability Distributions
The objectives of this section are for the student to be able to explain the requirements
for the Poisson probability distribution, compute probabilities for a Poisson random variable,
calculate the mean, variance, and standard deviation of a Poisson random variable, and use
the Poisson distribution to approximate the binomial distribution. The Poisson probability
distribution is used when we wish to find the probability of observing a certain number of
occurrences ( ) of a particular event within a fixed interval of space or time.
Section 6.4 Continuous Random Variables and the Normal Probability Distribution
The objectives of this section are for the students to be able to identify a continuous
probability distribution and state the requirements, calculate probabilities for the uniform
probability distribution,to explain the properties of the normal probability distribution, to find
areas under the standard normal curve, given a Z-value, find the standard normal Z-value,
given an area, and using normal probability plots to assess normality.
. The normal probability is sometimes referred to as the bell-shaped curve. It is also consid-
ered to be the most important probability distribution in the world.
Students sometimes have trouble understanding that a negative Z-value does not
correspond to a negative area. Explain that a negative Z-value just means that you are draw-
ing the line left of 0, so at least part of the area is from the left side of the distribution. For
example, if they were to cut a piece off of the left side of their desk, it would still be a posi-
tive amount of desk. Or if they had a cake shaped like the normal distribution or a calzone, if
xx
they were to cut the cake or the calzone on the left side they would still have a positive
amount of cake or calzone.
Section 6.5 Applications of the Normal Distribution
The objectives of this section are for the students to be able to compute probabilities
for a given value of any normal random variable, and to find the appropriate value of any
random variable, given an area or a probability.
Section 6.6 Normal Approximation to the Binomial Probability Distribution
The objective of this section is for the student to calculate binomial probabilities using
the normal approximation to the binomial distribution.
CHAPTER 7 Sampling Distributions
In Chapter 7, students are introduced to point estimation, sampling distributions, and
one of the most important results in statistical inference, the Central Limit Theorem.
Section 7.1 Central Limit Theorem for Means
The objectives of this section are for the students to be able to, to describe the sampling
distribution of ̅ for skewed and symmetric populations as the sample size increases, to ap-
ply the Central Limit Theorem for Means to solve probability questions about the sample
mean, find percentiles for the sample mean.
Section 7.2 Central Limit Theorem for Proportions
The objectives of this section are for the students to be able to explain the sampling
distribution of the sample proportion ̂, describe the sampling distribution of the sample pro-
portion ̂ for extreme and moderate values of ̂, and to apply the Central Limit Theorem for
Proportions to solve probability questions about the sample proportion.
xxi
CHAPTER 8 Confidence Intervals
In Chapter 7, we learned that point estimation cannot determine how close a point es-
timate is to its target parameter. There has to be a better way—and there is: confidence inter-
vals. By studying the patterns implicit in the sampling distribution of a statistic (such as the
sample mean or sample proportion), we can infer with a certain degree of confidence that the
associated population parameter lies within a certain interval.
Section 8.1 Z interval for the Mean
The objectives of this section are for the student to be able to calculate a point estimate
of the population mean, to calculate and interpret a Z interval for the population mean when
the population is normal and when the sample size is large, to reduce the margin of error, and
to calculate the sample size needed to estimate the population mean. The Confidence Interval
applet is used to demonstrate the concept that a confidence level of 90% means that if we
take sample after sample for a very long time, then in the long run, the proportion of intervals
that will contain the parameter will equal 90%. This applet is also used to demonstrate that
99% confidence intervals are wider than 90% confidence intervals. The Normal Density
Curve applet is used to find ⁄ critical values for several different confidence levels.
Section 8.2 t Interval for the Population Mean
The objectives for this section are for the students to be able to describe the characteris-
tics of the t distribution and calculate and interpret a t interval for the population mean.
Section 8.3 Z Interval for the Population Proportion
The objectives of this section are for the students to be able to calculate the point esti-
mate ̂ of the population proportion p, construct and interpret a Z interval for the population
proportion p, compute and interpret the margin of error for the Z interval for p, and determine
the sample size needed to estimate the population proportion.
xxii
Polls reported in the news are examples of confidence intervals for proportions. Na-
tional pollsters almost always use 95% as their confidence level and usually try to select the
sample size necessary to create a margin of error of about 3%.
Section 8.4 Confidence Intervals for the Population Variance and Standard Deviation
The objectives of this section are for the students to be able to describe the properties
of the (chi-square) distribution, find critical values for the (chi-square) distribution, and
construct confidence intervals for the population variance and standard deviation. The dis-
tribution is used in Section 9.6 for hypothesis tests about the population standard deviation
and in Chapter 11 for categorical data analysis.
CHAPTER 9 Hypothesis Testing
Hypothesis testing is the most widely used method for statistical inference in the
world. Hypothesis testing is a process for rendering a decision about the unknown value of a
population parameter.
Discuss how hypothesis testing is similar to a court case. Just as a person is innocent
until proven guilty, the null hypothesis is assumed true unless the sample evidence indicates
that the alternative hypothesis is true instead. Sometimes an innocent person is found guilty,
which is analogous to a Type I error. Sometimes the court fails to convict a guilty person,
which is analogous to making a Type II error.
Section 9.1 Introduction to Hypothesis Testing
The objectives of this section are for the students to be able to construct the null hy-
pothesis and the alternative hypothesis from the statement of the problem and state the two
types of errors made in hypothesis tests: the Type I error, made with probability , and the
Type II error, made with probability .
xxiii
Students have trouble constructing the null hypothesis and the alternative hypothesis
from the statement of the problem. The text gives 3 steps to follow in constructing the hy-
pothesis. Stress looking for key words or phrases and determining what the problem is ask-
ing to be proved.
Section 9.2 Z Test for the Population : Critical-Value Method
The objectives of this section are for the student to be able to explain the essential idea
about hypothesis testing for the population mean and to perform the Z test for the mean using
the critical-value method.
Section 9.3 Z Test for the Population Mean : p-Value Method
The objectives of this section are for the students to be able to perform the Z test for the
mean using the p-value method, to assess the strength of evidence against the null hypothesis,
describe the relationship between the p-value method and the critical-value method, and to
use the Z confidence interval for the mean to perform the two-tailed Z test for the mean.
Section 9.4 t Test for the Population Mean
The objectives of this section are for the students to be able to perform the t test for the
mean using the critical-value method, carry out the t test for the mean using the p-value
method, and use confidence intervals to perform two-tailed hypothesis tests. Instructors
should stress that, if the population standard deviation is unknown, it is wrong to use the
test.
Section 9.5 Z Test for the Population Proportion p
The objectives of this section are for the students to be able to perform the Z test for p
using the critical-value method, carry out the Z test for p using the p-value method, and use
confidence intervals for p to perform two-tailed hypothesis tests about p. The instructor
might want to mention that the p-value and p represent different quantities.
Section 9.6 Chi-Square Test for the Population Standard Deviation
xxiv
The objectives of this section are for the students to be able to perform the test for
using the critical-value method, carry out the test for using the p-value method, and to
use confidence intervals for to perform two-tailed hypothesis tests about .
Section 9.7 Probability of Type II Error and the Power of a Hypothesis Test
The objectives of this section are for the students to be able to calculate the probability of
a Type II error for a Z test for and to compute the power of a Z test for and construct a
power curve.
CHAPTER 10 Two-Sample Inference
In Chapter 10 we examine differences in the characteristics of two populations.
Section 10.1 Inference for Mean Difference—Dependent Samples
The objectives of this section are for the students to be able to distinguish between in-
dependent samples and dependent samples, to perform hypothesis tests for the population
mean difference for dependent samples, to construct and interpret confidence intervals for the
population mean difference for dependent samples, and to use a t interval for to perform t
tests about .
Dependent samples are frequently used for studies in which a comparison is made be-
tween a group before some treatment happens and after some treatment happens. For exam-
ple, you may want to compare running times of students on the track team at the beginning of
the season and at the end of the season. You could also compare swimming times of students
on the swimming team at the beginning and end of the season. If teachers give pretests and
post tests to measure student learning in a course, these could also be used as an example of
dependent samples.
Section 10.2 Inference for Two Independent Means
xxv
The objectives of this section are for the students to be able to perform and interpret t
tests about using Welch’s method, compute and interpret t intervals for us-
ing Welch’s method, to use confidence intervals for to perform two-tailed t tests
about , to perform and interpret t tests and t intervals about using the pooled
variance method, and to apply Z tests and Z intervals for when and are known.
Examples of independent samples could be to compare the amount of time studying per
week between men and women or to compare the amount of time studying per week between
freshmen, sophomores, juniors, and seniors using data gathered from the students in class.
Section 10.3 Inference for Two Independent Proportions
The objectives of this section are for the students to be able to perform and interpret Z
tests for , compute and interpret confidence intervals for , and use Z intervals
for to perform two-tailed Z tests.
Examples could be to compare the difference between the proportion of men and wom-
en in class who are education majors or to compare the proportion of men and women in
class who are engineering majors.
Section 10.4 Inference for Two Independent Standard Deviations
The objectives of this section are for the students to be able to describe the F distribu-
tion and the F test for two population standard deviations, to perform hypothesis tests for two
population standard deviations using the critical-value method, and to perform hypothesis
tests for two population standard deviations using the p – value method.
CHAPTER 11 Categorical Data Analysis
Chapter 11 introduces the multinomial random variable, an extension of the binomial
random variable. Students learn methods for performing hypothesis tests for this type of data
using the distribution.
xxvi
One possible example is to collect data from the class to determine if there is a differ-
ence in majors between men and women.
Section 11.1 Goodness of Fit Test
The objectives of this section are for the students to be able to explain what a multino-
mial random variable is and how to calculate expected frequencies, describe how a good-
ness of fit test works, and perform and interpret the results from the goodness of fit test
using the critical value method, the exact p-value method, and the estimated p-value method.
Section 11.2 Tests for Independence and for Homogeneity of Proportions
The objectives of this section are for the students to be able to explain what a test
for the independence of two variables is, perform and interpret a test for independence of
two variables using the critical value method and the p-value method, and perform and inter-
pret the results of a test for the homogeneity of proportions.
CHAPTER 12 Analysis of Variance
Chapters 12 and 13 introduce you to perhaps the most powerful and widespread statis-
tical procedures in the world. In this chapter, we compare the means of several populations
to determine whether significant differences exist. This chapter introduces us to analysis of
variance, a way to compare the population means of several different groups and determine
whether significant differences exist between these means.
One possible example is to collect data from the class to see whether different frater-
nities and sororities have different GPAs using analysis of variance. Other similar examples
could be to see whether men and women athletes have different GPAs or whether athletes on
different sports teams have different GPAs.
Section 12.1 One-Way Analysis of Variance (ANOVA)
xxvii
The objectives of this section are for the students to be able to explain how analysis of
variance works and how to perform one-way analysis of variance.
Section 12.2 Multiple Comparisons
The objectives of this section are for the students to be able to perform multiple com-
parison tests using the Bonferroni method, to use Tukey’s test to perform multiple compari-
sons, and to use confidence intervals to perform multiple comparisons for Tukey’s test.
Section 12.3 Randomized Block Design
The objective of this section is to explain the power of the randomized block design
and to perform a randomized block design ANOVA. You could demonstrate randomized
block design by using test scores and blocking by year in school or whether or not the stu-
dents have had calculus.
Section 12.4 Two-Way ANOVA
The objectives of this section are to construct and interpret an interactive graph and to
perform a two-way ANOVA.
CHAPTER 13 Regression Analysis
In this chapter, we learn about regression analysis. Regression analysis develops an
equation that can describe the relationship between two quantitative variables, often for the
purposes of prediction. We were introduced to some of the methods for investigating the re-
lationship between two quantitative variables in Chapter 4. It will be helpful to review these
now as a preparation for this chapter.
The example of high school GPA (x) and college GPA (y) used in Chapteodelr 4 can
also be used for this chapter. All the sums of squares can be calculated for this example as
well as the confidence interval for the slope of the regression line. The t test for the slope of
the regression line may also be performed on this example.
Section 13.1 Inference About the Slope of the Regression Line
xxviii
The objectives of this section are for the students to be able to explain the regression
model and the regression model assumptions, to perform the hypothesis test for the slope
of the population regression equation, to construct confidence intervals for the slope , and
to use confidence intervals to perform the hypothesis test for the slope .
Section 13.2 Confidence Intervals and Prediction Intervals
The objectives of this section are for the students to be able to construct confidence in-
tervals for the mean value of for a given value of and to construct prediction intervals for
a randomly chosen value of for a given value of .
Section 13.3 Multiple Regression
The objectives of this section are for the students to be able to find the multiple regres-
sion equation, interpret the multiple regression coefficients, and use the multiple regression
equation to make predictions, to calculate and interpret the adjusted coefficient of determina-
tion, to perform the test for the overall significance of the multiple regression, to conduct
tests for the significance of individual predictor variables, to explain the use and effect of
dummy variables in multiple regression, and to apply the strategy for building a multiple re-
gression model.
CHAPTER 14 Nonparametric Statistics
Chapter 14 introduces nonparametric (distribution-free) statistics.
Section 14.1 Introduction to Nonparametric Statistics
The objectives of this section are for the student to be able to explain what a nonparamet-
ric hypothesis test is, and why we use it and to describe what is meant by the efficiency of a
nonparametric test.
Section 14.2 Sign Test
xxix
The objectives of this section are for the students to be able to perform the sign test for a
single population median, to carry out the sign test for matched-pair data from two dependent
samples, and to perform the sign test for binomial data.
Section 14.3 Wilcoxon Signed Ranks Test for Matched-Pair Data
The objectives for this section are for the students to be able to assess whether or not the
data is symmetric, to carry out the Wilcoxon signed ranks test for matched-pair data from two
dependent samples, and to perform the Wilcoxon signed ranks test for a single population
median. A possible example is that you could record a local station’s predicted high tempera-
ture and the actual high temperature for 2 weeks and test whether there is a difference.
Section 14.4 Wilcoxon Rank Sum Test for Two Independent Samples
The objective of this section is to perform the Wilcoxon rank sum test for the difference
in population medians, using two independent samples.
Section 14.5 Kruskal-Wallis Test
The objective of this section is for the student to be able to perform the Kruskal-Wallis
test for equal medians in three or more populations.
Section 14.6 Rank Correlation Test
The objective of this section is for the students to be able to perform the rank correlation
test for paired data. A possible example is that you could test whether there is a correlation
between the students’ individual grades and the number of absences a student has.
Section 14.7 Runs Test for Randomness
The objective of this section is for the students to be able to perform the runs test for ran-
domness. A possible example is to flip a coin 20 times and record the result of each flip.
Test for randomness. You could also record the gender of the next 20 or 30 people that enter
the school’s library, bookstore, or cafeteria and test for randomness.
1
Chapter 1: The Nature of Statistics
Section 1.1
1. The steepest negative slope in the graph is between the years 1994 and 1995.
2. The murder rate does not always go down year by year. The murder rate increased
between 1992 and 1993, between 2002 and 2003, between 2005 and 2006, between 2007 and
2008, and between 2009 and 2010.
3.
(a) About 36,000,000. Actual answer: 36,132,147.
(b) About 7600. Actual answer: 7594.
4.
(a) About 23,000,000. Actual answer: 22,859,968. (b) About 900. Actual answer: 873.
5.
About 5400. Actual answer: 5428.
Instructor’s Guide with Solutions 2
6.
About 3500. Actual answer: 3493.
7. The Eiler fire is the largest. The Junction fire is the smallest.
8. The Eiler fire is the most contained. The Happy Camp Complex fire is the least
contained.
Section 1.2
1. Statistics is the art and science of collecting, analyzing, presenting, and interpreting
data.
2. We call the entities from which the data are collected elements.
3. A qualitative variable is a variable that does not assume a numerical value, but is
usually classified into categories. A quantitative variable is a variable that takes on numerical
values.
4. Another term for a qualitative variable is a categorical variable.
5. True.
6. A population is the collection of all elements (persons, items, or data) of interest in a
particular study. A sample is a subset of the population from which the information is
collected.
7. A statistic is a characteristic of a sample.
8. A parameter is a characteristic of a population. The value of a parameter is constant
but usually unknown. A statistic is a characteristic of a sample. The value of a statistic may
vary from sample to sample but is usually known.
9. A census is the collection of data from every element in the population.
10. False. Statistical inference consists of methods for drawing conclusions about
population characteristics based on the information contained in a subset (sample) of that
population.
11. The elements are the teams: Dragonborn, Sprites, Enchanters, Trolls
12. The variables are Captain’s gender, Wins, Rank, and Winning percentage.
13. (a) Captain’s gender can take the values male or female.
(b) The observation for the Sprites is captain’s gender is female, 9 wins, rank is 2, and the
winning percentage is 0.600.
14. Quantitative variables: Wins, Winning percentage
Instructor’s Guide with Solutions 3
Qualitative variables: Captain’s gender, Rank
15. Since the number of wins is counted, the variable Wins is discrete:
Since the winning percentage can be any number between 0 and 1 inclusive the variable Win-
ning percentage is continuous.
16. The variable Captain’s gender is qualitative and it takes the values male and female.
There is no natural ordering for these values. Therefore the variable Captain’s gender is
nominal.
The values that the variable Wins takes are 0, 1, 2… These are numerical. There is a natural
0 since a team can have 0 wins. Division can be performed on these values. For ex-ample, a
team with 10 wins has 10/5 = 2 times as many wins as a team with 5 wins. There-fore, the
variable Wins is ratio.
The values that the variable Rank takes are 1, 2, 3… These are numerical, so there is a natu-
ral ordering to them. However, there is no natural 0 and arithmetic does not make sense.
Therefore, the variable Rank is ordinal.
The variable Winning percentage takes any number between 0 and 1, inclusive. There is a
natural 0 since a team with 0 wins has a winning percentage of 0. Division can be per-
formed on the values of winning percent. For example, a team with a winning percent of
0.800 has a winning percent that is 0.800/0.200 = 4 times the winning percent as a team with
a winning percent of 0.200. Therefore the variable Winning percentage is ratio.
17. The elements are the players: Miguel Cabrera, Michael Cuddyer, Joe Mauer, Michael
Trout, Chris Johnson
18. The variables are Team, Batting average, Hits, Rank, and Year of birth.
19. (a) The variable Team can take the values Detroit Tigers, Colorado Rockies, Minne-
sota Twins, Los Angeles Angels, and Atlanta Braves.
(b) The observation for Miguel Cabrera is his team is the Detroit Tigers, his batting aver-
age is 0.348, the number of hits is 193, his rank is 1, and his year of birth is 1983.
20. Quantitative: Batting average, Hits, Year of birth
Qualitative: Team, Rank
21. The number of hits can be counted. Therefore, the variable Hits is discrete.
A player’s year of birth can either be 1979 or 1980 or 1981, etc., and nothing in between.
Therefore, the variable Year of birth is discrete.
A player’s batting average can take any value between 0 and 1, inclusive. Therefore, the var-
iable Batting average is continuous.
22. The variable Team takes is qualitative. There is no natural ordering of the teams.
Therefore, the variable Team is nominal.
The variable Batting average takes values between 0 and 1, inclusive. There is a natural 0
and division is possible. For example, a batting average of 0.300 is 0.300/0.100 = 3 times the
batting average of a player with a 0.100 batting average. Therefore, the variable Batting av-
erage is ratio.
The variable Hits takes values 0, 1, 2, ….. There is a natural 0 since it is possible for a player
to have 0 hits and division is possible. For example, 20 hits is 20/4 = 5 times more hits than 4
hits. Therefore, the variable Hits is ratio.
Instructor’s Guide with Solutions 4
The values that the variable Rank can have are 1, 2, 3… These are numerical, so there is a
natural ordering to them. However, there is no natural 0 and arithmetic does not make sense.
Therefore, the variable Rank is ordinal.
The variable Year of birth is numerical so it can be ordered or ranked. Subtraction makes
sense. For example, 2009–1979 = 30 years. However there is no natural 0 and division does
not make sense. Therefore the variable Year of birth is interval.
23. The elements are the schools: University of Phoenix, Devry University, ITT Tech-
nical Institute, Penn State University, Kaplan University
24. The variables are State, School type, Recipients, and Total loan amount ($ millions).
25. (a) The variable School type can take the values proprietary and public.
(b) The observation for Penn State University is it is in the state of PA, its school type is
public, it had 42,011 federal student loan recipients in the 2013–2014 academic year, and it
had a total federal student loan amount $151 million in the 2013–014 academic year.
26. Quantitative variables: Recipients, Total loan amount ($ millions)
Qualitative variables: State, School type
27. The number of recipients is counted, so the variable Recipients is discrete. The total
amount of federal student loans for the school is counted to the nearest million dollars, so the
variable Total loan amount ($ millions) is discrete.
28. The variable State is qualitative. There is no natural order for the values that State
takes. Therefore, the variable State is nominal.
The variable School type is qualitative. There is no natural order for the values that School
type takes. Therefore, the variable School type is nominal.
The variable Recipients takes values that are numerical. There is a natural 0 and division is
possible. For example, a school with 800 recipients has 800/200 = 4 times as many recipients
of federal student loans as a school with 200 recipients. Therefore the variable Recipients is
ratio.
The variable Total loan amount ($ millions) takes values that are numerical. There is a natu-
ral 0 and division is possible. For example, a school with $300 million in total loans has
$300 million/$100 million = 3 times the total loan amount of federal student loans as a school
with $100 million in total loans. Therefore the variable Total loan amount ($ millions) is ra-
tio.
29. (a) The values for the variable year you were born are numbers that can be ranked or
ordered, so the variable is quantitative. Since there are a finite number of years in which you
could have been born, the variable is discrete.
(b) There is no natural zero. Division (2000/1990) does not make sense. However,
subtraction does make sense. For example, someone born in 1993 is 3 years younger than
someone born in 1990 (1993–1990 = 3). Therefore, the variable year you were born
represents interval data.
30. (a) Since the only possible values of the variable are yes or no, the variable is
qualitative.
(b) Since there is no natural ordering for the values yes and no, the data represent nominal
data.
31. (a) Quantitative. Since the price of tea in China is rounded to the nearest whole unit
Instructor’s Guide with Solutions 5
of currency, it is discrete.
(b) The price of tea in China represents ratio data. There is a natural zero ($0.00 per pound or
$0.00 per box). Here, division does make sense. That is, tea that costs $10.00 per pound costs
twice as much as tea that costs $5.00 per pound.
32. (a) Quantitative. Since the SAT Math scores are whole numbers, the variable SAT
Math score is discrete.
(b) The SAT Math score of the person sitting next to you represents interval data. Since the
lowest possible score is 200, there is no natural zero. Also, division does not make sense
because a score of 400 does not mean that the person had twice as many correct answers as a
person with a score of 200. Hence, the data are not ratio data. However, subtraction does
make sense, because a score of 400 is 400 – 200 = 200 more points than a score of 200.
33. (a) Quantitative. Since the winning score is a whole number, the winning score in
next year’s Super Bowl is discrete.
(b) The winning score in next year’s Super Bowl represents ratio data. There is a natural zero
because it is possible for a team to score 0 points. Here division makes sense because a score
of 28 points represents twice as many points as a score of 14 points.
34. (a) Qualitative.
(b) The winning team in next year’s Super Bowl represents nominal data because there is no
natural or obvious way that the data may be ordered. Also, no arithmetic can be carried out
on the winning team in next year’s Super Bowl.
35. (a) Qualitative.
(b) The rank of the winning Super Bowl team in their division represents ordinal data
because the ranks may be arranged in a particular order and no arithmetic may be performed
on them.
36. (a) Quantitative. Since the number of friends on a student’s Facebook page is a whole
number, the number of friends on a student’s Facebook page is discrete.
(b) The number of friends on a student’s Facebook page represents ratio data. Since it is
possible for a student to have 0 friends on their Facebook page, there is a natural zero. Here,
division makes sense because a person with 12 friends on their Facebook page has twice as
many friends on their Facebook page as a person with 6 friends on their Facebook page.
37. (a) Since the possible values of the variable are not numeric but names such as
“Bones” or “The Big Bang Theory,” the variable is qualitative.
(b) Since there is no natural ordering of the names of television shows the variable represents
nominal data.
38. (a) Quantitative. Since the number of contacts you have on your cell phone is a
whole number, the number of contacts you have on your cell phone is discrete.
(b) How many contacts you have on your cell phone represents ratio data. Since it is possible
for someone to have 0 contacts on their cell phone, there is a natural zero. Here division
makes sense because a person with 20 contacts on his or her cell phone has twice as many
contacts as a person with 10 contacts on that person’s cell phone.
39. (a) Since the possible values of the variable are not numeric, but rather ice cream
flavors such as “rocky road” or “strawberry,” the variable is qualitative.
(b) Since there is no natural ordering of ice cream flavors, the data represent nominal data.
Instructor’s Guide with Solutions 6
40. (a) Quantitative. Since your credit card balance is given in dollars and cents, your
credit card balance is discrete.
(b) Your credit card balance represents ratio data. Since a person may have a credit card
balance of $0.00, there is a natural zero. Since a credit card balance of $2000 is twice as
much as a credit card balance of $1000, division makes sense here.
41. (a) Quantitative. Since how old your car is can be any real number greater than or
equal to 0, how old your car is continuous.
(b) How old your car is represents ratio data. Since a car that was just purchased is 0 years
old, there is a natural zero. A car that is 6 years old is twice as old as a car that is 3 years old,
so division makes sense in this case.
42. (a) Qualitative.
(b) The model of your car represents nominal data because there is no natural or obvious way
that the data may be ordered. Also, no arithmetic can be carried out on the models of cars.
43. The 4 teams listed in Table 6 are all of the teams in the intramural league. Therefore,
the 4 teams listed in Table 6 represent a population.
44. There are more than 5 Major League baseball players. Therefore, the 5 Major League
baseball players listed in Table 7 represent a sample.
45. There are more than 5 universities in the United States. Therefore, the 5 universities
in Table 8 represent a sample.
46. Since the data in Table 6 represents a population, the team with the most wins in the
league is a parameter.
47. Since the data in Table 7 represent a sample, the oldest player is a statistic.
48. Since the data in Table 8 represent a sample, the result 4 out of 5 (80%) of the
universities are proprietary is a statistic.
49. Since the data in Table 6 represents a population, descriptive statistics is indicated.
50. The data in Table 7 represents a sample. The average number of hits of these 5
players is a statistic. Since this statistic is used to infer that the average number of hits of all
players in the league is the same as the average number of hits as these 5 players, statistical
inference is indicated.
51. The data in Table 8 represent a sample. The result 4 out of 5 (80%) of the universities
are proprietary is a statistic. Since this statistic is used to infer that 80% of all universities are
proprietary, statistical inference is indicated.
52. The population is all home sales in Tarrant County, Texas. The sample is the 100
home sales selected.
53. The population is all veterans returning from war. The sample is the 20 veterans
selected.
54. Population: all 4-H clubs in Maricopa County, Arizona. Sample: 10 selected 4-H
clubs.
55. The population is all older women. The sample is the 10 patients of the physical
therapist that she selected.
56. The population is all students at Portland Community College. The sample is the 50
Portland Community College students that were selected.
Instructor’s Guide with Solutions 7
57. The population is all companies that recently underwent a merger. The sample is the
50 companies that recently underwent a merger that were selected.
58. Descriptive statistics. The average price of homes sold in Jacksonville, Florida is a
descriptive statistic because it describes a sample. But no inference is made regarding a larger
population.
59. Statistical inference. A sample of automobile passengers was taken, and the sample
proportion of automobile passengers who wear seat belts was calculated. Then this sample
proportion was used to make an inference about what percentage of automobile passengers
wear seatbelts.
60. Statistical inference. A sample was taken, and the sample average percentage of
people in which the cholesterol level was lowered by daily exercise was calculated. Then this
percentage was used to make an inference about how much daily exercise can lower
everyone’s cholesterol level.
61. Descriptive statistics. The proportion of traffic fatalities in New York that involved
alcohol is a descriptive statistic because it describes a sample. But no inference is made
regarding a larger population.
62. Descriptive statistics. The goal-against average for the Charlestown Chiefs hockey
team is a descriptive statistic because it describes the sample. But no inference was made
regarding a larger population.
63. Statistical inference. A sample of 15- to 18-year-olds was taken, and the sample
percentage of 15- to 18-year-olds who use illicit drugs was calculated. Then this percentage
was used to make an inference about the percentage of 15- to 18-year-olds who use illicit
drugs.
64. Descriptive statistics. The average on the first statistics test in Ms. Reynolds’s class is
a descriptive statistic because it describes a sample. But no inference was made regarding a
larger population.
65. (a) Elements: Endangered species—pygmy rabbit, Florida panther, Red wolf, and
West Indian manatee; variables—year listed as endangered, estimated number remaining,
and range.
(b) Qualitative variables: Since the values of the variable range are not numerical, range is a
qualitative variable. Quantitative variables: Since the values of the variables year listed as
endangered and estimated number remaining are numerical and can be ranked or ordered, the
variables year listed as endangered and estimated number remaining are quantitative
variables.
(c) Since there are a finite number of values for the variable year listed as endangered, the
variable is discrete. Since the values for the variable estimated number remaining can be
counted, the variable is discrete.
(d) There is no natural zero for the variable year listed as endangered. Division of the values
of the variable year listed as endangered is not possible. However subtraction is possible. For
example, a species listed as endangered in 1995 was listed as endangered 7 years before a
species listed as endangered in 2002 (2002 – 1995 = 7). Therefore, the variable year listed as
endangered represents interval data. The variable estimated number remaining has a natural
zero. Division is possible on the values for the variable estimated number remaining. For
example, a species with 15 remaining has 2.5 times as many members left as a species with 6
remaining (15/6 = 2.5). Therefore, the variable estimated number remaining represents ratio
data. There is no natural order for the values of the variable range. Therefore, the variable
Instructor’s Guide with Solutions 8
range represents nominal data.
(e) 1973, 50, Florida.
66. (a) Elements: Companies—City of Santa Monica, St. John’s Health Center, The
Macerich Company, Fremont General Corp., and Entravision Corp.; variables—employees
and industry.
(b) Qualitative variables: Since the values for the variable industry are not numerical, the
variable industry is qualitative. Quantitative variables: Since the values for the variable
employees are numerical and can be ordered or ranked, the variable employees is quantitative.
(c) Since the values of the variable employees can be counted, the variable is discrete.
(d) The variable employees has a natural zero. Division is possible on the values of the
variable employees. For example, an industry with 6 employees has 2/3 as many employees
as an industry with 9 employees (6/9 = 2/3). Therefore, the variable employees represents
ratio data. There is no natural ordering for the values of the variable industry. Therefore, the
variable industry represents nominal data.
(e) 1892, government.
67. (a) Elements: States—Texas, Missouri, Minnesota, Ohio, and South Dakota;
variables—proportion of GE corn and most prevalent type.
(b) Qualitative variables: Since the values of the variable most prevalent type are not
numerical, the variable most prevalent type is qualitative. Quantitative variables: Since the
values of the variable proportion of GE corn are numerical and can be ranked or ordered, the
variable proportion of GE corn is quantitative.
(c) Since the variable proportion of GE corn can take on any value between 0 and 1 inclusive,
the variable proportion of GE corn is continuous.
(d) There is a natural zero for the variable proportion of GE corn. Division is also possible
for the values of the variable proportion of GE corn. For example, a state with 16% GE corn
has 2 times the proportion of GE corn as a state with 8% GE corn (16/8 = 2). Therefore, the
variable proportion of GE corn represents ratio data. Since there is no natural order for the
values of the variable most prevalent type, the variable most prevalent type represents
nominal data.
(e) 89%, herbicide-tolerant.
68. (a) Elements: Hospital names—Hardy Wilson, Humphreys County, Jefferson County,
Lackey Memorial, Leake Memorial, Madison County, Monfort Jones, and Rankin Medical
Center; variables—beds, city, and ZIP.
(b) Qualitative variables: Since the values of the variable city are not numerical, city is a
qualitative variable. The values of the variable zip are numerical, but the numbers represent
areas of the country. Therefore, the variable zip is qualitative. Quantitative variables: Since
the values of the variable beds are numbers that can be ranked or ordered, the variable beds is
quantitative.
(c) Since the values of the variable beds are values that were counted, the variable beds is
discrete.
(d) The values of the variable beds have a natural zero. Division is possible on the values of
the variable beds. For example, a hospital with 15 beds has half as many beds as a hospital
with 30 beds (15/30 = 1/2). Therefore, the variable beds represents ratio data. There is no
natural order for the values of the variables city and zip. Therefore, the variables city and zip
Instructor’s Guide with Solutions 9
represent nominal data.
(e) 134, Brandon, 39042.
69. (a) Elements: Hospitals—Briarcliff Manor, Buchanan, Cortlandt, Croton-on-Hudson,
Mount Pleasant, Ossining 1, Ossining 2, Peekskill, Pleasantville, and Sleepy Hollow;
variables—births and average maternal age.
(b) Qualitative variables: There are no qualitative variables. Quantitative variables: The
values of the variables births and average maternal age are numbers that can be ranked or
ordered, so both of these variables are quantitative.
(c) The values of the variable births represent values that were counted so the variable births
is discrete. The values of the variable average maternal age are calculated from data that
were measured and can be any real number between the youngest mother and the oldest
mother, so the variable average maternal age is continuous.
(d) There is a natural zero for both the births and average maternal age variables. Division is
possible on the values of both the births and average maternal age variables. For example, a
hospital with 80 births had 3.2 times as many births as a hospital that had 25 births (80/25 =
3.2), and a hospital with an average maternal age of 33 has an average maternal age of 1.1
times the average maternal age of a hospital with an average maternal age of 30 (33/30 =
1.1). Therefore, the variables births and average maternal age represent ratio data.
(e) 134, 29.2.
70. (a) Elements: Commodities—oil, gold, and wheat; variables—price per share and
percent change.
(b) Qualitative variables: There are no qualitative variables. Quantitative variables: The
values of the variables price per share and percent change are numbers that can be ranked or
ordered so both variables are quantitative.
(c) The values of the variable price per share can be any nonnegative real number and the
values of the variable percent change can be any real number. Therefore, both of these
variables are continuous.
(d) Both of the variables price per share and percent change have a natural zero. Division is
possible for the values of both variables. For example, a price per share of $120 is 3 times a
price per share of $40 (120/40 = 3), and a percent change of +1.05 is 0.875 of a percent
change of +1.20 (1.05/1.20 = 0.875). Therefore the variables price per share and percent
change represent ratio data.
(e) $1243.62, − 0.110%.
71. (a) Elements are the tornado names: Tri-State, Natchez, St. Louis, Tupelo, Gaines-
ville. Variables: deaths, year
(b) Qualitative variables: There are no qualitative variables. Quantitative variables: Since the
values of the variables deaths and year are numbers that can be ranked or ordered, the
variables deaths and year are quantitative.
(c) Since the values of the variable deaths are values that are counted, the variable deaths is
discrete. Since the values of the variable year are whole numbers with no numbers in
between, the variable year is discrete.
(d) Since it is possible to have 0 tornado deaths in a year, there is a natural 0 for the variable
deaths. Division is possible on the values of the variable deaths. For example, a year with 35
tornado deaths has 7 times as many tornado deaths as a year with 5 tornado deaths (35/5 = 7).
Instructor’s Guide with Solutions 10
Therefore the variable deaths represents ratio data. There is an order for the values of the
variable year. Subtraction can be performed on the values of the variable year. However
there is no natural 0 for the variable year and division cannot be performed on the values of
the variable year. Therefore, the variable year represents interval data.
(e) 255, 1896
72. (a) Sample.
(b) No; these companies are relatively large and there are probably many more small
companies than large companies.
73. (a) Sample.
(b) This sample could not be considered a random sample of the annual number of tornado
deaths of all years. The 5 years selected were not selected randomly, but were selected
according to which 5 years had the most tornado deaths.
74. They tested a sample of their own light bulbs, found the average lifetime of the
sample, compared it to the average lifetimes of other current models of light bulbs, and found
the average lifetime of their sample to be longer than the reported average lifetimes of other
current models of light bulbs.
75. (a) This is a statistic because it came from a sample.
(b) An estimate of the average lifetime of all new light bulbs is the average lifetime of the
sample of 100 light bulbs, which is 2000 hours. Thus the company can claim that “The
average lifetime of this new model of light bulb is 2000 hours.”
76. (a) The elements are the institutions: Ashford University, Arizona State University,
Liberty University, Miami Dade College, Lone Star College System
(b) State, enrollment, and rank.
(c) The values of the variable state are not numbers, so the variable state is qualitative. The
variable rank is also qualitative, even though its values are numerical. The values of the
variable rank are not counting anything or measuring anything.
(d) The values of the variables enrollment are numbers that can be ranked or ordered, so the
variable enrollment is quantitative.
(e) There is no natural order for the values of the variable state, so the values of the variable
state represent nominal data. Since it is possible to have an enrollment of 0, there is a natural
zero for the variable enrollment. Division is possible on the values of the variable enrollment.
For example, a university with an enrollment of 20,000 students has 20,000/30,000 = 2/3 of
the enrollment of a university with an enrollment of 30,000. Therefore the variable
enrollment represents ratio data. The values of the variable rank can be ordered, but no
arithmetic can be performed on them. Therefore, the variable rank represents ordinal data.
77. (a) Sample.
(b) No, they are the five university campuses with the largest enrollment.
(c) Arizona State University is located in Arizona and in 2014 it had 72,254 students making
it the university campus with the second-highest enrollment.
78. The qualitative variables are platform, studio, and type.
79. The quantitative variables are sales for week, sales total, and weeks on list.
Instructor’s Guide with Solutions 11
80. The number of weeks that a video game is on the top 30 list is counted. Therefore,
the variable Weeks on list is discrete.
81. The list in Table 3 represents a sample. Only the 30 best-selling video games are in-
cluded.
82. Since the list in Table 3 represents a sample, the number for highest sales for the week
represents a statistic.
83. Since the values for the variables platform, studio, and type are qualitative and there is
no natural ordering for the values of these variables, they are nominal.
84. No.
85. The variables sales for week, sales total, and weeks on list have values that can be di-
vided and have natural zeros Therefore, the variables sales for week, sales total, and weeks
on list are ratio.
86. No. The variables platform, studio, and type are qualitative and there is no natural
ordering for their values. Therefore, they are nominal. The variables sales for week, sales
total, and weeks on list take values that can be divided and have natural zeros. Therefore,
they are ratio.
87. Descriptive statistics. No attempt was made to use the fact that the Xbox 360 version
of Grand Theft Auto V outsold the PS3 version of the game during the week of May 17, 2014
to predict that the Xbox 360 version of Grand Theft Auto V will outsell the PS3 version of the
game during any week after the week of May 17, 2014.
88. Statistical inference. We are using the fact that the Xbox 360 version of Grand Theft
Auto V outsold the PS3 version of the game during the week of May 17, 2014 to predict that
the Xbox 360 version of Grand Theft Auto V will outsell the PS3 version of the game during
the next week after the week of May 17, 2014.
Section 1.3
1. Convenience sampling usually includes only a select group of people. For example,
surveying people at a mall on a workday during working hours would probably include few if
any people who work full time.
2. The Literary Digest poll exhibited selection bias. The Literary Digest used lists of
people who owned cars and had telephones, which resulted in the exclusion of millions of
poor and underprivileged people who largely supported Roosevelt. The sample was therefore
highly biased toward the richer people who were more likely to support Alf Landon. Thus,
the results of their poll incorrectly indicated that Alf Landon would win.
3. The Literary Digest could have decreased the bias in their poll by choosing a random
sample of houses and apartments and surveying the people door to door. They would have
been more likely to include people who were poor or underprivileged by using this method
and thus their sample would have been more representative of the population.
4. No, the Literary Digest poll was not a random sample. Only people who had a phone
or a car were sampled. People who did not have a phone or a car had no chance of being
sampled.
5. A random sample is a sample for which every element has an equal chance of being
included.
6. In an observational study, the researcher observes whether the subjects’ differences in
Instructor’s Guide with Solutions 12
the predictor variable are associated with differences in the response variable. No attempt is
made to create differences in the predictor variable. In an experimental study, researchers
investigate how varying the predictor variable affects the response variable. Subjects are
randomly placed into treatment and control groups.
7. Answers will vary.
8. Answers will vary.
9. Answers will vary.
10. Answers will vary.
11. Illinois, Iowa, Michigan State, Nebraska, Ohio State, Purdue
12. Arkansas, Georgia, Mississippi, South Carolina, Vanderbilt
13. Alabama, Georgia, Mississippi State, Texas A&M
14. California, Oregon State, USC, Washington State
15. Answers will vary.
16. Answers will vary.
17. Answers will vary.
18. Answers will vary.
19. Answers will vary.
20. Answers will vary.
21. No. The sample would likely not be a representative sample of the Southeastern
Conference or of all college football teams. This sample will likely not contain at least one of
the best teams, at least one of the worst teams, and at least one team in the middle of either
the Southeastern Conference or college football.
22. No. The 2 colleges in the Pacific 12 Conference in the state of Washington would
likely not be a representative sample of the Pacific 12 Conference or of all college football
teams. Since there are only 2 teams in the sample, it will not contain at least one of the best
teams, at least one of the worst teams, and at least one of the teams in the middle of either the
Pacific 12 Conference or college football. Also, it is hard to get a representative sample with
only 2 teams.
23. This is cluster sampling because (a) the population was divided into clusters (class
ranks), (b) a random sample of the clusters (class ranks) was taken, and (c) all of the students
in that class rank (cluster) were selected.
24. Systematic sampling is represented.
25. This is convenience sampling, since you are choosing a sample that is convenient to
you.
26. This is cluster sampling because (a) the population was divided into clusters (lab
sections), (b) a random sample of the clusters (lab sections) was taken, and (c) all of the
students in those lab sections (clusters) were selected.
27. Target population: all college students; potential population: all students working out
at the gymnasium on the Monday night Brandon was there.
28. Yes; students working out at the gymnasium are more likely to be physically fit than
the rest of the students.
Instructor’s Guide with Solutions 13
29. Target population: all small businesses; potential population: small businesses near
the state university.
30. Yes; businesses near the state university are more likely to employ college students
than businesses farther away.
31. What is meant by “sometimes”? This is vague terminology.
32. This is a leading question, which is clearly trying to influence the respondent’s
answer.
33. This question would only be understood by someone who knows about graduated
income taxes, and is neither simple nor clear.
34. This is asking two questions in one. It is possible that respondents support one, the
other, or both of these issues.
35. (a) Observational.
(b) Response variable: how often they attend religious services; predictor variable: whether
or not the family is large (at least four children).
36. (a) Observational.
(b) Response variable: stock price; predictor variable: whether or not the company gives
large bonuses to its CEOs.
37. (a) Experimental.
(b) Response variable: performance of the electronics equipment; predictor variable: whether
or not a piece of equipment has a new computer processor.
38. (a) Experimental.
(b) Response variable: whether or not the person’s blood pressure is lowered; predictor
variable: whether or not the person is taking the new drug.
39. Level of insect damage to crops.
40. Whether or not the new pesticide was used.
41. The new pesticide.
42. The traditional pesticide.
43. LDL cholesterol level in the bloodstream.
44. Whether a person is given new medication or a placebo.
45. New medication.
46. Placebo.
47. (a) Randomization is present for the 100 randomly assigned subjects but not for the
subjects with high LDL cholesterol levels.
(b) The sample of 100 people is probably enough replication.
48. (a) Randomization is present
(b) Two subjects each is probably insufficient replication to uncover any strong statistical
results.
49. Experiment
Instructor’s Guide with Solutions 14
50. Observational study
51. (a) Answers will vary
(b) No. Every possible sample of 5 video games has the same chance of being selected.
(c) No. Every possible sample of 5 video games has the same chance of being selected.
Some of the samples will contain the video game and some won’t.
(d) Answers will vary, answers will vary
52. Minecraft for PS3, Titanfall for Xbox One, Titanfall for Xbox 360, Super Luigi U for
Wii U, Battlefield 4 for Xbox 360, Battlefield 4 for PS3, Yoshi’s New Island for 3DS, Mario
Kart 7 for 3DS
53. Answers will vary
54. Answers will vary
55. The poll by Ann Landers was extremely biased. Only people who read Ann Landers’s
column and felt strongly about the poll responded to this poll. Further, there was no
mechanism to guard against people responding more than once or to keep people who don’t
have children from responding. The Newsday poll was done professionally; therefore, the
sample used was more likely to be representative of the population.
56. The target population is all high schools in New England, and the potential population
is all high schools in greater Boston. The potential for selection bias is that the sample is not a
random sample of all high schools in New England. The drop-out rate for all of New England
high school students may be different than the drop-out rate for those 15 high schools in
greater Boston.
57. The target population is all people living in Chicago, and the potential population is
people who have phones and who have their phone number listed in the Chicago phone
directory. The potential for selection bias is that many of the people living below the poverty
level in Chicago may not have phones. Also, many people may have unlisted numbers.
Further, the poverty level is determined by family size as well as income, and this survey
does not take that into consideration.
58. The question may be interpreted in more than one way. Some people might think that
the question is asking for a choice to be made between rap and hip-hop music. Others may
think the answer to the question is either yes or no. This is because it is actually two
questions in one.
59. The survey question is a leading question.
60. No, the researcher would not be justified in reporting “Two-thirds of women support
abortion.” The women responded to the question “Do you support the right of a woman to
terminate a pregnancy when her life is in danger?” and not the question “Do you support
abortion?” The women may have answered each question differently.
61. (a) No, we do not know what the lowest price in the sample will be before we select
the sample. Since the sample is randomly selected, we don’t know which stocks will be
selected before we select our sample. Different samples may contain different lowest stock
prices.
(b) Answers will vary.
(c) No, if we take another sample of size 2, it is not likely to comprise the same two
companies. Since the samples are randomly selected, they will probably contain different
companies.
Instructor’s Guide with Solutions 15
(d) Answers will vary.
62. (a) No, we don’t know what the lowest price in the sample will be before we select
the sample. No, we don’t know whether our sample will be the same as in the previous
exercise. Since the sample is selected randomly, we don’t know which 2 companies will be
selected before we select the sample. Different samples may contain different lowest stock
prices.
(b) Answers will vary.
(c) Answers will vary.
63. A quantity like “the lowest price in a random sample of stocks” is a variable that may
vary from sample to sample.
64. The response variable is the risk for a second heart attack and the predictor variable is
whether the patient followed a Mediterranean diet or a Western diet.
65. (a) Forcing the parents of a treatment group to smoke tobacco would increase the
occurrence of respiratory illnesses in their children, which is not very ethical. (b)
Observational study.
66. It is unethical.
67. (a) The control is the placebo bracelet.
(b) The subjects were randomly assigned to wear either the placebo bracelet or the ionized
bracelet.
(c) There is replication of data since there are 305 subjects in both the treatment and the
control group.
68. (a) The predictor variable is whether the subject had the placebo bracelet or the
ionized bracelet.
(b) The treatment is wearing the ionized bracelet.
(c) The response variable is the measure of pain.
69. This study is an experimental study because the subjects were randomly assigned to
either a treatment or a control.
Chapter 1 Review Exercises
1. (a) Make/Models: Chevrolet, Corvette, Ferrari 458 Italia, Honda CR-Z, Jaguar F
Convertible, Porsche Boxster S
(b) Cylinders, transmission, combined mileage
2. (a) Transmission
(b) Cylinders, combined mileage
(c) The variable cylinders takes values that are numerical. There is a natural 0 and division
can be performed on these values. Therefore, the variable cylinders is ratio.
The variable transmission is qualitative. There is no natural ordering of the values of the var-
iable transmission. Therefore, the variable transmission is nominal.
The variable combined mileage takes values that are numerical. There is a natural 0 and divi-
sion can be performed on these values. Therefore, the variable combined mileage is ratio.
Instructor’s Guide with Solutions 16
3. The observation for the Chevrolet Corvette is it has 8 cylinders, it has a manual
transmission, and its combined city/highway gas mileage is 21 mpg.
4. (a) Elements are the states: California, Texas, New York, Florida, Illinois
Variables: Population (1960, in 1000s), Population (2013, in 1000s), increase
(b) Quantitative
(c) The observation for Florida is its population in 1960 was 4,952 thousand, its population
in 2013 was 19,953 thousand, which is an increase of 14,601 from 1960.
(d) Largest: California, Texas, Florida. Smallest: New York, Illinois
5. (a) The only way to find out the population average lifetime of all one million light
bulbs in the inventory is to turn on all one million light bulbs and leave them all on until they
burn out, measuring the time it takes for each light bulb to burn out. All of these lifetimes can
then be used to calculate the population average lifetime of all one million light bulbs.
(b) This would require burning out all one million light bulbs that are in stock so that there
would be no good light bulbs left to sell. It would be better to take a random sample of the
light bulbs, find the average lifetime of the sample, and use the sample average lifetime of the
light bulbs to estimate the population average lifetime of the light bulbs.
6. (a) The population was all registered voters in the United States.
(b) All people on the lists of people who owned cars and had telephones.
(c) The sample was the people on the lists of people who owned cars and had telephones who
returned the ballots .
(d) The sample was not similar to the population in all characteristics. The sample had less
poor and underprivileged people than the population. It also had a smaller proportion of
Roosevelt supporters and a larger proportion of Alf Landon supporters than the population.
7. (a) You would use an observational study.
(b) Since people are already enrolled in their statistics classes, it would be impractical to
randomly reassign people to a statistics class after classes have started.
8. (a) The experimental factor violated is replication.
(b) The larger the sample size is, the more precise is the inference it produces. Surveying
only 4 dentists is not likely to get a sample representative of the population of all dentists.
9. We would use an observational study. It would be impossible to randomly assign a
child to come from a single-parent family or a two-parent family.
Chapter 1 Quiz
1. False. Statistical inference consists of methods for estimating and drawing
conclusions about population characteristics based on the information contained in a sample.
2. False. A parameter is a characteristic of a population.
3. collecting
4. observation
5. sample
6. A sample survey is an example of an observational study.
Instructor’s Guide with Solutions 17
7. An experimental study is involved.
8. The predictor variable is whether an elderly patient with Alzheimer’s is given the new
drug or the placebo. The response variable is whether the patient’s Alzheimer’s symptoms are
reduced.
9. (a) The population is all statistics students.
(b) The sample is the random sample of students selected from the statistics class.
(c) The variable is whether the student is left-handed. It is a categorical variable.
(d) The sample proportion is not likely to be exactly the same as the population proportion.
But it is not likely to be very far away from the population proportion because which
statistics class a person enrolls in is not based on whether the person is left-handed.
10. Different people have different interpretations of the words often, occasionally,
sometimes, and seldom.