Instructor’s Guide with Solutions - Macmillan Learning · Instructor’s Guide with Solutions for...

Instructor’s Guide with Solutions for

Daniel T. Larose’s

Discovering Statistics

Third Edition

Christina Morian Lincoln University

© 2016 by W. H. Freeman and Company

All rights reserved.

Printed in the United States of America

ISBN-10: 1-4641-8863-7

ISBN-13: 978-1-4641-8863-3

First printing

W. H. Freeman and Company

1 New York Plaza Suite 4500

New York, NY 10004

www.whfreeman.com

Contents

1 To the Instructor

2 Chapter Comments

Chapter 1 The Nature of Statistics

Chapter 2 Describing Data Using Graphs and Tables

Chapter 3 Describing Data Numerically

Chapter 4 Correlation and Regression

Chapter 5 Probability

Chapter 6 Probability Distributions

Chapter 7 Sampling Distributions

Chapter 8 Confidence Intervals

Chapter 9 Hypothesis Testing

Chapter 10 Two-Sample Inference

Chapter 11 Categorical Data Analysis

Chapter 12 Analysis of Variance

Chapter 13 Inference in Regression

Chapter 14 Nonparametric Statistics

Solutions to Exercises

Chapter 1 The Nature of Statistics

Chapter 2 Describing Data Using Graphs and Tables

Chapter 3 Describing Data Numerically

Chapter 4 Correlation and Regression

Chapter 5 Probability

Chapter 6 Probability Distributions

Chapter 7 Sampling Distributions

Chapter 8 Confidence Intervals

Chapter 9 Hypothesis Testing

Chapter 10 Two-Sample Inference

Chapter 11 Categorical Data Analysis

Chapter 12 Analysis of Variance

Chapter 13 Inference in Regression

Chapter 14 Nonparametric Statistics

iv

1 TO THE INSTRUCTOR

Discovering Statistics, Third Edition, is intended for an algebra-based, undergraduate, one-

semester course in general introductory statistics for non-majors. The only prerequisite is

Intermediate Algebra. Discovering Statistics will prepare students to work with data in fields

such as psychology, business, nursing, education, and liberal arts, to name a few.

In 2005, the American Statistical Association endorsed the GAISE guidelines, which includ-

ed the following recommendations:

1. Emphasize statistical literacy and develop statistical thinking.

2. Use real data.

3. Stress conceptual understanding rather than mere knowledge of procedures.

4. Foster active learning in the classroom.

5. Use technology for developing conceptual understanding and analyzing data.

6. Use assessments to improve and evaluate student learning, Discovering Statistics

adopts these guidelines verbatim as the course pedagogical objectives, with the fol-

lowing single adjustment: (3) Stress conceptual understanding in addition to

knowledge of procedures. To these, the text adds two course pedagogical objectives:

7. Use case studies to show how newly acquired analytic tools may be applied to a fa-

miliar problem.

8. Encourage student motivation. The text integrates data interpretation and discovery-

based methods with complete computational coverage of introductory statistics topics.

The text helps students develop their “statistical sense”—understanding the meaning

behind the numbers. The text also includes many step-by-step solutions within exam-

ples. Some examples include screen shots and computer output from TI-83/84, Ex-

cel, and Minitab, with keystroke instructions in the Step-by-Step Technology Guides

at the ends of sections. Discovering Statistics emphasizes how students will be able

v

to explain statistical results to people who are unfamiliar with statistics. Discovering

Statistics contains many current examples with real data. Students frequently ask

“When will I ever use this?” in class. Seeing examples from many different areas

helps the students see the relevance of statistics to everyday life.

Sample course outline for a one-semester course:

Week Material to be covered

1 Chapter 1

2 Chapter 2

3 Chapter 2, Chapter 3

4 Chapter 3

5 Chapter 3, Exam I

6 Chapter 4

7 Chapter 5


9 Chapter 6

10 Chapter 6, Exam II

11 Chapter 7

12 Chapter 8


14 Chapter 9, Exam III

15 Review for final and extended examples

Comprehensive final examination

You may need to adjust this outline depending on the level and the needs of the students.

You might want to consider letting the students use a formula sheet. Many students are

overwhelmed by the number of formulas is statistics.

vi

2 CHAPTER COMMENTS

CHAPTER 1: The Nature of Statistics

Chapter 1 introduces the basic ideas of the field of statistics and the methods for gath-

ering data.

Section 1.1 Data Stories: The People Behind the Numbers

The objective for this section is for the students to be able to realize that behind each

data set lies a story about real people undergoing real life-experiences. This section shares

three data stories: Declining murder rate in New York City, UFO sightings, and California

wildfires.

Section 1.2 An Introduction to Statistics

This section has three main objectives. The first main objective is to describe what

statistics is. Several examples of statistics are presented in this section from a variety of are-

as, including an example about batting average leaders in major league baseball. Several ex-

amples of how statistics is both relevant and useful in several different majors are also pro-

vided. The case study titled: “Does Friday the 13th Change Human Behavior?” outlines the

process of how researchers conducted a study in support of the hypothesis that Friday the

13th does change human behavior.

The second objective of this section is for the students to be able to state the meaning

of descriptive statistics. The concepts of elements, variables, and observations are intro-

duced. The difference between a qualitative and a quantitative variable and the difference

between quantitative variables that are discrete and quantitative variables that are continuous

are also introduced. The difference in how the different types of variables are analyzed is

discussed in later chapters. The levels of measurement are also discussed.

The third objective of this section is for the students to understand that inferential sta-

tistics refers to learning about a population by studying a sample from that population. The

idea that a population is the collection of all elements in a particular study and that a sample

vii

is a subset of the population from which information is discussed, as well as the concept of

parameters characteristics of populations and statistics characteristics of samples. The main

idea here is that there are many situations that studying the entire population is impractical or

impossible and therefore a sample is taken from the population and inferences about the pop-

ulation are made based on the sample.

Section 1.3 Gathering Data

There are four main objectives for this section. The first is to explain what a random

sample is and why we need one. A random sample is an inexpensive way to eliminate many

types of bias. How the Gallop Organization obtains a random sample and how to generate a

random sample using technology are discussed.

Since, in certain circumstances, random sampling can have shortcomings, other types

of sampling are needed. Thus the second objective is to identify other sampling methods.

The other types of sampling discussed are stratified sampling, systematic sampling, cluster

sampling, and convenience sampling.

One way to illustrate the different types of sampling is to draw samples of the students

in the classroom using the different types of sampling. To illustrate random sampling, write

each person’s name on a slip of paper and place all of the slips of paper in a box. Then ran-

domly select a certain number of names from the box. To show stratified sampling, divide

the names into males and females or freshman, sophomores, juniors and seniors. Then select

a random sample from each group. For systematic sampling, have the class count off to some

number, such as 1, 2, 3, 1, 2, 3, and so forth. Then all of one of the numbers, such as all of

the 3’s, may be sampled. Clustered sampling can be demonstrated by randomly selecting a

sample of rows or a sample of tables, and then sampling everyone in the selected row or ta-

ble.

The third objective is to explain the selection bias and good questionnaire design. The

students could either obtain questionnaires that have been used by other people on campus

viii

and discuss how they meet of fail to meet the five factors for good questionnaire design or

they could divide up into small groups and construct their own questionnaire and have the

rest of the class discuss how it meets the five factors.

The fourth objective is for the students to be able to understand the difference between

an observational study and an experiment. You may not be able to obtain the information

you require by using survey or sampling methods. In this case, you may have to conduct an

experimental study. In an experimental study, researchers investigate how varying the re-

sponse predictor variable affects the response variable. The subjects are divided into the

treatment group and the control group. There are three main factors that should be consid-

ered when designing an experimental study: control, randomization, and replication. There

are circumstances where it is either impossible, impractical, or unethical for the researcher to

place subjects into treatment and control groups, so the researcher must use an observational

study.

CHAPTER 2 Describing Data Using Graphs and Tables

In Chapter 2, we apply the adage “A picture is worth a thousand words.” The human

mind can assess information presented pictorially better than it can through words and num-

bers alone. Psychologists sometimes call this innate ability pattern recognition. Statistical

graphs and tables take advantage of this ability to quickly summarize data.

Section 2.1 Graphs and Tables for Categorical Data

The objectives of this section are for the students to be able to construct and interpret a

frequency distribution and a relative frequency distribution for qualitative data, construct and

interpret bar graphs and Pareto charts, construct and interpret pie charts, construct and inter-

pret crosstabulations (two-way tables) to describe the relationship between two variables, to

construct a clustered bar graph to describe the relationship between two variables, and work-

ing with tabular data.

ix

It will be helpful for students to see all of the graphs and tables for categorical data us-

ing one small data set to illustrate constructing the graphs and tables by hand. Examples can

be selected from students by asking for volunteer information, such as gender, year in school,

or major. Using the same data set for both the tables and the graphs gives students an oppor-

tunity to comment on the different information that can be obtained from each type of graph

and table.

It might also be helpful to notice that the Pareto chart function in Minitab constructs

the Pareto charts with the rectangles touching whereas the textbook constructs the Pareto

charts with the rectangles separate.

Section 2.2 Graphs and Tables for Quantitative Data

The objectives of this section are for the students to be able to construct and interpret a

frequency distribution and a relative frequency distribution for discrete and continuous data,

use histograms and frequency polygons to summarize quantitative data, construct and inter-

pret stem-and-leaf displays and dotplots, recognize distribution shape, symmetry, and skew-

ness, and obtaining information from graphs and tables..

It will be helpful for students to see all of the graphs and tables for quantitative data

using one data set. The data set should be small and can be collected from students. Possible

examples are quiz or exam scores or how long it took them to find a parking space. Using the

same data set for both the tables and the graphs gives students an opportunity to comment on

the different information that can be obtained from each type of graph and table.

Section 2.3 Further Graphs and Tables for Quantitative Data

The objectives of this section are for the students to be able to build cumulative fre-

quency distributions and cumulative relative frequency distributions, create frequency ogives

and relative frequency ogives, and construct and interpret time series graphs.

Section 2.4 Graphical Misrepresentations of Data

x

This section is important for statistical literacy, the ability to understand the use of

graphs, tables, and statistics in the everyday media. The objective of this section is for stu-

dents to be able to understand what can make a graph misleading, confusing, or deceptive.

The eight common methods for making a graph misleading are discussed.

A possible classroom activity would be for the students to collect graphs from newspa-

pers, magazines, or websites and to discuss any possible methods for making a graph mis-

leading that might have been used.

CHAPTER 3 Describing Data Numerically

In this chapter methods for summarizing and analyzing data sets using descriptive sta-

tistics are shown.

Section 3.1 Measures of Center

The objectives of this section are for the students to be able to calculate the mean for a

given data set, find the median and describe why the median is sometimes preferable to the

mean, find the mode of the data set, and describe how skewness and symmetry affects these

measures of center.

The Mean and Median applet included in the disk gives the students a chance to dis-

cover how a value that is much larger than the rest of the data set affects the mean and medi-

an of that data set.

Discuss the advantages and disadvantages of using the mean, the median, and the

mode. The Mean and Median applet allows the students to see for themselves that the mean

is sensitive to extreme values but the median is not. Thus the median income or the median

expenditure per students of a school district may be reported instead of the mean. The mean

is used for symmetrical, unimodal distributions like the normal distribution and the t distribu-

tion. The mode is often called a “typical case”.

xi

New topics added to this section in the exercises are the trimmed mean, the midrange, the

harmonic mean, and the geometric mean. Because the mean is sensitive to extremes values,

the trimmed mean was developed as another measure of center. A specified small percentage

of the largest values and the smallest values are omitted from the data set and the mean of the

remaining values is calculated. The midrange is the midpoint of the interval with the smallest

data value as the left endpoint and the largest data value as the right endpoint. The harmonic

mean is a measure of center most appropriately used when dealing with rates. The geometric

mean is a measure of center used to calculate growth rates.

Section 3.2 Measures of Variability

The objectives of this section are for the students to be able to understand and calcu-

late the range of a data set, explain in their own words what a deviation is, and calculate the

variance and standard deviation for a population or sample.

The text stresses that the standard deviation is in the same units as the original data. It

represents a “typical distance” an observation is from the mean.

New topics added to this section are the coefficient of variation, the mean absolute de-

viation, and the coefficient of skewness. The coefficient of variation enables analysts to

compare the variability of two data sets that are measured on different scales. For example,

suppose a box contained a random sample of 10 caterpillars that had a mean of ̅ inches

and a standard deviation of

inch and another box contained a random sample of 10 ana-

condas that had a mean of ̅ feet = 240 inches and a standard deviation of

inch.

The coefficient of variation for the caterpillars is CV

⁄

and the coefficient of variation for the anacondas is CV

⁄

. Notice that the standard deviation is the same for both

samples but the coefficient of variation is much larger for the caterpillars than it is for the an-

xii

acondas. Therefore

inch is a lot of variability for the caterpillars but very little variabil-

ity for the snakes. The mean absolute deviation (MAD) is a measure of spread that looks at

the average of the absolute deviations. The coefficient of skewness quantifies the skewness

of the distribution. Negative values of skewness are associated with left-skewed distributions

while positive values of skewness are associated with right-skewed distributions. Values

close to zero indicate symmetric distributions.

Section 3.3 Working with Grouped Data

The objectives for this section are for the students to be able to calculate the weighted

mean, estimate the mean for grouped data, and estimate the variance and standard deviation

for grouped data.

Section 3.4 Measures of Position and Outliers

The objectives for this section are for the students to be able to find percentiles for both

small and large data sets, find the percentile rank, find quartiles and the interquartile range,

calculate z-scores and explain why we use them, finding a data value given its - score, and

use z-scores to detect outliers.

One wonderful advantage of using z-scores is that they can provide us with infor-

mation about data values even when we do not fully understand the original data set. Z-

scores are used later for the normal distribution.

Minitab and graphing calculators determine the quartiles using a different method than

that which is used in the text and therefore may produce different values for these quantities.

Section 3.5 Five-Number Summary and Boxplots

The objectives for this section are the five-number summary, boxplots and the IQR

method for detecting outliers. The five-number summary consists of the minimum value of

the data set, the first quartile, the median, the third quartile, and the maximum value of the

data set. A boxplot is a graphical representation of the five-number summary. It can be used

xiii

to assess the skewness of the data set. The IQR method of detecting outliers is not sensitive

to extreme values and is therefore a more robust method of detecting outliers than the z-score

method.

CHAPTER 4 Correlation and Regression

In Chapter 4, we learn how to examine the relationship between two quantitative varia-

bles. Correlation and regression are introduced in this chapter and are discussed in more ful-

ly in Chapter 13.

Section 4.1 Scatterplots and Correlation

The objectives for this section are for the students to be able to construct and interpret

scatterplots for two quantitative variables, to calculate and interpret the value of the correla-

tion coefficient, and to perform the test for linear correlation.

Section 4.2 Introduction to Regression

The objectives for this section are for the students to be able to calculate the value and

understand the meaning of the slope and the y intercept of the regression line and to predict

values of y for given values of x. The prediction error ̂ is also calculated.

For Sections 4.1 and 4.2, construct a scatterplot and calculate the correlation coefficient

and the regression equation for the same data set. One possible example is to use high school

GPA (x) and college GPA (y). This example can also be used for Chapter 13.

Section 4.3 Further Topics in Regression Analysis

The objectives for this section are for the student to calculate the sum of squares error

(SSE) and use the standard error of the estimate as a measure of a typical prediction error,

to describe how total variability, prediction error, and improvement are measured by the total

sum of squares (SST), the sum of squares error (SSE), and the sum of squares regression

(SSR), and to calculate and explain the meaning of the coefficient of determination as a

measure of the usefulness of the regression.

xiv

Note that there two ways of calculating SST. The first way is the usual method of sum-

ming the ( ̅) and the second way is by using the fact that SST ( ) . This

method underscores the fact that SST measures the variability in .

CHAPTER 5 Probability

Chapter 5 explains the tools of probability, which enable data analysts to quantify the

level of uncertainty in statistical inference.

Section 5.1 Introducing Probability

The objectives of this section are for the students to be able to understand the meaning

of an experiment, an outcome, an event, and a sample space, to describe the classical method

of assigning probability, and to explain the Law of Large Numbers and the relative frequency

method of assigning probability.

One example that can be used to explain the difference between the classical method

of assigning probability and the relative frequency method for assigning probability is to first

ask the students a question that uses the classical method of assigning probability, such as “If

you roll a 100-sided die, what is the probability of rolling a 72?” Most students will be able

to answer “1/100” even if they have never rolled a 100-sided die before. Now ask the stu-

dents a question that uses the relative frequency method of assigning probability, such as “If

you randomly select a student from this college or university, what is the probability that the

student is taking Elementary Statistics this semester?” The students will not be able to do

this unless they know how many students are enrolled in their school and how many of them

are taking Elementary Statistics this semester. The three types of probability are just three

different sources for the numbers used as probabilities.

To illustrate the Law of Large Numbers, have students do an experiment in class such

as flipping a coin or rolling a die. Have them divide up into small groups of 2 or 3 students

each and have each group either flip the coin or roll the die the same number of times, say 50

xv

or 100 times, recording the number of heads and tails or the number of 1’s, 2’s, 3’s, 4’s, 5’s,

or 6’s. Then have them pool all of the results together and calculate the relative frequency of

heads and tails or of 1’s, 2’s, 3’s, 4’s, 5’s, and 6’s. Then have them compare the class results

to the theoretical probabilities.

The Law of Large Numbers for Proportions applet allows students to explore the Law

of Large Numbers for themselves.

Real-life examples that use the Law of Large Numbers are the proportion of girls and

boys born and Punnett squares in biology.

Section 5.2 Combining Events

The objectives of this section are for the students to be able to understand how to com-

bine events using complement, union, and intersection, and to apply the Addition Rule to

events in general and to mutually exclusive events in particular.

Start with a simple example of union and intersection, such as Example 5.12 and Ex-

ample 5.13. Two-way tables are also great examples of using unions and intersections in cal-

culating probabilities. Example 5.14 is one example of a two-way table. Another possible

example is the following:

A college band had the following number of students in each section.

Woodwind Brass Percussion Total

Instruments Instruments Instruments

Male 13 25 8 46

Female 20 11 7 38

Total 33 36 15 84

If a student from this band is selected at random, find the probability that the

xvi

student is

(a) is female (Answer:

≈ 0.4524)

(b) plays a woodwind instrument (Answer:

≈ 0.3929)

(c) is female and plays a woodwind instrument (Answer:

≈ 0.2381)

(d) is female or plays a woodwind instrument (Answer:

0.6071)

(e) plays a brass or a percussion instrument. (Answer:

=

≈ 0.6071)

Section 5.3 Conditional Probability

The objectives of this section are for the students to be able to calculate conditional

probabilities, explain independent and dependent events, solve problems using the Multipli-

cation Rule, and recognize the difference between sampling with replacement and sampling

without replacement.

Two-way tables are also provide great examples of conditional probability. Example

5.17 is one example of a two-way table. Another example of a two-way table is the follow-

ing:

A college band had the following number of students in each section.

Woodwind Brass Percussion Total

Instruments Instruments Instruments

Male 13 25 8 46

Female 20 11 7 38

Total 33 36 15 84

xvii

If a student from this band is selected at random, find the probability that the

student is

(a) is female, given that the student plays a woodwind instrument (Answer:

≈

0.5263. Stress that all of the numbers come from the row labeled “Female”, so condi-

tional probability reduces the sample space.)

(b) plays a woodwind instrument, given that the student is female (Answer:

0.6061. Stress that all of the numbers come from the column labeled “Woodwind In-

struments”, so conditional probability reduces the sample space.)

This example can be used to demonstrate that P(A|B) ≠ P(B|A).

Section 5.4 Counting Methods

The objectives of this section are for the students to be able to apply the Multiplication

Rule for Counting to solve certain counting problems, use permutations and combinations to

solve certain counting problems, and compute probabilities using permutations.

One way to explain how to use the counting rule is to ask the students to first determine

how they would perform the task. Once this is determined, the numbers will usually fall into

place. For example, suppose you wanted to line up 6 people. Ask the students how many

ways can they select a first person in line. There are 6 ways. Have them select a first person

in line. Now ask the students how many ways are there to select a second person in line.

There are 5 ways to do this. Have them choose a second person in line. Now ask them how

many ways are there to choose a third person in line. Continue this process until all 6 of the

people are in line. Explain that there are 6·5·4·3·2·1 = 720 ways to line up 6 people.

The usual way of explaining the difference between permutations and combinations is

that order is important in permutations but not important in combinations. It also helps to

explain that in a permutation each object or person selected is doing something different or

specific whereas in a combination every object or every person is doing the same thing. For

xviii

example, calculating how many ways can a club with 10 members select a president, a vice

president, and a secretary is a permutation since each person selected is doing something dif-

ferent or specific. The first person selected is the president, the second person selected is the

vice president, and the third person selected is the secretary. However, calculating the num-

ber of ways that a club with 10 members can select a committee of three people to organize

the spring picnic is a combination. All 3 people selected are working on the same thing and it

does not matter who was picked first, who was picked second and who was picked third. It

only matters who was picked for the committee.

CHAPTER 6 Probability Distributions

In Chapter 6, students encounter random variables and probability distributions. With

these new tools, they can increase the efficiency of their decision making.

Section 6.1 Discrete Random Variables

The objectives of this section are for the students to be able to identify random varia-

bles, explain what a discrete probability distribution is and construct probability distribution

tables and graphs, and to calculate the mean, variance, and standard deviation of a discrete

random variable.

Describing the number of students in class today is a way to convey the meaning of a

discrete variable. For example, there may be either 20 or 21 students in class but nothing in

between (there are not 20

students in the class or 20.23 students in the class, for instance). A

continuous variable can be explained by using an example of distance. How far can someone

throw an eraser in the room? If someone throws an eraser towards the back wall of the room,

the eraser can land distance between 0 feet from the person to the distance from the person to

the back wall of the room. It is possible for someone to throw an eraser 10

feet or 8.645

feet. Another example of a continuous variable is time.

xix

Section 6.2 Binomial Probability Distributions

The objectives of this section are for the students to be able to explain what constitutes

a binomial experiment, compute probabilities using the binomial probability formula, find

probabilities using the binomial tables, and calculate and interpret the mean, variance, and

standard deviation of the binomial random variable.

Section 6.3 Poisson Probability Distributions

The objectives of this section are for the student to be able to explain the requirements

for the Poisson probability distribution, compute probabilities for a Poisson random variable,

calculate the mean, variance, and standard deviation of a Poisson random variable, and use

the Poisson distribution to approximate the binomial distribution. The Poisson probability

distribution is used when we wish to find the probability of observing a certain number of

occurrences ( ) of a particular event within a fixed interval of space or time.

Section 6.4 Continuous Random Variables and the Normal Probability Distribution

The objectives of this section are for the students to be able to identify a continuous

probability distribution and state the requirements, calculate probabilities for the uniform

probability distribution,to explain the properties of the normal probability distribution, to find

areas under the standard normal curve, given a Z-value, find the standard normal Z-value,

given an area, and using normal probability plots to assess normality.

. The normal probability is sometimes referred to as the bell-shaped curve. It is also consid-

ered to be the most important probability distribution in the world.

Students sometimes have trouble understanding that a negative Z-value does not

correspond to a negative area. Explain that a negative Z-value just means that you are draw-

ing the line left of 0, so at least part of the area is from the left side of the distribution. For

example, if they were to cut a piece off of the left side of their desk, it would still be a posi-

tive amount of desk. Or if they had a cake shaped like the normal distribution or a calzone, if

xx

they were to cut the cake or the calzone on the left side they would still have a positive

amount of cake or calzone.

Section 6.5 Applications of the Normal Distribution

The objectives of this section are for the students to be able to compute probabilities

for a given value of any normal random variable, and to find the appropriate value of any

random variable, given an area or a probability.

Section 6.6 Normal Approximation to the Binomial Probability Distribution

The objective of this section is for the student to calculate binomial probabilities using

the normal approximation to the binomial distribution.

CHAPTER 7 Sampling Distributions

In Chapter 7, students are introduced to point estimation, sampling distributions, and

one of the most important results in statistical inference, the Central Limit Theorem.

Section 7.1 Central Limit Theorem for Means

The objectives of this section are for the students to be able to, to describe the sampling

distribution of ̅ for skewed and symmetric populations as the sample size increases, to ap-

ply the Central Limit Theorem for Means to solve probability questions about the sample

mean, find percentiles for the sample mean.

Section 7.2 Central Limit Theorem for Proportions

The objectives of this section are for the students to be able to explain the sampling

distribution of the sample proportion ̂, describe the sampling distribution of the sample pro-

portion ̂ for extreme and moderate values of ̂, and to apply the Central Limit Theorem for

Proportions to solve probability questions about the sample proportion.

xxi

CHAPTER 8 Confidence Intervals

In Chapter 7, we learned that point estimation cannot determine how close a point es-

timate is to its target parameter. There has to be a better way—and there is: confidence inter-

vals. By studying the patterns implicit in the sampling distribution of a statistic (such as the

sample mean or sample proportion), we can infer with a certain degree of confidence that the

associated population parameter lies within a certain interval.

Section 8.1 Z interval for the Mean

The objectives of this section are for the student to be able to calculate a point estimate

of the population mean, to calculate and interpret a Z interval for the population mean when

the population is normal and when the sample size is large, to reduce the margin of error, and

to calculate the sample size needed to estimate the population mean. The Confidence Interval

applet is used to demonstrate the concept that a confidence level of 90% means that if we

take sample after sample for a very long time, then in the long run, the proportion of intervals

that will contain the parameter will equal 90%. This applet is also used to demonstrate that

99% confidence intervals are wider than 90% confidence intervals. The Normal Density

Curve applet is used to find ⁄ critical values for several different confidence levels.

Section 8.2 t Interval for the Population Mean

The objectives for this section are for the students to be able to describe the characteris-

tics of the t distribution and calculate and interpret a t interval for the population mean.

Section 8.3 Z Interval for the Population Proportion

The objectives of this section are for the students to be able to calculate the point esti-

mate ̂ of the population proportion p, construct and interpret a Z interval for the population

proportion p, compute and interpret the margin of error for the Z interval for p, and determine

the sample size needed to estimate the population proportion.

xxii

Polls reported in the news are examples of confidence intervals for proportions. Na-

tional pollsters almost always use 95% as their confidence level and usually try to select the

sample size necessary to create a margin of error of about 3%.

Section 8.4 Confidence Intervals for the Population Variance and Standard Deviation

The objectives of this section are for the students to be able to describe the properties

of the (chi-square) distribution, find critical values for the (chi-square) distribution, and

construct confidence intervals for the population variance and standard deviation. The dis-

tribution is used in Section 9.6 for hypothesis tests about the population standard deviation

and in Chapter 11 for categorical data analysis.

CHAPTER 9 Hypothesis Testing

Hypothesis testing is the most widely used method for statistical inference in the

world. Hypothesis testing is a process for rendering a decision about the unknown value of a

population parameter.

Discuss how hypothesis testing is similar to a court case. Just as a person is innocent

until proven guilty, the null hypothesis is assumed true unless the sample evidence indicates

that the alternative hypothesis is true instead. Sometimes an innocent person is found guilty,

which is analogous to a Type I error. Sometimes the court fails to convict a guilty person,

which is analogous to making a Type II error.

Section 9.1 Introduction to Hypothesis Testing

The objectives of this section are for the students to be able to construct the null hy-

pothesis and the alternative hypothesis from the statement of the problem and state the two

types of errors made in hypothesis tests: the Type I error, made with probability , and the

Type II error, made with probability .

xxiii

Students have trouble constructing the null hypothesis and the alternative hypothesis

from the statement of the problem. The text gives 3 steps to follow in constructing the hy-

pothesis. Stress looking for key words or phrases and determining what the problem is ask-

ing to be proved.

Section 9.2 Z Test for the Population : Critical-Value Method

The objectives of this section are for the student to be able to explain the essential idea

about hypothesis testing for the population mean and to perform the Z test for the mean using

the critical-value method.

Section 9.3 Z Test for the Population Mean : p-Value Method

The objectives of this section are for the students to be able to perform the Z test for the

mean using the p-value method, to assess the strength of evidence against the null hypothesis,

describe the relationship between the p-value method and the critical-value method, and to

use the Z confidence interval for the mean to perform the two-tailed Z test for the mean.

Section 9.4 t Test for the Population Mean

The objectives of this section are for the students to be able to perform the t test for the

mean using the critical-value method, carry out the t test for the mean using the p-value

method, and use confidence intervals to perform two-tailed hypothesis tests. Instructors

should stress that, if the population standard deviation is unknown, it is wrong to use the

test.

Section 9.5 Z Test for the Population Proportion p

The objectives of this section are for the students to be able to perform the Z test for p

using the critical-value method, carry out the Z test for p using the p-value method, and use

confidence intervals for p to perform two-tailed hypothesis tests about p. The instructor

might want to mention that the p-value and p represent different quantities.

Section 9.6 Chi-Square Test for the Population Standard Deviation

xxiv

The objectives of this section are for the students to be able to perform the test for

using the critical-value method, carry out the test for using the p-value method, and to

use confidence intervals for to perform two-tailed hypothesis tests about .

Section 9.7 Probability of Type II Error and the Power of a Hypothesis Test

The objectives of this section are for the students to be able to calculate the probability of

a Type II error for a Z test for and to compute the power of a Z test for and construct a

power curve.

CHAPTER 10 Two-Sample Inference

In Chapter 10 we examine differences in the characteristics of two populations.

Section 10.1 Inference for Mean Difference—Dependent Samples

The objectives of this section are for the students to be able to distinguish between in-

dependent samples and dependent samples, to perform hypothesis tests for the population

mean difference for dependent samples, to construct and interpret confidence intervals for the

population mean difference for dependent samples, and to use a t interval for to perform t

tests about .

Dependent samples are frequently used for studies in which a comparison is made be-

tween a group before some treatment happens and after some treatment happens. For exam-

ple, you may want to compare running times of students on the track team at the beginning of

the season and at the end of the season. You could also compare swimming times of students

on the swimming team at the beginning and end of the season. If teachers give pretests and

post tests to measure student learning in a course, these could also be used as an example of

dependent samples.

Section 10.2 Inference for Two Independent Means

xxv

The objectives of this section are for the students to be able to perform and interpret t

tests about using Welch’s method, compute and interpret t intervals for us-

ing Welch’s method, to use confidence intervals for to perform two-tailed t tests

about , to perform and interpret t tests and t intervals about using the pooled

variance method, and to apply Z tests and Z intervals for when and are known.

Examples of independent samples could be to compare the amount of time studying per

week between men and women or to compare the amount of time studying per week between

freshmen, sophomores, juniors, and seniors using data gathered from the students in class.

Section 10.3 Inference for Two Independent Proportions

The objectives of this section are for the students to be able to perform and interpret Z

tests for , compute and interpret confidence intervals for , and use Z intervals

for to perform two-tailed Z tests.

Examples could be to compare the difference between the proportion of men and wom-

en in class who are education majors or to compare the proportion of men and women in

class who are engineering majors.

Section 10.4 Inference for Two Independent Standard Deviations

The objectives of this section are for the students to be able to describe the F distribu-

tion and the F test for two population standard deviations, to perform hypothesis tests for two

population standard deviations using the critical-value method, and to perform hypothesis

tests for two population standard deviations using the p – value method.

CHAPTER 11 Categorical Data Analysis

Chapter 11 introduces the multinomial random variable, an extension of the binomial

random variable. Students learn methods for performing hypothesis tests for this type of data

using the distribution.

xxvi

One possible example is to collect data from the class to determine if there is a differ-

ence in majors between men and women.

Section 11.1 Goodness of Fit Test

The objectives of this section are for the students to be able to explain what a multino-

mial random variable is and how to calculate expected frequencies, describe how a good-

ness of fit test works, and perform and interpret the results from the goodness of fit test

using the critical value method, the exact p-value method, and the estimated p-value method.

Section 11.2 Tests for Independence and for Homogeneity of Proportions

The objectives of this section are for the students to be able to explain what a test

for the independence of two variables is, perform and interpret a test for independence of

two variables using the critical value method and the p-value method, and perform and inter-

pret the results of a test for the homogeneity of proportions.

CHAPTER 12 Analysis of Variance

Chapters 12 and 13 introduce you to perhaps the most powerful and widespread statis-

tical procedures in the world. In this chapter, we compare the means of several populations

to determine whether significant differences exist. This chapter introduces us to analysis of

variance, a way to compare the population means of several different groups and determine

whether significant differences exist between these means.

One possible example is to collect data from the class to see whether different frater-

nities and sororities have different GPAs using analysis of variance. Other similar examples

could be to see whether men and women athletes have different GPAs or whether athletes on

different sports teams have different GPAs.

Section 12.1 One-Way Analysis of Variance (ANOVA)

xxvii

The objectives of this section are for the students to be able to explain how analysis of

variance works and how to perform one-way analysis of variance.

Section 12.2 Multiple Comparisons

The objectives of this section are for the students to be able to perform multiple com-

parison tests using the Bonferroni method, to use Tukey’s test to perform multiple compari-

sons, and to use confidence intervals to perform multiple comparisons for Tukey’s test.

Section 12.3 Randomized Block Design

The objective of this section is to explain the power of the randomized block design

and to perform a randomized block design ANOVA. You could demonstrate randomized

block design by using test scores and blocking by year in school or whether or not the stu-

dents have had calculus.

Section 12.4 Two-Way ANOVA

The objectives of this section are to construct and interpret an interactive graph and to

perform a two-way ANOVA.

CHAPTER 13 Regression Analysis

In this chapter, we learn about regression analysis. Regression analysis develops an

equation that can describe the relationship between two quantitative variables, often for the

purposes of prediction. We were introduced to some of the methods for investigating the re-

lationship between two quantitative variables in Chapter 4. It will be helpful to review these

now as a preparation for this chapter.

The example of high school GPA (x) and college GPA (y) used in Chapteodelr 4 can

also be used for this chapter. All the sums of squares can be calculated for this example as

well as the confidence interval for the slope of the regression line. The t test for the slope of

the regression line may also be performed on this example.

Section 13.1 Inference About the Slope of the Regression Line

xxviii

The objectives of this section are for the students to be able to explain the regression

model and the regression model assumptions, to perform the hypothesis test for the slope

of the population regression equation, to construct confidence intervals for the slope , and

to use confidence intervals to perform the hypothesis test for the slope .

Section 13.2 Confidence Intervals and Prediction Intervals

The objectives of this section are for the students to be able to construct confidence in-

tervals for the mean value of for a given value of and to construct prediction intervals for

a randomly chosen value of for a given value of .

Section 13.3 Multiple Regression

The objectives of this section are for the students to be able to find the multiple regres-

sion equation, interpret the multiple regression coefficients, and use the multiple regression

equation to make predictions, to calculate and interpret the adjusted coefficient of determina-

tion, to perform the test for the overall significance of the multiple regression, to conduct

tests for the significance of individual predictor variables, to explain the use and effect of

dummy variables in multiple regression, and to apply the strategy for building a multiple re-

gression model.

CHAPTER 14 Nonparametric Statistics

Chapter 14 introduces nonparametric (distribution-free) statistics.

Section 14.1 Introduction to Nonparametric Statistics

The objectives of this section are for the student to be able to explain what a nonparamet-

ric hypothesis test is, and why we use it and to describe what is meant by the efficiency of a

nonparametric test.

Section 14.2 Sign Test

xxix

The objectives of this section are for the students to be able to perform the sign test for a

single population median, to carry out the sign test for matched-pair data from two dependent

samples, and to perform the sign test for binomial data.

Section 14.3 Wilcoxon Signed Ranks Test for Matched-Pair Data

The objectives for this section are for the students to be able to assess whether or not the

data is symmetric, to carry out the Wilcoxon signed ranks test for matched-pair data from two

dependent samples, and to perform the Wilcoxon signed ranks test for a single population

median. A possible example is that you could record a local station’s predicted high tempera-

ture and the actual high temperature for 2 weeks and test whether there is a difference.

Section 14.4 Wilcoxon Rank Sum Test for Two Independent Samples

The objective of this section is to perform the Wilcoxon rank sum test for the difference

in population medians, using two independent samples.

Section 14.5 Kruskal-Wallis Test

The objective of this section is for the student to be able to perform the Kruskal-Wallis

test for equal medians in three or more populations.

Section 14.6 Rank Correlation Test

The objective of this section is for the students to be able to perform the rank correlation

test for paired data. A possible example is that you could test whether there is a correlation

between the students’ individual grades and the number of absences a student has.

Section 14.7 Runs Test for Randomness

The objective of this section is for the students to be able to perform the runs test for ran-

domness. A possible example is to flip a coin 20 times and record the result of each flip.

Test for randomness. You could also record the gender of the next 20 or 30 people that enter

the school’s library, bookstore, or cafeteria and test for randomness.

1

Chapter 1: The Nature of Statistics

Section 1.1

1. The steepest negative slope in the graph is between the years 1994 and 1995.

2. The murder rate does not always go down year by year. The murder rate increased

between 1992 and 1993, between 2002 and 2003, between 2005 and 2006, between 2007 and

2008, and between 2009 and 2010.

3.

(a) About 36,000,000. Actual answer: 36,132,147.

(b) About 7600. Actual answer: 7594.

4.

(a) About 23,000,000. Actual answer: 22,859,968. (b) About 900. Actual answer: 873.

5.

About 5400. Actual answer: 5428.

Instructor’s Guide with Solutions 2

6.

About 3500. Actual answer: 3493.

7. The Eiler fire is the largest. The Junction fire is the smallest.

8. The Eiler fire is the most contained. The Happy Camp Complex fire is the least

contained.

Section 1.2

1. Statistics is the art and science of collecting, analyzing, presenting, and interpreting

data.

2. We call the entities from which the data are collected elements.

3. A qualitative variable is a variable that does not assume a numerical value, but is

usually classified into categories. A quantitative variable is a variable that takes on numerical

values.

4. Another term for a qualitative variable is a categorical variable.

5. True.

6. A population is the collection of all elements (persons, items, or data) of interest in a

particular study. A sample is a subset of the population from which the information is

collected.

7. A statistic is a characteristic of a sample.

8. A parameter is a characteristic of a population. The value of a parameter is constant

but usually unknown. A statistic is a characteristic of a sample. The value of a statistic may

vary from sample to sample but is usually known.

9. A census is the collection of data from every element in the population.

10. False. Statistical inference consists of methods for drawing conclusions about

population characteristics based on the information contained in a subset (sample) of that

population.

11. The elements are the teams: Dragonborn, Sprites, Enchanters, Trolls

12. The variables are Captain’s gender, Wins, Rank, and Winning percentage.

13. (a) Captain’s gender can take the values male or female.

(b) The observation for the Sprites is captain’s gender is female, 9 wins, rank is 2, and the

winning percentage is 0.600.

14. Quantitative variables: Wins, Winning percentage


Qualitative variables: Captain’s gender, Rank

15. Since the number of wins is counted, the variable Wins is discrete:

Since the winning percentage can be any number between 0 and 1 inclusive the variable Win-

ning percentage is continuous.

16. The variable Captain’s gender is qualitative and it takes the values male and female.

There is no natural ordering for these values. Therefore the variable Captain’s gender is

nominal.

The values that the variable Wins takes are 0, 1, 2… These are numerical. There is a natural

0 since a team can have 0 wins. Division can be performed on these values. For ex-ample, a

team with 10 wins has 10/5 = 2 times as many wins as a team with 5 wins. There-fore, the

variable Wins is ratio.

The values that the variable Rank takes are 1, 2, 3… These are numerical, so there is a natu-

ral ordering to them. However, there is no natural 0 and arithmetic does not make sense.

Therefore, the variable Rank is ordinal.

The variable Winning percentage takes any number between 0 and 1, inclusive. There is a

natural 0 since a team with 0 wins has a winning percentage of 0. Division can be per-

formed on the values of winning percent. For example, a team with a winning percent of

0.800 has a winning percent that is 0.800/0.200 = 4 times the winning percent as a team with

a winning percent of 0.200. Therefore the variable Winning percentage is ratio.

17. The elements are the players: Miguel Cabrera, Michael Cuddyer, Joe Mauer, Michael

Trout, Chris Johnson

18. The variables are Team, Batting average, Hits, Rank, and Year of birth.

19. (a) The variable Team can take the values Detroit Tigers, Colorado Rockies, Minne-

sota Twins, Los Angeles Angels, and Atlanta Braves.

(b) The observation for Miguel Cabrera is his team is the Detroit Tigers, his batting aver-

age is 0.348, the number of hits is 193, his rank is 1, and his year of birth is 1983.

20. Quantitative: Batting average, Hits, Year of birth

Qualitative: Team, Rank

21. The number of hits can be counted. Therefore, the variable Hits is discrete.

A player’s year of birth can either be 1979 or 1980 or 1981, etc., and nothing in between.

Therefore, the variable Year of birth is discrete.

A player’s batting average can take any value between 0 and 1, inclusive. Therefore, the var-

iable Batting average is continuous.

22. The variable Team takes is qualitative. There is no natural ordering of the teams.

Therefore, the variable Team is nominal.

The variable Batting average takes values between 0 and 1, inclusive. There is a natural 0

and division is possible. For example, a batting average of 0.300 is 0.300/0.100 = 3 times the

batting average of a player with a 0.100 batting average. Therefore, the variable Batting av-

erage is ratio.

The variable Hits takes values 0, 1, 2, ….. There is a natural 0 since it is possible for a player

to have 0 hits and division is possible. For example, 20 hits is 20/4 = 5 times more hits than 4

hits. Therefore, the variable Hits is ratio.


The values that the variable Rank can have are 1, 2, 3… These are numerical, so there is a

natural ordering to them. However, there is no natural 0 and arithmetic does not make sense.

Therefore, the variable Rank is ordinal.

The variable Year of birth is numerical so it can be ordered or ranked. Subtraction makes

sense. For example, 2009–1979 = 30 years. However there is no natural 0 and division does

not make sense. Therefore the variable Year of birth is interval.

23. The elements are the schools: University of Phoenix, Devry University, ITT Tech-

nical Institute, Penn State University, Kaplan University

24. The variables are State, School type, Recipients, and Total loan amount ($ millions).

25. (a) The variable School type can take the values proprietary and public.

(b) The observation for Penn State University is it is in the state of PA, its school type is

public, it had 42,011 federal student loan recipients in the 2013–2014 academic year, and it

had a total federal student loan amount $151 million in the 2013–014 academic year.

26. Quantitative variables: Recipients, Total loan amount ($ millions)

Qualitative variables: State, School type

27. The number of recipients is counted, so the variable Recipients is discrete. The total

amount of federal student loans for the school is counted to the nearest million dollars, so the

variable Total loan amount ($ millions) is discrete.

28. The variable State is qualitative. There is no natural order for the values that State

takes. Therefore, the variable State is nominal.

The variable School type is qualitative. There is no natural order for the values that School

type takes. Therefore, the variable School type is nominal.

The variable Recipients takes values that are numerical. There is a natural 0 and division is

possible. For example, a school with 800 recipients has 800/200 = 4 times as many recipients

of federal student loans as a school with 200 recipients. Therefore the variable Recipients is

ratio.

The variable Total loan amount ($ millions) takes values that are numerical. There is a natu-

ral 0 and division is possible. For example, a school with $300 million in total loans has

$300 million/$100 million = 3 times the total loan amount of federal student loans as a school

with $100 million in total loans. Therefore the variable Total loan amount ($ millions) is ra-

tio.

29. (a) The values for the variable year you were born are numbers that can be ranked or

ordered, so the variable is quantitative. Since there are a finite number of years in which you

could have been born, the variable is discrete.

(b) There is no natural zero. Division (2000/1990) does not make sense. However,

subtraction does make sense. For example, someone born in 1993 is 3 years younger than

someone born in 1990 (1993–1990 = 3). Therefore, the variable year you were born

represents interval data.

30. (a) Since the only possible values of the variable are yes or no, the variable is

qualitative.

(b) Since there is no natural ordering for the values yes and no, the data represent nominal

data.

31. (a) Quantitative. Since the price of tea in China is rounded to the nearest whole unit


of currency, it is discrete.

(b) The price of tea in China represents ratio data. There is a natural zero ($0.00 per pound or

$0.00 per box). Here, division does make sense. That is, tea that costs $10.00 per pound costs

twice as much as tea that costs $5.00 per pound.

32. (a) Quantitative. Since the SAT Math scores are whole numbers, the variable SAT

Math score is discrete.

(b) The SAT Math score of the person sitting next to you represents interval data. Since the

lowest possible score is 200, there is no natural zero. Also, division does not make sense

because a score of 400 does not mean that the person had twice as many correct answers as a

person with a score of 200. Hence, the data are not ratio data. However, subtraction does

make sense, because a score of 400 is 400 – 200 = 200 more points than a score of 200.

33. (a) Quantitative. Since the winning score is a whole number, the winning score in

next year’s Super Bowl is discrete.

(b) The winning score in next year’s Super Bowl represents ratio data. There is a natural zero

because it is possible for a team to score 0 points. Here division makes sense because a score

of 28 points represents twice as many points as a score of 14 points.

34. (a) Qualitative.

(b) The winning team in next year’s Super Bowl represents nominal data because there is no

natural or obvious way that the data may be ordered. Also, no arithmetic can be carried out

on the winning team in next year’s Super Bowl.


(b) The rank of the winning Super Bowl team in their division represents ordinal data

because the ranks may be arranged in a particular order and no arithmetic may be performed

on them.

36. (a) Quantitative. Since the number of friends on a student’s Facebook page is a whole

number, the number of friends on a student’s Facebook page is discrete.

(b) The number of friends on a student’s Facebook page represents ratio data. Since it is

possible for a student to have 0 friends on their Facebook page, there is a natural zero. Here,

division makes sense because a person with 12 friends on their Facebook page has twice as

many friends on their Facebook page as a person with 6 friends on their Facebook page.

37. (a) Since the possible values of the variable are not numeric but names such as

“Bones” or “The Big Bang Theory,” the variable is qualitative.

(b) Since there is no natural ordering of the names of television shows the variable represents

nominal data.

38. (a) Quantitative. Since the number of contacts you have on your cell phone is a

whole number, the number of contacts you have on your cell phone is discrete.

(b) How many contacts you have on your cell phone represents ratio data. Since it is possible

for someone to have 0 contacts on their cell phone, there is a natural zero. Here division

makes sense because a person with 20 contacts on his or her cell phone has twice as many

contacts as a person with 10 contacts on that person’s cell phone.

39. (a) Since the possible values of the variable are not numeric, but rather ice cream

flavors such as “rocky road” or “strawberry,” the variable is qualitative.

(b) Since there is no natural ordering of ice cream flavors, the data represent nominal data.


40. (a) Quantitative. Since your credit card balance is given in dollars and cents, your

credit card balance is discrete.

(b) Your credit card balance represents ratio data. Since a person may have a credit card

balance of $0.00, there is a natural zero. Since a credit card balance of $2000 is twice as

much as a credit card balance of $1000, division makes sense here.

41. (a) Quantitative. Since how old your car is can be any real number greater than or

equal to 0, how old your car is continuous.

(b) How old your car is represents ratio data. Since a car that was just purchased is 0 years

old, there is a natural zero. A car that is 6 years old is twice as old as a car that is 3 years old,

so division makes sense in this case.


(b) The model of your car represents nominal data because there is no natural or obvious way

that the data may be ordered. Also, no arithmetic can be carried out on the models of cars.

43. The 4 teams listed in Table 6 are all of the teams in the intramural league. Therefore,

the 4 teams listed in Table 6 represent a population.

44. There are more than 5 Major League baseball players. Therefore, the 5 Major League

baseball players listed in Table 7 represent a sample.

45. There are more than 5 universities in the United States. Therefore, the 5 universities

in Table 8 represent a sample.

46. Since the data in Table 6 represents a population, the team with the most wins in the

league is a parameter.

47. Since the data in Table 7 represent a sample, the oldest player is a statistic.

48. Since the data in Table 8 represent a sample, the result 4 out of 5 (80%) of the

universities are proprietary is a statistic.

49. Since the data in Table 6 represents a population, descriptive statistics is indicated.

50. The data in Table 7 represents a sample. The average number of hits of these 5

players is a statistic. Since this statistic is used to infer that the average number of hits of all

players in the league is the same as the average number of hits as these 5 players, statistical

inference is indicated.

51. The data in Table 8 represent a sample. The result 4 out of 5 (80%) of the universities

are proprietary is a statistic. Since this statistic is used to infer that 80% of all universities are

proprietary, statistical inference is indicated.

52. The population is all home sales in Tarrant County, Texas. The sample is the 100

home sales selected.

53. The population is all veterans returning from war. The sample is the 20 veterans

selected.

54. Population: all 4-H clubs in Maricopa County, Arizona. Sample: 10 selected 4-H

clubs.

55. The population is all older women. The sample is the 10 patients of the physical

therapist that she selected.

56. The population is all students at Portland Community College. The sample is the 50

Portland Community College students that were selected.


57. The population is all companies that recently underwent a merger. The sample is the

50 companies that recently underwent a merger that were selected.

58. Descriptive statistics. The average price of homes sold in Jacksonville, Florida is a

descriptive statistic because it describes a sample. But no inference is made regarding a larger

population.

59. Statistical inference. A sample of automobile passengers was taken, and the sample

proportion of automobile passengers who wear seat belts was calculated. Then this sample

proportion was used to make an inference about what percentage of automobile passengers

wear seatbelts.

60. Statistical inference. A sample was taken, and the sample average percentage of

people in which the cholesterol level was lowered by daily exercise was calculated. Then this

percentage was used to make an inference about how much daily exercise can lower

everyone’s cholesterol level.

61. Descriptive statistics. The proportion of traffic fatalities in New York that involved

alcohol is a descriptive statistic because it describes a sample. But no inference is made

regarding a larger population.

62. Descriptive statistics. The goal-against average for the Charlestown Chiefs hockey

team is a descriptive statistic because it describes the sample. But no inference was made

regarding a larger population.

63. Statistical inference. A sample of 15- to 18-year-olds was taken, and the sample

percentage of 15- to 18-year-olds who use illicit drugs was calculated. Then this percentage

was used to make an inference about the percentage of 15- to 18-year-olds who use illicit

drugs.

64. Descriptive statistics. The average on the first statistics test in Ms. Reynolds’s class is

a descriptive statistic because it describes a sample. But no inference was made regarding a

larger population.

65. (a) Elements: Endangered species—pygmy rabbit, Florida panther, Red wolf, and

West Indian manatee; variables—year listed as endangered, estimated number remaining,

and range.

(b) Qualitative variables: Since the values of the variable range are not numerical, range is a

qualitative variable. Quantitative variables: Since the values of the variables year listed as

endangered and estimated number remaining are numerical and can be ranked or ordered, the

variables year listed as endangered and estimated number remaining are quantitative

variables.

(c) Since there are a finite number of values for the variable year listed as endangered, the

variable is discrete. Since the values for the variable estimated number remaining can be

counted, the variable is discrete.

(d) There is no natural zero for the variable year listed as endangered. Division of the values

of the variable year listed as endangered is not possible. However subtraction is possible. For

example, a species listed as endangered in 1995 was listed as endangered 7 years before a

species listed as endangered in 2002 (2002 – 1995 = 7). Therefore, the variable year listed as

endangered represents interval data. The variable estimated number remaining has a natural

zero. Division is possible on the values for the variable estimated number remaining. For

example, a species with 15 remaining has 2.5 times as many members left as a species with 6

remaining (15/6 = 2.5). Therefore, the variable estimated number remaining represents ratio

data. There is no natural order for the values of the variable range. Therefore, the variable


range represents nominal data.

(e) 1973, 50, Florida.

66. (a) Elements: Companies—City of Santa Monica, St. John’s Health Center, The

Macerich Company, Fremont General Corp., and Entravision Corp.; variables—employees

and industry.

(b) Qualitative variables: Since the values for the variable industry are not numerical, the

variable industry is qualitative. Quantitative variables: Since the values for the variable

employees are numerical and can be ordered or ranked, the variable employees is quantitative.

(c) Since the values of the variable employees can be counted, the variable is discrete.

(d) The variable employees has a natural zero. Division is possible on the values of the

variable employees. For example, an industry with 6 employees has 2/3 as many employees

as an industry with 9 employees (6/9 = 2/3). Therefore, the variable employees represents

ratio data. There is no natural ordering for the values of the variable industry. Therefore, the

variable industry represents nominal data.

(e) 1892, government.

67. (a) Elements: States—Texas, Missouri, Minnesota, Ohio, and South Dakota;

variables—proportion of GE corn and most prevalent type.

(b) Qualitative variables: Since the values of the variable most prevalent type are not

numerical, the variable most prevalent type is qualitative. Quantitative variables: Since the

values of the variable proportion of GE corn are numerical and can be ranked or ordered, the

variable proportion of GE corn is quantitative.

(c) Since the variable proportion of GE corn can take on any value between 0 and 1 inclusive,

the variable proportion of GE corn is continuous.

(d) There is a natural zero for the variable proportion of GE corn. Division is also possible

for the values of the variable proportion of GE corn. For example, a state with 16% GE corn

has 2 times the proportion of GE corn as a state with 8% GE corn (16/8 = 2). Therefore, the

variable proportion of GE corn represents ratio data. Since there is no natural order for the

values of the variable most prevalent type, the variable most prevalent type represents

nominal data.

(e) 89%, herbicide-tolerant.

68. (a) Elements: Hospital names—Hardy Wilson, Humphreys County, Jefferson County,

Lackey Memorial, Leake Memorial, Madison County, Monfort Jones, and Rankin Medical

Center; variables—beds, city, and ZIP.

(b) Qualitative variables: Since the values of the variable city are not numerical, city is a

qualitative variable. The values of the variable zip are numerical, but the numbers represent

areas of the country. Therefore, the variable zip is qualitative. Quantitative variables: Since

the values of the variable beds are numbers that can be ranked or ordered, the variable beds is

quantitative.

(c) Since the values of the variable beds are values that were counted, the variable beds is

discrete.

(d) The values of the variable beds have a natural zero. Division is possible on the values of

the variable beds. For example, a hospital with 15 beds has half as many beds as a hospital

with 30 beds (15/30 = 1/2). Therefore, the variable beds represents ratio data. There is no

natural order for the values of the variables city and zip. Therefore, the variables city and zip


represent nominal data.

(e) 134, Brandon, 39042.

69. (a) Elements: Hospitals—Briarcliff Manor, Buchanan, Cortlandt, Croton-on-Hudson,

Mount Pleasant, Ossining 1, Ossining 2, Peekskill, Pleasantville, and Sleepy Hollow;

variables—births and average maternal age.

(b) Qualitative variables: There are no qualitative variables. Quantitative variables: The

values of the variables births and average maternal age are numbers that can be ranked or

ordered, so both of these variables are quantitative.

(c) The values of the variable births represent values that were counted so the variable births

is discrete. The values of the variable average maternal age are calculated from data that

were measured and can be any real number between the youngest mother and the oldest

mother, so the variable average maternal age is continuous.

(d) There is a natural zero for both the births and average maternal age variables. Division is

possible on the values of both the births and average maternal age variables. For example, a

hospital with 80 births had 3.2 times as many births as a hospital that had 25 births (80/25 =

3.2), and a hospital with an average maternal age of 33 has an average maternal age of 1.1

times the average maternal age of a hospital with an average maternal age of 30 (33/30 =

1.1). Therefore, the variables births and average maternal age represent ratio data.

(e) 134, 29.2.

70. (a) Elements: Commodities—oil, gold, and wheat; variables—price per share and

percent change.

(b) Qualitative variables: There are no qualitative variables. Quantitative variables: The

values of the variables price per share and percent change are numbers that can be ranked or

ordered so both variables are quantitative.

(c) The values of the variable price per share can be any nonnegative real number and the

values of the variable percent change can be any real number. Therefore, both of these

variables are continuous.

(d) Both of the variables price per share and percent change have a natural zero. Division is

possible for the values of both variables. For example, a price per share of $120 is 3 times a

price per share of $40 (120/40 = 3), and a percent change of +1.05 is 0.875 of a percent

change of +1.20 (1.05/1.20 = 0.875). Therefore the variables price per share and percent

change represent ratio data.

(e) $1243.62, − 0.110%.

71. (a) Elements are the tornado names: Tri-State, Natchez, St. Louis, Tupelo, Gaines-

ville. Variables: deaths, year

(b) Qualitative variables: There are no qualitative variables. Quantitative variables: Since the

values of the variables deaths and year are numbers that can be ranked or ordered, the

variables deaths and year are quantitative.

(c) Since the values of the variable deaths are values that are counted, the variable deaths is

discrete. Since the values of the variable year are whole numbers with no numbers in

between, the variable year is discrete.

(d) Since it is possible to have 0 tornado deaths in a year, there is a natural 0 for the variable

deaths. Division is possible on the values of the variable deaths. For example, a year with 35

tornado deaths has 7 times as many tornado deaths as a year with 5 tornado deaths (35/5 = 7).


Therefore the variable deaths represents ratio data. There is an order for the values of the

variable year. Subtraction can be performed on the values of the variable year. However

there is no natural 0 for the variable year and division cannot be performed on the values of

the variable year. Therefore, the variable year represents interval data.

(e) 255, 1896

72. (a) Sample.

(b) No; these companies are relatively large and there are probably many more small

companies than large companies.

73. (a) Sample.

(b) This sample could not be considered a random sample of the annual number of tornado

deaths of all years. The 5 years selected were not selected randomly, but were selected

according to which 5 years had the most tornado deaths.

74. They tested a sample of their own light bulbs, found the average lifetime of the

sample, compared it to the average lifetimes of other current models of light bulbs, and found

the average lifetime of their sample to be longer than the reported average lifetimes of other

current models of light bulbs.

75. (a) This is a statistic because it came from a sample.

(b) An estimate of the average lifetime of all new light bulbs is the average lifetime of the

sample of 100 light bulbs, which is 2000 hours. Thus the company can claim that “The

average lifetime of this new model of light bulb is 2000 hours.”

76. (a) The elements are the institutions: Ashford University, Arizona State University,

Liberty University, Miami Dade College, Lone Star College System

(b) State, enrollment, and rank.

(c) The values of the variable state are not numbers, so the variable state is qualitative. The

variable rank is also qualitative, even though its values are numerical. The values of the

variable rank are not counting anything or measuring anything.

(d) The values of the variables enrollment are numbers that can be ranked or ordered, so the

variable enrollment is quantitative.

(e) There is no natural order for the values of the variable state, so the values of the variable

state represent nominal data. Since it is possible to have an enrollment of 0, there is a natural

zero for the variable enrollment. Division is possible on the values of the variable enrollment.

For example, a university with an enrollment of 20,000 students has 20,000/30,000 = 2/3 of

the enrollment of a university with an enrollment of 30,000. Therefore the variable

enrollment represents ratio data. The values of the variable rank can be ordered, but no

arithmetic can be performed on them. Therefore, the variable rank represents ordinal data.

77. (a) Sample.

(b) No, they are the five university campuses with the largest enrollment.

(c) Arizona State University is located in Arizona and in 2014 it had 72,254 students making

it the university campus with the second-highest enrollment.

78. The qualitative variables are platform, studio, and type.

79. The quantitative variables are sales for week, sales total, and weeks on list.


80. The number of weeks that a video game is on the top 30 list is counted. Therefore,

the variable Weeks on list is discrete.

81. The list in Table 3 represents a sample. Only the 30 best-selling video games are in-

cluded.

82. Since the list in Table 3 represents a sample, the number for highest sales for the week

represents a statistic.

83. Since the values for the variables platform, studio, and type are qualitative and there is

no natural ordering for the values of these variables, they are nominal.

84. No.

85. The variables sales for week, sales total, and weeks on list have values that can be di-

vided and have natural zeros Therefore, the variables sales for week, sales total, and weeks

on list are ratio.

86. No. The variables platform, studio, and type are qualitative and there is no natural

ordering for their values. Therefore, they are nominal. The variables sales for week, sales

total, and weeks on list take values that can be divided and have natural zeros. Therefore,

they are ratio.

87. Descriptive statistics. No attempt was made to use the fact that the Xbox 360 version

of Grand Theft Auto V outsold the PS3 version of the game during the week of May 17, 2014

to predict that the Xbox 360 version of Grand Theft Auto V will outsell the PS3 version of the

game during any week after the week of May 17, 2014.

88. Statistical inference. We are using the fact that the Xbox 360 version of Grand Theft

Auto V outsold the PS3 version of the game during the week of May 17, 2014 to predict that

the Xbox 360 version of Grand Theft Auto V will outsell the PS3 version of the game during

the next week after the week of May 17, 2014.

Section 1.3

1. Convenience sampling usually includes only a select group of people. For example,

surveying people at a mall on a workday during working hours would probably include few if

any people who work full time.

2. The Literary Digest poll exhibited selection bias. The Literary Digest used lists of

people who owned cars and had telephones, which resulted in the exclusion of millions of

poor and underprivileged people who largely supported Roosevelt. The sample was therefore

highly biased toward the richer people who were more likely to support Alf Landon. Thus,

the results of their poll incorrectly indicated that Alf Landon would win.

3. The Literary Digest could have decreased the bias in their poll by choosing a random

sample of houses and apartments and surveying the people door to door. They would have

been more likely to include people who were poor or underprivileged by using this method

and thus their sample would have been more representative of the population.

4. No, the Literary Digest poll was not a random sample. Only people who had a phone

or a car were sampled. People who did not have a phone or a car had no chance of being

sampled.

5. A random sample is a sample for which every element has an equal chance of being

included.

6. In an observational study, the researcher observes whether the subjects’ differences in


the predictor variable are associated with differences in the response variable. No attempt is

made to create differences in the predictor variable. In an experimental study, researchers

investigate how varying the predictor variable affects the response variable. Subjects are

randomly placed into treatment and control groups.

7. Answers will vary.




11. Illinois, Iowa, Michigan State, Nebraska, Ohio State, Purdue

12. Arkansas, Georgia, Mississippi, South Carolina, Vanderbilt

13. Alabama, Georgia, Mississippi State, Texas A&M

14. California, Oregon State, USC, Washington State







21. No. The sample would likely not be a representative sample of the Southeastern

Conference or of all college football teams. This sample will likely not contain at least one of

the best teams, at least one of the worst teams, and at least one team in the middle of either

the Southeastern Conference or college football.

22. No. The 2 colleges in the Pacific 12 Conference in the state of Washington would

likely not be a representative sample of the Pacific 12 Conference or of all college football

teams. Since there are only 2 teams in the sample, it will not contain at least one of the best

teams, at least one of the worst teams, and at least one of the teams in the middle of either the

Pacific 12 Conference or college football. Also, it is hard to get a representative sample with

only 2 teams.

23. This is cluster sampling because (a) the population was divided into clusters (class

ranks), (b) a random sample of the clusters (class ranks) was taken, and (c) all of the students

in that class rank (cluster) were selected.

24. Systematic sampling is represented.

25. This is convenience sampling, since you are choosing a sample that is convenient to

you.

26. This is cluster sampling because (a) the population was divided into clusters (lab

sections), (b) a random sample of the clusters (lab sections) was taken, and (c) all of the

students in those lab sections (clusters) were selected.

27. Target population: all college students; potential population: all students working out

at the gymnasium on the Monday night Brandon was there.

28. Yes; students working out at the gymnasium are more likely to be physically fit than

the rest of the students.


29. Target population: all small businesses; potential population: small businesses near

the state university.

30. Yes; businesses near the state university are more likely to employ college students

than businesses farther away.

31. What is meant by “sometimes”? This is vague terminology.

32. This is a leading question, which is clearly trying to influence the respondent’s

answer.

33. This question would only be understood by someone who knows about graduated

income taxes, and is neither simple nor clear.

34. This is asking two questions in one. It is possible that respondents support one, the

other, or both of these issues.

35. (a) Observational.

(b) Response variable: how often they attend religious services; predictor variable: whether

or not the family is large (at least four children).

36. (a) Observational.

(b) Response variable: stock price; predictor variable: whether or not the company gives

large bonuses to its CEOs.

37. (a) Experimental.

(b) Response variable: performance of the electronics equipment; predictor variable: whether

or not a piece of equipment has a new computer processor.

38. (a) Experimental.

(b) Response variable: whether or not the person’s blood pressure is lowered; predictor

variable: whether or not the person is taking the new drug.

39. Level of insect damage to crops.

40. Whether or not the new pesticide was used.

41. The new pesticide.

42. The traditional pesticide.

43. LDL cholesterol level in the bloodstream.

44. Whether a person is given new medication or a placebo.

45. New medication.

46. Placebo.

47. (a) Randomization is present for the 100 randomly assigned subjects but not for the

subjects with high LDL cholesterol levels.

(b) The sample of 100 people is probably enough replication.

48. (a) Randomization is present

(b) Two subjects each is probably insufficient replication to uncover any strong statistical

results.

49. Experiment


50. Observational study

51. (a) Answers will vary

(b) No. Every possible sample of 5 video games has the same chance of being selected.

(c) No. Every possible sample of 5 video games has the same chance of being selected.

Some of the samples will contain the video game and some won’t.

(d) Answers will vary, answers will vary

52. Minecraft for PS3, Titanfall for Xbox One, Titanfall for Xbox 360, Super Luigi U for

Wii U, Battlefield 4 for Xbox 360, Battlefield 4 for PS3, Yoshi’s New Island for 3DS, Mario

Kart 7 for 3DS

53. Answers will vary

54. Answers will vary

55. The poll by Ann Landers was extremely biased. Only people who read Ann Landers’s

column and felt strongly about the poll responded to this poll. Further, there was no

mechanism to guard against people responding more than once or to keep people who don’t

have children from responding. The Newsday poll was done professionally; therefore, the

sample used was more likely to be representative of the population.

56. The target population is all high schools in New England, and the potential population

is all high schools in greater Boston. The potential for selection bias is that the sample is not a

random sample of all high schools in New England. The drop-out rate for all of New England

high school students may be different than the drop-out rate for those 15 high schools in

greater Boston.

57. The target population is all people living in Chicago, and the potential population is

people who have phones and who have their phone number listed in the Chicago phone

directory. The potential for selection bias is that many of the people living below the poverty

level in Chicago may not have phones. Also, many people may have unlisted numbers.

Further, the poverty level is determined by family size as well as income, and this survey

does not take that into consideration.

58. The question may be interpreted in more than one way. Some people might think that

the question is asking for a choice to be made between rap and hip-hop music. Others may

think the answer to the question is either yes or no. This is because it is actually two

questions in one.

59. The survey question is a leading question.

60. No, the researcher would not be justified in reporting “Two-thirds of women support

abortion.” The women responded to the question “Do you support the right of a woman to

terminate a pregnancy when her life is in danger?” and not the question “Do you support

abortion?” The women may have answered each question differently.

61. (a) No, we do not know what the lowest price in the sample will be before we select

the sample. Since the sample is randomly selected, we don’t know which stocks will be

selected before we select our sample. Different samples may contain different lowest stock

prices.

(b) Answers will vary.

(c) No, if we take another sample of size 2, it is not likely to comprise the same two

companies. Since the samples are randomly selected, they will probably contain different

companies.


(d) Answers will vary.

62. (a) No, we don’t know what the lowest price in the sample will be before we select

the sample. No, we don’t know whether our sample will be the same as in the previous

exercise. Since the sample is selected randomly, we don’t know which 2 companies will be

selected before we select the sample. Different samples may contain different lowest stock

prices.

(b) Answers will vary.

(c) Answers will vary.

63. A quantity like “the lowest price in a random sample of stocks” is a variable that may

vary from sample to sample.

64. The response variable is the risk for a second heart attack and the predictor variable is

whether the patient followed a Mediterranean diet or a Western diet.

65. (a) Forcing the parents of a treatment group to smoke tobacco would increase the

occurrence of respiratory illnesses in their children, which is not very ethical. (b)

Observational study.

66. It is unethical.

67. (a) The control is the placebo bracelet.

(b) The subjects were randomly assigned to wear either the placebo bracelet or the ionized

bracelet.

(c) There is replication of data since there are 305 subjects in both the treatment and the

control group.

68. (a) The predictor variable is whether the subject had the placebo bracelet or the

ionized bracelet.

(b) The treatment is wearing the ionized bracelet.

(c) The response variable is the measure of pain.

69. This study is an experimental study because the subjects were randomly assigned to

either a treatment or a control.

Chapter 1 Review Exercises

1. (a) Make/Models: Chevrolet, Corvette, Ferrari 458 Italia, Honda CR-Z, Jaguar F

Convertible, Porsche Boxster S

(b) Cylinders, transmission, combined mileage

2. (a) Transmission

(b) Cylinders, combined mileage

(c) The variable cylinders takes values that are numerical. There is a natural 0 and division

can be performed on these values. Therefore, the variable cylinders is ratio.

The variable transmission is qualitative. There is no natural ordering of the values of the var-

iable transmission. Therefore, the variable transmission is nominal.

The variable combined mileage takes values that are numerical. There is a natural 0 and divi-

sion can be performed on these values. Therefore, the variable combined mileage is ratio.


3. The observation for the Chevrolet Corvette is it has 8 cylinders, it has a manual

transmission, and its combined city/highway gas mileage is 21 mpg.

4. (a) Elements are the states: California, Texas, New York, Florida, Illinois

Variables: Population (1960, in 1000s), Population (2013, in 1000s), increase

(b) Quantitative

(c) The observation for Florida is its population in 1960 was 4,952 thousand, its population

in 2013 was 19,953 thousand, which is an increase of 14,601 from 1960.

(d) Largest: California, Texas, Florida. Smallest: New York, Illinois

5. (a) The only way to find out the population average lifetime of all one million light

bulbs in the inventory is to turn on all one million light bulbs and leave them all on until they

burn out, measuring the time it takes for each light bulb to burn out. All of these lifetimes can

then be used to calculate the population average lifetime of all one million light bulbs.

(b) This would require burning out all one million light bulbs that are in stock so that there

would be no good light bulbs left to sell. It would be better to take a random sample of the

light bulbs, find the average lifetime of the sample, and use the sample average lifetime of the

light bulbs to estimate the population average lifetime of the light bulbs.

6. (a) The population was all registered voters in the United States.

(b) All people on the lists of people who owned cars and had telephones.

(c) The sample was the people on the lists of people who owned cars and had telephones who

returned the ballots .

(d) The sample was not similar to the population in all characteristics. The sample had less

poor and underprivileged people than the population. It also had a smaller proportion of

Roosevelt supporters and a larger proportion of Alf Landon supporters than the population.

7. (a) You would use an observational study.

(b) Since people are already enrolled in their statistics classes, it would be impractical to

randomly reassign people to a statistics class after classes have started.

8. (a) The experimental factor violated is replication.

(b) The larger the sample size is, the more precise is the inference it produces. Surveying

only 4 dentists is not likely to get a sample representative of the population of all dentists.

9. We would use an observational study. It would be impossible to randomly assign a

child to come from a single-parent family or a two-parent family.

Chapter 1 Quiz

1. False. Statistical inference consists of methods for estimating and drawing

conclusions about population characteristics based on the information contained in a sample.

2. False. A parameter is a characteristic of a population.

3. collecting

4. observation

5. sample

6. A sample survey is an example of an observational study.


7. An experimental study is involved.

8. The predictor variable is whether an elderly patient with Alzheimer’s is given the new

drug or the placebo. The response variable is whether the patient’s Alzheimer’s symptoms are

reduced.

9. (a) The population is all statistics students.

(b) The sample is the random sample of students selected from the statistics class.

(c) The variable is whether the student is left-handed. It is a categorical variable.

(d) The sample proportion is not likely to be exactly the same as the population proportion.

But it is not likely to be very far away from the population proportion because which

statistics class a person enrolls in is not based on whether the person is left-handed.

10. Different people have different interpretations of the words often, occasionally,

sometimes, and seldom.

Instructor’s Guide with Solutions - Macmillan Learning · Instructor’s Guide with Solutions for...

Documents

Transcript of Instructor’s Guide with Solutions - Macmillan Learning · Instructor’s Guide with Solutions for...