the Relation between Two Variablessite.iugaza.edu.ps/mriffi/files/2018/02/Statistics... · Ch. 4...
Transcript of the Relation between Two Variablessite.iugaza.edu.ps/mriffi/files/2018/02/Statistics... · Ch. 4...
Ch. 4 Describing the Relation between Two Variables
4.1 Scatter Diagrams and Correlation
1 Draw and interpret scatter diagrams.
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
Construct a scatter diagram for the data.
1) The data below are the final exam scores of 10 randomly selected history students and the number of hours
they studied for the exam.
Hours, x
Scores, y
3
65
5
80
2
60
8
88
2
66
4
78
4
85
5
90
6
90
3
71
x
y
x
y
2) The data below are the temperatures on randomly chosen days during a summer class and the number of
absences on those days.
Temperature, x
Number of absences, y
72
3
85
7
91
10
90
10
88
8
98
15
75
4
100
15
80
5
x
y
x
y
Page 1
3) The data below are the ages and systolic blood pressures (measured in millimeters of mercury) of 9 randomly
selected adults.
Age, x
Pressure, y
38
116
41
120
45
123
48
131
51
142
53
145
57
148
61
150
65
152
x
y
x
y
4) The data below are the number of absences and the final grades of 9 randomly selected students from a
literature class.
Number of absences, x
Final grade, y
0
98
3
86
6
80
4
82
9
71
2
92
15
55
8
76
5
82
x
y
x
y
Page 2
5) A manager wishes to determine the relationship between the number of miles (in hundreds of miles) the
managerʹs sales representatives travel per month and the amount of sales (in thousands of dollars) per month.
Miles traveled, x
Sales, y
2
31
3
33
10
78
7
62
8
65
15
61
3
48
1
55
11
120
x
y
x
y
6) In order for employees of a company to work in a foreign office, they must take a test in the language of the
country where they plan to work. The data below show the relationship between the number of years that
employees have studied a particular language and the grades they received on the proficiency exam.
Number of years, x
Grades on test, y
3
61
4
68
4
75
5
82
3
73
6
90
2
58
7
93
3
72
x
y
x
y
Page 3
7) In an area of the Great Plains, records were kept on the relationship between the rainfall (in inches) and the
yield of wheat (bushels per acre).
Rainfall (in inches), x
Yield (bushels per acre), y
10.5
50.5
8.8
46.2
13.4
58.8
12.5
59.0
18.8
82.4
10.3
49.2
7.0
31.9
15.6
76.0
16.0
78.8
x
y
x
y
8) Five brands of cigarettes were tested for the amounts of tar and nicotine they contained. All measurements are
in milligrams per cigarette.
Cigarette Tar Nicotine
Brand A 16 1.2
Brand B 13 1.1
Brand C 16 1.3
Brand D 18 1.4
Brand E 6 0.6
x
y
x
y
9) The scores of nine members of a local community college womenʹs golf team in two rounds of tournament play
are listed below.
Player 1 2 3 4 5 6 7 8 9
Round 1 85 90 87 78 92 85 79 93 86
Round 2 90 87 85 84 86 78 77 91 82
x
y
x
y
Page 4
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Make a scatter diagram for the data. Use the scatter diagram to describe how, if at all, the variables are related.
10) x 3 8 6 5 9 4
y 4 7 5 5 6 3
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
A) The variables appear to be
positively, linearly related.
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
B) The variables do not appear to be
linearly related.
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
C) The variables appear to be
negatively, linearly related.
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
D) The variables do not appear to be
linearly related.
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
Page 5
11) x 6 4 -1 3 2 8 1y 17 19 15 18 21 17 24
x-12 -8 -4 4 8 12 16 20
y28
24
20
16
12
8
4
-4x-12 -8 -4 4 8 12 16 20
y28
24
20
16
12
8
4
-4
A) The variables do not appear to be
linearly related.
x-12 -8 -4 4 8 12 16 20
y28
24
20
16
12
8
4
-4x-12 -8 -4 4 8 12 16 20
y28
24
20
16
12
8
4
-4
B) The variables appear to be
negatively, linearly related.
x-12 -8 -4 4 8 12 16 20
y28
24
20
16
12
8
4
-4x-12 -8 -4 4 8 12 16 20
y28
24
20
16
12
8
4
-4
C) The variables do not appear to be
linearly related.
x-12 -8 -4 4 8 12 16 20
y28
24
20
16
12
8
4
-4x-12 -8 -4 4 8 12 16 20
y28
24
20
16
12
8
4
-4
D) The variables appear to be
positively, linearly related.
x-12 -8 -4 4 8 12 16 20
y28
24
20
16
12
8
4
-4x-12 -8 -4 4 8 12 16 20
y28
24
20
16
12
8
4
-4
Page 6
12)Subject A B C D E F G
x Time watching TV 9 5 3 8 8 6 7
y Time on Internet 14 12 8 17 18 9 18
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
A) The variables appear to be
positively, linearly related.
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
B) The variables do not appear to be
linearly related.
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
C) The variables appear to be
negatively, linearly related.
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
D) The variables do not appear to be
linearly related.
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
x2 4 6 8 10 12 14 16 18 20
y20
18
16
14
12
10
8
6
4
2
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
Provide an appropriate response.
13) An agricultural business wants to determine if the rainfall in inches can be used to predict the yield per acre on
a wheat farm. Identify the predictor variable and the response variable.
14) A college counselor wants to determine if the number of hours spent studying for a test can be used to predict
the grades on a test. Identify the predictor variable and the response variable.
Page 7
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
15) The variable is the variable whose value can be explained by the variable.
A) response; predictor B) response; lurking
C) lurking; response D) predictor Response
2 Describe the properties of the linear correlation coefficient.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Use the scatter diagrams shown, labeled a through f to solve the problem.
16) In which scatter diagram is r = 0.01?
a
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
b
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
c
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
d
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
e
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
f
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
A) e B) c C) f D) d
Page 8
17) In which scatter diagram is r = 1?
a
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
b
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
c
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
d
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
e
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
f
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
A) b B) a C) f D) d
Page 9
18) In which scatter diagram is r = -1?
a
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
b
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
c
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
d
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
e
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
f
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
A) a B) b C) f D) d
Page 10
19) Which scatter diagram indicates a perfect positive correlation?
a
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
b
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
c
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
d
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
e
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
x1 2 3 4 5 6 7
y
12
10
8
6
4
2
f
x1 2 3 4 5 6
y
12
10
8
6
4
2
x1 2 3 4 5 6
y
12
10
8
6
4
2
A) b B) a C) c D) f
Page 11
The scatter diagram shows the relationship between average number of years of education and births per woman of
child bearing age in selected countries. Use the scatter plot to determine whether the statement is true or false.
20) There is a strong positive correlation between years of education and births per woman.
Births per Woman
2 4 6 8 10 12 14
10
9
8
7
6
5
4
3
2
1
2 4 6 8 10 12 14
10
9
8
7
6
5
4
3
2
1
Average number of years of education
of Married Women of Child-Bearing Age
A) False B) True
21) There is no correlation between years of education and births per woman.
Births per Woman
2 4 6 8 10 12 14
10
9
8
7
6
5
4
3
2
1
2 4 6 8 10 12 14
10
9
8
7
6
5
4
3
2
1
Average number of years of education
of Married Women of Child-Bearing Age
A) False B) True
Page 12
22) There is a strong negative correlation between years of education and births per woman.
Births per Woman
2 4 6 8 10 12 14
10
9
8
7
6
5
4
3
2
1
2 4 6 8 10 12 14
10
9
8
7
6
5
4
3
2
1
Average number of years of education
of Married Women of Child-Bearing Age
A) True B) False
23) There is a causal relationship between years of education and births per woman.
Births per Woman
2 4 6 8 10 12 14
10
9
8
7
6
5
4
3
2
1
2 4 6 8 10 12 14
10
9
8
7
6
5
4
3
2
1
Average number of years of education
of Married Women of Child-Bearing Age
A) False B) True
Page 13
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
Provide an appropriate response.
24) Construct a scatter diagram for the given data. Determine whether there is a positive linear correlation,
negative linear correlation, or no linear correlation.
x
y
-5
-10
-3
-8
4
9
1
1
-1
-2
-2
-6
0
-1
2
3
3
6
-4
-8
x-6 -5 -4 -3 -2 -1 1 2 3 4 5 6
y12
10
8
6
4
2
-2
-4
-6
-8
-10
-12
x-6 -5 -4 -3 -2 -1 1 2 3 4 5 6
y12
10
8
6
4
2
-2
-4
-6
-8
-10
-12
25) Construct a scatter diagram for the given data. Determine whether there is a positive linear correlation,
negative linear correlation, or no linear correlation.
x
y
-5
11
-3
6
4
-6
1
-1
-1
3
-2
4
0
1
2
-4
3
-5
-4
8
x-6 -5 -4 -3 -2 -1 1 2 3 4 5 6
y12
10
8
6
4
2
-2
-4
-6
-8
-10
-12
x-6 -5 -4 -3 -2 -1 1 2 3 4 5 6
y12
10
8
6
4
2
-2
-4
-6
-8
-10
-12
Page 14
26) Construct a scatter diagram for the given data. Determine whether there is a positive linear correlation,
negative linear correlation, or no linear correlation.
x
y
-5
11
-3
-6
4
8
1
-3
-1
-2
-2
1
0
5
2
-5
3
6
-4
7
x-6 -5 -4 -3 -2 -1 1 2 3 4 5 6
y12
10
8
6
4
2
-2
-4
-6
-8
-10
-12
x-6 -5 -4 -3 -2 -1 1 2 3 4 5 6
y12
10
8
6
4
2
-2
-4
-6
-8
-10
-12
27) The numbers of home runs that Mark McGwire hit in the first 13 years of his major league baseball career are
listed below. (Source: Major League Handbook) Construct a scatter diagram for the data. Is there a relationship
between the home runs and the batting averages?
Home Runs
Batting Average
33 39 22 42 9 9 39 52 58 70
.231 .235 .201 .268 .33 .252 .274 .312 .274 .299
x15 30 45 60 75
y
0.35
0.3
0.25
0.2
0.15
x15 30 45 60 75
y
0.35
0.3
0.25
0.2
0.15
Page 15
28) The data below represent the numbers of absences and the final grades of 15 randomly selected students from
an astronomy class. Construct a scatter diagram for the data. Do you detect a trend?
Student Number
of Absences
Final Grade
as a Percent
1 5 79
2 6 78
3 2 86
4 12 56
5 9 75
6 5 90
7 8 78
8 15 48
9 0 92
10 1 78
11 9 81
12 3 86
13 10 75
14 3 89
15 11 65
x5 10 15
y100
90
80
70
60
50
40
x5 10 15
y100
90
80
70
60
50
40
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
29) A researcher determines that the linear correlation coefficient is 0.85 for a paired data set. This indicates that
there is
A) a strong positive linear correlation.
B) a strong negative linear correlation.
C) no linear correlation but that there may be some other relationship.
D) insufficient evidence to make any decision about the correlation of the data.
30) An instructor wishes to determine if there is a relationship between the number of absences from his class and
a studentʹs final grade in the course. What is the explanatory variable?
A) Absences B) Final Grade
C) The instructorʹs point scale for attendance D) Studentʹs performance on the final examination
31) A medical researcher wishes to determine if there is a relationship between the number of prescriptions written
by pediatricians and the ages of the children for whom the prescriptions are written. She surveys all the
pediatricians in a geographical region to collect her data. What is the response variable?
A) Number of prescriptions written
B) Pediatricians surveyed
C) Age of the children for whom prescriptions were written
D) Number of children for whom prescriptions were written
32) True or False: A doctor wishes to determine the relationship between a maleʹs age and that maleʹs total
cholesterol level. He tests 200 males and records each maleʹs age and that maleʹs total cholesterol level. The
males cholesterol level is the explanatory variable?
A) False B) True
33) A scatter diagram locates a point in a two dimensional plane. The diagram locates the
variable on the horizontal axis and the variable on the vertical axis.
A) explanatory; response B) response; explanatory
C) response; study D) study; explanatory
Page 16
34) A history instructor has given the same pretest and the same final examination each semester. He is interested
in determining if there is a relationship between the scores of the two tests. He computes the linear correlation
coefficient and notes that it is 1.15. What does this correlation coefficient value tell the instructor?
A) The history instructor has made a computational error.
B) There is a strong positive correlation between the tests.
C) There is a strong negative correlation between the tests.
D) The correlation is something other than linear.
35) A traffic officer is compiling information about the relationship between the hour or the day and the speed over
the limit at which the motorist is ticketed. He computes a correlation coefficient of 0.12. What does this tell
the officer?
A) There is a weak positive linear correlation.
B) There is a moderate positive linear correlation.
C) There is a moderate negative linear correlation.
D) There is insufficient evidence to make any conclusions about the relationship between the variables.
3 Compute and interpret the linear correlation coefficient.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Provide an appropriate response.
36) Calculate the linear correlation coefficient for the data below.
x
y
-11
-5 -9
-3
-2
14
-5
6
-7
3
-8
-1
-6
4
-4
8
-3
11
-10
-3
A) 0.990 B) 0.881 C) 0.819 D) 0.792
37) Calculate the linear correlation coefficient for the data below.
x
y
-3 19
-1 14
6 2
3 7
1 11
0 12
2 9
4 4
5 3
-2 16
A) -0.995 B) -0.671 C) -0.778 D) -0.885
38) Calculate the linear correlation coefficient for the data below.
x
y
-11
7
-9
-10 -2
4
-5
-7 -7
-6 -8
-3
-6
1
-4
-9 -3
2
-10
3
A) -0.104 B) -0.132 C) -0.549 D) -0.581
39) The data below are the final exam scores of 10 randomly selected calculus students and the number of hours
they slept the night before the exam. Calculate the linear correlation coefficient.
Hours, x
Scores, y
4
64
6
79
3
59
9
87
3
65
5
77
5
84
6
89
7
89
4
70
A) 0.847 B) 0.991 C) 0.761 D) 0.654
40) The data below are the average one-way commute times (in minutes) of selected students during a summer
literature class and the number of absences for those students for the term. Calculate the linear correlation
coefficient.
Commute time (min), x
Number of absences, y
70
8
83
12
89
15
88
15
86
13
96
20
73
9
98
20
78
10
A) 0.980 B) 0.890 C) 0.881 D) 0.819
41) The data below are the ages and annual pharmacy b ills (in dollars) of 9 randomly selected employees.
Calculate the linear correlation coefficient.
Age, x
Pharmacy bill ($), y
41
111
44
115
48
118
51
126
54
137
56
140
60
143
64
145
68
147
A) 0.960 B) 0.998 C) 0.890 D) 0.908
Page 17
42) The data below are the number of hours worked (per week) and the final grades of 9 randomly selected
students from a drama class. Calculate the linear correlation coefficient.
Hours worked, x
Final Grade, y
2
90
5
78
8
72
6
74
11
63
4
84
17
47
10
68
7
74
A) -0.991 B) -0.888 C) -0.918 D) -0.899
43) A manager wishes to determine the relationship between the number of years the managerʹs sales
representatives have been with the company and their average monthly sales (in thousands of dollars).
Calculate the linear correlation coefficient.
Years with company, x
Sales, y
5
34
6
36
13
81
10
65
11
68
18
64
6
51
4
58
14
123
A) 0.632 B) 0.561 C) 0.717 D) 0.791
44) In order for a companyʹs employees to work in a foreign office, they must take a test in the language of the
country where they plan to work. The data below shows the relationship between the number of years that
employees have studied a particular language and the grades they received on the proficiency exam. Calculate
the linear correlation coefficient.
Number of years, x
Grades on test, y
5
63
6
70
6
77
7
84
5
75
8
92
4
60
9
95
5
74
A) 0.934 B) 0.911 C) 0.891 D) 0.902
45) In an area of the Great Plains, records were kept on the relationship between the rainfall (in inches) and the
yield of wheat (bushels per acre). Calculate the linear correlation coefficient.
Rainfall (in inches), x
Yield (bushels per acre), y
7.8
48.5
6.1
44.2
10.7
56.8
9.8
57
16.1
80.4
7.6
47.2
4.3
29.9
12.9
74
13.3
76.8
A) 0.981 B) 0.998 C) 0.900 D) 0.899
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
46) Calculate the coefficient of correlation, r, letting Row 1 represent the x-values and Row 2 represent the
y-values. Now calculate the coefficient of correlation, r, letting Row 2 represent the x-values and Row 1
represent the y-values. What effect does switching the explanatory and response variables have on the linear
correlation coefficient?
Row 1
Row 2
-4 -9
-2 9
5 10
2 2
0 -1
-1 -5
1 0
3 4
4 7
-3 9
4 Determine whether a linear relation exists between two variables.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Compute the linear correlation coefficient between the two variables and determine whether a linear relation exists.
47) x 2 3 5 5 6
y 1.3 1.6 2.1 2.2 2.7
A) r = 0.983; linear relation exists B) r = 0.983; no linear relation exists
C) r = 0.883; linear relation exists D) r = 0.883; no linear relation exists
48) x -1 1 8 5 3 2 4 6 7 0y -13 -11 6 -2 -5 -9 -4 0 3 -11A) r = 0.990; linear relation exists B) r = 0.881; no linear relation exists
C) r = 0.819; linear relation exists D) r = 0.792; no linear relation exists
49) x -1 1 8 5 3 2 4 6 7 0y 18 13 1 6 10 11 8 3 2 15A) r = -0.995; linear relation exists B) r = -0.995; no linear relation exists
C) r = -0.885; no linear relation exists D) r = -0.885; linear relation exists
Page 18
50) x 9 2 3 4 2 5 9 10
y 85 52 55 68 67 86 83 73
A) r = 0.708; linear relation exists B) r = 0.235; no linear relation exists
C) r = -0.708; linear relation exists D) r = 0.708; no linear relation exists
51) x 10 11 16 9 7 15 16 10
y 96 51 62 58 89 81 46 51
A) r = -0.335; no linear relation exists B) r = 0.462; linear relation exists
C) r = -0.335; linear relation exists D) r = -0.284; no linear relation exists
52) The table below shows the scores on an end-of-year project of 10 randomly selected architecture students and
the number of days each student spent working on the project.
Days, x
Score, y
4
68
6
83
3
63
9
91
3
69
5
81
5
88
6
93
7
93
4
74
A) r = 0.847; linear relation exists B) r = 0.847; no linear relation exists
C) r = 0.761; linear relation exists D) r = 0.761; no linear relation exists
53) The table below shows the ages and weights (in pounds) of 9 randomly selected tennis coaches.
Age, x
Weight (pounds), y
42
120
45
124
49
127
52
135
55
146
57
149
61
152
65
154
69
156
A) r = 0.960; linear relation exists B) r = 0.960; no linear relation exists
C) r = 0.908; no linear relation exists D) r = 0.908; linear relation exists
54) The table shows the number of days off last year and the earnings for the year (in thousands of dollars) for nine
randomly selected insurance salesmen.
Number of days off, x
Earnings for the year (thousands of dollars), y
2
88
5
76
8
70
6
72
11
61
4
82
17
45
10
66
7
72
A) r = -0.991; linear relation exists B) r = -0.991; no linear relation exists
C) r = -0.899; linear relation exists D) r = -0.899; no linear relation exists
55) A manager wishes to determine whether there is a relationship between the number of years her sales
representatives have been with the company and their average monthly sales. The table shows the years of
service for each of her sales representatives and their average monthly sales (in thousands of dollars).
Years with company, x 6 7 14 11 12 19 7 5 15
Sales , y 29 31 76 60 63 59 46 53 118
A) r = 0.632; no linear relation exists B) r = 0.632; linear relation exists
C) r = 0.717; linear relation exists D) r = 0.717; no linear relation exists
56) To investigate the relationship between yield of soybeans and the amount of fertilizer used, a researcher
divides a field into eight plots of equal size and applies a different amount of fertilizer to each plot. The table
shows the yield of soybeans and the amount of fertilizer used for each plot.
Amount of fertilizer (pounds) ,x 1 1.5 2 2.5 3 3.5 4 4.5
Yield of soybeans (pounds), y 25 21 27 28 36 35 32 34
A) r = 0.819; linear relation exists B) r = 0.729; no linear relation exists
C) r = 0.683; linear relation exists D) r = 0.683; no linear relation exists
5 Explain the difference between correlation and causation.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Provide an appropriate response.
57) A variable that is related to either the response variable or the predictor variable or both, but which is excluded
from the analysis is a
A) lurking variable. B) random variable.
C) discrete variable. D) qualitative variable.
Page 19
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
58) For a random sample of 100 American cities, the linear correlation coefficient between the number of robberies
last year and the number of schools in the city was found to be r = 0.725. What does this imply? Does this
suggest that building more schools in a city could lead to more robberies? Why or why not? What is a likely
lurking variable?
59) For a random sample of 30 countries, the linear correlation coefficient between the infant mortality rate and the
average number of cars per capita was found to be r = -0.717. What does this imply? Does this suggest that if
people buy more cars, this could lower the infant mortality rate? Why or why not? What is a likely lurking
variable?
60) A random sample of 200 men aged between 20 and 60 was selected from a certain city. The linear correlation
coefficient between income and blood pressure was found to be r = 0.807. What does this imply? Does this
suggest that if a man gets a salary raise his blood pressure is likely to rise? Why or why not? What are likely
lurking variables?
4.2 Least-Squares Regression
1 Find the least-squares regression line and use the line to make predictions.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Provide an appropriate response.
1) Find the equation of the regression line for the given data. Round values to the nearest thousandth.
x
y
-5
-10
-3
-8
4
9
1
1
-1
-2
-2
-6
0
-1
2
3
3
6
-4
-8
A) y^= 2.097x - 0.552 B) y
^ = 0.522x - 2.097
C) y^ = 2.097x + 0.552 D) y
^ = -0.552x + 2.097
2) Find the equation of the regression line for the given data. Round values to the nearest thousandth.
x
y
-5
11
-3
6
4
-6
1
-1
-1
3
-2
4
0
1
2
-4
3
-5
-4
8
A) y^= -1.885x + 0.758 B) y
^ = 0.758x + 1.885 C) y
^= -0.758x - 1.885 D) y
^= 1.885x - 0.758
3) Find the equation of the regression line for the given data. Round values to the nearest thousandth.
x
y
-5
11
-3
-6
4
8
1
-3
-1
-2
-2
1
0
5
2
-5
3
6
-4
7
A) y^= -0.206x + 2.097 B) y
^= 2.097x - 0.206 C) y
^= 0.206x - 2.097 D) y
^= -2.097x + 0.206
4) The data below are the final exam scores of 10 randomly selected history students and the number of hours
they slept the night before the exam. Find the equation of the regression line for the given data. What would be
the predicted score for a history student who slept 7 hours the previous night? Is this a reasonable question?
Round the regression line values to the nearest hundredth, and round the predicted score to the nearest whole
number.
Hours, x
Scores, y
3
65
5
80
2
60
8
88
2
66
4
78
4
85
5
90
6
90
3
71
A) y^= 5.04x + 56.11; 91; Yes, it is reasonable.
B) y^= 5.04x + 56.11; 91; No, it is not reasonable. 7 hours is well outside the scope of the model.
C) y^= -5.04x + 56.11; 21; No, it is not reasonable. 7 hours is well outside the scope of the model.
D) y^= -5.04x + 56.11; 21; Yes, it is reasonable.
Page 20
5) The data below are the final exam scores of 10 randomly selected history students and the number of hours
they slept the night before the exam. Find the equation of the regression line for the given data. What would be
the predicted score for a history student who slept 15 hours the previous night? Is this a reasonable question?
Round your predicted score to the nearest whole number. Round the regression line values to the nearest
hundredth.
Hours, x
Scores, y
3
65
5
80
2
60
8
88
2
66
4
78
4
85
5
90
6
90
3
71
A) y^= 5.04x + 56.11; 132; No, it is not reasonable. 15 hours is well outside the scope of the model.
B) y^= 5.04x + 56.11; 132; Yes, it is reasonable.
C) y^= -5.04x + 56.11; -20; No, it is not reasonable.
D) y^= -5.04x + 56.11; -20; Yes, it is reasonable.
6) The data below are the average one-way commute times (in minutes) for selected students and the number of
absences for those students during the term. Find the equation of the regression line for the given data. What
would be the predicted number of absences if the commute time was 95 minutes? Is this a reasonable question?
Round the predicted number of absences to the nearest whole number. Round the regression line values to the
nearest hundredth.
Commute time (min), x
Number of absences, y
72
3
85
7
91
10
90
10
88
8
98
15
75
4
100
15
80
5
A) y^ = 0.45x - 30.27; 12 absences; Yes, it is reasonable.
B) y^ = 0.45x - 30.27; 12 absences; No, it is not reasonable. 95 minutes is well outside the scope of the model.
C) y^ = 0.45x + 30.27; 73 absences; Yes, it is reasonable.
D) y^ = 0.45x + 30.27; 73 absences; No, it is not reasonable. 95 minutes is well outside the scope of the model.
7) The data below are the average one-way commute times (in minutes) for selected students and the number of
absences for those students during the term. Find the equation of the regression line for the given data. What
would be the predicted number of absences if the commute time was 40 minutes? Is this a reasonable question?
Round the predicted number of absences to the nearest whole number. Round the regression line values to the
nearest hundredth.
Commute time (min), x
Number of absences, y
72
3
85
7
91
10
90
10
88
8
98
15
75
4
100
15
80
5
A) y^ = 0.45x - 30.27; -12 absences; No, it is not reasonable. 40 minutes is well outside the scope of the model.
B) y^ = 0.45x - 30.27; -12 absences; Yes, it is reasonable.
C) y^ = 0.45x + 30.27; 48 absences; Yes, it is reasonable.
D) y^ = 0.45x + 30.27; 48 absences; No, it is not reasonable. 40 minutes is well outside the scope of the model.
8) The data below are ages and systolic blood pressures (measured in millimeters of mercury) of 9 randomly
selected adults. Find the equation of the regression line for the given data. What would be the predicted
pressure if the age was 60? Round the predicted pressure to the nearest whole number. Round the regression
line values to the nearest hundredth.
Age, x
Pressure, y
38
116
41
120
45
123
48
131
51
142
53
145
57
148
61
150
65
152
A) y^= 1.49x + 60.46; 150 mm B) y
^= 60.46x - 1.49; 3626 mm
C) y^= 1.49x - 60.46; 29 mm D) y
^= 60.46x + 1.49; 3629 mm
Page 21
9) The data below are the number of absences and the final grades of 9 randomly selected students from a
literature class. Find the equation of the regression line for the given data. What would be the predicted final
grade if a student was absent 14 times? Round the regression line values to the nearest hundredth. Round the
predicted grade to the nearest whole number.
Number of absences, x
Final grade, y
0
98
3
86
6
80
4
82
9
71
2
92
15
55
8
76
5
82
A) y^ = -2.75x + 96.14; 58 B) y
^ = 96.14x - 2.75; 1343
C) y^= -2.75x - 96.14; 134.64 D) y
^= -96.14x + 2.75; 1343
10) A manager wishes to determine the relationship between the number of miles traveled (in hundreds of miles)
by her sales representatives and their amount of sales (in thousands of dollars) per month. Find the equation of
the regression line for the given data. What would be the predicted sales if the sales representative traveled 0
miles? Is this reasonable? Why or why not? Round the regression line values to the nearest hundredth.
Miles traveled, x
Sales, y
2
31
3
33
10
78
7
62
8
65
15
61
3
48
1
55
11
120
A) y^ = 3.53x + 37.92; $37,920; No; it is not reasonable for a representative to travel 0 miles and have a positive
amount of sales.
B) y^ = 3.53x + 37.92; $3792; No; it is not reasonable for a representative to travel 0 miles and have a positive
amount of sales.
C) y^ = 3.53x + 37.92; $37,920; Yes, it is reasonable.
D) y^= 37.92x + 3.53; $3792; Yes, it is reasonable.
11) A manager wishes to determine the relationship between the number of years her sales representatives have
been employed by the firm and their amount of sales (in thousands of dollars) per month. Find the equation of
the regression line for the given data. What would be the predicted sales if the sales representative was
employed by the firm for 30 years Is this reasonable? Why or why not? Round the regression line values to the
nearest hundredth.
Years employed, x
Sales, y
2
31
3
33
10
78
7
62
8
65
15
61
3
48
1
55
11
120
A) y^ = 3.53x + 37.92; $143,820; No; it is not reasonable. 30 years of employment is well outside the scope of
the model.
B) y^ = 3.53x + 37.92; $143,820;; Yes, it is reasonable.
C) y^ = 3.53x - 37.92; $67,980; No; it is not reasonable. 30 years of employment is well outside the scope of the
model.
D) y^ = 3.53x - 37.92; $67,980; Yes; it is reasonable.
12) In order for a companyʹs employees to work in a foreign office, they must take a test in the language of the
country where they plan to work. The data below shows the relationship between the number of years that
employees have studied a particular language and the grades they received on the proficiency exam. Find the
equation of the regression line for the given data. Round the regression line values to the nearest hundredth.
Number of years, x
Grades on test, y
3
61
4
68
4
75
5
82
3
73
6
90
2
58
7
93
3
72
A) y^ = 6.91x + 46.26 B) y
^= 6.91x - 46.26 C) y
^= 46.26x - 6.91 D) y
^ = 46.26x + 6.91
Page 22
13) In an area of the Great Plains, records were kept on the relationship between the rainfall (in inches) and the
yield of wheat (bushels per acre). Find the equation of the regression line for the given data. Round the
regression line values to the nearest thousandth.
Rainfall (in inches), x
Yield (bushels per acre), y
10.5
50.5
8.8
46.2
13.4
58.8
12.5
59.0
18.8
82.4
10.3
49.2
7.0
31.9
15.6
76.0
16.0
78.8
A) y^ = 4.379x + 4.267 B) y
^= -4.379x + 4.267 C) y
^= 4.267x + 4.379 D) y
^ = 4.267x - 4.379
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
14) Find the equation of the regression line by letting Row 1 represent the x-values and Row 2 represent the
y-values. Now find the equation of the regression line letting Row 2 represent the x-values and Row 1
represent the y-values. What effect does switching the explanatory and response variables have on the
regression line?
Row 1
Row 2
-5
-10
-3
-8
4
9
1
1
-1
-2
-2
-6
0
-1
2
3
3
6
-4
-8
15) Is the number of games won by a major league baseball team in a season related to the teamʹs batting average?
Data from 14 teams were collected and the summary statistics yield:
y =∑ 1,134, x∑ = 3.642, y2∑ = 93,110, x2∑ = 0.948622, and xy∑ = 295.54
Find the least squares prediction equation for predicting the number of games won, y, using a straight-line
relationship with the teamʹs batting average, x.
16) The table shows, for the years 1997-2012, the mean hourly wage for residents of the town of Pity Me and the
mean weekly rent paid by the residents.
Year Mean weekly rent
(dollars)
Mean hourly wage
(dollars)
Year Mean weekly rent
(dollars)
Mean hourly wage
(dollars)
1997 57 10.38 2005 116 28.99
1998 59 10.89 2006 113 28.63
1999 62 11.96 2007 112 36.75
2000 63 12.46 2008 86 14.55
2001 86 17.72 2009 90 17.90
2002 119 28.07 2010 90 14.67
2003 131 35.24 2011 100 17.97
2004 122 31.87 2012 115 22.23
Summary statistics yield: SSxx = 1222.2771, SSxy = 3031.7125, SSyy = 9144.9375, x = 21.2675, and
y = 95.0625. Find the least squares line that uses mean hourly wage to predict mean weekly rent. Round values
to the nearest ten-thousandth.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
17) A residual is the difference between
A) the observed value of y and the predicted value of y.
B) the observed value of x and the predicted value of x.
C) the observed value of y and the predicted value of x.
D) the observed value of x and the predicted value of y.
18) The least squares regression line
A) minimizes the sum of the residuals squared.
B) maximizes the sum of the residuals squared.
C) minimizes the mean difference between the residuals squared.
D) maximizes the mean difference between the residuals squared.
Page 23
19) For a given data set, the equation of the least squares regression line will always pass through
A) (x, y). B) every point in the given data set.
C) at least two point in the given data set. D) the y-intercept and the slope.
2 Interpret the slope and the y-intercept of the least-squares regression line.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Provide an appropriate response.
20) A county real estate appraiser wants to develop a statistical model to predict the appraised value of houses in a
section of the county called East Meadow. One of the many variables thought to be an important predictor of
appraised value is the number of rooms in the house. Consequently, the appraiser decided to fit the simple
linear regression model, y^ = β0 + β1x, where y = appraised value of the house (in $thousands) and x = number
of rooms. Using data collected for a sample of n = 74 houses in East Meadow, the following results were
obtained:
y^= 74.80 + 17.80x
sβ = 71.24, t = 1.05 (for testing β0)
sβ = 2.63, t = 7.49 (for testing β1)
SSE = 60,775, MSE = 841, s = 29, r2 = .44
Range of the x-values: 5 - 11
Range of the y-values: 160 - 300
Give a practical interpretation of the estimate of the slope of the least squares line.
A) For each additional room in the house, we estimate the appraised value to increase $17,800.
B) For each additional room in the house, we estimate the appraised value to increase $74,800.
C) For each additional dollar of appraised value, we estimate the number of rooms in the house to increase
by 17.80 rooms.
D) For a house with 0 rooms, we estimate the appraised value to be $74,800.
21) A county real estate appraiser wants to develop a statistical model to predict the appraised value of houses in a
section of the county called East Meadow. One of the many variables thought to be an important predictor of
appraised value is the number of rooms in the house. Consequently, the appraiser decided to fit the simple
linear regression model, y^= β0 + β1x, where y = appraised value of the house (in $thousands) and x = number
of rooms. Using data collected for a sample of n = 74 houses in East Meadow, the following results were
obtained:
y^= 74.80 + 19.72x
sβ = 71.24, t = 1.05 (for testing β0)
sβ = 2.63, t = 7.49 (for testing β1)
SSE = 60,775, MSE = 841, s = 29, r2 = 0.44
Range of the x-values: 5 - 11
Range of the y-values: 160 - 300
Give a practical interpretation of the estimate of the y-intercept of the least squares line.
A) There is no practical interpretation, since a house with 0 rooms is nonsensical.
B) For each additional room in the house, we estimate the appraised value to increase $74,800.
C) For each additional room in the house, we estimate the appraised value to increase $19,720.
D) We estimate the base appraised value for any house to be $74,800.
Page 24
22) Is there a relationship between the raises administrators at State University receive and their performance on
the job? A faculty group wants to determine whether job rating (x) is a useful linear predictor of raise (y).
Consequently, the group considered the straight-line regression model, y^= β0 + β1x. Using the method of least
squares, the faculty group obtained the following prediction equation, y^= 14,000 + 2,000x. Interpret the
estimated slope of the line.
A) For a 1-point increase in an administratorʹs rating, we estimate the administratorʹs raise to increase
$2,000.
B) For a 1-point increase in an administratorʹs rating, we estimate the administratorʹs raise to decrease
$2,000.
C) For an administrator with a rating of 1.0, we estimate his/her raise to be $2,000.
D) For a $1 increase in an administratorʹs raise, we estimate the administratorʹs rating to decrease 2,000
points.
23) Is there a relationship between the raises administrators at State University receive and their performance on
the job? A faculty group wants to determine whether job rating (x) is a useful linear predictor of raise (y).
Consequently, the group considered the straight-line regression model, y^ = β0 + β1x. Using the method of least
squares, the faculty group obtained the following prediction equation, y^= 14,000 + 2,000x.
Interpret the estimated y-intercept of the line.
A) For an administrator who receives a rating of zero, we estimate his or her raise to be $14,000.
B) The base administrator raise at State University is $14,000.
C) For a 1-point increase in an administratorʹs rating, we estimate the administratorʹs raise to increase
$14,000.
D) There is no practical interpretation, since rating of 0 is nonsensical and outside the range of the sample
data.
24) A large national bank charges local companies for using its services. A bank official reported the results of a
regression analysis designed to predict the bankʹs charges (y), measured in dollars per month, for services
rendered to local companies. One independent variable used to predict service charge to a company is the
companyʹs sales revenue (x), measured in millions of dollars. Data for 21 companies who use the bankʹs
services were used to fit the model, y^= β0 + β1x. The results of the simple linear regression are provided below.
y^= 2,700 + 20x, s = 65, 2-tailed p-value = 0.064 (for testing β1)
Interpret the estimate of β0, the y-intercept of the line.
A) There is no practical interpretation since a sales revenue of $0 is a nonsensical value.
B) All companies will be charged at least $2,700 by the bank.
C) About 95% of the observed service charges fall within $2,700 of the least squares line.
D) For every $1 million increase in sales revenue, we expect a service charge to increase $2,700.
Page 25
25) Civil engineers often use the straight-line equation, y^= β0 + β1x, to model the relationship between the mean
shear strength of masonry joints and precompression stress, x. To test this theory, a series of stress tests were
performed on solid bricks arranged in triplets and joined with mortar. The precompression stress was varied
for each triplet and the ultimate shear load just before failure (called the shear strength) was recorded. The
stress results for n = 7 triplet tests is shown in the accompanying table followed by a SAS printout of the
regression analysis.
Triplet Test 1 2 3 4 5 6 7
Shear Strength (tons), y 1.00 2.18 2.24 2.41 2.59 2.82 3.06
Precomp. Stress (tons), x 0 0.60 1.20 1.33 1.43 1.75 1.75
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob > F
Model 1 2.39555 2.39555 47.732 0.0010
Error 5 0.25094 0.05019
C Total 6 2.64649
Root MSE 0.22403 R-square 0.9052
Dep Mean 2.32857 Adj R-sq 0.8862
C.V. 9.62073
Parameter Estimates
Parameter Standard T for HO:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 1.191930 0.18503093 6.442 0.0013
X 1 0.987157 0.14288331 6.909 0.0010
Give a practical interpretation of the estimate of the slope of the least squares line.
A) For every 1 ton increase in precompression stress, we estimate the shear strength of the joint to increase
by 0.987 ton.
B) For a triplet test with a precompression stress of 1 ton, we estimate the shear strength of the joint to be
0.987 ton.
C) For every 0.987 ton increase in precompression stress, we estimate the shear strength of the joint to
increase by 1 ton.
D) For a triplet test with a precompression stress of 0 tons, we estimate the shear strength of the joint to be
1.19 tons.
Page 26
26) Civil engineers often use the straight-line equation, y^ = β0 + β1x, to model the relationship between the mean
shear strength of masonry joints and precompression stress, x. To test this theory, a series of stress tests were
performed on solid bricks arranged in triplets and joined with mortar. The precompression stress was varied
for each triplet and the ultimate shear load just before failure (called the shear strength) was recorded. The
stress results for n = 7 triplet tests is shown in the accompanying table followed by a SAS printout of the
regression analysis.
Triplet Test 1 2 3 4 5 6 7
Shear Strength, y
(tons)
1.00 2.18 2.24 2.41 2.59 2.82 3.06
Precomp. Stress, x
(tons)
0 0.60 1.20 1.33 1.43 1.75 1.75
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob > F
Model 1 2.39555 2.39555 47.732 0.0010
Error 5 0.25094 0.05019
C Total 6 2.64649
Root MSE 0.22403 R-square 0.9052
Dep Mean 2.32857 Adj R-sq 0.8862
C.V. 9.62073
Parameter Estimates
Parameter Standard T for HO:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 1.191930 0.18503093 6.442 0.0013
X 1 0.987157 0.14288331 6.909 0.0010
Give a practical interpretation of the estimate of the y-intercept of the least squares line.
A) For a triplet test with a precompression stress of 0 tons, we estimate the shear strength of the joint to be
1.19 tons.
B) For every 1 ton increase in precompression stress, we estimate the shear strength of the joint to increase
by 0.987 ton.
C) There is no practical interpretation since a triplet test with a precompression stress of 0 tons is outside the
range of the sample data.
D) For a triplet test with a precompression stress of 0 tons, we estimate the shear strength of the joint to
increase 1.19 tons.
Page 27
27) Each year a nationally recognized publication conducts its ʺSurvey of Americaʹs Best Graduate and
Professional Schools.ʺ An academic advisor wants to predict the typical starting salary of a graduate at a top
business school using GMAT score of the school as a predictor variable. Total GMAT scores range from 200 to
800. A simple linear regression of SALARY versus GMAT using 25 data points shown below.
β0^ = -92040 β
^1 = 228 s = 3213 R
2 = 0.66 r = 0.81 df = 23 t = 6.67
Give a practical interpretation of β0^ = -92040.
A) The value has no practical interpretation since a GMAT of 0 is nonsensical and outside the range of the
sample data.
B) We expect to predict SALARY to within 2(92040) = $184,080 of its true value using GMAT in a
straight-line model.
C) We estimate SALARY to decrease $92,040 for every 1-point increase in GMAT.
D) We estimate the base SALARY of graduates of a top business school to be $-92,040.
28) Each year a nationally recognized publication conducts its ʺSurvey of Americaʹs Best Graduate and
Professional Schools.ʺ An academic advisor wants to predict the typical starting salary of a graduate at a top
business school using GMAT score of the school as a predictor variable. Total GMAT scores range from 200 to
800. A simple linear regression of SALARY versus GMAT using 25 data points shown below.
β0^ = -92040 β
^1 = 228 s = 3213 R
2 = 0.66 r = 0.81 df = 23 t = 6.67
Give a practical interpretation of β1^ = 228.
A) We estimate SALARY to increase $228 for every 1-point increase in GMAT.
B) We expect to predict SALARY to within 2(228) = $456 of its true value using GMAT in a straight-line
model.
C) We estimate GMAT to increase 228 points for every $1 increase in SALARY.
D) The value has no practical interpretation since a GMAT of 0 is nonsensical and outside the range of the
sample data.
29) A real estate magazine reported the results of a regression analysis designed to predict the price (y), measured
in dollars, of residential properties recently sold in a northern Virginia subdivision. One independent variable
used to predict sale price is GLA, gross living area (x), measured in square feet. Data for 157 properties were
used to fit the model, y^= β0 + β1x. The results of the simple linear regression are provided below.
y^ = 96,600 + 22.5x s = 6500 r2 = -0.77 t = 6.1 (for testing β1)
Interpret the estimate of β0, the y-intercept of the line.
A) There is no practical interpretation, since a gross living area of 0 is a nonsensical value.
B) All residential properties in Virginia will sell for at least $96,600.
C) About 95% of the observed sale prices fall within $96,600 of the least squares line.
D) For every 1-sq ft. increase in GLA, we expect a propertyʹs sale price to increase $96,600.
Page 28
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
30) In a comprehensive road test on all new car models, one variable measured is the time it takes a car to
accelerate from 0 to 60 miles per hour. To model acceleration time, a regression analysis is conducted on a
random sample of 129 new cars.
TIME60: y = Elapsed time (in seconds) from 0 mph to 60 mph
MAX: x1 = Maximum speed attained (miles per hour)
Initially, the simple linear model E(y) = β0 + β1x1 was fit to the data. Computer printouts for the analysis are
given below:
UNWEIGHTED LEAST SQUARES LINEAR REGRESSION OF TIME60
PREDICTOR
VARIABLES COEFFICIENT STD ERROR STUDENTʹS T P
CONSTANT 18.7171 0.63708 29.38 0.0000
MAX -0.08365 0.00491 -17.05 0.0000
R-SQUARED 0.6960 RESID. MEAN SQUARE (MSE) 1.28695
ADJUSTED R-SQUARED 0.6937 STANDARD DEVIATION 1.13444
SOURCE DF SS MS F P
REGRESSION 1 374.285 374.285 290.83 0.0000
RESIDUAL 127 163.443 1.28695
TOTAL 128 537.728
CASES INCLUDED 129 MISSING CASES 0
Find and interpret the estimate b1 in the printout above.
31) In a study of feeding behavior, zoologists recorded the number of grunts of a warthog feeding by a lake in the
15 minute period following the addition of food. The data showing the weekly number of grunts and the age
of the warthog (in days) are listed below:
Week Number of Grunts Age (days)
1 82 117
2 60 133
3 31 147
4 36 152
5 55 159
6 32 166
7 54 175
8 9 181
9 12 187
a. Write the equation of a straight-line model relating number of grunts (y) to age (x).
b. Give the least squares prediction equation.
c. Give a practical interpretation of the value of β0^ if possible.
d. Give a practical interpretation of the value of β1^ if possible.
Page 29
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
32) Given the following least squares prediction equation, y^ = -173 + 74x, we estimate y to by
with each 1-unit increase in x.
A) increase; 74 B) decrease; 74 C) decrease; 173 D) increase; 173
33) Given the equation of a regression line is y^ = 3x - 4, what is the best predicted value for y given x = 2?
A) 2 B) 10 C) 5 D) 1
34) Given the equation of a regression line is y^= -3.5x+ 1.5, what is the best predicted value for y given x = 9.9?
A) -33.15 B) -36.15 C) 36.15 D) 33.15
35) Use the regression equation to predict the value of y for x = 0.2.
x
y
-5
-10
-3
-8
4
9
1
1
-1
-2
-2
-6
0
-1
2
3
3
6
-4
-8
A) -0.133 B) 0.971 C) 1.987 D) 2.207
36) Use the regression equation to predict the value of y for x = 3.3.
x
y
-5
11
-3
6
4
-6
1
-1
-1
3
-2
4
0
1
2
-4
3
-5
-4
8
A) -5.462 B) 6.979 C) 0.616 D) 4.386
37) The data below are the final exam scores of 10 randomly selected chemistry students and the number of hours
they slept the night before the exam. What is the best predicted value for y given x = 8?
Hours, x
Scores, y
3
65
5
80
2
60
8
88
2
66
4
78
4
85
5
90
6
90
3
71
A) 96 B) 95 C) 94 D) 97
38) The data below are the temperatures on randomly chosen days during the summer in one city and the number
of employee absences on those days for a company located in the same city. What is the best predicted value
for y given x = 84?
Temperature, x
Number of absences, y
72
3
85
7
91
10
90
10
88
8
98
15
75
4
100
15
80
5
A) 7 B) 8 C) 9 D) 10
39) The data below are the ages and systolic blood pressures (measured in millimeters of mercury) of 9 randomly
selected adults. What is the best predicted value for y given x = 66?
Age, x
Pressure, y
38
116
41
120
45
123
48
131
51
142
53
145
57
148
61
150
65
152
A) 159 B) 161 C) 157 D) 155
40) The data below are the number of absences and the salaries (in thousands of dollars) of 9 randomly selected
employees from an engineering firm. What is the best predicted value for y given x = 10?
.Number of absences, x
Salary, y
0
98
3
86
6
80
4
82
9
71
2
92
15
55
8
76
5
82
A) 69 B) 70 C) 71 D) 68
Page 30
41) In order for a companyʹs employees to work for the foreign office, they must take a test in the language of the
country where they plan to work. The data below show the relationship between the number of years that
employees have studied a particular language and the grades they received on the proficiency exam. What is
the best predicted value for y given x = 2?
Number of years, x
Grades on test, y
3
61
4
68
4
75
5
82
3
73
6
90
2
58
7
93
3
72
A) 60 B) 58 C) 56 D) 62
42) In an area of the Great Plains, records were kept on the relationship between the rainfall (in inches) and the
yield of wheat (bushels per acre). Which is the best predicted value for y given x = 17.6?
Rainfall (in inches), x
Yield (bushels per acre), y
10.5
50.5
8.8
46.2
13.4
58.8
12.5
59.0
18.8
82.4
10.3
49.2
7.0
31.9
15.6
76.0
16.0
78.8
A) 81.3 B) 81.6 C) 81.1 D) 81.8
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
43) A calculus instructor is interested in finding the strength of a relationship between the final exam grades of
students enrolled in Calculus I and Calculus II at his college. The data (in percentages) are listed below.
Calculus I
Calculus II
88
81
78
80
62
55
75
78
95
90
91
90
83
81
86
80
98
100
a) Graph a scatter diagram of the data.
b) Find an equation of the regression line.
c) Predict a Calculus II exam score for a student who receives an 80 in Calculus I.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
44) In one area of Russia, records were kept on the relationship between the rainfall (in inches) and the yield of
wheat (bushels per acre). The data for a 9 year period is as follows:
Rain Fall, x 13.1 11.4 16.0 15.1 21.4 12.9 9.6 18.2 18.6
Yield, y 48.5 44.2 56.8 80.4 47.2 29.9 74.0 74.0 76.8
The equation of the line of least squares is given as y^= -9.12 + 4.38x. How many bushels of wheat per acre can
be predicted if it is expected that there will be 17 inches of rain?
A) 65.34 B) 5.96 C) 61.18 D) 52.06
45) In an area of Russia, records were kept on the relationship between the rainfall (in inches) and the yield of
wheat (bushels per acre). The data for a 9 year period is as follows:
Rain Fall, x 13.1 11.4 16.0 15.1 21.4 12.9 9.6 18.2 18.6
Yield, y 48.5 44.2 56.8 80.4 47.2 29.9 74.0 74.0 76.8
The equation of the line of least squares is given as y^= -9.12 + 4.38x. What would be the expected number of
inches of rain if the yield is 60 bushels of wheat per acre?
A) 15.78 B) 253.68 C) 11.62 D) 64.74
46) In an area of Russia, records were kept on the relationship between the rainfall (in inches) and the yield of
wheat (bushels per acre). The data for a 9 year period is as follows:
Rain Fall, x 13.1 11.4 16.0 15.1 21.4 12.9 9.6 18.2 18.6
Yield, y 48.5 44.2 56.8 80.4 47.2 29.9 74.0 74.0 76.8
The equation of the line of least squares is given as y^ = -9.12 + 4.38x. How many bushels of wheat per acre can
be predicted if it is expected that there will be 30 inches of rain?
A) Cannot be certain of the result because 30 inches of rain exceeds the observed data.
B) 122.28
C) 140.52
D) 8.93
Page 31
3 Compute the sum of squared residuals.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Provide an appropriate response.
47) The regression line for the given data is y^ = 2.097x - 0.552. Determine the residual of a data point for which x =
0 and y = -1.
x
y
-5
-10
-3
-8
4
9
1
1
-1
-2
-2
-6
0
-1
2
3
3
6
-4
-8
A) -0.448 B) -1.552 C) -0.552 D) 2.649
48) The regression line for the given data is y^ = -1.885x + 0.758. Determine the residual of a data point for which x
= 2 and y = -4.x
y
-5
11
-3
6
4
-6
1
-1
-1
3
-2
4
0
1
2
-4
3
-5
-4
8
A) -0.988 B) -7.012 C) -3.012 D) -6.298
49) The regression line for the given data is y^ = -0.206x + 2.097. Determine the residual of a data point for which x
= -5 and y = 11.
x
y
-5
11
-3
-6
4
8
1
-3
-1
-2
-2
1
0
5
2
-5
3
6
-4
7
A) 7.873 B) 14.127 C) 3.127 D) -4.831
50) The regression line for the given data is y^ = 5.044x + 56.11. Determine the residual of a data point for which x =
5 and y = 80.
Hours, x
Scores, y
3
65
5
80
2
60
8
88
2
66
4
78
4
85
5
90
6
90
3
71
A) -1.33 B) 161.33 C) 81.33 D) -454.63
51) The regression line for the given data is y^ = 0.449x - 30.27. Determine the residual of a data point for which x =
98 and y = 15.
Temperature, x
Number of absences, y
72
3
85
7
91
10
90
10
88
8
98
15
75
4
100
15
80
5
A) 1.268 B) 28.732 C) 13.732 D) 121.535
52) The regression line for the given data is y^ = 1.488x + 60.46. Determine the residual of a data point for which x =
45 and y = 123.
Age, x
Pressure, y
38
116
41
120
45
123
48
131
51
142
53
145
57
148
61
150
65
152
A) -4.42 B) 250.42 C) 127.42 D) -198.484
53) The regression line for the given data is y^ = -2.75x + 96.14. Determine the residual of a data point for which x =
9 and y = 71.
Number of absences, x
Final grade, y
0
98
3
86
6
80
4
82
9
71
2
92
15
55
8
76
5
82
A) -0.39 B) 142.39 C) 71.39 D) 108.11
54) The regression line for the given data is y^ = 3.53x + 37.92. Determine the residual of a data point for which x =
8 and y = 65.
Years employed, x
Sales, y
2
31
3
33
10
78
7
62
8
65
15
61
3
48
1
55
11
120
A) -1.16 B) 131.16 C) 66.16 D) -259.37
Page 32
55) The regression line for the given data is y^ = 6.91x + 46.26. Determine the residual of a data point for which x = 3
and y = 72.
Number of years, x
Grades on test, y
3
61
4
68
4
75
5
82
3
73
6
90
2
58
7
93
3
72
A) 5.01 B) 138.99 C) 66.99 D) -540.78
56) The regression line for the given data is y^ = 4.379x + 4.267. Determine the residual of a data point for which x =
16 and y = 78.8.
Rainfall (in inches), x
Yield (bushels per acre), y
10.5
50.5
8.8
46.2
13.4
58.8
12.5
59.0
18.8
82.4
10.3
49.2
7.0
31.9
15.6
76.0
16.0
78.8
A) 4.469 B) 153.131 C) 74.331 D) -333.3322
57) Compute the sum of the squared residuals of the least-squares line for the given data.
x
y
-5
-10
-3
-8
4
9
1
1
-1
-2
-2
-6
0
-1
2
3
3
6
-4
-8
A) 7.624 B) 1.036 C) 2.097 D) 0
58) The data below are the final exam scores of 10 randomly selected statistics students and the number of hours
they slept the night before the exam. Compute the sum of the squared residuals of the least-squares line for the
given data.
Hours, x
Scores, y
3
65
5
80
2
60
8
88
2
66
4
78
4
85
5
90
6
90
3
71
A) 318.038 B) 804.062 C) 1122.1 D) 39.755
59) In an area of the Great Plains, records were kept on the relationship between the rainfall (in inches) and the
yield of wheat (bushels per acre). Compute the sum of the squared residuals of the least-squares line for the
given data.
Rain fall (in inches), x
Yield (bushels per acre), y
10.5
50.5
8.8
46.2
13.4
58.8
12.5
59.0
18.8
82.4
10.3
49.2
7.0
31.9
15.6
76.0
16.0
78.8
A) 87.192 B) 2207.628 C) 4.379 D) 0
60) In a study of feeding behavior, zoologists recorded the number of grunts of a warthog feeding by a lake in a 15
minute time period following the addition of food. The data showing the weekly number of grunts and the age
of the warthog (in days) are listed below. Compute the sum of the squared residuals of the least squared line
for the given data.
Week Number of
Grunts
Age (days)
1 90 125
2 68 141
3 39 155
4 44 160
5 63 167
6 40 174
7 62 183
8 17 189
9 20 195
A) 5533.53 B) 188.84 C) 74.39 D) 13.74
Page 33
61) The data below are the ages and systolic blood pressure (measured in Millimeters of mercury) of 9 randomly
selected adults.
Age, x Pressure, y
38 116
41 12.
45 123
48 131
51 142
53 145
57 148
61 150
65 152
A) 123.63 B) 1.41 C) 1.99 D) 11.11
62) A calculus instructor is interested the performance of his students from Calculus I that go on to Calculus II.
Their final grades in each course (in percent) are given below. Compute the sum of the squared residuals of the
least squared line for the given data.
Calculus I 88 78 62 75 95 91 83 86 98
Calculus II 81 80 55 78 90 90 81 80 100
A) 130.14 B) 30.85 C) 11.41 D) 1075.9
4.3 Diagnostics on the Least-Squares Regression Line
1 Compute and interpret the coefficient of determination.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Choose the coefficient of determination that matches the scatterplot. Assume that the scales on the horizontal and
vertical axes are the same.
1) Response
Explanatory
A) R2 = 0.77 B) R2 = 0.38 C) R2 = 0.96 D) R2 = 0.51
Page 34
2) Response
Explanatory
A) R2 = 0.43 B) R2 = -0.43 C) R2 = 0.82 D) R2 = 0.12
3) Response
Explanatory
A) R2 = 0.097 B) R2 = -0.31 C) R2 = 0.76 D) R2 = 0.41
Use the linear correlation coefficient given to determine the coefficient of determination, R2.
4) r = 0.16
A) R2 = 2.56% B) R2 = 40.00% C) R2 = 4.00% D) R2 = 0.256%
5) r = -0.71
A) R2 = 50.41% B) R2 = 84.26% C) R2 = -50.41% D) R2 = -84.26%
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
Provide an appropriate response.
6) Calculate the coefficient of determination to the nearest thousandth, given that the linear correlation coefficient,
r, is 0.837. What does this tell you about the explained variation and the unexplained variation of the data
about the regression line?
7) Calculate the coefficient of determination to the nearest thousandth, given that the linear correlation coefficient,
r, is -0.625. What does this tell you about the explained variation and the unexplained variation of the data
about the regression line?
8) Calculate the coefficient of determination, given that the linear correlation coefficient, r, is 1. What does this tell
you about the explained variation and the unexplained variation of the data about the regression line?
Page 35
9) In a study of feeding behavior, zoologists recorded the number of grunts of a warthog feeding by a lake in the
15 minute period following the addition of food. The data showing the weekly number of grunts and the age of
the warthog (in days) are listed below. Find and interpret the value of R2. Round R2 to the nearest thousandth.
Number of Grunts Age (days)
81 116
59 132
30 146
35 151
54 158
31 165
53 174
8 180
11 186
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
10) In a comprehensive road test on all new car models, one variable measured is the time it takes a car to
accelerate from 0 to 60 miles per hour. To model acceleration time, a regression analysis is conducted on a
random sample of 129 new cars.
TIME60: y = Elapsed time (in seconds) from 0 mph to 60 mph
MAX: x1 = Maximum speed attained (miles per hour)
Initially, the simple linear model E(y) = β0 + β1x1 was fit to the data. Computer printouts for the analysis are
given below:
UNWEIGHTED LEAST SQUARES LINEAR REGRESSION OF TIME60
PREDICTOR
VARIABLES COEFFICIENT STD ERROR STUDENTʹS T P
CONSTANT 18.7171 0.63708 29.38 0.0000
MAX -0.08365 0.00491 -17.05 0.0000
R-SQUARED 0.6960 RESID. MEAN SQUARE (MSE) 1.28695
ADJUSTED R-SQUARED 0.6937 STANDARD DEVIATION 1.13444
SOURCE DF SS MS F P
REGRESSION 1 374.285 374.285 290.83 0.0000
RESIDUAL 127 163.443 1.28695
TOTAL 128 537.728
CASES INCLUDED 129 MISSING CASES 0
Approximately what percentage, rounded to the nearest whole percent, of the sample variation in acceleration
time can be explained by the simple linear model?
A) 70% B) 0% C) -17% D) 8%
Page 36
11) A manufacturer of boiler drums wants to use regression to predict the number of man-hours needed to erect
drums in the future. The manufacturer collected a random sample of 35 boilers and measured the following
two variables:
MANHRS: y = Number of man-hours required to erect the drum
PRESSURE: x1= Boiler design pressure (pounds per square inch, i.e., psi)
Initially, the simple linear model E(y) = β0 + β1x1 was fit to the data. A printout for the analysis appears below:
UNWEIGHTED LEAST SQUARES LINEAR REGRESSION OF MANHRS
PREDICTOR
VARIABLES COEFFICIENT STD ERROR STUDENTʹS T P
CONSTANT 1.88059 0.58380 3.22 0.0028
PRESSURE 0.00321 0.00163 2.17 0.0300
R-SQUARED 0.4342 RESID. MEAN SQUARE (MSE) 4.25460
ADJUSTED R-SQUARED 0.4176 STANDARD DEVIATION 2.06267
SOURCE DF SS MS F P
REGRESSION 1 111.008 111.008 5.19 0.0300
RESIDUAL 34 144.656 4.25160
TOTAL 35 255.665
Give a practical interpretation of the coefficient of determination, R2. Express R2 to the nearest whole percent.
A) About 43% of the sample variation in number of man-hours can be explained by the simple linear model.
B) y^ = 1.88 + 0.00321x will be correct 43% of the time.
C) Man hours needed to erect drums will be associated with boiler design pressure 43% of the time.
D) About 2.06% of the sample variation in number of man-hours can be explained by the simple linear
model.
Page 37
12) Civil engineers often use the straight-line equation, E(y) = β0 + β1x, to model the relationship between the
mean shear strength E(y) of masonry joints and precompression stress, x. To test this theory, a series of stress
tests were performed on solid bricks arranged in triplets and joined with mortar. The precompression stress
was varied for each triplet and the ultimate shear load just before failure (called the shear strength) was
recorded. The stress results for n = 7 triplet tests is shown in the accompanying table followed by a SAS
printout of the regression analysis.
Triplet Test 1 2 3 4 5 6 7
Shear Strength (tons), y 1.00 2.18 2.24 2.41 2.59 2.82 3.06
Precomp. Stress (tons), x 0 0.60 1.20 1.33 1.43 1.75 1.75
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob > F
Model 1 2.39555 2.39555 47.732 0.0010
Error 5 0.25094 0.05019
C Total 6 2.64649
Root MSE 0.22403 R-square 0.9052
Dep Mean 2.32857 Adj R-sq 0.8862
C.V. 9.62073
Parameter Estimates
Parameter Standard T for HO:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 1.191930 0.18503093 6.442 0.0013
X 1 0.987157 0.14288331 6.909 0.0010
Give a practical interpretation of R2, the coefficient of determination for the least squares model. Express R2 to
the nearest whole percent.
A) About 91% of the total variation in the sample of y-values can be explained by (or attributed to) the linear
relationship between shear strength and precompression stress.
B) In repeated sampling, approximately 91% of all similarly constructed regression lines will accurately
predict shear strength.
C) We expect to predict the shear strength of a triplet test to within about .91 ton of its true value.
D) We expect about 91% of the observed shear strength values to lie on the least squares line.
13) The dean of the Business School at a small Florida college wishes to determine whether the grade -point
average (GPA) of a graduating student can be used to predict the graduateʹs starting salary. More specifically,
the dean wants to know whether higher GPAʹs lead to higher starting salaries. Records for 23 of last yearʹs
Business School graduates are selected at random, and data on GPA (x) and starting salary (y, in $thousands)
for each graduate were used to fit the model, E(y) = β0 + β1x. The results of the simple linear regression are
provided below.
y^ = 4.25 + 2.75x, SSxy = 5.15, SSxx = 1.87
SSyy = 15.17, SSE = 1.0075
Range of the x-values: 2.23 - 3.85
Range of the y-values: 9.3 - 15.6
Calculate the value of R2, the coefficient of determination.
A) 0.934 B) 0.661 C) 0.872 D) 0.339
Page 38
14) Each year a nationally recognized publication conducts its ʺSurvey of Americaʹs Best Graduate and
Professional Schools.ʺ An academic advisor wants to predict the typical starting salary of a graduate at a top
business school using GMAT score of the school as a predictor variable. A simple linear regression of SALARY
versus GMAT using 25 data points shown below.
b0 = -92040 b1 = 228 s = 3213 R2 = 0.66 r = 0.81 df = 23 t = 6.67
Give a practical interpretation of R2 = 0.66.
A) 66% of the sample variation in SALARY can be explained by using GMAT in a straight -line model.
B) 66% of the differences in SALARY are caused by differences in GMAT scores.
C) We estimate SALARY to increase $.66 for every 1-point increase in GMAT.
D) We can predict SALARY correctly 66% of the time using GMAT in a straight-line model.
15) A real estate magazine reported the results of a regression analysis designed to predict the price (y), measured
in dollars, of residential properties recently sold in a northern Virginia subdivision. One independent variable
used to predict sale price is GLA, gross living area (x), measured in square feet. Data for 157 properties were
used to fit the model, E(y) = β0 + β1x. The results of the simple linear regression are provided below.
y = 96,600 + 22.5x s = 6500 R2 = 0.77 t = 6.1 (for testing β1)
Interpret the value of the coefficient of determination, R2.
A) 77% of the total variation in the sample sale prices can be attributed to the linear relationship between
GLA (x) and (y).
B) GLA (x) is linearly related to sale price (y) 77% of the time.
C) 77% of the observed sale prices (yʹs) will fall within 2 standard deviations of the least squares line.
D) There is a moderately strong positive correlation between sale price (y) and GLA (x).
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
16) A company keeps extensive records on its new salespeople on the premise that sales should increase with
experience. A random sample of seven new salespeople produced the data on experience and sales shown in
the table.
Months on Job Monthly Sales
y ($ thousands)
2 2.4
4 7.0
8 11.3
12 15.0
1 0.8
5 3.7
9 12.0
Summary statistics yield SSxx = 94.8571, SSxy = 124.7571, SSyy = 176.5171, x = 5.8571, and y = 7.4571. Find and
interpret the coefficient of determination. Round R2 to the nearest hundredth of a percent.
17) To investigate the relationship between yield of potatoes, y, and level of fertilizer application, x, an
experimenter divides a field into eight plots of equal size and applies differing amounts of fertilizer to each.
The yield of potatoes (in pounds) and the fertilizer application (in pounds) are recorded for each plot. The data
are as follows:
x 1 1.5 2 2.5 3 3.5 4 4.5
y 25 31 27 28 36 35 32 34
Summary statistics yield SSxx = 10.5, SSyy = 112, and SSxy = 25. Calculate the coefficient of determination
rounded to the nearest ten-thousandth.
Page 39
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
18) The coefficient of correlation between x and y is r = 0.59. Calculate the coefficient of determination R2. Round
R2 to the nearest hundredth.
A) 0.35 B) 0.59 C) 0.41 D) 0.65
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
19) The coefficient of determination for a straight-line model relating selling price y to manufacturing cost x for a
particular item is R2 = 0.83. Interpret this value.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
20) The measures the percentage of total variation in the response variable that is explained
by the least squares regression line.
A) coefficient of determination B) coefficient of linear correlation
C) sum of the residuals squared D) slope of the regression line
21) If the coefficient of determination is close to 1, then
A) the least squares regression line equation explains most of the variation in the response variable.
B) the least squares regression line equation has no explanatory value.
C) the sum of the square residuals is large compared to the total variation.
D) the linear correlation coefficient is close to zero.
22) The coefficient of determination is the of the linear correlation coefficient.
A) square B) square root C) opposite D) reciprocal
2 Perform residual analysis on a regression model.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Analyze the residual lot below. Does it violate any of the conditions for an adequate linear model?
23)
A) No, the plot of residuals is random.
B) Yes, there is a discernable pattern in the residuals.
C) Yes, the residuals do not display constant error variance.
Page 40
24)
A) Yes, the residuals do not display constant error variance.
B) Yes, there is a discernable pattern in the residuals.
C) No, the plot of residuals is random.
25)
A) Yes, there is a discernable pattern in the residuals.
B) Yes, the residuals do not display constant error variance.
C) No, the plot of residuals is random.
Provide an appropriate response.
26) True or False: Residual analysis cannot be used to check for outliers.
A) False B) True
27) True or False: If a residual plot shows an almost straight line then a linear model is appropriate.
A) False B) True
28) To determine if there are outliers in a least squares regression modelʹs data set, we could construct a boxplot of
the
A) residuals. B) response variables.
C) predictor variables. D) lurking variables.
Page 41
3 Identify influential observations.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
A scatter diagram is given with one of the points labeled ʺA.ʺ In addition, there are two least -squares regression lines
drawn. The solid line excludes the point A. The dashed line includes the point A. Based on the graph, is the point A
influential?
29)
x1 2 3 4 5 6 7 8 9 10
y10
9
8
7
6
5
4
3
2
1
A
x1 2 3 4 5 6 7 8 9 10
y10
9
8
7
6
5
4
3
2
1
A
A) yes B) no
30)
x1 2 3 4 5 6 7 8 9 10
y10
9
8
7
6
5
4
3
2
1
A
x1 2 3 4 5 6 7 8 9 10
y10
9
8
7
6
5
4
3
2
1
A
A) no B) yes
Provide an appropriate response.
31) An influential observation is an observation that significantly affects the value of the
A) the slope of the least squares regression line. B) the mean of the response variable.
C) the median of the predictor variable. D) the median of the response variable.
32) What effect will an influential observation have upon the graph of the least squares regression line?
A) It will pull the graph toward the observation.
B) It will have no effect.
C) It will push the graph away from the observation.
D) It will lower the value of the correlation coefficient to make further analysis meaningless.
Page 42
4.4 Contingency Tables and Association
1 Compute the marginal distribution of a variable.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Provide an appropriate response.
1) The following data represent the living situation of newlyweds in a large metropolitan area and their annual
household income. Find the marginal frequency for newlyweds who own their own home.
<$20,000 $20-35,000 $35-50,000 $50-75,000 >$75,000
Own home 31 52 202 355 524
Rent home 67 66 52 23 11
Live w/family 89 69 30 4 2
A) 1164 B) 31 C) 524 D) 202
2) The following data represent the living situation of newlyweds in a large metropolitan area and their annual
household income. Find the marginal frequency for newlyweds who make between $35,000 and $50,000 per
year.
<$20,000 $20-35,000 $35-50,000 $50-75,000 >$75,000
Own home 31 52 202 355 524
Rent home 67 66 52 23 11
Live w/family 89 69 30 4 2
A) 284 B) 30 C) 202 D) 52
3) The following data represent the living situation of newlyweds in a large metropolitan area and their annual
household income. What percent of people who make between $35,000 and $50,000 per year own their own
home? Round to the nearest tenth of a percent.
<$20,000 $20-35,000 $35-50,000 $50-75,000 >$75,000
Own home 31 52 202 355 524
Rent home 67 66 52 23 11
Live w/family 89 69 30 4 2
A) 71.1% B) 18.3% C) 10.6% D) 17.4%
4) The following data represent the living situation of newlyweds in a large metropolitan area and their annual
household income. What percent of people who own their own home make between $35,000 and $50,000 per
year? Round to the nearest tenth of a percent.
<$20,000 $20-35,000 $35-50,000 $50-75,000 >$75,000
Own home 31 52 202 355 524
Rent home 67 66 52 23 11
Live w/family 89 69 30 4 2
A) 17.4% B) 4.5% C) 30.5% D) 71.1%
Page 43
5) Construct a frequency marginal distribution for the given contingency table.
x1 x2 x3
y1 25 40 40
y2 75 50 55
A)
x1 x2 x3 Marginal Distribution
y1 25 40 40 105
y2 75 50 55 180
Marginal Distribution 100 90 95 285
B)
x1 x2 x3 Marginal Distribution
y1 25 40 40 105
y2 75 50 55 180
Marginal Distribution 100 90 95 570
C)
x1 x2 x3 Marginal Distribution
y1 25 40 40 105
y2 75 50 55 180
Marginal Distribution 50 10 15 570
D)
x1 x2 x3 Marginal Distribution
y1 25 40 40 100
y2 75 50 55 90
Marginal Distribution 105 180 95 570
Page 44
6) Construct a relative frequency marginal distribution for the given contingency table. Round valuese to the
nearest thousandth.
x1 x2 x3
y1 20 25 10
y2 40 35 35
A)
x1 x2 x3
Relative Frequency
Marginal Distribution
y1 20 25 10 0.333
y2 40 35 35 0.667
Relative Frequency
Marginal Distribution 0.364 0.364 0.273 1
B)
x1 x2 x3
Relative Frequency
Marginal Distribution
y1 20 25 10 0.333
y2 40 35 35 0.667
Relative Frequency
Marginal Distribution 0.121 0.061 0.152 1
C)
x1 x2 x3
Relative Frequency
Marginal Distribution
y1 20 25 10 0.55
y2 40 35 35 1.10
Relative Frequency
Marginal Distribution 0.60 0.60 0.45 1
D)
Relative Frequency
Marginal Distribution x1 x2 x3
y1 0.121 0.152 0.061
y2 0.242 0.212 0.212
Page 45
7) Construct a conditional distribution by x for the given contingency table. Round valuese to the nearest
thousandth.
x1 x2 x3
y1 30 40 20
y2 50 65 65
A)
x1 x2 x3
y1 0.375 0.381 0.235
y2 0.625 0.619 0.765
Total 1 1 1
B)
x1 x2 x3
y1 0.375 0.381 0.235
y2 0.278 0.361 0.361
Total 1 1 1
C)
x1 x2 x3
y1 0.111 0.148 0.074
y2 0.185 0.241 0.241
D)
x1 x2 x3 Total
y1 0.333 0.444 0.222 1
y2 0.278 0.361 0.361 1
8) A contingency table relates
A) two categories of data.
B) the difference in the means of two random variables.
C) a particular response with order in which that response should be applied.
D) only continuous random variables.
9) To eliminate the effects of either the row or the column variables in a contingency table, a
distribution is created.
A) marginal B) normalized C) χ2 D) Studentʹs t
2 Use the conditional distribution to identify association among categorical data.
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
Provide an appropriate response.
10) The data below show the age and favorite type of music of 779 randomly selected people. Use α = 0.05. What, if
any, association exists between favorite music and age? Discuss the association.
Age Country Rock Pop Classical
15 - 21 21 45 90 33
21 - 30 60 55 42 48
30 - 40 65 47 31 57
40 - 50 68 39 25 53
11) The following data represent the living situation of newlyweds in a large metropolitan area and their annual
household income. What, if any, association exists between living situation and household income? Discuss the
association.
< $20,000 $20-35,000 $35-50,000 $50-75,000 > $75,000
Own home 31 52 202 355 524
Rent home 67 66 52 23 11
Live w/family 89 69 30 4 2
Page 46
3 Explain Simpsonʹs Paradox.
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
Provide an appropriate response.
12) Researchers conducted a study to determine which of two different treatments, A or B, is more effective in the
treatment of atherosclerosis. The results of their experiment are given in the table. (a) Which treatment appears
to be more effective? Why?
Treatment A Treatment B
Effective 420 435
Not effective 130 140
The data in the table do not take into account the seriousness of the case. The data shown in the next table
show the effectiveness of each treatment for both mild and advanced cases of atherosclerosis.
Mild Advanced
atherosclerosis atherosclerosis
Treatment A Treatment B Treatment A Treatment B
Effective 310 95 110 340
Not effective 80 20 50 120
(b) Determine the proportion of mild cases of atherosclerosis that were effectively dealt with using treatment A.
Determine the proportion of mild cases of atherosclerosis that were effectively dealt with using treatment B.
(c) Repeat part (b) for advanced cases of atherosclerosis to create a conditional distribution of effectiveness by
treatment for each category of the disease.
(d) Write a short report detailing and explaining your findings.
13) A company encourages applications from minority groups who they feel are under-represented in the
company. The table shows the number of applications that were accepted last year from people belonging to
minority groups and the number of applications that were accepted from people not belonging to minority
groups. Only applications from well qualified applicants are included in the analysis. (a) Does the acceptance
rate appear to be higher for those belonging to minority groups or for those not belonging to minority groups ?
Why?
Minority Not minority
Accepted 70 79
Rejected 460 500
The data in the table do not take into account the department of the company. The data shown in the next table
show the number of applications accepted from each group within each department.
Department A Department B Department C
Minority Not minority Minority Not minority Minority Not minority
Accepted 27 10 22 34 21 35
Rejected 260 110 80 150 120 240
(b) Determine the proportion of minority applications that were accepted within department A. Determine the
proportion of non-minority applications that were accepted within department A.
(c) Repeat part (b) for departments B and C to create a conditional distribution of acceptance rate by group for
each department of the company.
(d) Write a short report detailing and explaining your findings.
Page 47
4.5 Nonlinear Regression: Transformations (online)
1 Convert between exponential and logarithmic expressions.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Change the exponential expression to an equivalent expression involving a logarithm.
1) 23 = 8
A) log28 = 3 B) log82 = 3 C) log38 = 2 D) log23 = 8
2) 63 = x
A) log6x = 3 B) logx6 = 3 C) log3x = 6 D) log63 = x
3) 83 = y
A) log8y = 3 B) log3y = 8 C) logy8 = 3 D) logy3 = 8
4) 6x = 216
A) log6216 = x B) logx216 = 6 C) log2166 = x D) log216x = 6
5) 3x = 311
A) log3311 = x B) log3113 = x C) log311x = 3 D) log3x = 311
Change the logarithmic expression to an equivalent expression involving an exponent.
6) log5125 = 3
A) 53 = 125 B) 35 = 125 C) 5125 = 3 D) 1253 = 5
7) log5x = 3
A) 53 = x B) 35 = x C) 5x = 3 D) x3 = 5
8) logb256 = 4
A) b4 = 256 B) 4b = 256 C) 2564 = b D) 256b = 4
9) log381 = x
A) 3x = 81 B) x3 = 81 C) 81x = 3 D) 813 = x
2 Simplify logarithmic expressions.
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Write the expression as a sum of logs. Express powers as factors.
10) log6 xy
A) log6 x + log6 y B) log6 x - log6 y C) log3 x + log3 y D) log3 x - log3 y
11) log7x5
A) 5log7x B) 7log5x C) 7 log x D) 5log7x5
Page 48
12) logbyz8
A) logby + 8logbz B) 8logby + 8logbz C) 8logbyz D) logby + logb8z
13) logby9z3
A) 9logby + 3logbz B) 27logb yz C) logb (yz)27 D) logb 27yz
Use a calculator to evaluate the expression. Round your answer to three decimal places.
14) log 100
A) 2.000 B) 4.605 C) 2.004 D) 1.996
15) log 3.21
A) 0.507 B) 1.166 C) 0.520 D) 0.493
16) 101.7
A) 50.119 B) 63.096 C) 5.474 D) 79.433
17) 100.7029
A) 5.045 B) 0.198 C) 0.495 D) 1.176
3 Use logarithmic transformations to linearize exponential relations.
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
Provide an appropriate response.
18) The following data represent the bacteria population in a laboratory experiment. The researchers suspect that
the population is growing exponentially. Determine the logarithm of the y-values so that Y = log y.
Day, x Population, y
0 1762
1 4803
2 13,006
3 35,456
4 96,445
5 262,326
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
19) The following data represent the bacteria population in a laboratory experiment. The researchers suspect that
the population is growing exponentially. Find the least-squares regression line of the transformed data by
determining the logarithm of the y-values so that Y = log y.
Day, x Population, y
0 1952
1 5319
2 14410
3 39279
4 106845
5 290613
A) Y^ = 3.291 + 0.435x B) Y
^= 3.458 + 1.151x
C) Y^ = -7.537 + 2.301x D) Y
^ = -50,222.143 + 50,650.057x
Page 49
4 Use logarithmic transformations to linearize power relations.
SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.
Provide an appropriate response.
20) The following data represent the periods (in seconds) of simple pendulums of various lengths (in feet).
Determine the logarithm of both the x- and y-values so that X = log x and Y = log y.
Length, x Period, y
1 1.1
2 1.6
3 1.9
4 2.2
5 2.5
6 2.7
MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
21) The following data represent the periods (in seconds) of simple pendulums of various lengths (in feet). Find the
power equation of best fit.
Length, x Period, y
1 1.3
2 1.9
3 2.3
4 2.7
5 3.0
6 3.3
A) y^ = 0.1171 + 0.5176x B) y
^ = -0.2258 + 1.9309x
C) y^ = 0.0948 + 0.0768x D) y
^ = 1.0467 + 0.3914x
22) The following data represent the infection rate for a particular disease in a third world country x years after a
vaccine became widely available. Determine the ʺbestʺ model to describe the relationship between years past
and infection rate.
Years After Vaccine, x Infection Rate (%), y
1 32
2 11
3 5
4 2
5 0.5
A) exponential model: y^= 1.955 - 0.435x B) power model: y
^= 1.648 - 2.402x
C) linear model: y^= 31.7 - 7.2x D) linear model: y
^= 31.7 + 7.2x
23) The following data represent the compound yield in grams for a chemical reaction for various temperatures of
the reaction. Determine the ʺbestʺ model to describe the relationship between temperature and compound
yield.
Temperature (° C), x Compound Yield (grams), y
50 4
60 12
70 14
80 21
90 26
A) linear model: y^ = -21.7 + 0.53x B) exponential model: y
^= -0.195 + 0.019x
C) power model: y^= -4.392 + 2.999x D) power model: y
^= 4.392 + 2.999x
Page 50
24) The following data represent the height (relative to the ground) of a projectile shot into the air after x seconds
of travel. Determine the ʺbestʺ model to describe the relationship between travel time and height.
Travel Time (seconds), x Height (feet), y
8 890
9 869
10 798
11 708
12 571
A) power model: y^ = 3.930 - 1.056x B) exponential model: y
^ = 3.354 - 0.047x
C) linear model: y^ = 1566.2 - 79.9x D) exponential model: y
^ = 3.354 + 0.047x
Page 51
Ch. 4 Describing the Relation between Two VariablesAnswer Key
4.1 Scatter Diagrams and Correlation1 Draw and interpret scatter diagrams.
1)
Exam
Scores
x1 2 3 4 5 6 7 8
y
95
90
85
80
75
70
65
60
x1 2 3 4 5 6 7 8
y
95
90
85
80
75
70
65
60
Hours Studied
2)
Number
of
Absences
x70 75 80 85 90 95 100
y
16
14
12
10
8
6
4
2
x70 75 80 85 90 95 100
y
16
14
12
10
8
6
4
2
Temperature
3)
Blood
Pressure
(mm of mercury)
x35 40 45 50 55 60 65
y155
150
145
140
135
130
125
120
115
x35 40 45 50 55 60 65
y155
150
145
140
135
130
125
120
115
Age
Page 52
4)
Final
Grade
x2 4 6 8 10 12 14 16
y
100
90
80
70
60
50
x2 4 6 8 10 12 14 16
y
100
90
80
70
60
50
Number of Absences
5)
Sales
(in thousands)
x2 4 6 8 10 12 14 16
y120
110
100
90
80
70
60
50
40
30
x2 4 6 8 10 12 14 16
y120
110
100
90
80
70
60
50
40
30
Miles traveled (in hundreds)
6)
Final
Grade
x1 2 3 4 5 6 7
y
100
90
80
70
60
50
x1 2 3 4 5 6 7
y
100
90
80
70
60
50
Number of years studied
Page 53
7)
Yield
(bushels)
x6 8 10 12 14 16 18 20
y
90
80
70
60
50
40
30
x6 8 10 12 14 16 18 20
y
90
80
70
60
50
40
30
Rainfall (inches)
8)
Nicotine
(mg)
x6 9 12 15 18
y
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
x6 9 12 15 18
y
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
Tar (mg)
9)
Round 2
x75 80 85 90 95
y
95
90
85
80
75
x75 80 85 90 95
y
95
90
85
80
75
Round 1
10) A
11) A
12) A
13) predictor variable: rainfall in inches; response variable: yield per acre
14) predictor variable: hours studying; response variable: grades on the test
15) A
2 Describe the properties of the linear correlation coefficient.
16) A
Page 54
17) A
18) A
19) A
20) A
21) A
22) A
23) A
24) There appears to be a positive linear correlation.
x-5 -4 -3 -2 -1 1 2 3 4 5
y
10
8
6
4
2
-2
-4
-6
-8
-10
x-5 -4 -3 -2 -1 1 2 3 4 5
y
10
8
6
4
2
-2
-4
-6
-8
-10
25) There appears to be a negative linear correlation.
x-6 -5 -4 -3 -2 -1 1 2 3 4 5 6
y12
10
8
6
4
2
-2
-4
-6
-8
-10
-12
x-6 -5 -4 -3 -2 -1 1 2 3 4 5 6
y12
10
8
6
4
2
-2
-4
-6
-8
-10
-12
26) There appears to be no linear correlation.
x-6 -5 -4 -3 -2 -1 1 2 3 4 5 6
y12
10
8
6
4
2
-2
-4
-6
-8
-10
-12
x-6 -5 -4 -3 -2 -1 1 2 3 4 5 6
y12
10
8
6
4
2
-2
-4
-6
-8
-10
-12
Page 55
27) In general, there appears to be a relationship between the home runs and batting averages. As the number of home
runs increased, the batting averages increased.
Batting
Average
x15 30 45 60 75
y
0.35
0.3
0.25
0.2
0.15
x15 30 45 60 75
y
0.35
0.3
0.25
0.2
0.15
Home Runs
28) There appears to be a trend in the data. As the number of absences increases, the final grade decreases.
Final
Grade
(%)
x5 10 15
y100
90
80
70
60
50
40
x5 10 15
y100
90
80
70
60
50
40
Number of Absences
29) A
30) A
31) A
32) A
33) A
34) A
35) A
3 Compute and interpret the linear correlation coefficient.
36) A
37) A
38) A
39) A
40) A
41) A
42) A
43) A
44) A
45) A
46) The linear correlation coefficient remains unchanged.
4 Determine whether a linear relation exists between two variables.
47) A
48) A
49) A
Page 56
50) A
51) A
52) A
53) A
54) A
55) A
56) A
5 Explain the difference between correlation and causation.
57) A
58) A positive correlation exists between the number of schools in a city and the number of robberies but this is an
example of correlation not causation. Building more schools is unlikely to lead to an increase in robberies. A likely
lurking variable is the population of the city and this lurking variable accounts for the positive correlation. Larger
cities tend to have both more schools and more robberies.
59) A negative correlation exists between the infant mortality rate and the number of cars per capita but this is an example
of correlation not causation. If people buy more cars, this is unlikely to lead to a decrease in the infant mortality rate.
A likely lurking variable is wealth and this lurking variable accounts for the negative correlation. More affluent
countries tend to have both more cars per capita and lower infant mortality rates.
60) A positive correlation exists between income and blood pressure but this is an example of correlation not causation.
An increase in salary is unlikely to lead to an increase in blood pressure. Age and level of job stress are possible
lurking variables and these lurking variables account for the positive correlation. Older men tend to have both higher
blood pressures and higher incomes. Also men in high stress jobs tend to have both higher blood pressures and higher
incomes.
4.2 Least-Squares Regression1 Find the least-squares regression line and use the line to make predictions.
1) A
2) A
3) A
4) A
5) A
6) A
7) A
8) A
9) A
10) A
11) A
12) A
13) A
14) The regression lines are not necessarily the same.
Page 57
15) SSxx = x2∑ - x∑ 2
n = 0.948622 -
(3.642)2
14 = 0.00118171
SSxy = xy∑ - x∑ y∑n
= 295.54 - (3.642)(1,134)
14 = 0.538
y = y∑n =
1,134
14 = 81
x = x∑n =
3.642
14 = 0.26014
β^1 =
SSxy
SSxx =
0.538
0.00118171 = 455.27
β0^ = y - β
^1x = 81 - 455.27(0.26014) = -37.434
The least squares equation is y^= -37.434 + 455.27x.
16) β^1 =
SSxy
SSxx =
3031.7125
1222.2771 = 2.4804
β0^ = y - β
^1x = 95.0625 - 2.4804(21.2675) = 42.3106
The least squares prediction equation is y^ = 42.3106 + 2.4804x.
17) A
18) A
19) A
2 Interpret the slope and the y-intercept of the least-squares regression line.
20) A
21) A
22) A
23) A
24) A
25) A
26) A
27) A
28) A
29) A
30) b1 = -0.08365. For every 1 mile per hour increase in the maximum attained speed of a new car, we estimate the
elapsed 0 to 60 acceleration time to decrease by .08365 seconds.
31) a. E(y) = β0 + β1x
b. y^= β
^0 + β
^1x = 170.24 - 0.8195x
c. We would expect approximately 170 grunts after feeding a warthog that was just born. However, since the value 0
in outside the range of the original data set, this estimate is highly unreliable.
d. For each additional day, we estimate the number of grunts will decrease by 0.8195.
32) A
33) A
Page 58
34) A
35) A
36) A
37) A
38) A
39) A
40) A
41) A
42) A
43) a)
Calculus II
x50 60 70 80 90 100
y
100
90
80
70
60
50
x50 60 70 80 90 100
y
100
90
80
70
60
50
Calculus I
b) y^= 1.044x - 5.990
c) When x = 80, y^ = 78.
44) A
45) A
46) A
3 Compute the sum of squared residuals.
47) A
48) A
49) A
50) A
51) A
52) A
53) A
54) A
55) A
56) A
57) A
58) A
59) A
60) A
61) A
62) A
4.3 Diagnostics on the Least-Squares Regression Line1 Compute and interpret the coefficient of determination.
1) A
2) A
3) A
4) A
5) A
Page 59
6) The coefficient of determination, R2, = 0.701. That is, 70.1% of the variation is explained and 29.9% of the variation is
unexplained.
7) The coefficient of determination, R2, = 0.391. That is, 39.1% of the variation is explained and 60.9% of the variation is
unexplained.
8) The coefficient of determination, R2, = 1. That is, 100% of the variation is explained and there is no variation that is
unexplained.
9) R2 = 0.627; Approximately 62.7% of the variation in the number of grunts is explained by age.
10) A
11) A
12) A
13) A
14) A
15) A
16) R2 = 92.96% of the variation in the sample monthly sales values about their mean can be explained by using months
on the job in a linear model.
17) R2 = 0.5315
18) A
19) The model explains 83% of sample variation in cost.
20) A
21) A
22) A
2 Perform residual analysis on a regression model.
23) A
24) A
25) A
26) A
27) A
28) A
3 Identify influential observations.
29) A
30) A
31) A
32) A
4.4 Contingency Tables and Association1 Compute the marginal distribution of a variable.
1) A
2) A
3) A
4) A
5) A
6) A
7) A
8) A
9) A
2 Use the conditional distribution to identify association among categorical data.
10) The proportion who prefers country music increases as age increases. The proportion who prefers rock is roughly
constant as age increases. The proportion who prefers pop decreases as age increases. The proportion who prefers
classical increases as age increases.
11) The proportion of home owners increases as the household income increases. The proportions of renters and of those
living with family decrease as household income increases.
Page 60
3 Explain Simpsonʹs Paradox.
12) (a) Treatment A appears to be more effective:
treatment A was effective in 76.4% of cases
treatment B was effective in 75.7% of cases.
(b) Mild cases: treatment A was effective in 79.5% of cases
treatment B was effective in 82.6% of cases.
(c) Advanced cases: treatment A was effective in 68.8% of cases
treatment B was effective in 73.9% of cases.
(d) Within each category of disease, B has a higher rate of effectiveness and yet overall it has a lower rate of
effectiveness.
The initial analysis failed to take into account the lurking variable - seriousness of the case.
Treatment B is used more often in the more serious cases where success rates are lower for both methods.
13) (a) The acceptance rate appears to be higher for those not belonging to minority groups:
13.2% of applications from people belonging to minority groups were accepted
13.6% of applications from people not belonging to minority groups were accepted
(b) Department A: minority applications: 9.4% accepted
non-minority applications: 8.3% accepted
(c) Department B: minority applications: 21.6% accepted
non-minority applications: 18.5% accepted
Department C: minority applications: 14.9% accepted
non-minority applications: 12.7% accepted.
(d) Within each department, the rate of acceptance is higher for people belonging to minority groups and yet overall
the acceptance rate is higher for people not belonging to minority groups.
The initial analysis failed to take into account the lurking variable - the department of the company. In department A,
there are more applications from minorities and this department has the lowest acceptance rates. Department B has
the fewest applications from minorities and this department has the highest acceptance rates.
4.5 Nonlinear Regression: Transformations (online)1 Convert between exponential and logarithmic expressions.
1) A
2) A
3) A
4) A
5) A
6) A
7) A
8) A
9) A
2 Simplify logarithmic expressions.
10) A
11) A
12) A
13) A
14) A
15) A
16) A
17) A
Page 61
3 Use logarithmic transformations to linearize exponential relations.
18)
Day, x Population, y Log y
0 1762 3.246
1 4803 3.6815
2 13,006 4.1141
3 35,456 4.5497
4 96,445 4.9843
5 262,326 5.4188
19) A
4 Use logarithmic transformations to linearize power relations.
20)
Length, x Log x Period, y Log y
1 0 1.1 0.0414
2 0.3010 1.6 0.2041
3 0.4771 1.9 0.2788
4 0.6021 2.2 0.3424
5 0.6990 2.5 0.3979
6 0.7782 2.7 0.4314
21) A
22) A
23) A
24) A
Page 62