Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web...

35
Name _____________________________ Chapter 3 Learning Objectives Secti on Related Example on Page(s) Relevant Chapter Review Exercise( s) Can I do this? Identify explanatory and response variables in situations where one variable helps to explain or influences the other. 3.1 144 R3.4 Make a scatterplot to display the relationship between two quantitative variables. 3.1 145, 148 R3.4 Describe the direction, form, and strength of a relationship displayed in a scatterplot and recognize outliers in a scatterplot. 3.1 147, 148 R3.1 Interpret the correlation. 3.1 152 R3.3, R3.4 Understand the basic properties of correlation, including how the correlation is influenced by outliers. 3.1 152, 156, 157 R3.1, R3.2 Use technology to calculate correlation. 3.1 Activity on 152, 171 R3.4 Explain why association does not imply causation. 3.1 Discussio n on 156, 190 R3.6 Interpret the slope and y intercept of a least-squares regression line. 3.2 166 R3.2, R3.4 Use the least-squares regression line to predict y for a given x. Explain the dangers of extrapolation. 3.2 167, Discussio n on 168 R3.2, R3.4, R3.5 Calculate and interpret residuals. 3.2 169 R3.3, R3.4 Explain the concept of least squares. 3.2 Discussio n on 169 R3.5 Determine the equation of a least-squares regression line using technology or computer 3.2 Technolog y Corner on 171, R3.3, R3.4 1

Transcript of Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web...

Page 1: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

Name _____________________________

Chapter 3 Learning Objectives SectionRelated

Exampleon Page(s)

RelevantChapter Review

Exercise(s)

Can I do this?

Identify explanatory and response variables in situations where one variable helps to explain or influences the other.

3.1 144 R3.4

Make a scatterplot to display the relationship between two quantitative variables. 3.1 145, 148 R3.4

Describe the direction, form, and strength of a relationship displayed in a scatterplot and recognize outliers in a scatterplot.

3.1 147, 148 R3.1

Interpret the correlation. 3.1 152 R3.3, R3.4Understand the basic properties of correlation, including how the correlation is influenced by outliers.

3.1 152, 156, 157 R3.1, R3.2

Use technology to calculate correlation. 3.1 Activity on 152, 171 R3.4

Explain why association does not imply causation. 3.1 Discussion on

156, 190 R3.6

Interpret the slope and y intercept of a least-squares regression line. 3.2 166 R3.2, R3.4

Use the least-squares regression line to predict y for a given x. Explain the dangers of extrapolation.

3.2167,

Discussion on 168

R3.2, R3.4, R3.5

Calculate and interpret residuals. 3.2 169 R3.3, R3.4

Explain the concept of least squares. 3.2 Discussion on 169 R3.5

Determine the equation of a least-squares regression line using technology or computer output.

3.2Technology Corner on 171, 181

R3.3, R3.4

Construct and interpret residual plots to assess whether a linear model is appropriate. 3.2 Discussion on

175, 180 R3.3, R3.4

Interpret the standard deviation of the residuals and and use these values to assess how well the least-squares regression line models the relationship between two variables.

3.2 180 R3.3, R3.5

Describe how the slope, y intercept, standard deviation of the residuals, and are influenced by outliers.

3.2 Discussion on 188 R3.1

Find the slope and y intercept of the least-squares regression line from the means and standard deviations of x and y and their correlation.

3.2 183 R3.5

1

Page 2: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

Chapter 12 Learning Objectives SectionRelated

Exampleon Page(s)

RelevantChapter Review

Exercise(s)

Can I do this?

Use transformations involving powers and roots to find a power model that describes the relationship between two variables, and use the model to make predictions.

12.2 768, 770 R12.5

Use transformations involving logarithms to find a power model or an exponential model that describes the relationship between two variables, and use the model to make predictions.

12.2 772, 773, 776 R12.6

Determine which of several transformations does a better job of producing a linear relationship.

12.2 779 R12.6

2

Page 3: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

3.1 Describing Relationships

So far, we’ve mostly explored univariate data, examining one variable at a time. When we looked at two-way tables and mosaic plots, we were looking at bivariate data, examining two variables measured on each individual, often to look for a relationship between the two variables. In this next unit, we look at (mostly) quantitative bivariate data and look for ways to describe any relationship that we find, often by modeling the data with a linear equation, often called a “line of best fit”. The particular method we will use to find our line of best fit is called the “Least Squares” method and our linear equation will be called a “Least Squares Regression Line” or LSRL.

Read 143–144 What is the difference between an explanatory variable and a response variable?

Read 145–149 On scatterplots, the explanatory variable goes on the horizontal axis and we don’t necessarily start each axis at (0,0). Start each axis near the data.

What is the easiest way to lose points when making a scatterplot? (xkcd.com/833)

Track and Field Day! The table below shows data for 13 students in a statistics class. Each member of the class ran a 40-yard sprint and then did a long jump (with a running start). Make a scatterplot of the relationship between sprint time (in seconds) and long jump distance (in inches).

Sprint Time (s) 5.41

5.05 9.49

8.09 7.01

7.17 6.83

6.73 8.01

5.68 5.78

6.31 6.04

Long Jump Distance (in) 171 184 48 151 90 65 94 78 71 130 173 143 141

3

Page 4: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

Four characteristics you should consider when interpreting a scatterplot: “DUFS”AlwaysForm – linear or nonlinear (don’t let 1 or 2 points change your answer)

Direction – positive or negative (not absolute; some pair of points may not follow the trend)

Strength – very weak, weak, moderate, strong, very strong, perfect (how scattered?)

SometimesUnusual stuff – outliers (points far, vertically, from the LSRL)

- clusters (might have different direction than the whole data set)

Always in context: include variable names, not merely measurement units!

4

Page 5: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

The following scatterplot shows the lengths (in 1000s of feet) and the speeds (in miles/hour) for the 78 roller coasters I rode from Oct. 2007 to Aug. 2009. Describe the relationship between length and speed.

Caution! A strong association between two variables does not automatically indicate a cause-and-effect relationship! Correlation is NOT causation! Examples: $ of ice cream sales/week in South Carolina vs. # of shark attacks each week in South Carolina# of kids with cell phones vs. # kids who pass AP® Stats exammonthly sunblock sales vs. # of shark attacks monthlyIce cream sales vs. drowning deathshand span vs. vocabulary size

5

Page 6: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

Read 150: Using technology to create scatterplotsHW #26: page 159 (1, 5, 7, 9, 11, 13, 34)3.1 Correlation Just like two distributions can have the same shape and center with different spreads, two associations can have the same direction and form, but very different strengths. These are both linear and positive, but which one is stronger?

Read 150–151What is the correlation r? It is a numerical measure of direction and strength of linear patterns.

Linear? Use r. Not linear? Don’t use r.Draw an oval around the points. A near circle indicates nearly a 0 correlation (weak). Long skinny ovals indicate correlation closer to +1 or -1 (strong).

Correlation won't tell you the form of a relationship. Graph first, then you might be able to use correlation to describe strength and direction.Correlation, r, is NOT a resistant measure of strength; outliers will affect it.The formula for correlation is on the formula packet, so don't memorize it.

6

Page 7: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

Read 154–157 Read 154-155 Eleven important facts about correlation, r KNOW THESE!!!!1. Switching x & y won't change r.2. x and y must be quantitative (if either variable isn't quantitative, use "association").3. Units don't affect r & r has no units.4. Correlation does not imply causation.5. sign of r tells direction.6. -1 ≤ r ≤ 1 7. If the scatterplot is linear, then r near ±1 means strong, near 0 means weak.8. r measures linear relationships only.9. Correlation is not resistant-- outliers will affect r.10. r won't tell you whether something is linear.11. r doesn't tell the whole story; identify means and standard deviations for both x and y.

HW #27: page 161 (15–18, 21, 27–32)3.2 Least-Squares Regression Read 164–168The general form of a regression equation: y=a+bx The difference between y and : y = an observed value, a real data value y = a predicted value, a value from the equation

To interpret slope, you must address the ideas of: rate, units, prediction, & average.To interpret the intercept, you must address the ideas of: units, prediction when x is 0 (in context), & average.To predict a y for some x, you must address the ideas of: units, prediction when x is a # (in context) & average.

The a and b in the equation are estimates of the parameters of our true (population) line.

Used Ford EscapesThe following scatterplot shows the number of miles driven (in thousands) and advertised price (in thousands) for 11 used Ford Escapes from the 2012-2014 model years. The regression line shown on the scatterplot is = 24600 – 0.114x.

a) Interpret the slope and y intercept of the regression line.

b) Predict the price of a used Escape with 50,000 miles.

7

MilesDrive

nPrice

6000 269988000 2099812000 1959915000 2499919000 2599824000 1959944000 1999845000 1859947000 2159957000 1559971000 17599

Page 8: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

c) Predict the price of a used Escape with 250,000 miles. How confident are you in this prediction?Extrapolation can be a bad thing:

Caution! Don’t call x values outside the original data set “outliers”! They are x-values (in context) “that are outside the original interval of data used to create the equation of our line”.

Using the Track and Field data from earlier, the equation of the least-squares regression line is = 305 – 27.6x where y = long jump distance and x = sprint time. a) Interpret the slope.

b) Does it make sense to interpret the y-intercept? Explain.

HW #28 page 193 (35, 37, 39, 41, 45)3.2 ResidualsRead 168–172What is a residual?

residual = Actual y - Predicted y (AP)

Interpreting a residual: The residual is how far the actual y is above or below what we predict. Positive residuals mean points are above the line.

Negative residuals mean points are below the line. Remember! Points with large residuals are called outliers.

406080

100120140160180200

5 6 7 8 9Sprint (seconds)

LongJump = (-27.6 in/s)Sprint + 305 in 2

8

Page 9: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

9

Page 10: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

Calculate and interpret the residual for the Ford Escape with 57,000 miles and an asking price of $15,599.

How can we determine the “best” regression line for a set of data?

Example: 12 of Taco Bell’s chicken menu items:

(a) Calculate the equation of the least-squares regression line using technology. Make sure to define variables! Sketch the scatterplot with the graph of the least-squares regression line.

(b) Interpret “the estimates of the parameters of the regression line”. That means interpret the slope and y-intercept (yes, of course, in context).

(c) Calculate and interpret the residual for the first item listed, the Chicken Burrito Supreme, with 12g of fat and 50g of carbs.

10

Page 11: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

Read 172-178 To know a line is the right model to use, we can examine a residual plot. What is a residual plot? A scatterplot is a graph of the explanatory variable (x) and the response variable (y). A residual plot is a graph of the explanatory variable (x) and the residuals. What is the purpose of a residual plot? A residual plot can reveal patterns that were not apparent (not obvious) in the original scatterplot.

What would we see in a residual plot that tells us that our linear model is not a good one? Bad: A curved pattern, meaning the relationship between x&y is not linear.

A line is a bad model to choose.

Bad: Changing vertical spread, meaning predictions will be less accurate for some values of x.

What would we see in a residual plot that tells us that our linear model is appropriate?Good: A uniform, even, random scatter, meaning any errors in our predictions

will be random and about equal for any value of x. Construct and interpret a residual plot for the Ford Escape data.

HW #29: page 193 (43, 47, 49, 51)3.2 Standard deviation of the residuals and Read page 177What is the standard deviation of the residuals? How do you calculate and interpret it?

The standard deviation of the residuals for the Ford Escape data is 971.65 (approximately). Interpret this value.

Suppose that you see a used Ford Escape for sale. Predict the asking price for this Escape, using the average asking price.

11

Page 12: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

Would our predictions be better if we used miles driven to help estimate the sale price?

Read 178-180

What is the coefficient of determination r2? How do you calculate it?

r2 is a numerical measure of how much better our predictions are by using the regression equation instead of just using the average of the y values. r2 is called the coefficient of determination.

https://www.desmos.com/calculator/zvrc4lg3cr

How do you interpret r2? ___% of the variation in ____ (put the response variable there) is accounted for by the LSRL relating ____ (put the response variable there) to ____ (put the explanatory variable there).

How is related to r? How is related to s?

If , then To know whether r is +0.7 or -0.7, we will need more information.

both tell about how well the line fits our data

neither tells about form As , then (but ). because has no units because r has no units s has the same units as the response

variable

HW #30: page 193 (55, 57)

12

Page 13: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

Some Unusual Points in Regression: Outliers, High-Leverage Points, and Influential Points(newly emphasized information—some are not mentioned in the book or are differently defined)

Points can affect the correlation r, the equation of the least-squares regression line, the coefficient of determination r2, and the standard deviation of the residuals s. There are 3 cases (in each scatterplot, the 8 points closest to (0,0) are the same), based on where the additional 9th point is located.

Compare each of these cases to the original data:

Case 1 Case 2 Case 3slope of LSRL

intercept of LSRL

correlation

coeff. of determination

std. dev. of residuals

In Case 1 and Case 2, the x-coordinate of the unusual point is much larger than the x-coordinates of the other points. Points with x values that are much smaller or much larger compared to the other points in a scatterplot have high leverage. In Case 3, the unusual point has a very large residual. Points with large residuals are called outliers. All three of these unusual points are considered to be influential points; adding them to (or removing them from) the scatterplot substantially changes a, b, r, r2, and/or s.

13

Page 14: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

DEFINITIONS: Points with high leverage in regression have much larger or much smaller x values than the other points in the data set.An outlier in regression is a point that does not follow the pattern of the data and has a large residual.An influential point in regression is any point that, if removed, substantially changes the slope, y intercept, correlation, coefficient of determination, or standard deviation of the residuals.

Outliers and high-leverage points are often influential in regression calculations! The best way to investigate the influence of such points is to do regression calculations with and without them to see how much the results differ.

Interpreting Computer OutputRead 181–182

14

Page 15: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

Several factors contribute to the speed of a roller coaster: largest drop, maximum angle of drop, initial speed at the top of the largest hill, etc. Some parks report both height and drop, others only report the height. To investigate the relationship between height and speed for steel sit-down coasters, output from a regression analysis of 39 such American rides is shown below.

(a) What is the equation of the least-squares

regression line? Define any variables you use.

(b) Interpret the slope of the least-squares regression line.

(c) What is the correlation?

(d) Is a linear model appropriate for this data? Explain.

(e) Would you be willing to use the linear model to predict the speed of a roller coaster that is 500 feet tall, if it were built? Explain.

(f) Calculate and interpret the residual for Mr. Freeze, which is 218 ft. tall and has a speed of 70 mph.

Predictor Coef SE Coef T PConstant 25 1.13 11.7 0.0000Height 0.24 0.02 12 0.0000

S = 6.54 R-Sq = 86.0% R-Sq(adj) = 87.2%

15

Page 16: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

(g) Interpret the values of and s.

(h) If the speed were measured in km/h instead of mph, how would the values of and s be affected? Explain.

HW #31 page 193 (59, 61, 1–9 below)In addition to other factors affecting climate, temperatures generally tend to decrease as one moves north from the equator to the North Pole. Here are average June high temperatures and latitude for 12 cities along the US east coast. 1. Interpret the scatterplot for this data.

2. Calculate the equation of the least squares regression line.

3. Interpret the slope and y-intercept in the context of the problem.

16

City Latitude Avg. June HighPortland 43.4 73Boston 42.3 76

Providence 41.8 78Philadelphia 40.0 83Baltimore 39.3 85

Washington, DC 38.9 83Virginia Beach 36.8 84

Wilmington, NC 34.1 87Savannah 32.0 90

Jacksonville 30.2 90Daytona 29.1 88Miami 25.8 89

Page 17: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

4. Calculate and interpret the value of the correlation coefficient.

5. If the temperature were measured in Celsius instead of Fahrenheit, how would the correlation change? Explain.

6. Calculate and interpret the residual for Washington, DC.

7. Interpret the residual plot.

8. Identify and interpret the value of s in the context of the problem.

9. Identify and interpret the value of in the context of the problem.

17

Page 18: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

Regression Wisdom, etc. Read 186-191Does it matter which variable is x and which is y?

Which of the following has the highest correlation?

4

6

8

10

4 6 8 10 12 14 16A

3

5

7

9

4 6 8 10 12 14 16A

5

7

9

11

13

4 6 8 10 12 14 16 5

7

9

11

13

8 10 12 14 16 18 20E

Examine the hypothetical scatterplot of heights & weights of H. S. students (Form, Direction, Strength, Unusual features?), ignoring the 3 labeled points.

Consider each of these men, one at a time, and identify whether any of them are likely an outlier, a high leverage point, and/or influential.

Basketball player Shaquille O'Neal (7'1", 325 lbs.) Sumo wrestler Kasugao Katsumasa (6'0", 340 lbs.)Basketball player Manute Bol, (7'6" 200 lbs)

18

Page 19: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

Read 182-185To calculate the equation of the least-squares regression line using summary statistics, use these 2 facts:

1. Slope = b=rs y

sx2. The LSRL passes through the point ( x , y ).

Notice that for each increase of 1 standard deviation in x, the predicted value of y increases by r times the standard deviation of y.

example: Given: Mean for x=3, mean for y=7, std. dev. for x=1.5, std. dev. for y=2.5, and r = .89, what is the equation of the LSRL?

HW #32 page 193 (65, 71–78)

A note on predictions: consider these two interpretations and notice how the second one is better than the first one.

“The model tells us that a 6th grader with a forearm circumference of 12 inches should have a 1 rep max of 50.8 pounds.”

“The average 1 rep max for a 6th grader (from the sampled population) with a forearm circumference of 12 inches is expected to be approximately 50.8 pounds.”

19

Page 20: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

Plot, statistic,or equation

Characteristics

Form

?

Dire

ctio

n?

Stre

ngth

?

Show

s re

latio

nshi

p?

Use

for

pred

ictin

g or

es

timat

ing?

Other facts

Scatterplot

Correlationr

Coefficient of determination

r2

Least Squares Regression Liney=ax+bory=b0+b1 xResidual plot

20

Page 21: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

12.2 Transformations to Achieve LinearityRead 765–771 (focus on fish example primarily)

When a scatterplot does not appear to be linear, we can transform one or both variables to create transformed data sets that may be more linear than the untransformed data.A few examples of transformations:

evaluating logarithms (be mindful of what base is used), squaring, cubing, or other powers, finding the reciprocals, evaluating square roots, cube roots, etc.

Just so you know, when a nonlinear set of bivariate data is transformed to a linear set of bivariate data by...

taking the log of both variables, the original underlying relationship is a Power Model.

... taking the log of just one of the variables, the original underlying relationship is an Exponential Model.

12.2 Transformations to Achieve Linearity—Power ModelsKepler’s Third Law of Planetary Motion predicts the time it takes a planet to go around the sun, based on its distance from the sun. Here are the data for the 8 planetary objects closest to the sun.

(a) Based on this scatterplot and residual plot, describe why it would not be appropriate to use the least-squares regression line shown.

(b) We take the logarithm (base 10) of each variable to make the association linear. The resulting scatterplot is shown, along with a portion of the computer output from the analysis of the transformed data. The residual plot is also shown. Write the equation of the resulting linear equation.

Predictor Coef SE Coef Constant -0.000019 0.0012709LogDistance 1.50005 0.0020336

21

Distance Time

0.39 0.240.72 0.62

1 11.52 1.885.20 11.869.54 29.4619.22 84.0130.06 164.80

Page 22: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

(c) Predict the time (in years) for the dwarf planet Ceres, located in asteroid belt, to travel around the sun. Ceres is located 2.7675 AU from the sun.

Read 771–774(d) The dwarf planet Pluto, located beyond Neptune, has an actual orbital period of 248.00 years. Even though this is extrapolating, use the equation to predict the time (in years) for t to travel around the sun. Pluto is located 39.54 AU from the sun.

HW #45: page 763 (19–24), page 785 (31–37 odd) 12.2 Transformations to Achieve Linearity—Exponential ModelsRead 774–782

The table shows the number of Starbucks coffee shops in existence for several years during a period of rapid growth for the company. (a) Examine the scatterplot of the Starbucks data. Does the relationship look linear?

22

Year Stores1988 331989 551990 841991 1161992 1651993 2721994 4251995 6761996 10151997 1412

Page 23: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

(b) This is a scatterplot of log(Stores) vs. year with the LSRL and residual plot as well. It appears that an exponential model would be a good model for this data. A portion of the corresponding computer output is also given. Write the equation of the least-squares regression line.

Predictor Coef SE Coef Constant -359.70 6.3574897Year 0.181708 0.0027673

(c) This scatterplot of log(stores) vs. log(year) looks fairly linear. However, look at the residual plot. What is it about this residual plot, that tells us that a power model is not a good fit for this data?

(d) Use the model from part (b) to predict the number of Starbucks locations in 1998. Predictor Coef SE Coef Constant -359.70 6.3574897Year 0.181708 0.0027673

23

Page 24: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

A student rolled some four-sided dice and removed all the ones that landed with the “1” displayed. When he finished, he picked up the dice and repeated the same process over and over until all the dice were removed. Here is a table showing the number of dice remaining at the end of each “roll”.

(a) Here’s a scatterplot of this data. Does the relationship look like it follows a linear model or is it better to try a transformation to make it linear?

24

Roll Dice remaining1 462 323 224 175 116 97 78 69 310 211 112 0

Page 25: Wednesday, August 11 (131 minutes)stevewillott.com/19-20 ap stats notes in word/3 and... · Web viewUse technology to calculate correlation. 3.1. Activity on 152, 171. R3.4. Explain

(b) Three different combinations of transformations are done. Examine each scatterplot and residual plot. On two of these, the last observation in the table is not included since ln(0) is undefined. Which one appears to do the best job of making the relationship linear? Explain.

(c) Computer output from a linear regression analysis on the 11 transformed data points is shown below. This output uses the transformation we just selected. Give the equation of the least-squares regression line defining any variables you use.

Predictor Coef SE Coef T PConstant 4.0403 0.04352 92.84 0.000Roll -0.301452 0.006413 -47.01 0.000

S = 0.0638 R-Sq = 99.6% R-Sq(adj) = 97.6%

(d) Use your model from part (c) to predict the original number of dice the student used.

Review Chapter 3 & 12.2

HW #33 page 202 Chapter 3 Review ExercisesReview / FRAPPY!FRAPPY: 2005 #3HW #34 page 203 Chapter 3 & corresponding parts of 12.2 AP Statistics Practice TestsChapter 3 & 12.2 Test

25