Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

27
Chapter 4 Scatterplots and Correlation

Transcript of Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Page 1: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Chapter 4

Scatterplots and Correlation

Page 2: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Rating Cereal: 0 to 100

0 = unhealthy

100 = very nutritious

Page 3: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Rating Cereal: 0 to 100

0 = unhealthy

100 = very nutritious

Page 4: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

•How are you choosing your rating?

•Consumer Reports does this kind of work and provides it to consumers

•CR rated 77 cereals, but their rating formula is not available to the public.

Rating Cereal: 0 to 100

Page 5: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Your Mission

At the end of this lesson you will be designing a children’s cereal. You want your cereal to receive above average Consumer Report ratings but also taste good. Of course, we all know that what tastes good may not be the most nutritious. So this will be a balancing act.

CLUES: We have the nutritional information from ingredients we think might influence the ratings for these 77 cereals and have created scatterplots.

Page 6: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.
Page 7: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Your job is to answer the questions below AND write down enough of your reasoning that someone can follow your thinking.

1)Pick a scatterplot with a pattern that fits your expectations and a scatterplot with a pattern that is surprising to you. Explain why the patterns you see are expected or surprising.

2)Pick one ingredient that you think is influential in determining the Consumer Report rating. Now pick an ingredient that you think is not influential.

3)How do the patterns in the data support your choices in (2)?

Page 8: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Interpreting Scatterplots

4) Captain Crunch has the lowest Consumer Report rating of the 77 cereals in the data set. How much fat is in a serving of Captain Crunch?

5) In this set of 77 cereals, Product 19 has the most sodium in a serving. What is the rating for Product 19?

6) All-Bran Extra Fiber is the cereal with the highest rating. How much sugar, fat, sodium and fiber is in a serving of All-Bran Extra Fiber?

Page 9: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Interpreting Scatterplots

When you read a scatterplot, always ask yourself:

What are the individuals described?

What measurements were made, i.e. what are the variables?

Page 10: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Looking for Patterns and Relationships

7) Compare and contrast the sugar-ratings scatter plot with the protein-ratings scatter plot.

How are the patterns in these two scatter plots similar? How are they different? Are the patterns you see what you expected? Why or why not?

8) Think about variability There are 3 cereals that have 2 grams of sugar in a

serving. Find these 3 cereals in a scatter plot. Do these cereals have the same rating? If the ratings differ, what might explain the variability in the ratings?

Page 11: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Making Predictions

9) A new cereal for children has 18 grams of sugar per serving. Approximately what rating do you think the cereal will receive?

10) Use your prediction to plot this new cereal on the sugar-rating scatterplot. (Label it “#10”)

11) Several popular diets advocate high protein intake. Three new cereals are being developed for this market. All will have 5 grams of protein per serving. What do you think is a reasonable range for the ratings for these three cereals? Explain how you determined a reasonable range.

Page 12: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Making Predictions

12) Use your predictions to plot these three cereals on the scatter plot. (Label them “#12”.)

13) In which situation (high sugar cereals or high protein cereals) do you feel the most confident about your predicted ratings? How is this confidence (or lack of confidence) related to what you see in the scatter plots?

Page 13: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Explanatory and Response Variables

Today we are studying the relationship between pairs of variables by measuring both variables on the same cereals.

a response variable measures an outcome of a study

an explanatory variable explains or influences a response variable

sometimes there is no distinction

Page 14: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Explanatory and Response Variables

What were our explanatory and response variables?

If a distinction exists, plot the explanatory variable on the horizontal (x) axis and plot the response variable on the vertical (y) axis when making a scatterplot.

Page 15: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Analyzing a Scatterplot

Look for an overall pattern, deviations from this pattern, and outliers

Describe the form, direction, and strength of the relationship between the variables

Page 16: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Form

Sometimes the points of a scatterplot tend to fall along a straight line – this indicates a linear relationship

The pattern may also indicated a curved relationship, some other relationship, or that there is no relationship at all. We will be exploring the cases where there is a linear relationship in more detail.

Page 17: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Direction

Positive associationhigher values of one variable tend to correspond

with higher values of the other variable, and lower values tend to occur together

Negative associationhigher values of one variable tend to correspond

with lower values of the other variable, and vice versa

Page 18: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Direction

Which ingredients are positively associated with ratings?

Negatively associated with ratings?

Does this positive or negative association make sense given your own understanding about the nutritional value of the ingredient?

Page 19: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Direction

Analyze the vitamin-ratings relationship. Why is this relationship surprising?

What could explain the negative relationship? (Perhaps Consumer Reports is not using vitamins in their formula.) Why? (There is very little variability in the vitamin content among the cereals, so it is not a useful way to distinguish cereals.)

Page 20: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Measuring Strength & Direction(when the relationship is linear)

How closely does a non-horizontal straight line fit the points of a scatterplot?

The correlation coefficient (or simply correlation): rMeasures the strength of the relationship:

the stronger the relationship, the larger the magnitude of r.

Measures the direction of the relationship: positive r indicates a positive association, negative r indicates a negative association.

Page 21: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Correlation r

unrealistic values for r : a perfect positive linear relationship would have r = +1 a perfect negative linear relationship would have r = -1 if there is no linear relationship, or if the linear

relationship is horizontal, then r = 0

both variables must be quantitative, but we don’t have to identify the explanatory and response variables

r has no units (unlike most quantitative variables)

11 r

Page 22: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Examples of Correlations

(image © Bedford, Freeman and Worth Publishing Group)

Page 23: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Examples of Correlations Men’s weight vs. height on a basketball team

r = .93 Men’s weight vs. height in a movie theater

r = .47 Number miles traveled vs. tread depth on car tires

r = -.93 Which of our scatterplots shows the strongest

correlation? The weakest?

Page 24: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Potential Problems

Outliers can greatly influence correlation Separating groups can reveal

relationships otherwise invisible when groups are combined

Page 25: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Outliers and Correlation

How do the outliers affect the value of r?

A B

A: outlier decreases r (correlation appears worse) B: outlier increases r (correlation appears better)

(image © Bedford, Freeman and Worth Publishing Group)

Page 26: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

MiniTab (for HW)

You do not need to find r by hand in 4.35 (or ever). Use MiniTab in your homework; on your exam you will be making your best estimation just by looking at a scatterplot. I will accept a range of answers as long as your justification makes sense.

In MiniTab:•To find r: Stat>Basic Statistics>Correlation•To make a scatterplot: Graph>Scatterplot>(Simple; With Groups for 4.31 and 4.43)

Double click on the two columns that contain the variables of interest, then click OK.

Page 27: Chapter 4 Scatterplots and Correlation. Rating Cereal: 0 to 100 0 = unhealthy 100 = very nutritious.

Work Cited

Moore, David S., The Basic Practice of Statistics, 5ed; © Bedford, Freeman and Worth Publishing Group