Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

27
Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 14 1

Transcript of Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Page 1: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Chapter 14

Describing Relationships: Scatterplots and Correlation

Chapter 14 1

Page 2: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Chapter 14 2

Thought Question 1

For all cars manufactured in the U.S., there is a positive correlation between the size of the engine and horsepower. There is a negative correlation between the size of the engine and gas mileage. What does it mean for two variables to have a positive correlation or a negative correlation?

Page 3: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Chapter 14 3

ScatterplotA Scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual.

Page 4: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Figure 14.9 Scatterplot of average SAT Mathematics score for each state againstthe proportion of the state’s high school seniors who took the SAT. The light-coloredpoint corresponds to two states. (This figure was created using the Minitab softwarepackage.)

Page 5: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Figure 14.3 Scatterplot of the life expectancy of people in many nations against eachnation’s gross domestic product per person. (This figure was created using the Minitab software package.)

Page 6: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Chapter 14 6

Examining a Scatterplot

In any graph of data, look for the overall pattern and for striking deviations from that pattern.You can describe the overall pattern of a scatterplot by the direction, form, and strength of the relationship.An important kind of deviation is an outlier, an individual value that falls outside the overall pattern of the relationship.

Page 7: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Chapter 14 7

Positive association, Negative association

Two variables are positively/Negatively associated when above-average values of one tend to accompany above-average /below-average values of the other.The scatterplot slops upward/downward as we move from the left to right.

Page 8: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

• Our scatterplot regarding the SAT scores shows two clusters of states.

• The one with the GDP shows a curved relationship.

• The strength of a relationship in a scatterplot is determined by how closely the points follow a clear form.

• The relationship in both our plots are not strong. – States with similar percentages show quite a bit of

scatter in their average scores.– Nations with similar GDPs can have quite different life

expectancies.

Chapter 14 8

Page 9: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

A scatterplot with strong relationship

Chapter 14 9

Figure 14.5 Scatterplot of the lengths of two bones in 5 fossil specimens of the extinct beast Archaeopteryx.

Page 10: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Statistical versus Deterministic Relationships

• Distance versus Speed (when travel time is constant).

• Income (in millions of dollars) versus total assets of banks (in billions of dollars).

Chapter 14 10

Page 11: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Distance versus Speed

• Distance = Speed Time• Suppose time = 1.5 hours• Each subject drives a fixed

speed for the 1.5 hrs.– speed chosen for each subject

varies from 10 mph to 50 mph

• Distance does not vary for those who drive the same fixed speed

• Deterministic relationship

0

10

20

30

40

50

60

70

80

0 20 40 60

speed

dis

tan

ce

Chapter 14 11

Page 12: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Income versus Assets

• Income =a + bAssets?

• Assets vary from 3.4 billion to 49 billion

• Income varies from bank to bank, even among those with similar assets

• Statistical relationship0

50

100

150

200

250

300

0 20 40 60

assets (billions)

inco

me

(mil

lio

ns)

Chapter 14 12

Page 13: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Linear Relationship

Some relationships are such that the points of a scatterplot tend to fall along a straight line -- linear relationship.

Chapter 14 13

Page 14: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Measuring Strength & Directionof a Linear Relationship

• How closely does a non-horizontal straight line fit the points of a scatterplot?

• The correlation coefficient (often referred to as just correlation) r is a measure of: – the strength of the relationship: the stronger the

relationship, the larger the magnitude of r.– the direction of the relationship: positive r indicates

a positive relationship, negative r indicates a negative relationship.

Chapter 14 14

Page 15: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Correlation Coefficient

• special values for r : A perfect positive linear relationship would have r = +1. A perfect negative linear relationship would have r = -1.

If there is no linear relationship, or if the scatterplot points are best fit by a horizontal line, then r = 0.

Note: r must be between -1 and +1, inclusive.• r > 0: as one variable changes, the other variable

tends to change in the same direction.• r < 0: as one variable changes, the other variable

tends to change in the opposite direction.

Chapter 14 15

Page 16: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Figure 14.7 How correlation measures the strength of a straight-line relationship.Patterns closer to a straight line have correlations closer to 1 or −1.

Page 17: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Chapter 14 17

Correlation Calculation• Suppose we have data on variables X and Y

for n individuals:x1, x2, … , xn and y1, y2, … , yn

• Each variable has a mean and std dev: ) ) y

xs( x, s ( y, s (see ch. 12 for ) and

n

1i y

i

x

i

s

yy

s

xx

1-n

1r

Page 18: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Chapter 14 18

Case Study

Per Capita Gross Domestic Productand Average Life Expectancy for

Countries in Western Europe

Page 19: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Chapter 14 19

Case StudyCountry Per Capita GDP (x) Life Expectancy (y)

Austria 21.4 77.48

Belgium 23.2 77.53

Finland 20.0 77.32

France 22.7 78.63

Germany 20.8 77.17

Ireland 18.6 76.39

Italy 21.5 78.51

Netherlands 22.0 78.15

Switzerland 23.8 78.99

United Kingdom 21.2 77.37

Page 20: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Chapter 14 20

Case Studyx y

21.4 77.48 -0.078 -0.345 0.027

23.2 77.53 1.097 -0.282 -0.309

20.0 77.32 -0.992 -0.546 0.542

22.7 78.63 0.770 1.102 0.849

20.8 77.17 -0.470 -0.735 0.345

18.6 76.39 -1.906 -1.716 3.271

21.5 78.51 -0.013 0.951 -0.012

22.0 78.15 0.313 0.498 0.156

23.8 78.99 1.489 1.555 2.315

21.2 77.37 -0.209 -0.483 0.101

= 21.52 = 77.754sum = 7.285

sx =1.532 sy =0.795

yi /syy xi /sxx

x y

y

i

x

i

s

y-y

s

x-x

Page 21: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Chapter 14 21

Case Study

There is a strong, positive linear relationship between Per Capita GDP (x) and Life Expectancy (y).

Page 22: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Problems with Correlations

• Outliers can inflate or deflate correlations.• Groups combined inappropriately may mask

relationships (a third variable).– groups may have different relationships when

separated.

Chapter 14 22

Page 23: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Figure 14.8 Moving one point reduces the correlation from r = 0.994 to r = 0.640.

Page 24: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Not all Relationships are LinearMiles per Gallon versus Speed

• Linear relationship?MPG = a + bSpeed?

• Speed chosen for each subject varies from 20 mph to 60 mph.

• MPG varies from trial to trial, even at the same speed.

• Statistical relationship

Chapter 14 24

Page 25: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Not all Relationships are LinearMiles per Gallon versus Speed

• Curved relationship(r is misleading)

• Speed chosen for each subject varies from 20 mph to 60 mph.

• MPG varies from trial to trial, even at the same speed.

• Statistical relationship

0

5

10

15

20

25

30

35

0 50 100

speed

mil

es p

er g

allo

n

Chapter 14 25

Page 26: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Price of Books versus Size

• Relationship between price of books and the number of pages?

• Positive?• Look at paperbacks:• Look at hardcovers:• All books together:• Overall correlation is

Negative!

0

20

40

60

80

100

120

140

0 100 200 300 400

# of pages

pri

ce (

do

llar

s)

Chapter 14 26

Page 27: Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 141.

Key Concepts

• Statistical vs. Deterministic Relationships• Statistically Significant Relationship• Strength of Linear Relationship• Direction of Linear Relationship• Correlation Coefficient• Problems with Correlations

Chapter 14 27