Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying...

28
Chapter 4 Scatterplots and Correlation

Transcript of Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying...

Page 1: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Chapter 4

Scatterplots and Correlation

Page 2: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Explanatory and Response Variables

Interested in studying the relationship between two variables by measuring both variables on the same individuals.– a response variable measures an outcome

of a study– an explanatory variable explains or

influences changes in a response variable– sometimes there is no distinction

Page 3: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Question

In a study to determine whether surgery or chemotherapy results in higher survival rates for a certain type of cancer, whether or not the patient survived is one variable, and whether they received surgery or chemotherapy is the other. Which is the explanatory variable and which is the response variable?

Page 4: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Graphs the relationship between two quantitative (numerical) variables measured on the same individuals.

If a distinction exists, plot the explanatory variable on the horizontal (x) axis and plot the response variable on the vertical (y) axis.

Scatterplot

Page 5: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Relationship between mean SAT verbal score and percent of high school grads taking SAT

Scatterplot

Page 6: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Look for overall pattern and deviations from this pattern (linear, curved, clusters, no pattern)

Describe pattern by form, direction, and strength of the relationship

Look for outliers

Scatterplot

Page 7: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Linear Relationship

Some relationships are such that the points of a scatterplot tend to fall along a straight line -- linear relationship

Page 8: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Direction

Positive association– above-average values of one variable tend

to accompany above-average values of the other variable, and below-average values tend to occur together

Negative association– above-average values of one variable tend

to accompany below-average values of the other variable, and vice versa

Page 9: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Positive association: High values of one variable tend to occur together

with high values of the other variable.

Negative association: High values of one variable tend to occur together

with low values of the other variable.

Page 10: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

One way to remember this: The equation for this line is y = 5.x is not involved.

No relationship: x and y vary independently. Knowing x tells you nothing about y.

Page 11: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Examples

From a scatterplot of college students, there is a positive association between verbal SAT score and GPA.

For used cars, there is a negative association between the age of the car and the selling price.

Page 12: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Examples of Relationships

0

10

20

30

40

50

60

$0 $10 $20 $30 $40 $50 $60 $70

Income

Hea

th S

tatu

s M

easu

re

0

10

20

30

40

50

60

70

0 20 40 60 80 100

Age

Hea

th S

tatu

s M

easu

re0

2

4

6

8

10

12

14

16

18

0 20 40 60 80 100

Age

Ed

uca

tion

Lev

el

30

35

40

45

50

55

60

65

0 20 40 60 80

Physical Health Score

Men

tal H

ealt

h S

core

Page 13: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Measuring Strength & Directionof a Linear Relationship

How closely does a non-horizontal straight line fit the points of a scatterplot?

The correlation coefficient (often referred to as just correlation): r– measure of the strength of the relationship: the

stronger the relationship, the larger the magnitude of r.

– measure of the direction of the relationship: positive r indicates a positive relationship, negative r indicates a negative relationship.

Page 14: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Correlation Coefficient special values for r :

a perfect positive linear relationship would have r = +1 a perfect negative linear relationship would have r = -1 if there is no linear relationship, or if the scatterplot

points are best fit by a horizontal line, then r = 0 Note: r must be between -1 and +1, inclusive

both variables must be quantitative; no distinction between response and explanatory variables

r has no units; does not change when measurement units are changed (ex: ft. or in.)

Page 15: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Examples of Correlations

Page 16: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Examples of Correlations Husband’s versus Wife’s ages

r = .94 Husband’s versus Wife’s heights

r = .36 Professional Golfer’s Putting Success:

Distance of putt in feet versus percent success

r = -.94

Page 17: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Not all Relationships are Linear Miles per Gallon versus Speed

Linear relationship?

Correlation is close to zero.

y = - 0.013x + 26.9r = - 0.06

0

5

10

15

20

25

30

35

0 50 100

speed

mil

es p

er

gall

on

Page 18: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Not all Relationships are Linear Miles per Gallon versus Speed

0

5

10

15

20

25

30

35

0 50 100

speed

mil

es p

er g

allo

n Curved relationship.

Correlation is misleading.

Page 19: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Problems with Correlations

Outliers can inflate or deflate correlations (see next slide)

Groups combined inappropriately may mask relationships (a third variable)– groups may have different relationships

when separated

Page 20: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Outliers and Correlation

For each scatterplot above, how does the outlier affect the correlation?

A B

A: outlier decreases the correlation B: outlier increases the correlation

Page 21: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Example: IQ score and grade point average

a)Describe what this plot shows in words.

b)Describe the direction, shape, and strength. Are there outliers?

c) What is the deal with these people?

Page 22: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Correlation Calculation Suppose we have data on variables X

and Y for n individuals:x1, x2, … , xn and y1, y2, … , yn

Each variable has a mean and std dev:) for 2 ch. (see and s yx

s ,y (s ,x ( ))

n

1i y

i

x

i

s

yy

s

xx

1-n

1r

Page 23: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Case Study

Per Capita Gross Domestic Productand Average Life Expectancy for

Countries in Western Europe

Page 24: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Case Study

Country Per Capita GDP (x) Life Expectancy (y)

Austria 21.4 77.48

Belgium 23.2 77.53

Finland 20.0 77.32

France 22.7 78.63

Germany 20.8 77.17

Ireland 18.6 76.39

Italy 21.5 78.51

Netherlands 22.0 78.15

Switzerland 23.8 78.99

United Kingdom 21.2 77.37

Page 25: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Case Studyx y

21.4 77.48 -0.078 -0.345 0.027

23.2 77.53 1.097 -0.282 -0.309

20.0 77.32 -0.992 -0.546 0.542

22.7 78.63 0.770 1.102 0.849

20.8 77.17 -0.470 -0.735 0.345

18.6 76.39 -1.906 -1.716 3.271

21.5 78.51 -0.013 0.951 -0.012

22.0 78.15 0.313 0.498 0.156

23.8 78.99 1.489 1.555 2.315

21.2 77.37 -0.209 -0.483 0.101

= 21.52 = 77.754sum = 7.285

sx =1.532 sy =0.795

yi /syy xi /sxx

x y

y

i

x

i

s

y-y

s

x-x

Page 26: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Case Study

0.809

(7.285)110

1

n

1i y

i

x

i

s

yy

s

xx

1-n

1r

Page 27: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

“r” doesn’t distinguish explanatory and response variables

The correlation coefficient, r, treats x and y symmetrically.

“Time to swim” is the explanatory variable here and belongs on the x axis. However, in either plot r is the same (r = −0.75).

r = -0.75 r = -0.75

Page 28: Chapter 4 Scatterplots and Correlation. Explanatory and Response Variables u Interested in studying the relationship between two variables by measuring.

Changing the units of variables does not change the correlation coefficient “r,” because we get rid of all our unitswhen we standardize (get z-scores).

“r” has no unit