Download - Explanatory and Response Variables Chapter 4pignottia.faculty.mjc.edu/math134/classnotes/Chapter_04.4.pdf · Chapter 4 Scatterplots and Correlation Chapter 4 BPS - 5th Ed 1 Explanatory

Transcript
Page 1: Explanatory and Response Variables Chapter 4pignottia.faculty.mjc.edu/math134/classnotes/Chapter_04.4.pdf · Chapter 4 Scatterplots and Correlation Chapter 4 BPS - 5th Ed 1 Explanatory

Chapter 4 Scatterplots and Correlation

BPS - 5th Ed. Chapter 4 1

Explanatory and Response Variables

  Interested in studying the relationship between two variables by measuring both variables on the same individuals.   a response variable measures an outcome of a study   an explanatory variable explains or influences changes

in a response variable   sometimes there is no distinction

BPS - 5th Ed. Chapter 4 2

Question

BPS - 5th Ed. Chapter 4 3

Consider a study to determine whether surgery or chemotherapy results in higher survival rates for a certain type of cancer

•  What is the explanatory variable? the response variable?

Scatterplot   Graphs the relationship between two quantitative

(numerical) variables measured on the same individuals.

  If a distinction exists, plot the explanatory variable on the horizontal (x) axis and plot the response variable on the vertical (y) axis.

BPS - 5th Ed. Chapter 4 4

Page 2: Explanatory and Response Variables Chapter 4pignottia.faculty.mjc.edu/math134/classnotes/Chapter_04.4.pdf · Chapter 4 Scatterplots and Correlation Chapter 4 BPS - 5th Ed 1 Explanatory

BPS - 5th Ed. Chapter 4 5

Relationship between mean SAT verbal score and percent of high school grads taking SAT

Scatterplot Scatterplot To add a categorical variable, use a different plot color or symbol for each category

BPS - 5th Ed. Chapter 4 6

Southern states

highlighted

Scatterplot  Look for overall pattern and

deviations from this pattern

 Describe pattern by form, direction, and strength of the relationship

 Look for outliers

BPS - 5th Ed. Chapter 4 7

Linear Relationship Some relationships are such that the points of a scatterplot tend to fall along a straight line -- linear relationship

BPS - 5th Ed. Chapter 4 8

Page 3: Explanatory and Response Variables Chapter 4pignottia.faculty.mjc.edu/math134/classnotes/Chapter_04.4.pdf · Chapter 4 Scatterplots and Correlation Chapter 4 BPS - 5th Ed 1 Explanatory

Direction   Positive association

  above-average values of one variable tend to accompany above-average values of the other variable, and below-average values tend to occur together

  Negative association   above-average values of one variable tend to

accompany below-average values of the other variable, and vice versa

BPS - 5th Ed. Chapter 4 9

Positive Association

BPS - 5th Ed. Chapter 4 10

Negative Association

BPS - 5th Ed. Chapter 4 11

Examples

BPS - 5th Ed. Chapter 4 12

From a scatterplot of college students, will the association between verbal SAT score and GPA be negative or positive?

For used cars, is the association between the age of the car and the selling price positive or negative?

Page 4: Explanatory and Response Variables Chapter 4pignottia.faculty.mjc.edu/math134/classnotes/Chapter_04.4.pdf · Chapter 4 Scatterplots and Correlation Chapter 4 BPS - 5th Ed 1 Explanatory

Examples of Relationships

BPS - 5th Ed. Chapter 4 13

Measuring Strength & Direction of a Linear Relationship

  How closely does a non-horizontal straight line fit the points of a scatterplot?

  The correlation coefficient – r (often referred to as just correlation):

  measure of the strength of the relationship: the stronger the relationship, the larger the magnitude of r.

  measure of the direction of the relationship:  positive r indicates a positive relationship  negative r indicates a negative relationship.

BPS - 5th Ed. Chapter 4 14

Correlation Coefficient   special values for r : •  a perfect positive linear relationship would have r = +1 •  a perfect negative linear relationship would have r = -1 •  if there is no linear relationship, or if the scatterplot

points are best fit by a horizontal line, then r = 0 •  Note: -1 ≤ r ≤ +1

  both variables must be quantitative; no distinction between response and explanatory variables

  r has no units; does not change when measurement units are changed (ex: ft. or in.)

BPS - 5th Ed. Chapter 4 15

Examples of Correlations   Husband’s versus Wife’s ages

 r = .94

  Husband’s versus Wife’s heights  r = .36

  Professional Golfer’s Putting Success: Distance of putt in feet versus percent success

 r = -.94

BPS - 5th Ed. Chapter 4 16

Page 5: Explanatory and Response Variables Chapter 4pignottia.faculty.mjc.edu/math134/classnotes/Chapter_04.4.pdf · Chapter 4 Scatterplots and Correlation Chapter 4 BPS - 5th Ed 1 Explanatory

Linear Miles per Gallon versus

Speed

  Linear relationship?

  Correlation is close to zero.

BPS - 5th Ed. Chapter 4 17

Linear Miles per Gallon versus

Speed

  Curved relationship.

  Correlation is misleading.

BPS - 5th Ed. Chapter 4 18

Problems with Correlations   Outliers can inflate or deflate correlations (see next

slide)

  Groups combined inappropriately may mask relationships (a third variable)   groups may have different relationships when separated

BPS - 5th Ed. Chapter 4 19

Outliers and Correlation

BPS - 5th Ed. Chapter 4 20

For each scatterplot above, how does the outlier affect the correlation?

A B

A: outlier decreases the correlation B: outlier increases the correlation

Page 6: Explanatory and Response Variables Chapter 4pignottia.faculty.mjc.edu/math134/classnotes/Chapter_04.4.pdf · Chapter 4 Scatterplots and Correlation Chapter 4 BPS - 5th Ed 1 Explanatory

Correlation Calculation   Suppose we have data on variables X and Y for n

individuals: x1, x2, … , xn and y1, y2, … , yn

  Each variable has a mean and std dev:

( x , sx ) and ( y , sy ) (see ch. 2 for s)

BPS - 5th Ed. Chapter 4 21

r =1

n -1xi − x

sx

#

$ %

&

' (

yi − y sy

#

$ % %

&

' ( (

i =1

n

Case Study

BPS - 5th Ed. Chapter 4 22

Per Capita Gross Domestic Product and Average Life Expectancy for

Countries in Western Europe

Case Study Country Per Capita GDP (x) Life Expectancy (y) Austria 21.4 77.48 Belgium 23.2 77.53 Finland 20.0 77.32 France 22.7 78.63

Germany 20.8 77.17 Ireland 18.6 76.39

Italy 21.5 78.51 Netherlands 22.0 78.15 Switzerland 23.8 78.99

United Kingdom 21.2 77.37

BPS - 5th Ed. Chapter 4 23

Case Study x y

21.4 77.48 -0.078 -0.345 0.027 23.2 77.53 1.097 -0.282 -0.309 20.0 77.32 -0.992 -0.546 0.542 22.7 78.63 0.770 1.102 0.849 20.8 77.17 -0.470 -0.735 0.345 18.6 76.39 -1.906 -1.716 3.271 21.5 78.51 -0.013 0.951 -0.012 22.0 78.15 0.313 0.498 0.156 23.8 78.99 1.489 1.555 2.315 21.2 77.37 -0.209 -0.483 0.101 = 21.52 = 77.754

sum = 7.285 sx =1.532 sy =0.795

BPS - 5th Ed. Chapter 4 24

xi − x ( )/sx

Page 7: Explanatory and Response Variables Chapter 4pignottia.faculty.mjc.edu/math134/classnotes/Chapter_04.4.pdf · Chapter 4 Scatterplots and Correlation Chapter 4 BPS - 5th Ed 1 Explanatory

Case Study

BPS - 5th Ed. Chapter 4 25

Features of the Correlation Coefficient Features of the correlation coe!cient

Suppose you measure the correlation between the temperature in New

York and that in Boston during a month. Do you expect to get the same

correlation if you measure it in Celsius than if you measure it in

Fahrenheit?

Recall the process of obtaining the correlation. The first step is to

convert the samples of both variables to standard units. This implies

that the correlation does not depend on units. Thus, no matter if you

use Celsius of Fahrenheit, you will get the same correlation.

Furthermore, since the correlation depends on the product of the two

variables, it does not matter whether you consider the correlation the

temperature in NY and that in Boston or the correlation between the

temperature in Boston and that in NY.

MATH 134

251

BPS - 5th Ed. Chapter 4 26

Features of the Correlation Coefficient

BPS - 5th Ed. Chapter 4 27

The correlation coe!cient has the following properties

• The correlation is not a"ected when the two variables are

interchanged.

• The correlation is not changed if the same number is added to all

the values of one of the variables.

• The correlation is not changed if all the values of one of the

variables is multiplied by the same positive number. It will change

sign if the number is negative.

MATH 134

252

Correlation   If women always married men who were five years

older, what would the correlation between their ages be? (Hint: It always helps to visualize the scatterplot.)

  True or False: If the correlation coefficient is -0.80 then below-average values of the dependent variable tend to be associated with below-average values of the independent variable.

BPS - 5th Ed. Chapter 4 28

Page 8: Explanatory and Response Variables Chapter 4pignottia.faculty.mjc.edu/math134/classnotes/Chapter_04.4.pdf · Chapter 4 Scatterplots and Correlation Chapter 4 BPS - 5th Ed 1 Explanatory

Correlation   A teaching assistant gives a quiz to her section.

There are ten questions on the quiz, and no part credit is given. After grading the papers the TA writes down for each student the number of questions the student got right and the number wrong. The average number of wrong answers is 3.6, with the same SD of 2.0. The correlation between number right and number wrong is:

0 -0.50 +0.50 -1 +1 can’t tell without the data

BPS - 5th Ed. Chapter 4 29