Post on 20-Jan-2016
4.2 Correlation
• The Correlation Coefficient r• Properties of r
1
Correlation We can often see the strength
of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here are both scatterplots of the same data, on different scales. The second seems to be a stronger association…
So we need a measure of association independent of the graphics…
3
A scatterplot displays the strength, direction, and form of the relationship between two quantitative variables. Linear relations are important because a straight line is a simple pattern that is quite common.
Our eyes are not good judges of how strong a relationship is. Therefore, we use a numerical measure to supplement our scatterplot and help us interpret the strength of the linear relationship.
The correlation r measures the strength of the linear relationship between two quantitative variables.The correlation r measures the strength of the linear relationship between two quantitative variables.
Measuring Linear Association
4
We say a linear relationship is strong if the points lie close to a straight line and weak if they are widely scattered about a line. The following facts about r help us further interpret the strength of the linear relationship.
Properties of Correlation
r is always a number between –1 and 1. r > 0 indicates a positive association. r < 0 indicates a negative association. Values of r near 0 indicate a very weak linear relationship. The strength of the linear relationship increases as r moves
away from 0 toward –1 or 1. The extreme values r = –1 and r = 1 occur only in the case of
a perfect linear relationship.
Properties of Correlation
r is always a number between –1 and 1. r > 0 indicates a positive association. r < 0 indicates a negative association. Values of r near 0 indicate a very weak linear relationship. The strength of the linear relationship increases as r moves
away from 0 toward –1 or 1. The extreme values r = –1 and r = 1 occur only in the case of
a perfect linear relationship.
Measuring Linear Association
5
Correlation
The correlation coefficient r
Time to swim: = 35, sx = 0.7
Pulse rate: = 140 sy = 9.5
r does not distinguish between x & yThe correlation coefficient, r, treats
x and y symmetrically
"Time to swim" is the explanatory variable here, and belongs on the x axis. However, in either plot r is the same (r=-0.75).
r = -0.75 r = -0.75
Changing the units of measure of variables does not change the correlation coefficient r, because we "standardize out" the units when getting z-scores.
r has no unit of measure (unlike x and y)
r = -0.75
r = -0.75
z-score plot is the same for both plots
z for time z for pulse
9
Cautions:
Correlation requires that both variables be quantitative.
Correlation does not describe curved relationships between variables, no matter how strong the relationship is.
Correlation is not resistant. r is strongly affected by a few outlying observations.
Correlation is not a complete summary of two-variable data.
Cautions:
Correlation requires that both variables be quantitative.
Correlation does not describe curved relationships between variables, no matter how strong the relationship is.
Correlation is not resistant. r is strongly affected by a few outlying observations.
Correlation is not a complete summary of two-variable data.
10
HW: Read section 4.2 on the Correlation Coefficient. Pay particular attention to the Figure 4.12…
Work the following exercises: #4.36-4.38, 4.41-4.44, 4.47-4.49
HW: Read section 4.2 on the Correlation Coefficient. Pay particular attention to the Figure 4.12…
Work the following exercises: #4.36-4.38, 4.41-4.44, 4.47-4.49