8/7/2019 3.3 Power Point
1/19
Correlation andCorrelation and
Regression Wisdom:Regression Wisdom:Section 3.3Section 3.3
8/7/2019 3.3 Power Point
2/19
CautionsCautions
Correlation and regression describe onlyCorrelation and regression describe onlylinear relationships!linear relationships! You can do theYou can do thecalculations for any relationship betweencalculations for any relationship between
two variables, but the results are useful onlytwo variables, but the results are useful onlyif the scatterplot shows a linear pattern.if the scatterplot shows a linear pattern.
Extrapolation often produces unreliableExtrapolation often produces unreliable
results!results! Correlation is not resistant!Correlation is not resistant! Always plotAlways plot
your data and look for unusual observationsyour data and look for unusual observationsbefore you interpret correlation.before you interpret correlation.
8/7/2019 3.3 Power Point
3/19
Which Points Have theWhich Points Have the
Influence?Influence? Each point has an influence on the LSR line,Each point has an influence on the LSR line,
some make large contributions and somesome make large contributions and some
small. Some make positive contributions andsmall. Some make positive contributions andsome negative.some negative.
Our goal is to learn to recognize the points theOur goal is to learn to recognize the points the
points in a data set that may have an unusuallypoints in a data set that may have an unusuallylarge influence on where the regression linelarge influence on where the regression line
goes or on the size and sign or the correlation.goes or on the size and sign or the correlation.
8/7/2019 3.3 Power Point
4/19
Outliers and InfluentialOutliers and Influential
Observations in RegressionObservations in Regression
8/7/2019 3.3 Power Point
5/19
0
10
20
30
40
50
60
70
80
0 5 10 15 20 25 30 35 40 45
Aver e_Life_Span
Coll ction1 Sc tt r Plot
Animal LongevityAnimal Longevity
TheThe
relationrelation--shipship
betweenbetween
maximummaximum
andandaverageaverage
life spanlife span
forfor
mammals.mammals.
Beaver: avg. = 5
max = 50
Elephant: avg. = 35
max = 70
Hippo: avg. = 41
max = 54
8/7/2019 3.3 Power Point
6/19
If you look at the entire sample, the beaver,If you look at the entire sample, the beaver,
elephant, and hippo are the oddballs of theelephant, and hippo are the oddballs of the
bunch.bunch.
The LSR line for all the mammals in theThe LSR line for all the mammals in the
sample is:sample is:
where is the predicted maximumwhere is the predicted maximum
longevity andlongevity and AA stands for observedstands for observed
average longevity.average longevity.
The correlation for the relationship is .77.The correlation for the relationship is .77.
AM 58.153.10 !
M
8/7/2019 3.3 Power Point
7/19
Are Outliers Always Influential?Are Outliers Always Influential?
In Chapter 1, outliers were influentialIn Chapter 1, outliers were influentialpoints, because they were far from thepoints, because they were far from the
other values AND the mean changesother values AND the mean changesdrastically when they are removed. Indrastically when they are removed. Inregression, not all outliers are influential.regression, not all outliers are influential.Influential points often have smallInfluential points often have small
residuals because they pull theresiduals because they pull theregression line toward themselves, soregression line toward themselves, sojust looking at a residual plot is notjust looking at a residual plot is notenough.enough.
8/7/2019 3.3 Power Point
8/19
How Do We Know?How Do We Know?
The surest way to verify that a point isThe surest way to verify that a point is
influential is to find the regression lineinfluential is to find the regression line
both with and without the suspect point. Ifboth with and without the suspect point. Ifthe line moves more than a small amountthe line moves more than a small amount
when the point is deleted, the point iswhen the point is deleted, the point is
influential.influential.
8/7/2019 3.3 Power Point
9/19
Maxi _ i _ an = 1.58Average_ i e_ an + 10.5; r2 = 0.59
0
10
20
30
40
50
60
70
80
0 5 10 15 20 25 30 35 40 45
Av r g if
Collection 1 Scatter Plot
With all mammals:slope = 1.58
and r = .77
Maximum_ i e_Span = 1.96Average_ i e_Span + 6.3; r2 = 0.64
0
10
20
30
40
50
60
70
80
0 5 10 15 20 25 30 35 40 45
Av r g if
Collection 1 Scatter Plot
Without hippos:slope = 1.96
and r = .80
Maximum_ i e_Span = 1.53Average_ i e_Span + 11; r2 = 0.52
0
10
20
30
40
50
60
70
80
0 5 10 15 20 25 30 35 40 45
Av r g if
Collection 1 Scatter Plot
Without elephant:slope = 1.53
and r = .72
Maximum_ i e_Span = 1.69Average_ i e_Span + 8.1; r2 = 0.69
0
10
20
30
40
50
60
70
80
0 5 10 15 20 25 30 35 40 45
Av r g if
Collection 1 Scatter Plot
Without beaver:slope = 1.69
and r = .83
8/7/2019 3.3 Power Point
10/19
The Anscombe sets:The Anscombe sets: M
ake a scatterplot, find the LSR equation,M
ake a scatterplot, find the LSR equation,and find the correlation:and find the correlation:
Set 1:Set 1: xx yy
1010 8.048.04
88 6.956.95
1313 7.587.5899 8.818.81
1111 8.338.33
1414 9.969.96
66 7.247.2444 4.264.26
12 10.8412 10.84
77 4.824.82
55 5.685.68
xy 5.!
r = .82
8/7/2019 3.3 Power Point
11/19
Make a scatterplot, find the LSR equation,M
ake a scatterplot, find the LSR equation,and find the correlation:and find the correlation:
Set 2:Set 2: xx yy
1010 9.149.14
88 8.148.14
1313 8.748.7499 8.778.77
1111 9.269.26
1414 8.108.10
66 6.136.1344 3.103.10
12 9.1312 9.13
77 7.267.26
55 4.744.74
The Anscombe sets:The Anscombe sets:
xy 5.!
r = .82
8/7/2019 3.3 Power Point
12/19
Make a scatterplot, find the LSR equation,M
ake a scatterplot, find the LSR equation,and find the correlation:and find the correlation:
Set 3:Set 3: xx yy
1010 7.467.46
88 6.776.77
13 12.7413 12.7499 7.117.11
1111 7.817.81
1414 8.848.84
66 6.086.0844 5.395.39
12 8.1512 8.15
77 6.426.42
55 5.735.73
The Anscombe sets:The Anscombe sets:
xy 5.!
r = .82
8/7/2019 3.3 Power Point
13/19
Make a scatterplot, find the LSR equation,M
ake a scatterplot, find the LSR equation,and find the correlation:and find the correlation:
Set 4:Set 4: xx yy
88 6.956.95
88 5.765.76
88 7.717.7188 8.848.84
88 8.478.47
88 7.047.04
88 5.255.2519 12.5019 12.50
8 5.568 5.56
88 7.917.91
88 6.896.89
The Anscombe sets:The Anscombe sets:
xy 5.!
r = .82
8/7/2019 3.3 Power Point
14/19
QuestionsQuestions
Which plots have a point influential withWhich plots have a point influential with
respect to the slope of the LSR line? Howrespect to the slope of the LSR line? How
would the slope change if the point waswould the slope change if the point was
removed?removed?
Which plots have a that is influential withWhich plots have a that is influential with
respect to the correlation? How would therespect to the correlation? How would the
correlation change if the point wascorrelation change if the point wasremoved?removed?
8/7/2019 3.3 Power Point
15/19
Matching:Matching:
8/7/2019 3.3 Power Point
16/19
Beware the Lurking Variable!Beware the Lurking Variable!
8/7/2019 3.3 Power Point
17/19
Imported Goods and Spending on HealthImported Goods and Spending on Health
The explanatory variable isThe explanatory variable is
the dollar value of goodsthe dollar value of goodsimported into the USimported into the US
between 1990 and 2001.between 1990 and 2001.
The response variable isThe response variable is
private spending on healthprivate spending on healthin these years.in these years.
8/7/2019 3.3 Power Point
18/19
Are You in a Relationship?Are You in a Relationship?
There is no economic relationshipThere is no economic relationshipbetween these variables. The strongbetween these variables. The strong
association is due entirely to the fact thatassociation is due entirely to the fact thatboth imports and health spending grewboth imports and health spending grewrapidly in these years. The common yearrapidly in these years. The common yearfor each point is a lurking variable. Anyfor each point is a lurking variable. Any
two variables that both increase overtwo variables that both increase overtime will show a strong association. Thistime will show a strong association. Thisdoes not mean that one variable explainsdoes not mean that one variable explainsor influences each other.or influences each other.
8/7/2019 3.3 Power Point
19/19
Hey, Watch it!Hey, Watch it!
Association doesAssociation does notnot
imply causation!!!imply causation!!!
Top Related