Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two...
-
Upload
alaina-glenn -
Category
Documents
-
view
260 -
download
3
Transcript of Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two...
![Page 1: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/1.jpg)
Chapter 7: CorrelationBivariate distribution: a distribution that shows the relation between two variables
-2 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.30.4
0.5
0.6
0.7
0.8
0.9
1
Area of primary visual cortex
Vis
ual A
cuity
Left hemisphereRight hemisphere
This graph is called a scatter plot or scatter diagram
![Page 2: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/2.jpg)
How do we quantify the strength of the relationship between the two variables in a bivariate distribution?
![Page 3: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/3.jpg)
How do we quantify the strength of the relationship between the two variables in a bivariate distribution?
![Page 4: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/4.jpg)
Example from the book:Two measures made for each subject – stress level and eating difficulties
Stress E.D.
17 9
8 13
8 7
20 18
14 11
7 1
21 5
22 15
19 26
30 28 5 10 15 20 25 30 35
5
10
15
20
25
Stress
Eat
ing
Diff
icu
ltie
s
![Page 5: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/5.jpg)
The most common way to quantify the relation between the two variables in a bivariate distribution is the Pearson correlation coefficient, labeled r. r is always between -1 and 1.
The z-score formula is the most intuitive formula:
17 9
8 13
8 7
20 18
14 11
7 1
21 5
22 15
19 26
30 28
X Y
16.60
7.02
13.30
8.28
mx =
sx =
my =
sy =
zx zy zxzy
0.06 -0.52 -0.03
-1.23 -0.04 0.04
-1.23 -0.76 0.93
0.48 0.57 0.27
-0.37 -0.28 0.10
-1.37 -1.48 2.03
0.63 -1.00 -0.63
0.77 0.21 0.16
0.34 1.53 0.52
1.91 1.77 3.39
yxzz 6.68
raw scores z scores
Example: use the z-score formula to calculate r: n
zzr yx
68.0n
zzr yx
![Page 6: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/6.jpg)
17 9
8 13
8 7
20 18
14 11
7 1
21 5
22 15
19 26
30 28
x y
0.06 -0.52 -0.03
-1.23 -0.04 0.04
-1.23 -0.76 0.93
0.48 0.57 0.27
-0.37 -0.28 0.10
-1.37 -1.48 2.03
0.63 -1.00 -0.63
0.77 0.21 0.16
0.34 1.53 0.52
1.91 1.77 3.39
zx zy zxzy
How does each data point contribute to the correlation value?
30
mx
my
Points in the upper right or lower left quadrants add to the correlation valuePoints in the upper left or lower right subtract to the correlation value.
5 10 15 20 25 30 35
5
10
15
20
25
Stress
Eat
ing
Diff
icu
ltie
s
r = 0.68
![Page 7: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/7.jpg)
Fun fact about the Pearson correlation statistic
Since the z-scores do not change when you add or multiply the raw scores, the Pearson correlation doesn’t change either.
multiplying y by 2 and adding
100
10 20 30
5
10
15
20
25
Stress
Ea
ting
Diff
icu
ltie
s
r = 0.68
0 20 40
110
120
130
140
150
StressE
atin
g D
iffic
ulti
es
r = 0.68
n
zzr yx
![Page 8: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/8.jpg)
Similarly, the correlation stays the same no matter how you stretch your axes:
As a rule, you should plot your axes with an equal scale.
10 20 30
5
10
15
20
25
StressE
atin
g D
iffic
ulti
es
r = 0.68
0 20 400
5
10
15
20
25
30
Stress
Eat
ing
Diff
icu
ltie
s
r = 0.68
5 10 15 20 25 300
10
20
30
Stress
Eat
ing
Diff
icu
ltie
s
r = 0.68
![Page 9: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/9.jpg)
Guess that correlation!
50 55 60 65 70 75 80
55
60
65
70
75
Average of parent's height (in)
Stu
dent
's h
eig
ht (
in)
n = 90, r = 0.34
![Page 10: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/10.jpg)
Guess that correlation!
58 60 62 64 66 68 70 72
66
68
70
72
74
76
78
Father‘s height (in)
Mal
e s
tud
ent's
hei
ght (
in)
n = 21, r = 0.34
![Page 11: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/11.jpg)
50 55 60 65 70 75 80 85
50
55
60
65
70
75
Mother's height (in)
Fe
ma
le s
tude
nt's
hei
ght (
in)
n = 70, r = 0.68
![Page 12: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/12.jpg)
Guess that correlation!
2.5 3 3.5 4
2.5
3
3.5
4
High School GPA
UW
GP
A
n = 90, r = 0.19
![Page 13: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/13.jpg)
Guess that correlation!
0 5 10 15 20 25
5
6
7
8
9
10
11
Caffeine (cups/day)
Sle
ep
(ho
urs/
nigh
t)
n = 91, r = -0.12
![Page 14: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/14.jpg)
Guess that correlation!
0 5 10 15 20 25
0
5
10
15
20
25
30
Caffeine (cups/day)
Drin
ks (
per
we
ek)
n = 91, r = 0.01
![Page 15: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/15.jpg)
Guess that correlation!
0 2 4 6 8
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Facebook friends
Drin
ks (
per
we
ek)
n = 91, r = 0.10
![Page 16: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/16.jpg)
Guess that correlation!
30 40 50 60 70 80 90
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Favorite outdoor temperature (F)
Vid
eo
gam
e p
layi
ng (
hou
rs/w
eek)
n = 91, r = -0.19
![Page 17: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/17.jpg)
0 20 40 60 80 100
70
80
90
100
110
120
130
140
x
y
r = -0.56
Guess that correlation!
![Page 18: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/18.jpg)
10 20 30 40 50 60
105
110
115
120
125
130
135
140
145
150
x
y
r = 0.94
Guess that correlation!
![Page 19: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/19.jpg)
10 20 30 40 50 60 70 80 90
100
110
120
130
140
150
160
x
y
r = 0.08
Guess that correlation!
![Page 20: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/20.jpg)
-20 -15 -10 -5 0 5
135
140
145
150
155
x
y
r = -1.00
Guess that correlation!
![Page 21: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/21.jpg)
-40 -30 -20 -10 0 10 20 30 40
80
90
100
110
120
130
140
x
y
r = -0.08
Guess that correlation!
![Page 22: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/22.jpg)
-50 0 50 100
80
100
120
140
160
180
200
220
240
x
y
r = 0.49
Guess that correlation!
![Page 23: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/23.jpg)
-20 -10 0 10 20 30 40 50 60 700
10
20
30
40
50
60
70
x
y
r = -0.92
Guess that correlation!
![Page 24: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/24.jpg)
-40 -20 0 20 40 60
130
140
150
160
170
180
190
200
210
220
x
y
r = -0.77
Guess that correlation!
![Page 25: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/25.jpg)
r is a measure of the linear relation between two variables
-2 -1 0 1 2
0
0.5
1
1.5
2
2.5
3
3.5
4
x
y
r = 0.01
![Page 26: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/26.jpg)
-1.5 -1 -0.5 0 0.5 1 1.5
-1
-0.5
0
0.5
1
x
y
r = 0.00
Guess that correlation!
![Page 27: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/27.jpg)
-1 -0.5 0 0.5 1
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
x
y
r = 0.91
Guess that correlation!
![Page 28: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/28.jpg)
n
zzr yx
Z-Score formula for calculating r (intuitive, but not very practical)
Deviation-Score formula for calculating r: (somewhat intuitive, somewhat more practical)
YX SnS
YYXXr
))((
Substituting the formula for z:
XS
XXz
Computational formula for calculating r: (less intuitive, more practical)
YX SSSS
YYXXr
))((
![Page 29: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/29.jpg)
Computational formula for calculating r: (less intuitive, more practical)
YX SSSS
YYXXr
))((
A little algebra shows that:
n
YXXYYYXX ))((
Computational raw score formula for calculating r: (least intuitive, most practical)
YX SSSSn
YXXY
r
![Page 30: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/30.jpg)
Using the Computational raw-score formula:
n X Y X2 Y2 XY10 17 9 289 81 153
8 13 64 169 1048 7 64 49 56
20 18 400 324 36014 11 196 121 154
7 2 49 4 1421 5 441 25 10522 15 484 225 33019 26 361 676 49430 28 900 784 840
Totals 166 134 3248 2458 2610
SSX 492.4SSy 662.4
r 0.675
yxSSSSn
YXXY
r
n
XXSSx
22
n
YYSSY
22
![Page 31: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/31.jpg)
A second measure of correlation, called the Spearman Rank-Order Coefficient is appropriate for ordinal scores. It is calculated by:
Where D is the difference between each pair of ranks.
Most often used when:
a) At least one variable is an ordinal scaleb) One of the distributions is very skewed or has outliers
)1(
61
2
2
nn
Drs
![Page 32: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/32.jpg)
Fact: (According to Wikipedia anyway)
In 1995, National Pax had planned to replace the "Sir Isaac Lime" flavor with "Scarlett O'Cherry," until a group of Orange County, California fourth-graders created a petition in opposition and picketed the company's headquarters in early 1996. The crusade also included an e-mail campaign, in which a Stanford professor reportedly accused the company of "Otter-cide." After meeting with the children, company executives relented and retained the Sir Isaac Lime flavor.[1]
Example: Is there a correlation between your preference for Otter Pops® flavors and mine?
![Page 33: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/33.jpg)
Example: Suppose two wine experts were asked to rank-order their preference for eight wines. How can we measure the similarity of their rankings?
X Y Rank X Rank Y D D2
1 2 1 2 -1 12 1 2 1 1 13 5 3 5 -2 44 3 4 3 1 15 4 5 4 1 16 7 6 7 -1 17 8 7 8 -1 18 6 8 6 2 4
n=8 14 2D
833.)18(8
)14)(6(1
2
sr
)1(
61
2
2
nn
Drs
![Page 34: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/34.jpg)
Pearson correlation is much more sensitive to outlying values than the Spearman coefficient.
From: http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
![Page 35: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/35.jpg)
Pearson correlation is much more sensitive to outlying values than the Spearman coefficient.
0 5 10 15 20
5
6
7
8
9
10
11
Caffeine (cups/day)
Sle
ep
(ho
urs/
nigh
t)
n = 91Pearson's r = -0.12
Spearman's rs = 0.02
0 5 10 15 20
5
6
7
8
9
10
11
Caffeine (cups/day)
Sle
ep
(ho
urs/
nigh
t)
n = 89Pearson's r = 0.06
Spearman's rs = 0.07
![Page 36: Chapter 7: Correlation Bivariate distribution: a distribution that shows the relation between two variables -2-1.9-1.8-1.7-1.6-1.5-1.4-1.3 0.4 0.5 0.6.](https://reader033.fdocuments.us/reader033/viewer/2022061520/56649da25503460f94a8e830/html5/thumbnails/36.jpg)
Only the rank order matters for the Spearman coefficient
-0.5 0 0.5
-0.5
0
0.5
1
X
Y
Pearson r: 0.92Spearman r s: 1.00