Correlation and Covariance

19
Statistics Introduce the statistical concepts of Covariance Correlation Investigate invariance properties Develop computational formulas

Transcript of Correlation and Covariance

Page 1: Correlation and Covariance

Statistics

Introduce the statistical concepts of Covariance Correlation

Investigate invariance propertiesDevelop computational formulas

Page 2: Correlation and Covariance

Covariance

So far, we have been analyzing summary statistics that describe aspects of a single list of numbersFrequently, however, we are interested in how variables behave together

Page 3: Correlation and Covariance

Smoking and Lung Capacity

Suppose, for example, we wanted to investigate the relationship between cigarette smoking and lung capacityWe might ask a group of people about their smoking habits, and measure their lung capacities

Page 4: Correlation and Covariance

Smoking and Lung Capacity

Cigarettes (X) Lung Capacity (Y)

0 45

5 42

10 33

15 31

20 29

Page 5: Correlation and Covariance

Smoking and Lung Capacity

With SPSS, we can easily enter these data and produce a scatterplot.

Smoking

3020100-10

Lung

Cap

acity

50

40

30

20

Page 6: Correlation and Covariance

Smoking and Lung Capacity

We can see easily from the graph that as smoking goes up, lung capacity tends to go down.The two variables covary in opposite directions.We now examine two statistics, covariance and correlation, for quantifying how variables covary.

Page 7: Correlation and Covariance

Covariance

When two variables covary in opposite directions, as smoking and lung capacity do, values tend to be on opposite sides of the group mean. That is, when smoking is above its group mean, lung capacity tends to be below its group mean.Consequently, by averaging the product of deviation scores, we can obtain a measure of how the variables vary together.

Page 8: Correlation and Covariance

The Sample Covariance

Instead of averaging by dividing by N, we divide by . The resulting formula is1N

1

11

N

xy i ii

S X X Y YN

Page 9: Correlation and Covariance

Calculating Covariance

Cigarettes (X) dX dXdY dY Lung

Capacity (Y)0 10 +9 45

5 5 +6 42

10 0 0 3 33

15 +5 5 31

20 +10 7 29

215

Page 10: Correlation and Covariance

Calculating Covariance

So we obtain

1 ( 215) 53.754xyS

Page 11: Correlation and Covariance

Invariance Properties of Covariance

The covariance is invariant under listwise addition, but not under listwise multiplication. Hence, it is vulnerable to changes in standard deviation of the variables, and is not scale-invariant.

Page 12: Correlation and Covariance

Invariance Properties of CovarianceIf , theni i

i i

L aX bdl adx

1

1 1

Let , 1Then

11 1

1 1

i i i i

N

LM i ii

N N

i i i i xyi i

L aX b M cY d

S dl dmN

adx cdy ac dx dy acSN N

Page 13: Correlation and Covariance

Invariance Properties of Covariance

Multiplicative constants come straight through in the covariance, so covariance is difficult to interpret – it incorporates information about the scale of the variables.

Page 14: Correlation and Covariance

The (Pearson) Correlation Coefficient

Like covariance, but uses Z-scores instead of deviations scores. Hence, it is invariant under linear transformation of the raw scores.

1

11

N

xy i ii

r zx zyN

Page 15: Correlation and Covariance

Alternative Formula for the Correlation Coefficient

xyxy

x y

sr

s s

Page 16: Correlation and Covariance

Computational Formulas -- Covariance

There is a computational formula for covariance similar to the one for variance. Indeed, the latter is a special case of the former, since variance of a variable is “its covariance with itself.”

1 1

1

11

N N

i iNi i

xy i ii

X Ys X Y

N N

Page 17: Correlation and Covariance

Computational Formula for Correlation

By substituting and rearranging, you obtain a substantial (and not very transparent) formula for

xyr

2 22 2xy

N XY X Yr

N X X N Y Y

Page 18: Correlation and Covariance

Computing a correlation

Cigarettes (X) XY

Lung Capacity (Y)

0 0 0 2025 455 25 210 1764 4210 100 330 1089 3315 225 465 961 3120 400 580 841 2950 750 1585 6680 180

2X 2Y

Page 19: Correlation and Covariance

Computing a Correlation

2 2

(5)(1585) (50)(180)

(5)(750) 50 (5)(6680) 180

7925 9000(3750 2500)(33400 32400)

1075 .96151250 (1000)

xyr