Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know...

Correlation• Assume you have two measurements, x and

y, on a set of objects, and would like to know if x and y are related.

• If they are directly related, when x is high, y tends to be high, and when x is low, y tends to be low.

• If they are indirectly related, when x is high, y tends to be low, and when x is low, y tends to be high.

• This suggests summing the products of the z-scores.

1 7.7 64.2

2 10.9 65.7

3 10.3 64.3

4 8.7 64.3

5 9.6 64.6

6 10.3 65.8

7 13.6 67.9

8 12.8 67.5

9 8.7 63.5

10 13.0 66.6

11 10.7 64.1

12 9.9 63.9

13 10.7 65.4

14 9.8 66.4

15 9.9 64.2

16 11.5 66.1

17 11.4 65.5

18 11.4 66.8

19 10.7 64.2

20 8.8 64.4

x ycase

Scatter plot of y vs. x

x (zsc

ore)

y (zsc

ore)

case1 7.7 64.2 -1.9 -0.8 1.55

2 10.9 65.7 0.3 0.3 0.08

3 10.3 64.3 -0.1 -0.8 0.11

4 8.7 64.3 -1.2 -0.8 0.91

5 9.6 64.6 -0.6 -0.5 0.32

6 10.3 65.8 -0.1 0.4 -0.06

7 13.6 67.9 2.0 2.0 4.17

8 12.8 67.5 1.5 1.7 2.62

9 8.7 63.5 -1.2 -1.4 1.66

10 13.0 66.6 1.6 1.0 1.70

11 10.7 64.1 0.1 -0.9 -0.11

12 9.9 63.9 -0.4 -1.1 0.44

13 10.7 65.4 0.1 0.1 0.01

14 9.8 66.4 -0.5 0.9 -0.42

15 9.9 64.2 -0.4 -0.8 0.34

16 11.5 66.1 0.6 0.6 0.42

17 11.4 65.5 0.6 0.2 0.10

18 11.4 66.8 0.6 1.2 0.69

19 10.7 64.2 0.1 -0.8 -0.10

20 8.8 64.4 -1.1 -0.7 0.77

mean 10.5 65.3 0.0 0.0 0.8

x y xzyz

Correlationcoefficient

(Pearson’s) r

Properties of r

• r ranges from -1.0 to +1.0

• r = 1 means perfect linear relationship

• r = -1 means perfect linear relationship with negative slope

• r = 0 means no correlation

r = 0.80

r = 0.35

Example scatterplots

r = -.24 r = .41

r = 0

r = -.66 r = .94

Correlation and causation

• “Correlation does not imply causation”

• More precisely, x correlated with y does not imply x causes y, because

• correlation could be a type I error

• y could cause x

• z could cause both x and y

Uncorrelated does not mean independent

x

y

x is highly predictive of y, but r = 0

Significance test for r

• The aim is to test the null hypothesis that the population correlation ρ (rho) is 0.

• The larger n, the less likely a given r will happen under the null hypothesis.

From r and n, we can compute a p-value

From n and α, we can compute a critical r

• Numerical example

Regression

• Correlation suggests a linear model of y as a function of x

• A linear model is defined by

ŷ = mx + b + e

random error with mean 0equation for a line

slope interceptpredicted y

x

regression y

residuals

x

e

R2 = 0.5336, F = 20.5949, p = 0.0003

Regression line: y = -1.18 x + 18.77

r vs. R2

•R2 is actually the square of r. So why is it capitalized and squared in a regression?

•r ranges from -1 to 1.

•But in a regression, r cannot meaningfully be negative, because it is the correlation between y and ŷ. Since ŷ is the best estimate of y, this correlation is automatically positive.

•The capitalization and squaring reflects this situation.

•It is squared to

Interpretation of R2

•R2 can be interpreted as the proportion of the variance accounted for

•R2 = 1 -SSerror

SStotal

SSreg

SStotal=

regression line

mean

R2 is high when the unexplained (residual) variance is small relative to the total amount of variance

Simpson’s paradox

Size of animallen

gth

of

ears

Negatively correlated

Or positively correlated?Rabbits

Humans

Whales

Adding a variable can change the sign of the correlation

Effect size•Beyond computing significance, we

often need an estimate of the magnitude of an effect.

•There are two basic ways of expressing this:

- Normalized mean difference

- Proportion of variance accounted for

The normalized difference between means

Cohen’s d

expresses how the difference between two means relative to the spread in the data.

Proportion of variance accounted for• R2 can be interpreted as the proportion

of all the variance in the data that is predicted by a regression model

• η2 (eta squared) can be interpreted as the proportion of all variance in a factorial design that is accounted for by main effects and interactions

Power

•Power is the probability of finding an effect of a given size in your experiment, i.e.

•The probability of rejecting the null hypothesis if the null hypothesis is actually false.

Outliers• An outlier is a measurement that is so discrepant

from others that it seems “suspicious.”

• If p(xsuspicious|distribution) is low enough, we “reject the null hypothesis” that xsuspicious came from the same distribution as the others, and remove it.

• A common rule of thumb is z > 2.5 (or 2 or 3), BUT...

• But also consider transforms that avoid outliers in the first place, like 1/x.

• Removed data is best NOT REPLACED. But if it must be replaced, do so “conservatively,” i.e. in a manner biased towards the null hypothesis.

Chi squared•Assume that mutually K exclusive

outcomes are predicted to occur E1,E2,...,EK, times

•...but are actually observed to occur N1,N2,...,NK times respectively...

•A chi-square test allows us to evaluate the null hypothesis that the proportions were as expected, with deviations “by chance.”

Performing a chi-squared test

•For each outcome, compute

•Sum them up over all outcomes

•Then, under the null hypothesis, this total will be distributed as a χ2 distribution with n-1 degrees of freedom.

The Bayesian perspective• Conventional statistics is based on a

frequentist definition of probability, which insists that hypotheses do not have “probabilities.”

→ All we can do is “reject” H, or not reject it.

• Bayesian inference is based on a subjectivist definition of probability, which considers p(H) to be the “degree of belief” in hypothesis H, simply expressing our uncertainty about H in light of the data.

→ Instead of accepting or rejecting, we seek p(H|E).

Cartoon 1: Fisher• Fisher: Given the sampling distribution of the

null p(E|H0), consider the likelihood of the null hypothesis, integrated out to the tail. If this probability is low, this tends to contradict the null hypothesis.

• In fact, if it is lower than .05, we informally “reject” the null.

0

p(E|H0)

E

probabilitydensity

Cartoon 2: Neyman & Pearson• N&P: There are really two hypotheses, the null H0 and some

alternative H1.

• Our main goal is to avoid a Type I error. So set this probability at α, which determines our criterion for rejecting the null.

• Note though that there is also a possibility of making a Type II error, a hit, or a correct rejection.

• Compute power and set sample size to control the probability of a Type II error.

0

p(E|H0)

Expected effect size

p(E|H1)

μ1E

probabilitydensity

Cartoon 3: Bayes (/Laplace/Jeffreys)

• What we really want is to evaluate how strongly our data favors either hypothesis, not just make an accept/reject decision.

• For each H, the degree of belief in it, conditioned on the data, is p(H|E). So to evaluate the relative strength of the H1 and H0, consider the posterior ratio

• This expresses how strongly the data and priors favor H1 relative to H0, taking into account everything we know about the situation.

Degree of belief in H1

Degree of belief in H0=

Decomposing the posterior ratio

posterior ratio = prior ratio × likelihood ratio• If you want to be “unbiased”, set the prior ratio to 1, sometimes called an “uninformative prior.”

•Then your posterior belief about H0 and H1 depends entirely on the likelihood ratio, aka “Bayes factor.”

Visualizing the Likelihood Ratio

0Expected effect size

μ1

=

p(E|H0) p(E|H1)

height of green bar at Eheight of red bar at E

E

probabilitydensity

Interpretation of likelihood ratios

• LR = 1 means the evidence was neutral about which hypothesis was correct.

• LR > 1 means the evidence favors the hypothesis.

• Jeffreys (1939) suggested rules of thumb, e.g. LR > 3 means “substantial” evidence in favor of H1, LR >10 means “strong,” evidence etc.

• LR < 1 means the evidence actually favored the null hypothesis.

LRs vs. p-values

• Likelihood ratios and p-values are not at all the same thing.

• But in practice, they are related.

Dixon (1998)

Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know...

Documents

Transcript of Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know...