Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know...
-
Upload
dennis-butler -
Category
Documents
-
view
213 -
download
0
Transcript of Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know...
Correlation• Assume you have two measurements, x and
y, on a set of objects, and would like to know if x and y are related.
• If they are directly related, when x is high, y tends to be high, and when x is low, y tends to be low.
• If they are indirectly related, when x is high, y tends to be low, and when x is low, y tends to be high.
• This suggests summing the products of the z-scores.
1 7.7 64.2
2 10.9 65.7
3 10.3 64.3
4 8.7 64.3
5 9.6 64.6
6 10.3 65.8
7 13.6 67.9
8 12.8 67.5
9 8.7 63.5
10 13.0 66.6
11 10.7 64.1
12 9.9 63.9
13 10.7 65.4
14 9.8 66.4
15 9.9 64.2
16 11.5 66.1
17 11.4 65.5
18 11.4 66.8
19 10.7 64.2
20 8.8 64.4
x ycase
Scatter plot of y vs. x
x (zsc
ore)
y (zsc
ore)
case1 7.7 64.2 -1.9 -0.8 1.55
2 10.9 65.7 0.3 0.3 0.08
3 10.3 64.3 -0.1 -0.8 0.11
4 8.7 64.3 -1.2 -0.8 0.91
5 9.6 64.6 -0.6 -0.5 0.32
6 10.3 65.8 -0.1 0.4 -0.06
7 13.6 67.9 2.0 2.0 4.17
8 12.8 67.5 1.5 1.7 2.62
9 8.7 63.5 -1.2 -1.4 1.66
10 13.0 66.6 1.6 1.0 1.70
11 10.7 64.1 0.1 -0.9 -0.11
12 9.9 63.9 -0.4 -1.1 0.44
13 10.7 65.4 0.1 0.1 0.01
14 9.8 66.4 -0.5 0.9 -0.42
15 9.9 64.2 -0.4 -0.8 0.34
16 11.5 66.1 0.6 0.6 0.42
17 11.4 65.5 0.6 0.2 0.10
18 11.4 66.8 0.6 1.2 0.69
19 10.7 64.2 0.1 -0.8 -0.10
20 8.8 64.4 -1.1 -0.7 0.77
mean 10.5 65.3 0.0 0.0 0.8
x y xzyz
Correlationcoefficient
(Pearson’s) r
Properties of r
• r ranges from -1.0 to +1.0
• r = 1 means perfect linear relationship
• r = -1 means perfect linear relationship with negative slope
• r = 0 means no correlation
r = 0.80
r = 0.35
Example scatterplots
r = -.24 r = .41
r = 0
r = -.66 r = .94
Correlation and causation
• “Correlation does not imply causation”
• More precisely, x correlated with y does not imply x causes y, because
• correlation could be a type I error
• y could cause x
• z could cause both x and y
Uncorrelated does not mean independent
x
y
x is highly predictive of y, but r = 0
Significance test for r
• The aim is to test the null hypothesis that the population correlation ρ (rho) is 0.
• The larger n, the less likely a given r will happen under the null hypothesis.
From r and n, we can compute a p-value
From n and α, we can compute a critical r
• Numerical example
Regression
• Correlation suggests a linear model of y as a function of x
• A linear model is defined by
ŷ = mx + b + e
random error with mean 0equation for a line
slope interceptpredicted y
x
regression y
residuals
x
e
R2 = 0.5336, F = 20.5949, p = 0.0003
Regression line: y = -1.18 x + 18.77
r vs. R2
•R2 is actually the square of r. So why is it capitalized and squared in a regression?
•r ranges from -1 to 1.
•But in a regression, r cannot meaningfully be negative, because it is the correlation between y and ŷ. Since ŷ is the best estimate of y, this correlation is automatically positive.
•The capitalization and squaring reflects this situation.
•It is squared to
Interpretation of R2
•R2 can be interpreted as the proportion of the variance accounted for
•R2 = 1 -SSerror
SStotal
SSreg
SStotal=
regression line
mean
R2 is high when the unexplained (residual) variance is small relative to the total amount of variance
Simpson’s paradox
Size of animallen
gth
of
ears
Negatively correlated
Or positively correlated?Rabbits
Humans
Whales
Adding a variable can change the sign of the correlation
Effect size•Beyond computing significance, we
often need an estimate of the magnitude of an effect.
•There are two basic ways of expressing this:
- Normalized mean difference
- Proportion of variance accounted for
The normalized difference between means
Cohen’s d
expresses how the difference between two means relative to the spread in the data.
Proportion of variance accounted for• R2 can be interpreted as the proportion
of all the variance in the data that is predicted by a regression model
• η2 (eta squared) can be interpreted as the proportion of all variance in a factorial design that is accounted for by main effects and interactions
Power
•Power is the probability of finding an effect of a given size in your experiment, i.e.
•The probability of rejecting the null hypothesis if the null hypothesis is actually false.
Outliers• An outlier is a measurement that is so discrepant
from others that it seems “suspicious.”
• If p(xsuspicious|distribution) is low enough, we “reject the null hypothesis” that xsuspicious came from the same distribution as the others, and remove it.
• A common rule of thumb is z > 2.5 (or 2 or 3), BUT...
• But also consider transforms that avoid outliers in the first place, like 1/x.
• Removed data is best NOT REPLACED. But if it must be replaced, do so “conservatively,” i.e. in a manner biased towards the null hypothesis.
Chi squared•Assume that mutually K exclusive
outcomes are predicted to occur E1,E2,...,EK, times
•...but are actually observed to occur N1,N2,...,NK times respectively...
•A chi-square test allows us to evaluate the null hypothesis that the proportions were as expected, with deviations “by chance.”
Performing a chi-squared test
•For each outcome, compute
•Sum them up over all outcomes
•Then, under the null hypothesis, this total will be distributed as a χ2 distribution with n-1 degrees of freedom.
The Bayesian perspective• Conventional statistics is based on a
frequentist definition of probability, which insists that hypotheses do not have “probabilities.”
→ All we can do is “reject” H, or not reject it.
• Bayesian inference is based on a subjectivist definition of probability, which considers p(H) to be the “degree of belief” in hypothesis H, simply expressing our uncertainty about H in light of the data.
→ Instead of accepting or rejecting, we seek p(H|E).
Cartoon 1: Fisher• Fisher: Given the sampling distribution of the
null p(E|H0), consider the likelihood of the null hypothesis, integrated out to the tail. If this probability is low, this tends to contradict the null hypothesis.
• In fact, if it is lower than .05, we informally “reject” the null.
0
p(E|H0)
E
probabilitydensity
Cartoon 2: Neyman & Pearson• N&P: There are really two hypotheses, the null H0 and some
alternative H1.
• Our main goal is to avoid a Type I error. So set this probability at α, which determines our criterion for rejecting the null.
• Note though that there is also a possibility of making a Type II error, a hit, or a correct rejection.
• Compute power and set sample size to control the probability of a Type II error.
0
p(E|H0)
Expected effect size
p(E|H1)
μ1E
probabilitydensity
Cartoon 3: Bayes (/Laplace/Jeffreys)
• What we really want is to evaluate how strongly our data favors either hypothesis, not just make an accept/reject decision.
• For each H, the degree of belief in it, conditioned on the data, is p(H|E). So to evaluate the relative strength of the H1 and H0, consider the posterior ratio
• This expresses how strongly the data and priors favor H1 relative to H0, taking into account everything we know about the situation.
Degree of belief in H1
Degree of belief in H0=
Decomposing the posterior ratio
posterior ratio = prior ratio × likelihood ratio• If you want to be “unbiased”, set the prior ratio to 1, sometimes called an “uninformative prior.”
•Then your posterior belief about H0 and H1 depends entirely on the likelihood ratio, aka “Bayes factor.”
Visualizing the Likelihood Ratio
0Expected effect size
μ1
=
p(E|H0) p(E|H1)
height of green bar at Eheight of red bar at E
E
probabilitydensity
Interpretation of likelihood ratios
• LR = 1 means the evidence was neutral about which hypothesis was correct.
• LR > 1 means the evidence favors the hypothesis.
• Jeffreys (1939) suggested rules of thumb, e.g. LR > 3 means “substantial” evidence in favor of H1, LR >10 means “strong,” evidence etc.
• LR < 1 means the evidence actually favored the null hypothesis.
LRs vs. p-values
• Likelihood ratios and p-values are not at all the same thing.
• But in practice, they are related.
Dixon (1998)