Correlation. Definition The linear correlation coefficient r measures the strength of the linear...

Post on 28-Dec-2015

234 views 9 download

Transcript of Correlation. Definition The linear correlation coefficient r measures the strength of the linear...

Correlation

Definition

The linear correlation coefficient r measures the strength of the linear relationship between paired x- and y- quantitative values in a sample.

We can often see a relationship between two variables by constructing a scatterplot.

Scatterplots of Paired Data

Scatterplots of Paired Data

Requirements

1. The sample of paired (x, y) data is a random sample.

2. Visual examination of the scatter plot must confirm that the points approximate a certain pattern.

3. The outliers must be removed if they are known to be errors.

Notation for the Linear Correlation Coefficient

n represents the number of pairs of data present.

denotes the addition of the items indicated.

x denotes the sum of all x-values.

x2 indicates that each x-value should be squared and then those squares added.

(x)2 indicates that the x-values should be added and the total then squared.

xy indicates that each x-value should be first multiplied by its corresponding y-value. After obtaining all such products, find their sum.

r represents linear correlation coefficient for a sample.

represents linear correlation coefficient for a population.

nxy – (x)(y)

n(x2) – (x)2 n(y2) – (y)2r =

The linear correlation coefficient r measures the strength of a linear relationship between the paired values in a sample.

Formula

3

5

1

8

3

6

5

4

Datax

y

Example: Calculating r

Using the simple random sample of data below, find the value of r.

Example: Calculating r - cont

nxy – (x)(y)

n(x2) – (x)2 n(y2) – (y)2r =

61 – (12)(23)

4(44) – (12)2 4(141) – (23)2r =

-32

33.466 r = = -0.956

3

5

1

8

3

6

5

4

Datax

y

Example: Calculating r - cont

Properties of the Linear Correlation Coefficient r

1. –1 r 1

2. The value of r does not change if all values of either variable are converted to a different scale.

3. The value of r is not affected by the choice of x and y. Interchange all x- and y-values and the value of r will not change.

4. r measures strength of a linear relationship.

Interpreting r : Explained Variation

The value of r2 is the proportion of the variation in y that is explained by the linear relationship between x and y.

For Example if r = 0.926, we get r2 = 0.857.

We conclude that 0.857 (or about 86%) of the variation in Y can be explained by the linear relationship between X and Y. This implies that 14% of the variation in Y cannot be explained by X

Formal Hypothesis Test

We wish to determine whether there is a significant linear correlation between two variables.

H0: = (no significant linear correlation)

H1: (significant linear correlation)

Test statistic:

Critical values:

Use Tables with degrees of freedom = n – 2

1 – r 2

n – 2

rt =

Test Statistic is t

P-value: Use Tables with degrees of freedom = n – 2

Conclusion: If the absolute value of t is > critical value reject H0 and conclude that there is a linear correlation. If the absolute value of t ≤ critical value, fail to reject H0; there is not sufficient evidence to conclude that there is a linear correlation.

Test Statistic is t(follows format of earlier chapters)

SlideSlide 17

CovarianceMeasure of linear relationship between variables

If the relationship between the random variables is nonlinear, the covariance might not be sensitive to the relationship

SlideSlide 18

Pearson’s Correlation Coeff.Pearson's correlation coefficient between two variables is defined as the covariance of the two variables divided by the product of their standard deviations:

The above formula defines the population correlation coefficient, commonly represented by the Greek letter ρ (rho). Substituting estimates of the covariances and variances based on a sample gives the sample correlation coefficient, commonly denoted r :

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         

SlideSlide 19

Pearson correlation coefficient

Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

The Spearman correlation coefficient is often thought of as being the Pearson correlation coefficient between the ranked variables. In practice, however, a simpler procedure is normally used to calculate ρ. The n raw scores Xi, Yi are converted to ranks xi, yi, and the differences di = xi − yi between the ranks of each observation on the two variables are calculated

SlideSlide 20Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

A Spearman correlation of 1 results when the two variables being compared are monotonically related, even if their relationship is not linear. In contrast, this does not give a perfect Pearson correlation

SlideSlide 21Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

When the data are roughly elliptically distributed and there are no prominent outliers, the Spearman correlation and Pearson correlation give similar values

SlideSlide 22Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

The Spearman correlation is less sensitive than the Pearson correlation to strong outliers that are in the tails of both samples