Correlation Pearson

37
Correlation Analysis

description

Queantitative methods for beginners . Learn statistics

Transcript of Correlation Pearson

Correlation Analysis

LEARNING OBJECTIVES

❶ Understand and interpret the terms dependent and independent variable.

❷ Calculate and interpret the coefficient of correlation, the coefficient of determination, and the standard error of estimate.

❸ Pearson’s Product Moment Correlation of Coefficient, rp

❹ Spearman’s Rank Correlation of Coefficient, rs

2

3

Correlation Analysis and Scatter Diagram

Correlation Analysis is the study of the relationship

between variables. It is also defined as group of

techniques to measure the association between

two variables.

A Scatter Diagram is a chart that portrays the

relationship between the two variables. It is the

usual first step in correlations analysis

4

Dependent vs. Independent Variable

DEPENDENT VARIABLE The variable that is being predicted or estimated. It is scaled on the Y-axis.

INDEPENDENT VARIABLE

The variable that provides the basis for estimation. It is the predictor variable. It is scaled on the X-axis.

5

Scatter Diagram

6

The Coefficient of Correlation, r

The Coefficient of Correlation (r) is a measure of

the strength of the relationship between two

variables. It requires interval or ratio-scaled data.

It can range from -1.00 to 1.00.

Values of -1.00 or 1.00 indicate perfect and strong

correlation.

Values close to 0.0 indicate weak correlation.

Negative values indicate an inverse relationship

and positive values indicate a direct relationship.

7

Perfect Correlation

8

Scatter Plots and Correlation

A scatter plot (or scatter diagram) is used to show the relationship between two quantitative variables

The linear relationship can be:

• Positive – as x increases, y increases

»As advertising dollars increase, sales increase

• Negative – as x increases, y decreases

»As expenses increase, net income decrease

9

Scatter Plot Examples

y

x

y

x

y

y

x

x

Linear relationships Curvilinear relationships

10

Scatter Plot Examples

y

x

y

x

y

y

x

x

Strong relationships Weak relationships

(continued)

11

Scatter Plot Examples

y

x

y

x

No relationship

12

Correlation Coefficient - Interpretation

13 r = +.3 r = +1

Examples of Approximate r Values

y

x

y

x

y

x

y

x

y

x

r = -1 r = -.6 r = 0

14

Pearson’s Product Moment Correlation of Coefficient, rp

Formula,

where

n = number of paired observations

r = Sample correlation coefficient

x = Value of the independent variable

y = Value of the dependent variable

2222 YYnXXn

YXXYnrp

15

The duration of the last 9 business trips made by an employee and the corresponding expenses claimed are shown in the following table.

No. of days 3 5 2 1 3 4 1 3 1

Expenses ($) 100 300 90 30 240 200 150 170 60

Calculate the product moment correlation of coefficient between the number of days and expenses.

Solution Let X be the number of days Y be the expenses

16

X Y XY X2 Y2

3 100 300 9 10,000

5 300 1500 25 90,000

2 90 180 4 8,100

1 30 30 1 900

3 240 720 9 57,600

4 200 800 16 40,000

1 150 150 1 22,500

3 170 510 9 28,900

1 60 60 1 3,600

Total 23 1340 4250 75 261,600

17

The coefficient of correlation, rp = +0.8228, indicates that there is a ___________ _______________ correlation between the number of days and expenses.

8226.0

2222

YYnXXn

YXXYnrp

18

Question:

The following sample observations

were randomly selected.

X: 4 5 3 6 10

Y: 4 6 5 7 7

Compute rp. Interpret your answer.

19

Spearman’s Rank Correlation of Coefficient, rs

Developed in the 1920s by Charles

Spearman (British psychologist).

Based on rank-order scores.

Works correctly even if the original scores

are nonnumeric.

Much less affected by outliers.

20

Spearman’s Rank Correlation of Coefficient, rs

Formula,

where

d = r1 – r2

r1 = ranks for x

r2 = ranks for y

16

12

2

nn

drs

21

Example 1: Rank Correlation of Coefficient (Data has already been ranked)

A German language teacher takes a group of 5 students. She rank orders them in order of how confident they are when speaking (1 - extremely confident, 5 - not at all confident) and wants to correlate this with performance in the oral examination. A different teacher has given ratings of how well the students spoke in the oral exam (1 - hopeless, 5 - excellent).The following table was obtained as a result. Compute rs and interpret. Person Confidence Oral exam performance A 5 2 B 4 4 C 1 5 D 3 3 E 2 5

22

Example 1: Rank Correlation of Coefficient (Data has already been ranked)

Solution

Let r1 = rankings of confidence

r2 = rankings of oral exam performance

Person r1 r2 d d2

A

B

C

D

E

Total 34

23

Example 1: Rank Correlation of Coefficient (Data has already been ranked)

The coefficient of rank correlation, rs = - 0.7, indicates that there is a ___________ ____________ between the confidence and oral exam performance in their rankings.

7.0

7.11

1

61

2

2

nn

drs

24

Example 2: Rank Correlation of Coefficient (Data has not yet been ranked)

The following data relates to the marks obtained by 5 students in the Economics and Statistics examinations. Compute rs and interpret. Marks Student Economics Statistics 1 36 52 2 98 91 3 75 68 4 65 53 5 82 62

25

Solution

Let X = marks in Economics

Y = marks in Statistics

r1 = ranks for X

r2 = ranks for Y

Student X Y r1 r2 d d2 1 36 52 0 0 2 98 91 0 0 3 75 68 1 1 4 65 53 0 0 5 82 62 -1 1 Total 2

Example 2: Rank Correlation of Coefficient (Data has not yet been ranked)

26

Solution

Let X = marks in Economics

Y = marks in Statistics

r1 = ranks for X

r2 = ranks for Y

Student X Y r1 r2 d d2 1 36 52 5 5 0 0 2 98 91 1 1 0 0 3 75 68 3 2 1 1 4 65 53 4 4 0 0 5 82 62 2 3 -1 1 Total 2

Example 2: Rank Correlation of Coefficient (Data has not yet been ranked)

27

Example 2: Rank Correlation of Coefficient (Data has not yet been ranked)

The coefficient of rank correlation, rs = +0.9,

indicates that there is a ___________

_____________ between the rankings in both

subjects.

9.0

1.01

1

61

2

2

nn

drs

28

Example 3: Rank Correlation of Coefficient (Tied Ranks)

The following data relates to the marks obtained by 5 students in Accounting and Costing examinations. Compute rs and interpret.

Marks Student Accounting Costing 1 86 91 2 86 82 3 77 68 4 63 77 5 89 77

29

Solution

Let X = marks in Accounting

Y = marks in Costing

r1 = ranks for X

r2 = ranks for Y

Student X Y r1 r2 d d2 1 86 91 1.5 2.25 2 86 82 0.5 0.25 3 77 68 -1 1 4 63 77 1.5 2.25 5 89 77 -2.5 6.25 Total 12

Example 3: Rank Correlation of Coefficient (Tied Ranks)

2.5 1 2.5 2 4 5 5 3.5 1 3.5

30

Example 3: Rank Correlation of Coefficient (Tied Ranks)

The coefficient of rank correlation, rs = +0.4,

indicates that there is a ___________

____________ between the rankings in both

subjects.

4.0

6.01

1

61

2

2

nn

drs

31

Question:

The following sample observations

were randomly selected.

X: 4 5 3 6 10

Y: 4 6 5 7 7

Compute rs. Interpret your answer.

32

Coefficient of Determination

The coefficient of determination (r2) is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X).

It is the square of the coefficient of correlation.

It ranges from 0 to 1.

It does not give any information on the direction of the relationship between the variables.

33

Coefficient of Determination (r2) - Example

Recall Example 1

The coefficient of determination, r2 ,is 0.677,

found by (0.8228)2

This is a proportion or a percent; we can say that

67.7 percent of the variation in the expenses is

explained, or accounted for, by the variation in the

number of days.

34

Testing the Significance of the Correlation Coefficient

H0: = 0 (the correlation in the population is 0)

H1: ≠ 0 (the correlation in the population is not 0)

Reject H0 if:

t > t/2,n-2 or t < -t/2,n-2

35

Testing the Significance of the Correlation Coefficient - Example

H0: = 0 (the correlation in the population is 0)

H1: ≠ 0 (the correlation in the population is not 0)

Reject H0 if:

t > t/2,n-2 or t < -t/2,n-2

t > t0.025,8 or t < -t0.025,8

t > 2.306 or t < -2.306

36

Testing the Significance of the Correlation Coefficient - Example

The computed t (3.297) is within the rejection region, therefore, we will reject H0. This means the correlation in the population is not zero. From a practical standpoint, it indicates to the sales manager that there is correlation with respect to the number of sales calls made

and the number of copiers sold.

J

37