1 9-2 / 9.3 Correlation and Regression. 2 n xy - ( x)( y) n( x 2 ) - ( x) 2 n( y 2 ) - ( y) 2...

26
1 9-2 / 9.3 Correlation and Regression

Transcript of 1 9-2 / 9.3 Correlation and Regression. 2 n xy - ( x)( y) n( x 2 ) - ( x) 2 n( y 2 ) - ( y) 2...

Page 1: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

1

9-2 / 9.3

Correlation and Regression

Page 2: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

2

nxy - (x)(y)

n(x2) - (x)2 n(y2) - (y)2r =

DefinitionLinear Correlation Coefficient r

measures strength of the linear relationship between paired x and y values in a sample

Page 3: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

3

Formula for b0 and b1

b0 = (y-intercept)(y) (x2) - (x) (xy)

n(xy) - (x) (y)

n(x2) - (x)2

b1 = (slope)n(x2) - (x)2

Page 4: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

4

0.27

2

1.41

3

2.19

3

2.83

6

2.19

4

1.81

2

0.85

1

3.05

5

Data from the Garbage Projectx Plastic (lb)

y Household

Review Calculations

Find the Correlation and the Regression Equation (Line of Best Fit)

Page 5: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

5

0.27

2

1.41

3

2.19

3

2.83

6

2.19

4

1.81

2

0.85

1

3.05

5

Data from the Garbage Projectx Plastic (lb)

y Household

b0 = 0.549

b1= 1.48

Using a calculator:

y = 0.549 + 1.48x

r = 0.842

Review Calculations

Page 6: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

6

Notes on correlation

r represents linear correlation coefficient for a sample (ro) represents linear correlation coefficient for a

population -1 r 1 r measures strength of a linear relationship. -1 is perfect negative correlation & 1 is perfect

positive correlation

Page 7: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

7

Interpreting the Linear Correlation Coefficient

If the absolute value of r exceeds the value in Table A - 6, conclude that there is a significant linear correlation.

Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation.

Page 8: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

8

Formal Hypothesis Test

Two methods

Both methods let H0: = (no significant linear correlation)

H1: (significant linear correlation)

Page 9: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

9

Method 1: Test Statistic is t(follows format of earlier chapters)

Test statistic:

1 - r 2

n - 2

r

Critical values:

use Table A-3 with degrees of freedom = n - 2

t =

Page 10: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

10

Test statistic: r

Critical values: Refer to Table A-6 (no degrees of freedom)

Much easier

Method 2: Test Statistic is r(uses fewer calculations)

Page 11: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

11

TABLE A-6 Critical Values of the Pearson Correlation Coefficient r

456789

101112131415161718192025303540455060708090

100

n

.999

.959

.917

.875

.834

.798

.765

.735

.708

.684

.661

.641

.623

.606

.590

.575

.561

.505

.463

.430

.402

.378

.361

.330

.305

.286

.269

.256

.950

.878

.811

.754

.707

.666

.632

.602

.576

.553

.532

.514

.497

.482

.468

.456

.444

.396

.361

.335

.312

.294

.279

.254

.236

.220

.207

.196

= .05 = .01

Page 12: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

12

0.27

2

1.41

3

2.19

3

2.83

6

2.19

4

1.81

2

0.85

1

3.05

5

Data from the Garbage Projectx Plastic (lb)

y Household

n = 8 = 0.05 H0: = 0

H1 : 0

Test statistic is r = 0.842

Is there a significant linear correlation?

Page 13: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

13

n = 8 = 0.05 H0: = 0

H1 : 0

Test statistic is r = 0.842

Critical values are r = - 0.707 and 0.707(Table A-6 with n = 8 and = 0.05)

TABLE A-6 Critical Values of the Pearson Correlation Coefficient r

456789

101112131415161718192025303540455060708090

100

n.999.959.917.875.834.798.765.735.708.684.661.641.623.606.590.575.561.505.463.430.402.378.361.330.305.286.269.256

.950

.878

.811

.754

.707

.666

.632

.602

.576

.553

.532

.514

.497

.482

.468

.456

.444

.396

.361

.335

.312

.294

.279

.254

.236

.220

.207

.196

= .05 = .01

Is there a significant linear correlation?

Page 14: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

14

0r = - 0.707 r = 0.707 1

Sample data:r = 0.842

- 1

0.842 > 0.707, That is the test statistic does fall within the critical region.

Is there a significant linear correlation?

Fail to reject = 0

Reject= 0

Reject= 0

Page 15: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

15

0r = - 0.707 r = 0.707 1

Sample data:r = 0.842

- 1

0.842 > 0.707, That is the test statistic does fall within the critical region.

Therefore, we REJECT H0: = 0 (no correlation) and concludethere is a significant linear correlation between the weights ofdiscarded plastic and household size.

Is there a significant linear correlation?

Fail to reject = 0

Reject= 0

Reject= 0

Page 16: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

16

RegressionDefinitionRegression Model

Regression Equation

y = b0 + b1x^

Given a collection of paired data, the regression equation algebraically describes the relationship between the two variables

y = 0 + 1x +

Page 17: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

17

Notation for Regression Equation

y-intercept of regression equation 0 b0

Slope of regression equation 1 b1

Equation of the regression line y = 0 + 1 x + y = b0 + b1

PopulationParameter

SampleStatistic

x^

Page 18: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

18

RegressionDefinition Regression Equation

Given a collection of paired data, the regression equation

Regression Line (line of best fit or least-squares line)

is the graph of the regression equation

y = b0 + b1x^

algebraically describes the relationship between the two variables

Page 19: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

19

Assumptions & Observations

1. We are investigating only linear relationships.

2. For each x value, y is a random variable having a normal distribution.

3. There are many methods for determining normality.

3. The regression line goes through (x, y)

Page 20: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

20

1. If there is no significant linear correlation, don’t use the regression equation to make predictions.

2. Stay within the scope of the available sample data when making prediction.

Guidelines for Using TheRegression Equation

Page 21: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

21

Definitions Outlier a point lying far away from the other

data points

Influential Points points which strongly affect the graph

of the regression line

Page 22: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

22

DefinitionsResidual (error)

for a sample of paired (x,y) data, the difference (y - y) between an observed sample y-value and the value of y, which is the value of y that is predicted by using the regression equation.

Least-Squares PropertyA straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible.

^

Residuals and the Least-Squares Property

^

Page 23: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

23

Residuals and the Least-Squares Property

x 1 2 4 5y 4 24 8 32

y = 5 + 4x

02468

101214161820222426283032

1 2 3 4 5

x

yResidual = 7

Residual = -13Residual = -5

Residual = 11

^

Page 24: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

24

DefinitionsTotal Deviation from the mean of the particular point (x, y)

the vertical distance y - y, which is the distance between the point (x, y) and the horizontal line passing through the sample mean y

Explained Deviationthe vertical distance y - y, which is the distance between the predicted

y value and the horizontal line passing through the sample mean y

Unexplained Deviationthe vertical distance y - y, which is the vertical distance between the

point (x, y) and the regression line. (The distance y - y is also called a residual, as defined in Section 9-3.)

^

^

^

Page 25: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

25

Totaldeviation

(y - y)

01

35

79

1113151719

2123

252729

313335

3739

Unexplaineddeviation

(y - y)

Explaineddeviation

(y - y)

(5, 32)

(5, 25)

(5, 17)

y = 5 + 4x^

y = 17

^

^

y

x0 1 2 3 4 5 6 7 8 9

Unexplained, Explained, and Total Deviation

Page 26: 1 9-2 / 9.3 Correlation and Regression. 2 n  xy - (  x)(  y) n(  x 2 ) - (  x) 2 n(  y 2 ) - (  y) 2 r = Definition  Linear Correlation Coefficient.

26

(y - y) = (y - y) + (y - y)(total deviation) = (explained deviation) + (unexplained deviation)

(total variation) = (explained variation) + (unexplained variation)

Σ(y - y) 2

= Σ (y - y) 2

+ Σ (y - y) 2^ ^

^ ^