1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25...

62
1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100 110 120 130 140 150 160 Weight 22 23 24 25 26 27 28 29 30 Waist Size

description

3 ES9 Linear Correlation Analysis The coefficient of linear correlation, r, is a measure of the strength of a linear relationship Consider another measure of dependence: covariance Recall: bivariate data - ordered pairs of numerical values

Transcript of 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25...

Page 1: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

1

ES9 Chapter 24 ~

Linear Correlation & Regression Analysis

100 110 120 130 140 150 160

Weight

22

23

24

25

26

27

28

29

30

WaistSize

Page 2: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

2

ES9 Chapter Goals

• More detailed look at linear correlation and regression analysis

• Develop a hypothesis test to determine the strength of a linear relationship

• Consider the line of best fit. Use this to make confidence interval estimations.

Page 3: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

3

ES9

Linear Correlation Analysis

The coefficient of linear correlation, r, is a measure of the strength of a linear relationship

Consider another measure of dependence: covariance

Recall: bivariate data - ordered pairs of numerical values

Page 4: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

4

ES9

Consider the following set of bivariate data:{(8, 22), (5, 28), (8, 18), (4, 16), (13, 27), (15, 23), (17, 17), (12, 13)}

50.20 25.10 yx

Derivation of the CovarianceDerivation of the CovarianceGoal: a measure of the linear relationship between two variables

Consider a graph of the data:1. The point is the centroid of the data2. A vertical and horizontal line through the centroid divides the graph into four sections

),( yx

Page 5: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

5

ES9 Graph of the Data with Centriod

4 6 8 10 12 14 16 18 20

x

12

14

16

18

20

22

24

26

28

30

y(10.25, 20.5)

)( xx

)( yy

Page 6: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

6

ES9 Notes

1. Each point (x, y) lies a certain distance from each of the two lines

)( xx 2. : the horizontal distance from (x, y) to the vertical line passing through the centroid

)( yy 3. : the vertical distance from (x, y) to the horizontal line passing through the centroid

4. The distances may be positive, negative, or zero

5. Consider the product: a. If the graph has lots of points to the upper right and lower left of the centroid (positive linear relationship), most products will be positiveb. If the graph has lots of points to the upper left and lower right of the centroid (negative linear relationship), most products will be negative

))(( yyxx

Page 7: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

7

ES9

1

))((),(covar 1

n

yyxxyx

n

iii

Covariance of x and yThe covariance of x and y is defined as the sum of the products of the distances of all values x and y from the centroid divided by n 1:

always! 0)( and 0)( yyxxNote:

Page 8: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

8

ES9

2857.2716),(covar yx

Points(8, 22) -2.25 1.5 -3.375(5, 28) -5.25 7.5 -39.375(8, 18) -2.25 -2.5 5.625(4, 16) -6.25 -4.5 28.125(13, 27) 2.75 6.5 17.875(15, 23) 4.75 2.5 11.875(17, 17) 6.75 -3.5 -23.625(12, 13) 1.75 -7.5 -13.125Total 0.00 0.0 -16.000

xx yy ))(( yyxx

Calculations for Finding Covar (x, y)

Page 9: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

9

ES9 Data & Covariance

Positive covariance:

0 1 2 3 4 5 6 7 8

x

0

1

2

3

4

5

6

7

8

y ),( yx

Page 10: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

10

ES9 Data & Covariance

Negative covariance:

0 1 2 3 4 5 6 7 8

x

0

1

2

3

4

5

6

7

8

9

y),( yx

Page 11: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

11

ES9 Data & Covariance

Covariance near 0:

0 1 2 3 4 5 6 7 8 9

x

0

1

2

3

4

5

6

7

8

9

y),( yx

Page 12: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

12

ES9 Problems

1.The covariance does not have a standardized unit of measure

2.Suppose we multiply each data point in the example in this section by 15The covariance of the new data set is -514.29

3.The amount of the dependency between x and y seems stronger but the relationship is really the same

4.We must find a way to eliminate the effect of the spread of the data when we measure the strength of a linear relationship

Page 13: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

13

ES9

yx syyy

sxxx ' and '

Solution1. Standardize x and y:

2. Compute the covariance of x and y

3. This covariance is not affected by the spread of the data

yx ssyxyxr

),(covar)','(covar

4. This is exactly what is accomplished by the coefficient of linear correlation:

Page 14: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

14

ES9 Notes

1. The coefficient of linear correlation standardizes the measure of dependency and allows us to compare the relative strengths of dependency of different sets of data

2. Also commonly called Pearson’s product moment, r

0904.0)37.5)(71.4(

2857.2),(covar

37.5 and 71.4

yx

yx

ssyxr

ss

Calculation of r (for the data presented in this section):

Page 15: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

15

ES9

)(SS)(SS)(SS1

))((),(covar

yxxy

ssn

yyxx

ssyxr

yxyx

Alternative (Computational) Formula for r

1. This formula avoids the separate calculations of the means, standard deviations, and the deviations from the means

2. This formula is easier and more accurate: minimizes round-off error

Alternative (Computational) Formula for r:

Page 16: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

16

ES9 Inferences About

the Linear Correlation Coefficient• Use the calculated value of the coefficient of linear correlation, r*, to make an inference about the population correlation coefficient,

• Consider a confidence interval for and a hypothesis test concerning

Page 17: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

17

ES9 Assumptions...

Assumptions for inferences about linear correlation coefficient:The set of (x, y) ordered pairs forms a random sample and they-values at each x have a normal distribution. Inferences use thet-distribution with n 2 degrees of freedom.

Caution:The inferences about the linear correlation coefficient are about the pattern of behavior of the two variables involved and the usefulness of one variable in predicting the other. Significance of the linear correlation coefficient does not mean there is a direct cause-and-effect relationship.

Page 18: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

18

ES9

Confidence Interval Procedure

1. A confidence interval may be used to estimate the value of the population correlation coefficient,

2. Use a table showing confidence belts

3. Table 10, Appendix B: confidence belts for 95% confidence intervals

4. Table 10 utilizes n, the sample size

Page 19: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

19

ES9

Example: A random sample of 25 ordered pairs of data have a calculated value of r = 0.45. Find a 95% confidence interval for , the population linear correlation coefficient.

Example

Solution:1. Population Parameter of Concern

The linear correlation coefficient for the population,

2. The Confidence Interval Criteriaa. Assumptions: The ordered pairs form a random sample, and for each x, the y-values have a mounded distributionb. Test statistic: The calculated value of rc. Confidence level: 1 = 0.95

3. Sample Evidencen = 25 and r = 0.45

Page 20: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

20

ES9 Solution Continued

4. The Confidence IntervalThe confidence interval is read from Table 10, Appendix BFind r = 0.45 at the bottom of Table 10Visualize a vertical line through that pointFind the two points where the belts marked for the correct sample size cross the vertical lineDraw a horizontal line through each point to the vertical scale on the left and read the confidence intervalThe values are 0.68 and 0.12

5. The Results0.68 to 0.12 is the 95% confidence interval for

Page 21: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

21

ES9 Table 10

-0.12

-0.68

Scale of p (population correlation coefficient)

-0.45Scale of r (sample correlation)

The numbers on the curves are sample sizes:

Page 22: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

22

ES9 Hypothesis Testing Solution

1. Null hypothesis: the two variables are linearly unrelated, = 0

3. Test statistic: calculated value of r

4. Probability bounds or critical values for r: Table 11, Appendix B

5. Number of degrees of freedom for the r-statistic: n 2

2. Alternative hypothesis: one- or two-tailed, usually 0

Page 23: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

23

ES9

Example: In a study of 32 randomly selected ordered pairs,r = 0.421. Is there any evidence to suggest the linear

correlation coefficient is different from 0 at the 0.05 level of significance?

Example

Solution:1. The Set-up

a. Population parameter of concern: The linear correlation coefficient for the population, b. The null and the alternative hypothesis:

Ho: = 0Ha: 0

Page 24: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

24

ES9 Solution Continued

2. The Hypothesis Test Criteria

a. Assumptions: The ordered pairs form a random sample and we will assume that the y-values at each x have a mounded distribution

b. Test statistic:r* (calculated value of r) with df = 32 2 = 30

c. Level of significance: = 0.05

3. The Sample Evidencen = 32 and r* = r = 0.421

Page 25: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

25

ES9 Solution Continued

4. The Probability Distribution (p-Value Approach)a. The p-value: Use Table 11: 0.01 < P < 0.02b. The p-value is smaller than the level of significance,

5. The Resultsa. Decision: Reject Ho

b. Conclusion: At the 0.05 level of significance, there is evidence to suggest x and y are correlated

4. The Probability Distribution (Classical Approach)a. Critical Value: The critical value is found at the intersection of the df = 30 row and the two-tailed 0.05 column of Table 11: 0.349b. r* is in the critical region

~ or ~

Page 26: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

26

ES9

• Line of best fit results from an analysis of two (or more) related variables

Linear Regression Analysis

• Try to predict the value of the dependent, or output, variable

• The variable we control is the independent, or input, variable

Page 27: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

27

ES9

Method of Least Squares:

Method of Least Squares

Notes:1. A scatter diagram may suggest curvilinear regression2. If two or more input variables are used: multiple regression

xbby 10ˆ The line of best fit:

)(SS)(SS

1 xxyb The slope:

xbyn

b 101

The y-intercept:

Page 28: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

28

ES9

0: The y-intercept, estimated by b0

Linear ModelThe Linear Model:This equation represents the linear relationship between the two variables in a population xy 10ˆ

1: The slope, estimated by b1

yye ˆ

2

: Experimental error, estimated by

The random variable e is called the residual

e is the difference between the observed value of y and the predicted value of y at a given x

The sum of the residuals is exactly zero

Mean value of experimental error is zero: = 0

Variance of experimental error:

Page 29: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

29

ES9 Estimating the

Variance of the Experimental Error

Estimating the Variance of the Experimental Error:Assumption: The distribution of y’s is approximately normal and the variances of the distributions of y at all values of x are the same (The standard deviation of the distribution of y about y is the same for all values of x)

1)( 2

2

n

xxsConsider the sample variance:

1. The variance of y involves an additional complication: there is a different mean for y at each value of x

2. Each “mean” is actually the predicted value, y

3. Variance of the error e estimated by:Degrees of freedom: n 2

2)ˆ( 2

2

n

yyse

Page 30: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

30

ES9

Rewriting :

2SSE

2

2)(

2)ˆ(

102

210

22

n

nxybyby

nxbby

nyy

se

2es

Alternative (Computational) Formulafor Variance of Experimental Error

SSE = sum of squares for error

Page 31: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

31

ES9

Adv. Costs (x ) Sales (y ) Adv. Costs (x ) Sales (y )40 289 60 47055 423 52 40835 250 39 32050 400 47 41543 335 38 389

Example: A recent study was conducted to determine the relation between advertising expenditures and sales of statistics texts (for the first year in print). The data is given below (in thousands). Find the line of best fit and the variance of y about the line of best fit.

Example

Page 32: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

32

ES9

9.608

10)459(21677)(SS

222 n

xxx

9.437810

)3699)(459(174163)(SS nyx

xyxy

1915.79.6089.4378

)(SS)(SS

1 xxyb

8105.39

10)459)(1915.7(36991

0

n

xbyb

Solution

Page 33: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

33

ES9

xy 19.781.39ˆ

Solution Continued• The equation for the line of best fit:

8244.13418

5955.107348

)174163)(1915.7()3699)(81.39()1410485(2

102

2

n

xybybyse

• The variance of y about the regression line:

Note: Extra decimal places are often needed for this type of calculation

Page 34: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

34

ES9

• Scatter diagram, regression line, and random errors as line segments:

Illustration

Sales

Advertising Costs35 40 45 50 55 60 65

250

275

300

325

350

375

400

425

450

475

500

Page 35: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

35

ES9

Regression Analysis

The regression equation isC2 = 39.8 + 7.19 C1

Predictor Coef StDev T PConstant 39.81 69.11 0.58 0.580C1 7.191 1.484 4.84 0.001

S = 36.63 R-Sq = 74.6% R-Sq(adj) = 71.4%

Analysis of Variance

Source DF SS MS F PRegression 1 31491 31491 23.47 0.001Residual Error 8 10734 1342Total 9 42225

Minitab Output

Page 36: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

36

ES9 Inferences Concerning

the Slope of the Regression Line• Confidence Interval for 1: 1- confidence interval estimate for the population slope of the line of best fit

• Hypothesis Test for 1: Tests the null hypothesis, 1= 0, the slope of the line of best fit is equal to 0, that is, the line is of no use in predicting y for a given value of x

Page 37: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

37

ES9

1. b1 has a sampling distribution that is approximately normal

Assume: Random samples of size n are repeatedly taken from a bivariate population

Sampling Distribution of the Slope b1

2. The mean of b1 is 1

2

22

)(1 xxb3. The variance of b1 is:

provided there is no lack of fit

Page 38: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

38

ES9

Estimator for :21b

)(SS)(

2

22

2

2

221 x

s

nx

x

sxx

ss eeeb

Standard Error of Regression

1b

1bs

The standard error of regression (slope) is and is estimated by

2037.29.608

8244.1341)(SS

221

x

ss eb

Example (continued): For the advertising costs and sales data:

Page 39: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

39

ES9

Assumptions for inferences about the slope parameter 1:The set of (x, y) ordered pairs forms a random sample and the y-values at each x have a normal distribution. Since the population standard deviation is unknown and replaced with the sample standard deviation, the t-distribution will be used with n 2 degrees of freedom.

Inferences About Slope Continued

11 bstb )2/,2( n

Confidence Interval Procedure:The 1 confidence interval for 1 is given by

Page 40: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

40

ES9

Example: Find the 95% confidence interval for the population slope 1 for the advertising costs and sales example

Example

Solution:1. Population parameter of Interest

The slope, 1, for the line of best fit for the population

2. The Confidence Interval Criteriaa. Assumptions: The ordered pairs form a random sample and we will assume the y-values (sales) at each x (advertising costs) have a mounded distributionb. Test statistic: t with df = 10 2 = 8c. Confidence level: 1 = 0.95

2037.2 ,1915.7 ,10 21 1

bsbn3. Sample Evidence

Sample information:

Page 41: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

41

ES9 Solution Continued

4. The Confidence Intervala. Confidence coefficients:t(df, /2) = t(8, 0.025) = 2.31

b. Interval:

5. The ResultsThe slope of the line of best fit of the population from which the sample was drawn is between 5.707 and 8.676 with 95% confidence

)676.8 ,707.5(4845.11915.7

2037.2)31.2(1915.711

bsb t(n-2, /2)

Page 42: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

42

ES9 Hypothesis-Testing Procedure

1. Null hypothesis is always Ho: 1 = 0

2. Use the Students t distribution with df = n 2

1

11*bs

bt 3. The test statistic:

Page 43: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

43

ES9

Example: In the previous example, is the slope for the line of best fit significant enough to show that advertising cost is

useful in predicting the first year sales? Use = 0.05

Example

Solution:1. The Set-up

a. Population parameter of concern: The parameter of concern is 1, the slope of the line of best fit for the population

b. The null and alternative hypothesis:Ho: 1 = 0 (x is of no use in predicting y)

Ha: 1 > 0 (we expect sales to increase as costs increase)

Page 44: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

44

ES9 Solution Continued

2. The Hypothesis Test Criteriaa. Assumptions: The ordered pairs form a random sampleand we will assume the y-values (sales) at each x (advertising costs) have a mounded distributionb. Test statistic: t* with df = n 2 = 8c. Level of significance: = 0.05

3. The Sample Evidencea. Sample information:b. Calculate the value of the test statistic:

2037.2 ,1915.7 ,10 21 1

bsbn

8444.42037.2

0.01915.7*1

11 bs

bt

Page 45: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

45

ES9 Solution Continued

4. The Probability Distribution (p-Value Approach)a. The p-value: P = P(t* > 4.8444, with df = 8) < 0.001b. The p-value is smaller than the level of significance,

5. The Resultsa. Decision: Reject Ho

b. Conclusion: At the 0.05 level of significance, there is evidence to suggest the slope of the line of best fit is greater than zero. The evidence indicates there is a linear relationship and that advertising cost (x) is useful in predicting the first year sales (y).

4. The Probability Distribution (Classical Approach)a. Critical value: t(8, 0.05) = 1.86b. t* is in the critical region

~ or ~

Page 46: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

46

ES9 Confidence

Interval Estimates for Regression

• Use the line of best fit to make predictions

• Predict the population mean y-value at a given x

• Predict the individual y-value selected at random that will occur at a given value of x

• The best point estimate, or prediction, for bothis y

Page 47: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

47

ES9

0|xy

0xy

Notation & BackgroundNotation:1. Mean of the population y-values at a given value of x:2. The individual y-value selected at random for a given value of x:

0|xy

0xy x

Background:1. Recall: the development of confidence intervals for the population mean when the variance was known and when the variance was estimated2. The confidence interval for and the prediction interval for are constructed in a similar fashion3. y replaces as the point estimate4. The sampling distribution of y is normal

Page 48: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

48

ES9 Background Continued

5. The standard deviation in both cases is computed by multiplying the square root of the variance of the error by an appropriate correction factor

6. The line of best fit passes through the centroid:Consider a confidence interval for the slope 1

If we draw lines with slopes equal to the extremes of that confidence interval through the centroid, the value for y fluctuates considerably for different values of x (See the Figure on the next slide.)It is reasonable to expect a wider confidence interval as we consider values of x further fromWe need a correction factor to adjust for the distance between x0 and

This factor must also adjust for the variation of the y-values about y

xx

) ,( yx

Page 49: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

49

ES9 Confidence Interval for Slope

35 40 45 50 55 60 65

250

275

300

325

350

375

400

425

450

475

500

) ,( yx

Slope is 8.676

Slope is 5.707Sales

Advertising Costs

Page 50: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

50

ES9

Confidence interval for the mean value of y at a given valueof x,

0|xy

Notes:1. The numerator of the second term under the radical sign is the square of the distance of x0 from 2. The denominator is closely related to the variance of x and has a standardizing effect on this term

x

Confidence Interval

)(SS)(1ˆ

)()(1ˆ

20

2

20

xxx

nsty

xxxx

nsty

e

e

standard error of y

(n-2, /2)

(n-2, /2)

Page 51: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

51

ES9

Pounds of 100 Pounds Pounds of 100 PoundsFertilizer (x ) of Wheat (y ) Fertilizer (x ) of Wheat (y )

30 14 74 2036 9 76 2441 18 81 2949 16 88 3553 23 93 3455 17 94 3960 28 101 2865 33 109 33

Example: It is believed that the amount of nitrogen fertilizer used per acre has a direct effect on the amount of wheat produced. The data below shows the amount of nitrogen fertilizer used per test plot and the amount of wheat harvested per test plot.

Example

a. Find the line of best fitb. Construct a 95% confidence interval for the mean

amount of wheat harvested for 45 pounds of fertilizer

Page 52: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

52

ES9 Solution

xy 298.042.4ˆ • Using Minitab, the line of best fit:

45| xy

Confidence Interval:1. Population Parameter of Interest

The mean amount of wheat produced for 45 pounds of fertilizer,

2. The Confidence Interval Criteriaa. Assumptions: The ordered pairs form a random sample and they-values at each x have a mounded distributionb. Test statistic: t with df = 16 2 = 14c. Confidence level: 1 = 0.95

83.17)45(298.042.4ˆ :096.597.25 97.25

45

2

yyss

x

ee

3. Sample Information:

Page 53: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

53

ES9

45|

2

20

for interval confidence 95% ,74.21 to92.13 91.383.17

)3587.0)(096.5)(14.2(83.17 0662.00625.0)096.5)(14.2(83.17

94.8746)06.6945(

161)096.5)(14.2(83.17

)(SS)(1ˆ

xy

e xxx

nsty

Solution Continued4. The Confidence Interval:

(n-2, /2)

Page 54: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

54

ES9 Confidence Belts for

• Confidence interval: green vertical line• Confidence interval belt: upper and lower boundaries of all 95% confidence

intervals

0|xy

30 40 50 60 70 80 90 100 110 120

10

15

20

25

30

35

40

45

Upper boundary for

0|xy

Line of best fit

Wheat

Fertilizer

Lower boundary for 0|xy

Page 55: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

55

ES9 Prediction Interval

Prediction interval of the value of a single randomly selected y:

Example: Find the 95% prediction interval for the amount of wheat harvested for 45 pounds of fertilizer

Solution:1. Population Parameter of Interest

yx=45, the amount of wheat harvested for 45 pounds of fertilizer

)(SS)(11ˆ

20

xxx

nsty e

(n-2, /2)

Page 56: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

56

ES9 Solution Continued

2. The Confidence Interval Criteria

a. Assumptions: The ordered pairs form a random sample and the y-values at each x have a mounded distribution

b. Test statistic: t with df = 16 2 = 14

c. Confidence level: 1 = 0.95

83.17)45(298.042.4ˆ :096.597.25 97.25

45

2

yyss

x

ee

3. Sample Information

Page 57: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

57

ES9

45

2

20

for interval prediction 95% ,41.29 to24.6 5859.1183.17

)0624.1)(096.5)(14.2(83.17 1287.1)096.5)(14.2(83.17

0662.00625.01)096.5)(14.2(83.17 94.8746

)06.6945(1611)096.5)(14.2(83.17

)(SS)(11ˆ

x

e

y

xxx

nsty

Solution Continued4. The Confidence Interval

(n-2, /2)

Page 58: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

58

ES9

0xyPrediction belts for

30 40 50 60 70 80 90 100 110 120

10

15

20

25

30

35

40

45

Lower boundary for 95% prediction interval on individual y-values at any x

Upper boundary on individual y-values

Line of best fit

x0 = 45

Wheat

Fertilizer

Page 59: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

59

ES9 Precautions

1. The regression equation is meaningful only in the domain of the x variable studied. Estimation outside this domain is risky; it assumes the relationship between x and y is the same outside the domain of the sample data.

2. The results of one sample should not be used to make inferences about a population other than the one from which the sample was drawn

3. Correlation (or association) does not imply causation. A significant regression does not imply x causes y to change. Most common problem: missing, or third, variable effect.

Page 60: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

60

ES9 13.6 ~ Understanding the Relationship

Between Correlation & Regression

• We have considered correlation and regression analysis

• When do we use these techniques?

• Is there any duplication of work?

Page 61: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

61

ES9 Remarks

1. The primary use of the linear correlation coefficient is in answering the question “Are these two variables related?”

2. The linear correlation coefficient may be used to indicate the usefulness of x as a predictor of y (if the linear model is appropriate)

The test concerning the slope of the regression line (Ho: 1 = 0) tests the same basic concept

3. Lack-of-fit test: Is the linear model appropriate?Consider the scatter diagram

Page 62: 1 ES9 Chapter 24 ~ Linear Correlation & Regression Analysis 100110120130140150160 Weight 22 23 24 25 26 27 28 29 30 Waist Size.

62

ES9 Conclusions

1. Linear correlation and regression measure different characteristics. It is possible to have a strong linear correlation and have the wrong model?

2. Regression analysis should be used to answer questions about the relationship between two variables:a. What is the relationship?b. How are the two variables related?