1 Chapter 12 Simple Linear Regression. 2 Chapter Outline Simple Linear Regression Model Least...

1

Chapter 12

Simple Linear Regression

2

Chapter Outline

Simple Linear Regression Model Least Squares Method Coefficient of Determination Model Assumptions Testing for Significance

3


Managerial decisions often are based on the relationship between two or more Managerial decisions often are based on the relationship between two or more variables.variables.

Regression analysisRegression analysis can be used to develop an equation showing how the variables can be used to develop an equation showing how the variables are related.are related.

The variable being predicted is called the The variable being predicted is called the dependent variabledependent variable and is denoted by and is denoted by yy.. The variables being used to predict the value of the dependent variable are called The variables being used to predict the value of the dependent variable are called

the the independent variablesindependent variables and are denoted by and are denoted by xx..

4


Simple linear regressionSimple linear regression involves one independent variable and involves one independent variable and one dependent variable.one dependent variable.

The relationship between the two variables is approximated by a The relationship between the two variables is approximated by a straight line (hence, the ‘linear’ regression).straight line (hence, the ‘linear’ regression).

Regression analysis involving two or more independent variable Regression analysis involving two or more independent variable is called is called multiple regressionmultiple regression (covered in the next chapter). (covered in the next chapter).

5

Simple Linear Regression Model

The equation that describes how y is related to x The equation that describes how y is related to x and an error term is called the and an error term is called the regression modelregression model..

The The simple linear regression modelsimple linear regression model is: is:

yy = = 00 + + 11xx + +

where:0 and 1 are called parameters of the model,

is a random variable called the error term.

6

Simple Linear Regression Equation

The simple linear regression equation is:The simple linear regression equation is:

EE((yy) = ) = 00 + + 11xx

• Graph of the regression equation is a straight line.Graph of the regression equation is a straight line.

• 00 is the is the yy intercept of the regression line. intercept of the regression line.

• 11 is the slope of the regression line. is the slope of the regression line.

• EE((yy) is the expected value of ) is the expected value of yy for a given value of for a given value of xx..

Please note that both Please note that both 00 and and 11 are population are population

parameters, depicting the parameters, depicting the truetrue relationship between relationship between yy and and xx..

7


Example: Stock Market RiskExample: Stock Market Risk

The systematic risk (a common risk shared by all The systematic risk (a common risk shared by all the stocks) of stock market has different impacts on the stocks) of stock market has different impacts on different stocks. Stocks that are more sensitive to different stocks. Stocks that are more sensitive to systematic risk are riskier. We can conduct a regression systematic risk are riskier. We can conduct a regression analysis to estimate the sensitivity of an individual analysis to estimate the sensitivity of an individual stock to the systematic market risk. On the next slide stock to the systematic market risk. On the next slide are shown the data for a sample of 20 most recent are shown the data for a sample of 20 most recent quarterly returns of Netflix and the SPY (an index fund quarterly returns of Netflix and the SPY (an index fund that keeps track of S&P 500).that keeps track of S&P 500).

8

Simple Linear Regression Example: Stock Market Risk (data)Example: Stock Market Risk (data)

Quarter SPY NFLX Quarter SPY NFLX2009Q1 0.0630 0.2537 2011Q3 -0.0246 -0.69142009Q2 0.1366 -0.0302 2011Q4 0.0530 0.46442009Q3 0.0531 0.2164 2012Q1 0.0698 -0.33332009Q4 0.0426 0.1646 2012Q2 -0.0104 -0.29062010Q1 0.1109 0.5888 2012Q3 0.0320 0.39382010Q2 -0.0674 0.0369 2012Q4 0.0666 1.08532010Q3 0.0803 0.6925 2013Q1 0.0714 0.30762010Q4 0.0917 0.2334 2013Q2 0.0621 0.13152011Q1 0.0649 0.0868 2013Q3 0.0470 0.31902011Q2 -0.0474 0.1432 2013Q4 0.0191 0.2693

9

Simple Linear Regression Example: Stock Market Risk (Scatter Diagram)Example: Stock Market Risk (Scatter Diagram)

Quarterly Gross Returns of SPY and NETFLIX, Inc.

-0.9

-0.6

-0.3

0

0.3

0.6

0.9

1.2

-0.09 -0.06 -0.03 0 0.03 0.06 0.09 0.12 0.15

SPY

NF

LX

Trend Line

10


Example: Stock Market RiskExample: Stock Market RiskFrom the scatter diagram, we observe the following:From the scatter diagram, we observe the following:1.1. The plots are scattered around, indicating the relationship The plots are scattered around, indicating the relationship

between the returns of SPY and Netflix is not perfect.between the returns of SPY and Netflix is not perfect.2.2. The trend line has a positive slope, indicating that the The trend line has a positive slope, indicating that the

relationship is positive, i.e. as the returns of SPY go up, relationship is positive, i.e. as the returns of SPY go up, the returns of Netflix the returns of Netflix tendtend to go up too. to go up too.

3.3. The vertical distance between a plot to the trend line is The vertical distance between a plot to the trend line is the difference between the actual return of Netflix and its the difference between the actual return of Netflix and its estimated value, given an actual return of SPY. The estimated value, given an actual return of SPY. The difference is simply the estimated error, similar to difference is simply the estimated error, similar to

y – E(y).y – E(y).

11


Positive Linear RelationshipPositive Linear Relationship

EE((yy))

xx

Slope Slope 11

is positiveis positive

Regression lineRegression line

InterceptIntercept00

12


Negative Linear RelationshipNegative Linear Relationship

xx

EE((yy))

xx

Slope Slope 11

is negativeis negative

Regression lineRegression lineInterceptIntercept

00

13


No RelationshipNo Relationship

EE((yy))

xx

Slope Slope 11

is 0is 0

Regression lineRegression lineInterceptIntercept00

14

Estimated Simple Linear Regression Equation

The estimated simple linear regression equation:The estimated simple linear regression equation:

• The graph is called the The graph is called the estimatedestimated regression line. regression line.

• bb00 is the is the yy intercept of the estimated regression line. intercept of the estimated regression line.

• bb11 is the slope of the estimated regression line. is the slope of the estimated regression line.

• is the estimated value of is the estimated value of yy for a given value of for a given value of xx..

Please note that Please note that bb00 and and bb11 are sample estimates of are sample estimates of 00 and and

11, respectively, depicting the , respectively, depicting the estimated sampleestimated sample

relationship between relationship between yy and and xx..

0 1y b b x 0 1y b b x

y

15

Estimation Process

Regression ModelRegression Modelyy = = 00 + + 11xx + +

Regression EquationRegression EquationEE((yy) = ) = 00 + + 11xx

Unknown ParametersUnknown Parameters00, , 11

Sample Data:Sample Data:x yx y

xx11 y y11

. .. . . .. . xxnn yynn

bb00 and and bb11

provide estimates ofprovide estimates of00 and and 11

EstimatedEstimatedRegression EquationRegression Equation

Sample StatisticsSample Statistics

bb00, , bb11

0 1y b b x 0 1y b b x

16

Least Squares Method

Least Squares CriterionLeast Squares Criterion

min (y yi i )2min (y yi i )2

where:where:

yyii = = observedobserved value of the dependent variable value of the dependent variable

for the for the iith observationth observation^yyii = = estimatedestimated value of the dependent variable value of the dependent variable

for the for the iith observationth observation

17


Least Squares CriterionLeast Squares Criterion

min (y yi i )2min (y yi i )2

• is the estimated error for the ith observation;• Take the square of means that it is the

magnitude of the error not the sign of it (positive or negative) that matters;

• The purpose of The purpose of Least Squares CriterionLeast Squares Criterion is to find the is to find the bb00 and and bb11 that minimize the sum of the square of estimated that minimize the sum of the square of estimated error for all the observations in the sample, i.e. the error for all the observations in the sample, i.e. the best-fit (best-fit (with the smallest overall errorwith the smallest overall error) straight line ) straight line that approximates the relationship between that approximates the relationship between yy and and xx..

ii yy ˆii yy ˆ

18


Slope for the Estimated Regression EquationSlope for the Estimated Regression Equation

1 2

( )( )

( )i i

i

x x y yb

x x

1 2

( )( )

( )i i

i

x x y yb

x x

where:where:

xxii = value of independent variable for = value of independent variable for iithth observationobservation

__yy = average value of dependent variable = average value of dependent variable

__xx = average value of independent variable = average value of independent variable

yyii = value of dependent variable for = value of dependent variable for iithth observationobservation

19


yy-Intercept for the Estimated Regression Equation-Intercept for the Estimated Regression Equation

0 1b y b x 0 1b y b x

20

Simple Linear Regression Example: Stock Market Risk Example: Stock Market Risk

Quarterly ReturnsQuarterly Returnsof SPY (of SPY (xx))

Quarterly Returns Quarterly Returns of Netflix (of Netflix (yy))

0.06300.06300.13660.1366

0.04700.04700.01910.0191

0.25370.2537-0.0302-0.0302

0.31900.31900.26930.2693

xx = 0.9143 = 0.9143 yy = = 4.04194.0419

0457.0x 2021.0y

21

Estimated Regression Equation

Slope for the Estimated Regression Equation Slope for the Estimated Regression Equation

y-Intercept for the Estimated Regression Equation

Estimated Regression Equation

87.20490.0

1407.021

xx

yyxxb

i

ii

07.00457.087.22021.010 xbyb

xy 87.207.0ˆ

22

Estimated Regression Line – Stock Market Risk Example

Expected Regression Equation

-0.9000

-0.6000

-0.3000

0.0000

0.3000

0.6000

0.9000

1.2000

-0.1000 -0.0500 0.0000 0.0500 0.1000 0.1500

SPY

NF

LX

xy 87.207.0ˆ

23

Coefficient of Determination

where: SST = total sum of squares (i.e. total variability

of y) SSR = sum of squares due to regression (i.e. the

variability of y that is explained by regression) SSE = sum of squares due to error (i.e. the variability

of y that cannot be explained by regression)

SST = SSR + SSE

2( )iy y 2( )iy y 2ˆ( )iy y 2ˆ( )iy y 2ˆ( )i iy y 2ˆ( )i iy y

• Relationship Among SST, SSR, SSE

24


The coefficient of determination is:

r2 represents the percentage of total variability of y that is explained by regression.

r2 = SSR/SST

25


r2 = SSR/SST = 0.404/2.741 = 0.147

The regression relationship is actually weak. Only14.7% of the variability in the returns of Netflix can beexplained by the linear relationship between themarket returns (SPY) and the returns of Netflix.

26

Sample Correlation Coefficient

21 ) of(sign rbrxy 21 ) of(sign rbrxy

xyr 1(sign of ) Coefficient of Determination bxyr 1(sign of ) Coefficient of Determination b

where: b1 = the slope of the estimated regression

equation xbby 10ˆ xbby 10ˆ

27

Sample Correlation Coefficient

21 ) of(sign rbrxy 21 ) of(sign rbrxy

The sign of b1 in the equation is “+”.xy 87.207.0ˆ

384.0

147.0

xy

xy

r

r

28

Assumptions About the Error Term

yy = = 00 + + 11xx + +

1. The error is a random variable with mean of zero.1. The error is a random variable with mean of zero.

2. The variance of , denoted by 2, is the same for all values of the independent variable.2. The variance of , denoted by 2, is the same for all values of the independent variable.

3. The values of are independent.3. The values of are independent.

4. The error is a normally distributed random variable.4. The error is a normally distributed random variable.

29

Test for Significance

To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of 1 (slop) is zero.

To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of 1 (slop) is zero.

Two tests are commonly used: Two tests are commonly used:

t Testt Test and F TestF Test

Both the t test and F test require an estimate of 2, the variance of in the regression model. Both the t test and F test require an estimate of 2, the variance of in the regression model.

yy = = 00 + + 11xx + +1 determines the relationship between y and x.

30


• An Estimate of 2

210

2 )()ˆ(SSE iiii xbbyyy 210

2 )()ˆ(SSE iiii xbbyyy

where:

s 2 = MSE = SSE/(n 2)

The mean square error (MSE) provides the estimate(the sample variance s2 ) of 2.

31


An Estimate of

2

SSEMSE

ns

2

SSEMSE

ns

• To estimate we take the square root of s2.

• The resulting s is called the standard error of the estimate.

32

Test for Significance: t Test

Hypotheses

Test Statistic

0 1: 0H 0 1: 0H

1: 0aH 1: 0aH

1

1

b

bt

s

1

1

b

bt

s where

21

xx

ss

i

b

33


where: t is based on a t distribution

with n - 2 degrees of freedom n is the number of observations in the regression; 2 is the number of parameters (0 & 1) in the regression.

Reject H0 if p-value < or t < -tor t > t

Rejection Rule

34


1. Determine the hypotheses.

2. Specify the level of significance.

3. Calculate the test statistic.

= .05

0 1: 0H 0 1: 0H

1: 0aH 1: 0aH

1

1

b

bt

s

1

1

b

bt

s

76.163.1

87.2

1

1 bs

bt

35


4. Determine whether to reject H0.

p-Value approacht = 1.76 provides an area of .0473 in the uppertail. Hence, the p-value is 2*0.0473 = 0.0946. Sincep-value is larger than 0.05, we will not reject H0.

Critical Value approachFor =5%, the critical value is 2.1 (a two-tailed test). Since our test statistic t = 1.76, which is less than 2.1, we will not reject H0.

36

Confidence Interval for 1

H0 is rejected if the hypothesized value of 1 is not included in the confidence interval for 1.

We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test.

37


• The form of a confidence interval for The form of a confidence interval for 11 is: is:

11 / 2 bb t s11 / 2 bb t s

wherewhere is the is the tt value providing an area value providing an area

of of /2 in the upper tail of a /2 in the upper tail of a tt distribution distribution

with with n n - 2 degrees of freedom- 2 degrees of freedom

2/t 2/t

bb11 is the is thepointpoint

estimatestimatoror

tt/2/2ssb1b1

is theis themarginmarginof errorof error

38


Reject H0 if 0 is not included in

the confidence interval for 1.

0 is included in the confidence interval. Do Not Reject H0

= 2.87 ± 2.1(1.63) = 2.87 ± 3.4212/1 bstb 12/1 bstb

or -0.55 to 6.29

Rejection Rule

95% Confidence Interval for 1

Conclusion

39

Test for Significance: F Test

FF = MSR/MSE = MSR/MSE

0 1: 0H 0 1: 0H

1: 0aH 1: 0aH

Hypotheses

Test Statistic

Please note that the hypotheses of the F test are the same as the ones of the t test, which is always the case for a Simple Linear Regression (where there is only one independent variable.)

40


Rejection Rule

Reject Reject HH00 if if pp-value -value <<

or or FF >> FF

where:F is based on an F distribution with

1 degree of freedom in the numerator andn - 2 degrees of freedom in the denominator

41

ANOVA Table for A Regression Analysis

MSTRSSTR

-

k 1MSTR

SSTR-

k 1

MSESSE

-

n kT

MSESSE

-

n kT

MSTRMSE

MSTRMSE

Source ofSource ofVariationVariation

Sum ofSum ofSquaresSquares

Degrees ofDegrees ofFreedomFreedom

MeanMeanSquareSquare FF

RegressionRegression

ErrorError

TotalTotal

kk - 1 - 1

nnTT - 1 - 1

SSRSSR

SSESSE

SSTSST

nnT T - - kk

pp--ValueValue

k is the number of parameters in a regression.

nt is the number of observations.

42

ANOVA Table for A Regression Analysis

Source ofSource ofVariationVariation

Sum ofSum ofSquaresSquares

Degrees ofDegrees ofFreedomFreedom

MeanMeanSquareSquare FF

RegressionRegression

ErrorError

TotalTotal

11

1919

0.4040.404

2.3372.337

2.7412.741

1818

pp--ValueValue

0.4040.404

0.130.13

3.113.11 0.0950.095

Stock Market Risk Example -

43


1. Determine the hypotheses.

2. Specify the level of significance.

3. Calculate the test statistic.

= .05

0 1: 0H 0 1: 0H

1: 0aH 1: 0aH

F = MSR/MSE

FF = MSR/MSE = 0.404/0.13 = 3.11 = MSR/MSE = 0.404/0.13 = 3.11

The relationship between the The relationship between the FF value and the value and the tt value is value is FF = = tt22, which is only true for simple , which is only true for simple linear regressions.linear regressions.

44


4. Determine whether to reject H0.

p-Value approachF = 3.11 provides an area of .0946 in the uppertail. Hence, the p-value is 0.0946. Sincep-value is larger than 0.05, we will not reject H0.

Critical Value approachFor =5%, the critical value is 4.41. Since our test statistic F = 3.11, which is less than 4.41, we will not reject H0.

45

Some Cautions about the Interpretation of Significance Tests

Just because we are able to reject H0: 1 = 0 and demonstrate statistical significance does not enable

us to conclude that there is a linear relationshipbetween x and y.

Rejecting H0: 1 = 0 and concluding that the

relationship between x and y is significant does not enable us to conclude that a cause-and-effect

relationship is present between x and y.

1 Chapter 12 Simple Linear Regression. 2 Chapter Outline Simple Linear Regression Model Least...

Documents

Transcript of 1 Chapter 12 Simple Linear Regression. 2 Chapter Outline Simple Linear Regression Model Least...