1 Chapter 12 Simple Linear Regression. 2 Chapter Outline Simple Linear Regression Model Least...
-
Upload
ambrose-heath -
Category
Documents
-
view
225 -
download
2
Transcript of 1 Chapter 12 Simple Linear Regression. 2 Chapter Outline Simple Linear Regression Model Least...
1
Chapter 12
Simple Linear Regression
2
Chapter Outline
Simple Linear Regression Model Least Squares Method Coefficient of Determination Model Assumptions Testing for Significance
3
Simple Linear Regression
Managerial decisions often are based on the relationship between two or more Managerial decisions often are based on the relationship between two or more variables.variables.
Regression analysisRegression analysis can be used to develop an equation showing how the variables can be used to develop an equation showing how the variables are related.are related.
The variable being predicted is called the The variable being predicted is called the dependent variabledependent variable and is denoted by and is denoted by yy.. The variables being used to predict the value of the dependent variable are called The variables being used to predict the value of the dependent variable are called
the the independent variablesindependent variables and are denoted by and are denoted by xx..
4
Simple Linear Regression
Simple linear regressionSimple linear regression involves one independent variable and involves one independent variable and one dependent variable.one dependent variable.
The relationship between the two variables is approximated by a The relationship between the two variables is approximated by a straight line (hence, the ‘linear’ regression).straight line (hence, the ‘linear’ regression).
Regression analysis involving two or more independent variable Regression analysis involving two or more independent variable is called is called multiple regressionmultiple regression (covered in the next chapter). (covered in the next chapter).
5
Simple Linear Regression Model
The equation that describes how y is related to x The equation that describes how y is related to x and an error term is called the and an error term is called the regression modelregression model..
The The simple linear regression modelsimple linear regression model is: is:
yy = = 00 + + 11xx + +
where:0 and 1 are called parameters of the model,
is a random variable called the error term.
6
Simple Linear Regression Equation
The simple linear regression equation is:The simple linear regression equation is:
EE((yy) = ) = 00 + + 11xx
• Graph of the regression equation is a straight line.Graph of the regression equation is a straight line.
• 00 is the is the yy intercept of the regression line. intercept of the regression line.
• 11 is the slope of the regression line. is the slope of the regression line.
• EE((yy) is the expected value of ) is the expected value of yy for a given value of for a given value of xx..
Please note that both Please note that both 00 and and 11 are population are population
parameters, depicting the parameters, depicting the truetrue relationship between relationship between yy and and xx..
7
Simple Linear Regression
Example: Stock Market RiskExample: Stock Market Risk
The systematic risk (a common risk shared by all The systematic risk (a common risk shared by all the stocks) of stock market has different impacts on the stocks) of stock market has different impacts on different stocks. Stocks that are more sensitive to different stocks. Stocks that are more sensitive to systematic risk are riskier. We can conduct a regression systematic risk are riskier. We can conduct a regression analysis to estimate the sensitivity of an individual analysis to estimate the sensitivity of an individual stock to the systematic market risk. On the next slide stock to the systematic market risk. On the next slide are shown the data for a sample of 20 most recent are shown the data for a sample of 20 most recent quarterly returns of Netflix and the SPY (an index fund quarterly returns of Netflix and the SPY (an index fund that keeps track of S&P 500).that keeps track of S&P 500).
8
Simple Linear Regression Example: Stock Market Risk (data)Example: Stock Market Risk (data)
Quarter SPY NFLX Quarter SPY NFLX2009Q1 0.0630 0.2537 2011Q3 -0.0246 -0.69142009Q2 0.1366 -0.0302 2011Q4 0.0530 0.46442009Q3 0.0531 0.2164 2012Q1 0.0698 -0.33332009Q4 0.0426 0.1646 2012Q2 -0.0104 -0.29062010Q1 0.1109 0.5888 2012Q3 0.0320 0.39382010Q2 -0.0674 0.0369 2012Q4 0.0666 1.08532010Q3 0.0803 0.6925 2013Q1 0.0714 0.30762010Q4 0.0917 0.2334 2013Q2 0.0621 0.13152011Q1 0.0649 0.0868 2013Q3 0.0470 0.31902011Q2 -0.0474 0.1432 2013Q4 0.0191 0.2693
9
Simple Linear Regression Example: Stock Market Risk (Scatter Diagram)Example: Stock Market Risk (Scatter Diagram)
Quarterly Gross Returns of SPY and NETFLIX, Inc.
-0.9
-0.6
-0.3
0
0.3
0.6
0.9
1.2
-0.09 -0.06 -0.03 0 0.03 0.06 0.09 0.12 0.15
SPY
NF
LX
Trend Line
10
Simple Linear Regression
Example: Stock Market RiskExample: Stock Market RiskFrom the scatter diagram, we observe the following:From the scatter diagram, we observe the following:1.1. The plots are scattered around, indicating the relationship The plots are scattered around, indicating the relationship
between the returns of SPY and Netflix is not perfect.between the returns of SPY and Netflix is not perfect.2.2. The trend line has a positive slope, indicating that the The trend line has a positive slope, indicating that the
relationship is positive, i.e. as the returns of SPY go up, relationship is positive, i.e. as the returns of SPY go up, the returns of Netflix the returns of Netflix tendtend to go up too. to go up too.
3.3. The vertical distance between a plot to the trend line is The vertical distance between a plot to the trend line is the difference between the actual return of Netflix and its the difference between the actual return of Netflix and its estimated value, given an actual return of SPY. The estimated value, given an actual return of SPY. The difference is simply the estimated error, similar to difference is simply the estimated error, similar to
y – E(y).y – E(y).
11
Simple Linear Regression Equation
Positive Linear RelationshipPositive Linear Relationship
EE((yy))
xx
Slope Slope 11
is positiveis positive
Regression lineRegression line
InterceptIntercept00
12
Simple Linear Regression Equation
Negative Linear RelationshipNegative Linear Relationship
xx
EE((yy))
xx
Slope Slope 11
is negativeis negative
Regression lineRegression lineInterceptIntercept
00
13
Simple Linear Regression Equation
No RelationshipNo Relationship
EE((yy))
xx
Slope Slope 11
is 0is 0
Regression lineRegression lineInterceptIntercept00
14
Estimated Simple Linear Regression Equation
The estimated simple linear regression equation:The estimated simple linear regression equation:
• The graph is called the The graph is called the estimatedestimated regression line. regression line.
• bb00 is the is the yy intercept of the estimated regression line. intercept of the estimated regression line.
• bb11 is the slope of the estimated regression line. is the slope of the estimated regression line.
• is the estimated value of is the estimated value of yy for a given value of for a given value of xx..
Please note that Please note that bb00 and and bb11 are sample estimates of are sample estimates of 00 and and
11, respectively, depicting the , respectively, depicting the estimated sampleestimated sample
relationship between relationship between yy and and xx..
0 1y b b x 0 1y b b x
y
15
Estimation Process
Regression ModelRegression Modelyy = = 00 + + 11xx + +
Regression EquationRegression EquationEE((yy) = ) = 00 + + 11xx
Unknown ParametersUnknown Parameters00, , 11
Sample Data:Sample Data:x yx y
xx11 y y11
. .. . . .. . xxnn yynn
bb00 and and bb11
provide estimates ofprovide estimates of00 and and 11
EstimatedEstimatedRegression EquationRegression Equation
Sample StatisticsSample Statistics
bb00, , bb11
0 1y b b x 0 1y b b x
16
Least Squares Method
Least Squares CriterionLeast Squares Criterion
min (y yi i )2min (y yi i )2
where:where:
yyii = = observedobserved value of the dependent variable value of the dependent variable
for the for the iith observationth observation^yyii = = estimatedestimated value of the dependent variable value of the dependent variable
for the for the iith observationth observation
17
Least Squares Method
Least Squares CriterionLeast Squares Criterion
min (y yi i )2min (y yi i )2
• is the estimated error for the ith observation;• Take the square of means that it is the
magnitude of the error not the sign of it (positive or negative) that matters;
• The purpose of The purpose of Least Squares CriterionLeast Squares Criterion is to find the is to find the bb00 and and bb11 that minimize the sum of the square of estimated that minimize the sum of the square of estimated error for all the observations in the sample, i.e. the error for all the observations in the sample, i.e. the best-fit (best-fit (with the smallest overall errorwith the smallest overall error) straight line ) straight line that approximates the relationship between that approximates the relationship between yy and and xx..
ii yy ˆii yy ˆ
18
Least Squares Method
Slope for the Estimated Regression EquationSlope for the Estimated Regression Equation
1 2
( )( )
( )i i
i
x x y yb
x x
1 2
( )( )
( )i i
i
x x y yb
x x
where:where:
xxii = value of independent variable for = value of independent variable for iithth observationobservation
__yy = average value of dependent variable = average value of dependent variable
__xx = average value of independent variable = average value of independent variable
yyii = value of dependent variable for = value of dependent variable for iithth observationobservation
19
Least Squares Method
yy-Intercept for the Estimated Regression Equation-Intercept for the Estimated Regression Equation
0 1b y b x 0 1b y b x
20
Simple Linear Regression Example: Stock Market Risk Example: Stock Market Risk
Quarterly ReturnsQuarterly Returnsof SPY (of SPY (xx))
Quarterly Returns Quarterly Returns of Netflix (of Netflix (yy))
0.06300.06300.13660.1366
0.04700.04700.01910.0191
0.25370.2537-0.0302-0.0302
0.31900.31900.26930.2693
xx = 0.9143 = 0.9143 yy = = 4.04194.0419
0457.0x 2021.0y
21
Estimated Regression Equation
Slope for the Estimated Regression Equation Slope for the Estimated Regression Equation
y-Intercept for the Estimated Regression Equation
Estimated Regression Equation
87.20490.0
1407.021
xx
yyxxb
i
ii
07.00457.087.22021.010 xbyb
xy 87.207.0ˆ
22
Estimated Regression Line – Stock Market Risk Example
Expected Regression Equation
-0.9000
-0.6000
-0.3000
0.0000
0.3000
0.6000
0.9000
1.2000
-0.1000 -0.0500 0.0000 0.0500 0.1000 0.1500
SPY
NF
LX
xy 87.207.0ˆ
23
Coefficient of Determination
where: SST = total sum of squares (i.e. total variability
of y) SSR = sum of squares due to regression (i.e. the
variability of y that is explained by regression) SSE = sum of squares due to error (i.e. the variability
of y that cannot be explained by regression)
SST = SSR + SSE
2( )iy y 2( )iy y 2ˆ( )iy y 2ˆ( )iy y 2ˆ( )i iy y 2ˆ( )i iy y
• Relationship Among SST, SSR, SSE
24
Coefficient of Determination
The coefficient of determination is:
r2 represents the percentage of total variability of y that is explained by regression.
r2 = SSR/SST
25
Coefficient of Determination
r2 = SSR/SST = 0.404/2.741 = 0.147
The regression relationship is actually weak. Only14.7% of the variability in the returns of Netflix can beexplained by the linear relationship between themarket returns (SPY) and the returns of Netflix.
26
Sample Correlation Coefficient
21 ) of(sign rbrxy 21 ) of(sign rbrxy
xyr 1(sign of ) Coefficient of Determination bxyr 1(sign of ) Coefficient of Determination b
where: b1 = the slope of the estimated regression
equation xbby 10ˆ xbby 10ˆ
27
Sample Correlation Coefficient
21 ) of(sign rbrxy 21 ) of(sign rbrxy
The sign of b1 in the equation is “+”.xy 87.207.0ˆ
384.0
147.0
xy
xy
r
r
28
Assumptions About the Error Term
yy = = 00 + + 11xx + +
1. The error is a random variable with mean of zero.1. The error is a random variable with mean of zero.
2. The variance of , denoted by 2, is the same for all values of the independent variable.2. The variance of , denoted by 2, is the same for all values of the independent variable.
3. The values of are independent.3. The values of are independent.
4. The error is a normally distributed random variable.4. The error is a normally distributed random variable.
29
Test for Significance
To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of 1 (slop) is zero.
To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of 1 (slop) is zero.
Two tests are commonly used: Two tests are commonly used:
t Testt Test and F TestF Test
Both the t test and F test require an estimate of 2, the variance of in the regression model. Both the t test and F test require an estimate of 2, the variance of in the regression model.
yy = = 00 + + 11xx + +1 determines the relationship between y and x.
30
Test for Significance
• An Estimate of 2
210
2 )()ˆ(SSE iiii xbbyyy 210
2 )()ˆ(SSE iiii xbbyyy
where:
s 2 = MSE = SSE/(n 2)
The mean square error (MSE) provides the estimate(the sample variance s2 ) of 2.
31
Test for Significance
An Estimate of
2
SSEMSE
ns
2
SSEMSE
ns
• To estimate we take the square root of s2.
• The resulting s is called the standard error of the estimate.
32
Test for Significance: t Test
Hypotheses
Test Statistic
0 1: 0H 0 1: 0H
1: 0aH 1: 0aH
1
1
b
bt
s
1
1
b
bt
s where
21
xx
ss
i
b
33
Test for Significance: t Test
where: t is based on a t distribution
with n - 2 degrees of freedom n is the number of observations in the regression; 2 is the number of parameters (0 & 1) in the regression.
Reject H0 if p-value < or t < -tor t > t
Rejection Rule
34
Test for Significance: t Test
1. Determine the hypotheses.
2. Specify the level of significance.
3. Calculate the test statistic.
= .05
0 1: 0H 0 1: 0H
1: 0aH 1: 0aH
1
1
b
bt
s
1
1
b
bt
s
76.163.1
87.2
1
1 bs
bt
35
Test for Significance: t Test
4. Determine whether to reject H0.
p-Value approacht = 1.76 provides an area of .0473 in the uppertail. Hence, the p-value is 2*0.0473 = 0.0946. Sincep-value is larger than 0.05, we will not reject H0.
Critical Value approachFor =5%, the critical value is 2.1 (a two-tailed test). Since our test statistic t = 1.76, which is less than 2.1, we will not reject H0.
36
Confidence Interval for 1
H0 is rejected if the hypothesized value of 1 is not included in the confidence interval for 1.
We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test.
37
Confidence Interval for 1
• The form of a confidence interval for The form of a confidence interval for 11 is: is:
11 / 2 bb t s11 / 2 bb t s
wherewhere is the is the tt value providing an area value providing an area
of of /2 in the upper tail of a /2 in the upper tail of a tt distribution distribution
with with n n - 2 degrees of freedom- 2 degrees of freedom
2/t 2/t
bb11 is the is thepointpoint
estimatestimatoror
tt/2/2ssb1b1
is theis themarginmarginof errorof error
38
Confidence Interval for 1
Reject H0 if 0 is not included in
the confidence interval for 1.
0 is included in the confidence interval. Do Not Reject H0
= 2.87 ± 2.1(1.63) = 2.87 ± 3.4212/1 bstb 12/1 bstb
or -0.55 to 6.29
Rejection Rule
95% Confidence Interval for 1
Conclusion
39
Test for Significance: F Test
FF = MSR/MSE = MSR/MSE
0 1: 0H 0 1: 0H
1: 0aH 1: 0aH
Hypotheses
Test Statistic
Please note that the hypotheses of the F test are the same as the ones of the t test, which is always the case for a Simple Linear Regression (where there is only one independent variable.)
40
Test for Significance: F Test
Rejection Rule
Reject Reject HH00 if if pp-value -value <<
or or FF >> FF
where:F is based on an F distribution with
1 degree of freedom in the numerator andn - 2 degrees of freedom in the denominator
41
ANOVA Table for A Regression Analysis
MSTRSSTR
-
k 1MSTR
SSTR-
k 1
MSESSE
-
n kT
MSESSE
-
n kT
MSTRMSE
MSTRMSE
Source ofSource ofVariationVariation
Sum ofSum ofSquaresSquares
Degrees ofDegrees ofFreedomFreedom
MeanMeanSquareSquare FF
RegressionRegression
ErrorError
TotalTotal
kk - 1 - 1
nnTT - 1 - 1
SSRSSR
SSESSE
SSTSST
nnT T - - kk
pp--ValueValue
k is the number of parameters in a regression.
nt is the number of observations.
42
ANOVA Table for A Regression Analysis
Source ofSource ofVariationVariation
Sum ofSum ofSquaresSquares
Degrees ofDegrees ofFreedomFreedom
MeanMeanSquareSquare FF
RegressionRegression
ErrorError
TotalTotal
11
1919
0.4040.404
2.3372.337
2.7412.741
1818
pp--ValueValue
0.4040.404
0.130.13
3.113.11 0.0950.095
Stock Market Risk Example -
43
Test for Significance: F Test
1. Determine the hypotheses.
2. Specify the level of significance.
3. Calculate the test statistic.
= .05
0 1: 0H 0 1: 0H
1: 0aH 1: 0aH
F = MSR/MSE
FF = MSR/MSE = 0.404/0.13 = 3.11 = MSR/MSE = 0.404/0.13 = 3.11
The relationship between the The relationship between the FF value and the value and the tt value is value is FF = = tt22, which is only true for simple , which is only true for simple linear regressions.linear regressions.
44
Test for Significance: F Test
4. Determine whether to reject H0.
p-Value approachF = 3.11 provides an area of .0946 in the uppertail. Hence, the p-value is 0.0946. Sincep-value is larger than 0.05, we will not reject H0.
Critical Value approachFor =5%, the critical value is 4.41. Since our test statistic F = 3.11, which is less than 4.41, we will not reject H0.
45
Some Cautions about the Interpretation of Significance Tests
Just because we are able to reject H0: 1 = 0 and demonstrate statistical significance does not enable
us to conclude that there is a linear relationshipbetween x and y.
Rejecting H0: 1 = 0 and concluding that the
relationship between x and y is significant does not enable us to conclude that a cause-and-effect
relationship is present between x and y.