The Simple Linear Regression Model
description
Transcript of The Simple Linear Regression Model
1 1 Slide Slide
The Simple Linear Regression ModelThe Simple Linear Regression Model
Simple Linear Regression ModelSimple Linear Regression Model
yy = = 00 + + 11xx + +
Simple Linear Regression EquationSimple Linear Regression Equation
E(E(yy) = ) = 00 + + 11xx
Estimated Simple Linear Regression EquationEstimated Simple Linear Regression Equation
yy = = bb00 + + bb11xx^
2 2 Slide Slide
最小平方直線(最佳預測直線)最小平方直線(最佳預測直線)
通過平面分佈圖資料點的直線中,使預測通過平面分佈圖資料點的直線中,使預測誤差平方和誤差平方和爲最小者即稱爲最小平方直線,而此方法即稱爲最小爲最小者即稱爲最小平方直線,而此方法即稱爲最小平方法(平方法( Least Square MethodLeast Square Method ))
何謂誤差平方和?何謂誤差平方和?設 爲設 爲 nn 個資料點,若以 做爲以個資料點,若以 做爲以XX 預測預測 YY 的直線,則當的直線,則當 XX == x1x1 ,預測值 與實際觀察,預測值 與實際觀察的的 y1y1 之差異 即稱爲預測誤差,誤差平方和即定義爲之差異 即稱爲預測誤差,誤差平方和即定義爲
求 使函數 求 使函數 f f 爲最小時,由微積分解“極大或極小”方法。 爲最小時,由微積分解“極大或極小”方法。
),(),...,,(),,( 2211 nn yxyxyx xbby 10
11
yyxbby 101
n
i
n
iiiii xbbyyybbf
1 1
210
210 )()(),(
10 , bb
3 3 Slide Slide
最小平方直線最小平方直線
0),(
0),(
1
10
0
10
b
bbfb
bbf
解此聯立方程組
: 可可得得
xbyb
nxx
nyxyxb
ii
iiii
10
221 /)(
/)(
)(ˆ 11110 xxbyxbxbyxbby 故最小平方直線為故最小平方直線為
4 4 Slide Slide
Example: Reed Auto SalesExample: Reed Auto Sales
Simple Linear RegressionSimple Linear Regression
Reed Auto periodically has a special week-long Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed sale. As part of the advertising campaign Reed runs one or more television commercials during runs one or more television commercials during the weekend preceding the sale. Data from a the weekend preceding the sale. Data from a sample of 6 previous sales are shown below.sample of 6 previous sales are shown below.
Number of TV AdsNumber of TV Ads Number of Cars SoldNumber of Cars Sold11 141433 242422 181811 171733 272722 2222
5 5 Slide Slide
Slope for the Estimated Regression EquationSlope for the Estimated Regression Equation
bb11 = 264 - (12)(122)/5 = 5 = 264 - (12)(122)/5 = 5
28 - (12)28 - (12)22/5/5 yy-Intercept for the Estimated Regression -Intercept for the Estimated Regression
EquationEquation
bb00 = 20.333 - 5(2) = 10.333 = 20.333 - 5(2) = 10.333 Estimated Regression EquationEstimated Regression Equation
yy = 10.333 + 5 = 10.333 + 5xx
^
Example: Reed Auto SalesExample: Reed Auto Sales
6 6 Slide Slide
0
10
20
30
0 1 2 3 4
TV ad
Car
s So
ld
Example: Reed Auto SalesExample: Reed Auto Sales
Scatter DiagramScatter Diagram
7 7 Slide Slide
The Coefficient of DeterminationThe Coefficient of Determination
Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE
SST = SSR + SSESST = SSR + SSE
Coefficient of DeterminationCoefficient of Determination
rr22 = SSR/SST = SSR/SST
where:where:
SST = total sum of squaresSST = total sum of squares
SSR = sum of squares due to SSR = sum of squares due to regressionregression
SSE = sum of squares due to errorSSE = sum of squares due to error
( ) ( ) ( )y y y y y yi i i i 2 2 2( ) ( ) ( )y y y y y yi i i i 2 2 2^^
8 8 Slide Slide
判定係數判定係數
定義: 定義: rr22 = SSR/SST = SSR/SST 用以表示用以表示 YY 的變異數中已被的變異數中已被 XX 解釋的部分(比率)解釋的部分(比率)
• 當當 rr2 2 愈大時,表示最小平方直線愈精確愈大時,表示最小平方直線愈精確• 11 - - rr22 為總變異數為總變異數 (SST)(SST) 中無法由中無法由 XX 解釋的餘量(剩餘的比解釋的餘量(剩餘的比
率)率)
• • 表示汽車銷售量的差異與變化有表示汽車銷售量的差異與變化有 85.2%85.2% 可由“廣告次數”可由“廣告次數”
這個因素來解釋(而有這個因素來解釋(而有 14.8%14.8% 無法由“廣告次數”所解釋)無法由“廣告次數”所解釋)
Example: Reed Auto SalesExample: Reed Auto Sales
rr2 = SSR/SST = 100/117.333 2 = SSR/SST = 100/117.333 = .852273= .852273
9 9 Slide Slide
The Correlation CoefficientThe Correlation Coefficient
Sample Correlation CoefficientSample Correlation Coefficient
where:where:
bb11 = the slope of the estimated = the slope of the estimated regressionregression
equation equation
21 ) of(sign rbrxy 21 ) of(sign rbrxy
ionDeterminat oft Coefficien ) of(sign 1brxy ionDeterminat oft Coefficien ) of(sign 1brxy
xbby 10ˆ xbby 10ˆ
10 10 Slide Slide
Example: Reed Auto SalesExample: Reed Auto Sales
Sample Correlation CoefficientSample Correlation Coefficient
The sign of The sign of bb11 in the equation in the equation is “+”.is “+”.
rrxyxy = +.923186 = +.923186
xy 5333.10ˆ xy 5333.10ˆ
21 ) of(sign rbrxy 21 ) of(sign rbrxy
852273.0xyr 852273.0xyr
11 11 Slide Slide
Model AssumptionsModel Assumptions
Assumptions About the Error Term Assumptions About the Error Term • The error The error is a random variable with mean is a random variable with mean
of zero.of zero.
• The variance of The variance of , denoted by , denoted by 22, is the , is the same for all values of the independent same for all values of the independent variable.variable.
• The values of The values of are independent. are independent.
• The error The error is a normally distributed random is a normally distributed random variable.variable.
12 12 Slide Slide
Testing for SignificanceTesting for Significance
To test for a significant regression relationship, To test for a significant regression relationship, we must conduct a hypothesis test to we must conduct a hypothesis test to determine whether the value of determine whether the value of 11 is zero. is zero.
Two tests are commonly usedTwo tests are commonly used
• tt Test Test
• FF Test Test Both tests require an estimate of Both tests require an estimate of 22, the , the
variance of variance of in the regression model. in the regression model.
13 13 Slide Slide
Testing for SignificanceTesting for Significance
An Estimate of An Estimate of 22
The mean square error (MSE) provides the The mean square error (MSE) provides the estimateestimate
of of 22, and the notation , and the notation ss22 is also used. is also used.
ss22 = MSE = SSE/(n-2) = MSE = SSE/(n-2)
where:where: 210
2 )()ˆ(SSE iiii xbbyyy 210
2 )()ˆ(SSE iiii xbbyyy
14 14 Slide Slide
Testing for SignificanceTesting for Significance
An Estimate of An Estimate of • To estimate To estimate we take the square root of we take the square root of
22..
• The resulting The resulting ss is called the is called the standard error standard error of the estimateof the estimate..
2
SSEMSE
n
s2
SSEMSE
n
s
15 15 Slide Slide
Testing for Significance: Testing for Significance: tt Test Test
HypothesesHypotheses
HH00: : 11 = 0 = 0
HHaa: : 11 = 0 = 0 Test StatisticTest Statistic
Rejection RuleRejection Rule
Reject Reject HH00 if if tt < - < -ttor or tt > > tt
where where tt is based on a is based on a tt distribution with distribution with
nn - 2 degrees of freedom. - 2 degrees of freedom.
tbsb
1
1
tbsb
1
1
2)(1
xx
sswhere
i
b
2)(1
xx
sswhere
i
b
16 16 Slide Slide
tt Test Test • HypothesesHypotheses H H00: : 11 = 0 = 0
HHaa: : 11 = 0 = 0• Rejection RuleRejection Rule
For For = .05 and d.f. = 4, = .05 and d.f. = 4, tt.025.025 = = 2.7762.776
Reject Reject HH00 if if tt > 2.776 > 2.776• Test StatisticsTest Statistics
t t = 5/1.0408 = 4.804= 5/1.0408 = 4.804• ConclusionsConclusions
Reject Reject HH00
• P-valueP-value 2P{T>4.804}=0.0086 <0.05 2P{T>4.804}=0.0086 <0.05
Reject Reject HH00
Example: Reed Auto SalesExample: Reed Auto Sales
17 17 Slide Slide
Confidence Interval for Confidence Interval for 11
We can use a 95% confidence interval for We can use a 95% confidence interval for 11 to to test the hypotheses just used in the test the hypotheses just used in the tt test. test.
HH00 is rejected if the hypothesized value of is rejected if the hypothesized value of 11 is is not included in the confidence interval for not included in the confidence interval for 11..
18 18 Slide Slide
Confidence Interval for Confidence Interval for 11
The form of a confidence interval for The form of a confidence interval for 11 is: is:
wherewhere bb11 is the point estimate is the point estimate
is the margin of erroris the margin of error
is the is the tt value providing an area value providing an area
of of /2 in the upper tail of a/2 in the upper tail of a
tt distribution with distribution with n n - 2 degrees- 2 degrees
of freedomof freedom
12/1 bstb 12/1 bstb
12/ bst 12/ bst2/t 2/t
19 19 Slide Slide
Example: Reed Auto SalesExample: Reed Auto Sales
Rejection RuleRejection Rule
Reject Reject HH00 if 0 is not included in the if 0 is not included in the confidence interval for confidence interval for 11..
95% Confidence Interval for 95% Confidence Interval for 11
= 5 2.776(1.0408) = 5 = 5 2.776(1.0408) = 5 2.892.89
or 2.11 to 7.89or 2.11 to 7.89 ConclusionConclusion
Reject Reject HH0 0
12/1 bstb 12/1 bstb
20 20 Slide Slide
Testing for Significance: Testing for Significance: FF Test Test
HypothesesHypotheses
HH00: : 11 = 0 = 0
HHaa: : 11 = 0 = 0 Test StatisticTest Statistic
FF = MSR/MSE = MSR/MSE Rejection RuleRejection Rule
Reject Reject HH00 if if FF > > FF
where where FF is based on an is based on an FF distribution with 1 distribution with 1 d.f. in d.f. in
the numerator and the numerator and nn - 2 d.f. in the - 2 d.f. in the denominator.denominator.
21 21 Slide Slide
F F Test Test
• HypothesesHypotheses H H00: : 11 = 0 = 0
HHaa: : 11 = 0 = 0
• Rejection RuleRejection Rule
For For = .05 and d.f. = 1, 4: = .05 and d.f. = 1, 4: FF.05.05 = = 7.7097.709
Reject Reject HH00 if F > 7.709. if F > 7.709.
• Test StatisticTest Statistic
FF = MSR/MSE = 100/4.333 = 23.077 = MSR/MSE = 100/4.333 = 23.077
• ConclusionConclusion
We can reject We can reject HH00..
Example: Reed Auto SalesExample: Reed Auto Sales
22 22 Slide Slide
Some Cautions about theSome Cautions about theInterpretation of Significance TestsInterpretation of Significance Tests
Rejecting Rejecting HH00: : 11 = 0 and = 0 and concluding that the concluding that the relationship between relationship between xx and and yy is significant does not enable is significant does not enable us to conclude that a us to conclude that a cause-cause-and-effect relationshipand-effect relationship is is present between present between xx and and yy..
Just because we are able to Just because we are able to reject reject HH00: : 11 = 0 and = 0 and demonstrate statistical demonstrate statistical significance does not enable significance does not enable us to conclude that there is a us to conclude that there is a linear relationshiplinear relationship between between xx and and yy..
23 23 Slide Slide
Confidence Interval Estimate of Confidence Interval Estimate of EE((yypp))
Prediction Interval Estimate of Prediction Interval Estimate of yypp
yypp ++ tt/2 /2 ssindind
where the confidence coefficient is 1 - where the confidence coefficient is 1 - and and
tt/2 /2 is based on ais based on a t t distribution with distribution with nn - 2 d.f.- 2 d.f.
is the standard error of the estimate of is the standard error of the estimate of EE((yypp))
ssind ind is the standard error of is the standard error of individual individual
estimate ofestimate of
Using the Estimated Regression EquationUsing the Estimated Regression Equationfor Estimation and Predictionfor Estimation and Prediction
/ y t sp yp 2 / y t sp yp 2
pys ˆ pys ˆ
pypy
24 24 Slide Slide
Standard Errors of Estimate of Standard Errors of Estimate of EE((yypp) and) and y ypp
2
20
ˆ)(
)(1
xx
xx
nSs
iy p
2
20
ˆ)(
)(1
xx
xx
nSs
iy p
2
20
)(
)(11
xx
xx
nSs
iind
2
20
)(
)(11
xx
xx
nSs
iind
25 25 Slide Slide
EE((yypp) ) 與與 yypp 估計式的變異數估計式的變異數
的變異數:的變異數: 的變異數:的變異數:
的變異數:的變異數: 估計式的變異數:估計式的變異數:
估計式的變異數:估計式的變異數:
n
2y
2)(1
xx
sswhere
i
b
2)(1
xx
sswhere
i
b
2
20
22
02
)(
)()(
1 xx
xxxxS
ib
2
)( 01 xxb
)]([)ˆ( 01 xxbyVaryVar
2
20
22
)(
)(
xx
xx
n i
010)( xyE p
010 xyp
)ˆ(yVar 2
26 26 Slide Slide
Point EstimationPoint Estimation
If 3 TV ads are run prior to a sale, we expect If 3 TV ads are run prior to a sale, we expect the mean number of cars sold to be:the mean number of cars sold to be:
yy = 10.333 + 5(3) = 25.333 cars = 10.333 + 5(3) = 25.333 cars Confidence Interval for Confidence Interval for EE((yypp))
95% confidence interval estimate of the mean 95% confidence interval estimate of the mean number of cars sold when 3 TV ads are run is:number of cars sold when 3 TV ads are run is:
25.333 25.333 ++ 3.730 = 21.603 to 29.063 cars 3.730 = 21.603 to 29.063 cars Prediction Interval for Prediction Interval for yypp
95% prediction interval estimate of the 95% prediction interval estimate of the number of cars sold in one particular week number of cars sold in one particular week when 3 TV ads are run is: 25.333 when 3 TV ads are run is: 25.333 ++ 6.878 6.878 = 18.455 to 32.211 cars= 18.455 to 32.211 cars
^
Example: Reed Auto SalesExample: Reed Auto Sales