The Simple Linear Regression Model

26
1 The Simple Linear Regression Model The Simple Linear Regression Model Simple Linear Regression Model Simple Linear Regression Model y y = = 0 + + 1 x x + + Simple Linear Regression Equation Simple Linear Regression Equation E( E( y y ) = ) = 0 + + 1 x x Estimated Simple Linear Regression Estimated Simple Linear Regression Equation Equation y y = = b b 0 + + b b 1 x x ^ ^

description

The Simple Linear Regression Model. Simple Linear Regression Model y =  0 +  1 x +  Simple Linear Regression Equation E( y ) =  0 +  1 x Estimated Simple Linear Regression Equation y = b 0 + b 1 x. ^. 最小平方直線(最佳預測直線). - PowerPoint PPT Presentation

Transcript of The Simple Linear Regression Model

Page 1: The Simple Linear Regression Model

1 1 Slide Slide

The Simple Linear Regression ModelThe Simple Linear Regression Model

Simple Linear Regression ModelSimple Linear Regression Model

yy = = 00 + + 11xx + +

Simple Linear Regression EquationSimple Linear Regression Equation

E(E(yy) = ) = 00 + + 11xx

Estimated Simple Linear Regression EquationEstimated Simple Linear Regression Equation

yy = = bb00 + + bb11xx^

Page 2: The Simple Linear Regression Model

2 2 Slide Slide

最小平方直線(最佳預測直線)最小平方直線(最佳預測直線)

通過平面分佈圖資料點的直線中,使預測通過平面分佈圖資料點的直線中,使預測誤差平方和誤差平方和爲最小者即稱爲最小平方直線,而此方法即稱爲最小爲最小者即稱爲最小平方直線,而此方法即稱爲最小平方法(平方法( Least Square MethodLeast Square Method ))

何謂誤差平方和?何謂誤差平方和?設 爲設 爲 nn 個資料點,若以 做爲以個資料點,若以 做爲以XX 預測預測 YY 的直線,則當的直線,則當 XX == x1x1 ,預測值 與實際觀察,預測值 與實際觀察的的 y1y1 之差異 即稱爲預測誤差,誤差平方和即定義爲之差異 即稱爲預測誤差,誤差平方和即定義爲

求 使函數 求 使函數 f f 爲最小時,由微積分解“極大或極小”方法。 爲最小時,由微積分解“極大或極小”方法。

),(),...,,(),,( 2211 nn yxyxyx xbby 10

11

yyxbby 101

n

i

n

iiiii xbbyyybbf

1 1

210

210 )()(),(

10 , bb

Page 3: The Simple Linear Regression Model

3 3 Slide Slide

最小平方直線最小平方直線

0),(

0),(

1

10

0

10

b

bbfb

bbf

解此聯立方程組

: 可可得得

xbyb

nxx

nyxyxb

ii

iiii

10

221 /)(

/)(

)(ˆ 11110 xxbyxbxbyxbby 故最小平方直線為故最小平方直線為

Page 4: The Simple Linear Regression Model

4 4 Slide Slide

Example: Reed Auto SalesExample: Reed Auto Sales

Simple Linear RegressionSimple Linear Regression

Reed Auto periodically has a special week-long Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed sale. As part of the advertising campaign Reed runs one or more television commercials during runs one or more television commercials during the weekend preceding the sale. Data from a the weekend preceding the sale. Data from a sample of 6 previous sales are shown below.sample of 6 previous sales are shown below.

Number of TV AdsNumber of TV Ads Number of Cars SoldNumber of Cars Sold11 141433 242422 181811 171733 272722 2222

Page 5: The Simple Linear Regression Model

5 5 Slide Slide

Slope for the Estimated Regression EquationSlope for the Estimated Regression Equation

bb11 = 264 - (12)(122)/5 = 5 = 264 - (12)(122)/5 = 5

28 - (12)28 - (12)22/5/5 yy-Intercept for the Estimated Regression -Intercept for the Estimated Regression

EquationEquation

bb00 = 20.333 - 5(2) = 10.333 = 20.333 - 5(2) = 10.333 Estimated Regression EquationEstimated Regression Equation

yy = 10.333 + 5 = 10.333 + 5xx

^

Example: Reed Auto SalesExample: Reed Auto Sales

Page 6: The Simple Linear Regression Model

6 6 Slide Slide

0

10

20

30

0 1 2 3 4

TV ad

Car

s So

ld

Example: Reed Auto SalesExample: Reed Auto Sales

Scatter DiagramScatter Diagram

Page 7: The Simple Linear Regression Model

7 7 Slide Slide

The Coefficient of DeterminationThe Coefficient of Determination

Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE

SST = SSR + SSESST = SSR + SSE

Coefficient of DeterminationCoefficient of Determination

rr22 = SSR/SST = SSR/SST

where:where:

SST = total sum of squaresSST = total sum of squares

SSR = sum of squares due to SSR = sum of squares due to regressionregression

SSE = sum of squares due to errorSSE = sum of squares due to error

( ) ( ) ( )y y y y y yi i i i 2 2 2( ) ( ) ( )y y y y y yi i i i 2 2 2^^

Page 8: The Simple Linear Regression Model

8 8 Slide Slide

判定係數判定係數

定義: 定義: rr22 = SSR/SST = SSR/SST 用以表示用以表示 YY 的變異數中已被的變異數中已被 XX 解釋的部分(比率)解釋的部分(比率)

• 當當 rr2 2 愈大時,表示最小平方直線愈精確愈大時,表示最小平方直線愈精確• 11 - - rr22 為總變異數為總變異數 (SST)(SST) 中無法由中無法由 XX 解釋的餘量(剩餘的比解釋的餘量(剩餘的比

率)率)

• • 表示汽車銷售量的差異與變化有表示汽車銷售量的差異與變化有 85.2%85.2% 可由“廣告次數”可由“廣告次數”

這個因素來解釋(而有這個因素來解釋(而有 14.8%14.8% 無法由“廣告次數”所解釋)無法由“廣告次數”所解釋)

Example: Reed Auto SalesExample: Reed Auto Sales

rr2 = SSR/SST = 100/117.333 2 = SSR/SST = 100/117.333 = .852273= .852273

Page 9: The Simple Linear Regression Model

9 9 Slide Slide

The Correlation CoefficientThe Correlation Coefficient

Sample Correlation CoefficientSample Correlation Coefficient

where:where:

bb11 = the slope of the estimated = the slope of the estimated regressionregression

equation equation

21 ) of(sign rbrxy 21 ) of(sign rbrxy

ionDeterminat oft Coefficien ) of(sign 1brxy ionDeterminat oft Coefficien ) of(sign 1brxy

xbby 10ˆ xbby 10ˆ

Page 10: The Simple Linear Regression Model

10 10 Slide Slide

Example: Reed Auto SalesExample: Reed Auto Sales

Sample Correlation CoefficientSample Correlation Coefficient

The sign of The sign of bb11 in the equation in the equation is “+”.is “+”.

rrxyxy = +.923186 = +.923186

xy 5333.10ˆ xy 5333.10ˆ

21 ) of(sign rbrxy 21 ) of(sign rbrxy

852273.0xyr 852273.0xyr

Page 11: The Simple Linear Regression Model

11 11 Slide Slide

Model AssumptionsModel Assumptions

Assumptions About the Error Term Assumptions About the Error Term • The error The error is a random variable with mean is a random variable with mean

of zero.of zero.

• The variance of The variance of , denoted by , denoted by 22, is the , is the same for all values of the independent same for all values of the independent variable.variable.

• The values of The values of are independent. are independent.

• The error The error is a normally distributed random is a normally distributed random variable.variable.

Page 12: The Simple Linear Regression Model

12 12 Slide Slide

Testing for SignificanceTesting for Significance

To test for a significant regression relationship, To test for a significant regression relationship, we must conduct a hypothesis test to we must conduct a hypothesis test to determine whether the value of determine whether the value of 11 is zero. is zero.

Two tests are commonly usedTwo tests are commonly used

• tt Test Test

• FF Test Test Both tests require an estimate of Both tests require an estimate of 22, the , the

variance of variance of in the regression model. in the regression model.

Page 13: The Simple Linear Regression Model

13 13 Slide Slide

Testing for SignificanceTesting for Significance

An Estimate of An Estimate of 22

The mean square error (MSE) provides the The mean square error (MSE) provides the estimateestimate

of of 22, and the notation , and the notation ss22 is also used. is also used.

ss22 = MSE = SSE/(n-2) = MSE = SSE/(n-2)

where:where: 210

2 )()ˆ(SSE iiii xbbyyy 210

2 )()ˆ(SSE iiii xbbyyy

Page 14: The Simple Linear Regression Model

14 14 Slide Slide

Testing for SignificanceTesting for Significance

An Estimate of An Estimate of • To estimate To estimate we take the square root of we take the square root of

22..

• The resulting The resulting ss is called the is called the standard error standard error of the estimateof the estimate..

2

SSEMSE

n

s2

SSEMSE

n

s

Page 15: The Simple Linear Regression Model

15 15 Slide Slide

Testing for Significance: Testing for Significance: tt Test Test

HypothesesHypotheses

HH00: : 11 = 0 = 0

HHaa: : 11 = 0 = 0 Test StatisticTest Statistic

Rejection RuleRejection Rule

Reject Reject HH00 if if tt < - < -ttor or tt > > tt

where where tt is based on a is based on a tt distribution with distribution with

nn - 2 degrees of freedom. - 2 degrees of freedom.

tbsb

1

1

tbsb

1

1

2)(1

xx

sswhere

i

b

2)(1

xx

sswhere

i

b

Page 16: The Simple Linear Regression Model

16 16 Slide Slide

tt Test Test • HypothesesHypotheses H H00: : 11 = 0 = 0

HHaa: : 11 = 0 = 0• Rejection RuleRejection Rule

For For = .05 and d.f. = 4, = .05 and d.f. = 4, tt.025.025 = = 2.7762.776

Reject Reject HH00 if if tt > 2.776 > 2.776• Test StatisticsTest Statistics

t t = 5/1.0408 = 4.804= 5/1.0408 = 4.804• ConclusionsConclusions

Reject Reject HH00

• P-valueP-value 2P{T>4.804}=0.0086 <0.05 2P{T>4.804}=0.0086 <0.05

Reject Reject HH00

Example: Reed Auto SalesExample: Reed Auto Sales

Page 17: The Simple Linear Regression Model

17 17 Slide Slide

Confidence Interval for Confidence Interval for 11

We can use a 95% confidence interval for We can use a 95% confidence interval for 11 to to test the hypotheses just used in the test the hypotheses just used in the tt test. test.

HH00 is rejected if the hypothesized value of is rejected if the hypothesized value of 11 is is not included in the confidence interval for not included in the confidence interval for 11..

Page 18: The Simple Linear Regression Model

18 18 Slide Slide

Confidence Interval for Confidence Interval for 11

The form of a confidence interval for The form of a confidence interval for 11 is: is:

wherewhere bb11 is the point estimate is the point estimate

is the margin of erroris the margin of error

is the is the tt value providing an area value providing an area

of of /2 in the upper tail of a/2 in the upper tail of a

tt distribution with distribution with n n - 2 degrees- 2 degrees

of freedomof freedom

12/1 bstb 12/1 bstb

12/ bst 12/ bst2/t 2/t

Page 19: The Simple Linear Regression Model

19 19 Slide Slide

Example: Reed Auto SalesExample: Reed Auto Sales

Rejection RuleRejection Rule

Reject Reject HH00 if 0 is not included in the if 0 is not included in the confidence interval for confidence interval for 11..

95% Confidence Interval for 95% Confidence Interval for 11

= 5 2.776(1.0408) = 5 = 5 2.776(1.0408) = 5 2.892.89

or 2.11 to 7.89or 2.11 to 7.89 ConclusionConclusion

Reject Reject HH0 0

12/1 bstb 12/1 bstb

Page 20: The Simple Linear Regression Model

20 20 Slide Slide

Testing for Significance: Testing for Significance: FF Test Test

HypothesesHypotheses

HH00: : 11 = 0 = 0

HHaa: : 11 = 0 = 0 Test StatisticTest Statistic

FF = MSR/MSE = MSR/MSE Rejection RuleRejection Rule

Reject Reject HH00 if if FF > > FF

where where FF is based on an is based on an FF distribution with 1 distribution with 1 d.f. in d.f. in

the numerator and the numerator and nn - 2 d.f. in the - 2 d.f. in the denominator.denominator.

Page 21: The Simple Linear Regression Model

21 21 Slide Slide

F F Test Test

• HypothesesHypotheses H H00: : 11 = 0 = 0

HHaa: : 11 = 0 = 0

• Rejection RuleRejection Rule

For For = .05 and d.f. = 1, 4: = .05 and d.f. = 1, 4: FF.05.05 = = 7.7097.709

Reject Reject HH00 if F > 7.709. if F > 7.709.

• Test StatisticTest Statistic

FF = MSR/MSE = 100/4.333 = 23.077 = MSR/MSE = 100/4.333 = 23.077

• ConclusionConclusion

We can reject We can reject HH00..

Example: Reed Auto SalesExample: Reed Auto Sales

Page 22: The Simple Linear Regression Model

22 22 Slide Slide

Some Cautions about theSome Cautions about theInterpretation of Significance TestsInterpretation of Significance Tests

Rejecting Rejecting HH00: : 11 = 0 and = 0 and concluding that the concluding that the relationship between relationship between xx and and yy is significant does not enable is significant does not enable us to conclude that a us to conclude that a cause-cause-and-effect relationshipand-effect relationship is is present between present between xx and and yy..

Just because we are able to Just because we are able to reject reject HH00: : 11 = 0 and = 0 and demonstrate statistical demonstrate statistical significance does not enable significance does not enable us to conclude that there is a us to conclude that there is a linear relationshiplinear relationship between between xx and and yy..

Page 23: The Simple Linear Regression Model

23 23 Slide Slide

Confidence Interval Estimate of Confidence Interval Estimate of EE((yypp))

Prediction Interval Estimate of Prediction Interval Estimate of yypp

yypp ++ tt/2 /2 ssindind

where the confidence coefficient is 1 - where the confidence coefficient is 1 - and and

tt/2 /2 is based on ais based on a t t distribution with distribution with nn - 2 d.f.- 2 d.f.

is the standard error of the estimate of is the standard error of the estimate of EE((yypp))

ssind ind is the standard error of is the standard error of individual individual

estimate ofestimate of

Using the Estimated Regression EquationUsing the Estimated Regression Equationfor Estimation and Predictionfor Estimation and Prediction

/ y t sp yp 2 / y t sp yp 2

pys ˆ pys ˆ

pypy

Page 24: The Simple Linear Regression Model

24 24 Slide Slide

Standard Errors of Estimate of Standard Errors of Estimate of EE((yypp) and) and y ypp

2

20

ˆ)(

)(1

xx

xx

nSs

iy p

2

20

ˆ)(

)(1

xx

xx

nSs

iy p

2

20

)(

)(11

xx

xx

nSs

iind

2

20

)(

)(11

xx

xx

nSs

iind

Page 25: The Simple Linear Regression Model

25 25 Slide Slide

EE((yypp) ) 與與 yypp 估計式的變異數估計式的變異數

的變異數:的變異數: 的變異數:的變異數:

的變異數:的變異數: 估計式的變異數:估計式的變異數:

估計式的變異數:估計式的變異數:

n

2y

2)(1

xx

sswhere

i

b

2)(1

xx

sswhere

i

b

2

20

22

02

)(

)()(

1 xx

xxxxS

ib

2

)( 01 xxb

)]([)ˆ( 01 xxbyVaryVar

2

20

22

)(

)(

xx

xx

n i

010)( xyE p

010 xyp

)ˆ(yVar 2

Page 26: The Simple Linear Regression Model

26 26 Slide Slide

Point EstimationPoint Estimation

If 3 TV ads are run prior to a sale, we expect If 3 TV ads are run prior to a sale, we expect the mean number of cars sold to be:the mean number of cars sold to be:

yy = 10.333 + 5(3) = 25.333 cars = 10.333 + 5(3) = 25.333 cars Confidence Interval for Confidence Interval for EE((yypp))

95% confidence interval estimate of the mean 95% confidence interval estimate of the mean number of cars sold when 3 TV ads are run is:number of cars sold when 3 TV ads are run is:

25.333 25.333 ++ 3.730 = 21.603 to 29.063 cars 3.730 = 21.603 to 29.063 cars Prediction Interval for Prediction Interval for yypp

95% prediction interval estimate of the 95% prediction interval estimate of the number of cars sold in one particular week number of cars sold in one particular week when 3 TV ads are run is: 25.333 when 3 TV ads are run is: 25.333 ++ 6.878 6.878 = 18.455 to 32.211 cars= 18.455 to 32.211 cars

^

Example: Reed Auto SalesExample: Reed Auto Sales