Hypothesis tests for slopes in multiple linear regression model Using the general linear test and...

60
Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares

Transcript of Hypothesis tests for slopes in multiple linear regression model Using the general linear test and...

Page 1: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Hypothesis tests for slopes in multiple linear regression model

Using the general linear test and sequential sums of squares

Page 2: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

An example

Page 3: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Study on heart attacks in rabbits

• An experiment in 32 anesthetized rabbits subjected to an infarction (“heart attack”)

• Three experimental groups:– Hearts cooled to 6º C within 5 minutes of

occluded artery (“early cooling”)– Hearts cooled to 6º C within 25 minutes of

occluded artery (“late cooling”)– Hearts not cooled at all (“no cooling”)

Page 4: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Study on heart attacks in rabbits

• Measurements made at end of experiment:– Size of the infarct area (in grams)– Size of region at risk for infarction (in grams)

• Primary research question:– Does the mean size of the infarcted area differ

among the three treatment groups – no cooling, early cooling, late cooling – when controlling for the size of the region at risk for infarction?

Page 5: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

A potential regression model

iiiii xxxy 3322110

where …

• yi is size of infarcted area (in grams) of rabbit i

• xi1 is size of the region at risk (in grams) of rabbit i

• xi2 = 1 if early cooling of rabbit i, 0 if not

• xi3 = 1 if late cooling of rabbit i, 0 if not

and … the independent error terms i follow a normal distribution with mean 0 and equal variance 2.

Page 6: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The estimated regression function

ELC

1.51.00.5

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

Size of Area at Risk (grams)

Siz

e o

f In

farc

ted

Are

a (g

ram

s)

Early

Late

Control

The regression equation is InfSize = - 0.135 + 0.613 AreaSize - 0.243 X2 - 0.0657 X3

Page 7: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Possible hypothesis tests for slopes

#1. Is the regression model containing all three predictors useful in predicting the size of the infarct?

0 oneleast at :

0: 3210

iAH

H

#2. Is the size of the infarct significantly (linearly) related to the area of the region at risk?

0:

0:

1

10

AH

H

Page 8: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Possible hypothesis tests for slopes

#3. (Primary research question) Is the size of the infarct area significantly (linearly) related to the type of treatment after controlling for the size of the region at risk for infarction?

0 oneleast at :

0: 320

iAH

H

Page 9: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Linear regression’sgeneral linear test

An aside

Page 10: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Three basic steps

• Define a (larger) full model.

• Define a (smaller) reduced model.

• Use an F statistic to decide whether or not to reject the smaller reduced model in favor of the larger full model.

Page 11: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The full model

For simple linear regression, the full model is:

iii xy 10

The full model (or unrestricted model) is the model thought to be most appropriate for the data.

Page 12: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The full model

54321

22

18

14

10

6

High school gpa

Co

llege

ent

ranc

e te

st s

core

xYEY 10

ii xY 10

Page 13: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The full model

75706560

4

3

2

Height (inches)

Gra

de p

oin

t ave

rage

xYEY 10

ii xY 10

Page 14: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The reduced model

The reduced model (or restricted model) is the model described by the null hypothesis H0.

For simple linear regression, the null hypothesis is H0: β1 = 0. Therefore, the reduced model is:

iiY 0

Page 15: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The reduced model

54321

25

15

5

High school gpa

Co

llege

ent

ranc

e te

st s

core

0 YEY

iiY 0

Page 16: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The reduced model

756555

4

3

2

Height (inches)

Gra

de p

oin

t ave

rage

0 YEY

iiY 0

Page 17: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The general linear test approach

• “Fit the full model” to the data.– Obtain least squares estimates of β0 and β1.

– Determine error sum of squares – “SSE(F).”

• “Fit the reduced model” to the data.– Obtain least squares estimate of β0.

– Determine error sum of squares – “SSE(R).”

Page 18: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The general linear test approach

756555

4

3

2

Height (inches)

Gra

de p

oin

t ave

rage

xyF 001.095.2ˆ

015.3ˆ yyR

5028.7ˆ)( 2 ii yyFSSE

5035.7)( 2 yyRSSE i

Page 19: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The general linear test approach

504030

200

150

100

Latitude (at center of state)

Mo

rtal

ity

88.152ˆ yyR

xyF 98.5389ˆ

536372 yyRSSE i

17173ˆ 2 ii yyFSSE

Page 20: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The general linear test approach

• Compare SSE(R) and SSE(F). • SSE(R) is always larger than (or same as) SSE(F).

– If SSE(F) is close to SSE(R), then variation around fitted full model regression function is almost as large as variation around fitted reduced model regression function.

– If SSE(F) and SSE(R) differ greatly, then the additional parameter(s) in the full model substantially reduce the variation around the fitted regression function.

Page 21: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

How close is close?

The test statistic is a function of SSE(R)-SSE(F):

FFR df

FSSE

dfdf

FSSERSSEF

)()()(*

The degrees of freedom (dfR and dfF) are those associated with the reduced and full model error sum of squares, respectively.

Reject H0 if F* is large (or if the P-value is small).

Page 22: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

But for simple linear regression, it’s just the same F test as before

FFR df

FSSE

dfdf

FSSERSSEF

)()()(*

1ndfR

2ndfF

SSTORSSE )(

SSEFSSE )(

MSE

MSR

n

SSE

nn

SSESSTOF

221*

Page 23: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The formal F-test for slope parameter β1

Null hypothesis H0: β1 = 0Alternative hypothesis HA: β1 ≠ 0

Test statisticMSE

MSRF *

P-value = What is the probability that we’d get an F* statistic as large as we did, if the null hypothesis is true?

The P-value is determined by comparing F* to an F distribution with 1 numerator degree of freedom and n-2 denominator degrees of freedom.

Page 24: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Example: Alcoholism and muscle strength?

• Report on strength tests for a sample of 50 alcoholic men– x = total lifetime dose of alcohol (kg per kg of

body weight)– y = strength of deltoid muscle in man’s non-

dominant arm

Page 25: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

0 10 20 30 40

10

20

30

alcohol

stre

ngthReduced Model Fit

32.1224)(1

2

n

ii YYRSSE

164.20ˆ yyR

Fit the reduced model

Page 26: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

0 10 20 30 40

10

20

30

alcohol

stre

ngth

Full Model Fit

27.720ˆ)(1

2

n

iii YYFSSE

xyF 3.037.26ˆ

Fit the full model

Page 27: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The ANOVA table

Analysis of Variance

Source DF SS MS F PRegression 1 504.04 504.040 33.5899 0.000Error 48 720.27 15.006 Total 49 1224.32 SSE(R)=SSTO SSE(F)=SSE

There is a statistically significant linear association between alcoholism and arm strength.

Page 28: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Sequential (or extra) sums of squares

Another aside

Page 29: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

What is a sequential sum of squares?

• It can be viewed in either of two ways:– It is the reduction in the error sum of squares

(SSE) when one or more predictor variables are added to the model.

– Or, it is the increase in the regression sum of squares (SSR) when one or more predictor variables are added to the model.

Page 30: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Notation

• The error sum of squares (SSE) and regression sum of squares (SSR) depend on what predictors are in the model.

• So, note what variables are in the model.– SSE(X1) denotes the error sum of squares when

X1 is the only predictor in the model

– SSR(X1, X2) denotes the regression sum of squares when X1 and X2 are both in the model

Page 31: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Notation

• The sequential sum of squares of adding:– X2 to the model in which X1 is the only predictor

is denoted SSR(X2 | X1)– X1 to the model in which X2 is the only predictor

is denoted SSR(X1 | X2)– X1 to the model in which X2 and X3 are predictors

is denoted SSR(X1 | X2, X3)– X1 and X2 to the model in which X3 is the only

predictor is denoted SSR(X1, X2 | X3)

Page 32: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Allen Cognitive Level (ACL) Study

• David and Riley (1990) investigated relationship of ACL test to level of psychopathology in a set of 69 patients in a hospital psychiatry unit:– Response y = ACL score

– x1 = vocabulary (Vocab) score on Shipley Institute of Living Scale

– x2 = abstraction (Abstract) score on Shipley Institute of Living Scale

– x3 = score on Symbol-Digit Modalities Test (SDMT)

Page 33: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Regress y = ACL on x1 = VocabThe regression equation is ACL = 4.23 + 0.0298 Vocab...Analysis of Variance

Source DF SS MS F PRegression 1 2.6906 2.6906 4.47 0.038Residual Error 67 40.3590 0.6024Total 68 43.0496

6906.21 XSSR 3590.401 XSSE

0496.43)( 1 XSSTO

Page 34: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Regress y = ACL on x1 = Vocab and x3 = SDMT

The regression equation isACL = 3.85 - 0.0068 Vocab + 0.0298 SDMT...Analysis of VarianceSource DF SS MS F PRegression 2 11.7778 5.8889 12.43 0.000Residual Error 66 31.2717 0.4738Total 68 43.0496

Source DF Seq SSVocab 1 2.6906SDMT 1 9.0872

7778.11, 31 XXSSR 2717.31, 31 XXSSE

0496.43),( 31 XXSSTO

Page 35: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The sequential sum of squares SSR(X3 | X1)

SSR(X3 | X1) is the reduction in the error sum of squares when X3 is added to the model in which X1 is the only predictor:

),()(| 31113 XXSSEXSSEXXSSR

0873.92717.313590.40| 13 XXSSR

Page 36: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The sequential sum of squares SSR(X3 | X1)

SSR(X3 | X1) is the increase in the regression sum of squares when X3 is added to the model in which X1 is the only predictor:

)(),(| 13113 XSSRXXSSRXXSSR

0872.96906.27778.11| 13 XXSSR

Page 37: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The sequential sum of squares SSR(X3 | X1)

The regression equation isACL = 3.85 - 0.0068 Vocab + 0.0298 SDMT...Analysis of VarianceSource DF SS MS F PRegression 2 11.7778 5.8889 12.43 0.000Residual Error 66 31.2717 0.4738Total 68 43.0496

Source DF Seq SSVocab 1 2.6906SDMT 1 9.0872

0872.9| 13 XXSSR 6906.21 XSSR

Page 38: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Regress y = ACL on x3 = SDMT

(Order in which predictors are added determine the “Seq SS” you get.)

The regression equation isACL = 3.75 + 0.0281 SDMT...Analysis of Variance

Source DF SS MS F PRegression 1 11.680 11.680 24.95 0.000Residual Error 67 31.370 0.468Total 68 43.050

680.113 XSSR 370.313 XSSE

050.43)( 3 XSSTO

Page 39: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Regress y = ACL on x3 = SDMT and x1 = Vocab

(Order in which predictors are added determine the “Seq SS” you get.)

7778.11, 31 XXSSR 2717.31, 31 XXSSE0496.43),( 31 XXSSTO

The regression equation isACL = 3.85 + 0.0298 SDMT - 0.0068 Vocab...Analysis of VarianceSource DF SS MS F PRegression 2 11.7778 5.8889 12.43 0.000Residual Error 66 31.2717 0.4738Total 68 43.0496

Source DF Seq SSSDMT 1 11.6799Vocab 1 0.0979

Page 40: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The sequential sum of squares SSR(X1 | X3)

SSR(X1 | X3) is the reduction in the error sum of squares when X1 is added to the model in which X3 is the only predictor:

),()(| 31331 XXSSEXSSEXXSSR

0983.02717.31370.31| 31 XXSSR

Page 41: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The sequential sum of squares SSR(X1 | X3)

SSR(X1 | X3) is the increase in the regression sum of squares when X1 is added to the model in which X3 is the only predictor:

)(),(| 33131 XSSRXXSSRXXSSR

0978.0680.117778.11| 31 XXSSR

Page 42: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Regress y = ACL on x3 = SDMT and x1 = Vocab

(Order in which predictors are added determine the “Seq SS” you get.)

The regression equation isACL = 3.85 + 0.0298 SDMT - 0.0068 Vocab...Analysis of VarianceSource DF SS MS F PRegression 2 11.7778 5.8889 12.43 0.000Residual Error 66 31.2717 0.4738Total 68 43.0496

Source DF Seq SSSDMT 1 11.6799Vocab 1 0.0979

0979.0| 31 XXSSR 6799.113 XSSR

Page 43: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

More sequential sums of squares(Regress y on x3, x1, x2)

The regression equation isACL = 3.95 + 0.0274 SDMT - 0.0174 Vocab + 0.0122 Abstract...Analysis of VarianceSource DF SS MS F PRegression 3 12.3009 4.1003 8.67 0.000Residual Error 65 30.7487 0.4731Total 68 43.0496

Source DF Seq SSSDMT 1 11.6799Vocab 1 0.0979Abstract 1 0.5230

0979.0| 31 XXSSR

6799.113 XSSR

5230.0,| 312 XXXSSR

Page 44: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Two- (or three- or more-) degree of freedom sequential sums of squares

The regression equation isACL = 3.95 + 0.0274 SDMT - 0.0174 Vocab + 0.0122 Abstract...Analysis of VarianceSource DF SS MS F PRegression 3 12.3009 4.1003 8.67 0.000Residual Error 65 30.7487 0.4731Total 68 43.0496

Source DF Seq SSSDMT 1 11.6799Vocab 1 0.0979Abstract 1 0.5230

0979.0| 31 XXSSR 5230.0,| 312 XXXSSR 6209.0|, 321 XXXSSR

),,()(|, 3213321 XXXSSEXSSEXXXSSR 6213.07487.30370.31|, 321 XXXSSR

Page 45: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The hypothesis tests for the slopes

Page 46: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Possible hypothesis tests for slopes

#1. Is the regression model containing all three predictors useful in predicting the size of the infarct?

0 oneleast at :

0: 3210

iAH

H

#2. Is the size of the infarct significantly (linearly) related to the area of the region at risk?

0:

0:

1

10

AH

H

Page 47: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Possible hypothesis tests for slopes

#3. (Primary research question) Is the size of the infarct area significantly (linearly) related to the type of treatment upon controlling for the size of the region at risk for infarction?

0 oneleast at :

0: 320

iAH

H

Page 48: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Testing all slope parameters are 0

Full model

iiiii xxxy 3322110

SSEFSSE )( 4ndfF

Reduced model

iiY 0

SSTORSSE )( 1ndfR

Page 49: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Testing all slope parameters are 0

The general linear test statistic:

FFR df

FSSE

dfdf

FSSERSSEF

*

becomes the usual overall F-test:

MSE

MSR

n

SSESSRF

43*

Page 50: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Testing all slope parameters are 0

Use overall F-test and P-value reported in ANOVA table.The regression equation isInfSize = - 0.135 + 0.613 AreaSize - 0.243 X2 - 0.0657 X3...Analysis of VarianceSource DF SS MS F PRegression 3 0.95927 0.31976 16.43 0.000Residual Error 28 0.54491 0.01946Total 31 1.50418

0 oneleast at :

0: 3210

iAH

H

Page 51: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Testing one slope is 0,say β1 = 0

Full model

iiiii xxxy 3322110

321 ,,)( XXXSSEFSSE 4ndfF

Reduced model

iiii xxy 33220

32 ,)( XXSSERSSE 3ndfR

Page 52: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Testing one slope is 0,say β1 = 0

The general linear test statistic:

FFR df

FSSE

dfdf

FSSERSSEF

*

becomes a partial F-test:

4

,,

1

,| 321321*

n

XXXSSEXXXSSRF

321

321*

,,

),|(

XXXMSE

XXXMSRF

Page 53: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Equivalence of t-testto partial F-test for one slope

Since there is only one numerator degree of freedom in the partial F-test for one slope, it is equivalent to the t-test.

),1(2

pnpn Ft

The t-test is a test for the marginal significance of the x1 predictor after x2 and x3 have been taken into account.

Page 54: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The regression equation isInfSize = - 0.135 - 0.2430 X2 - 0.0657 X3 + 0.613 AreaSize

Predictor Coef SE Coef T PConstant -0.1345 0.1040 -1.29 0.206X2 -0.24348 0.06229 -3.91 0.001X3 -0.06566 0.06507 -1.01 0.322AreaSize 0.6127 0.1070 5.72 0.000

S = 0.1395 R-Sq = 63.8% R-Sq(adj) = 59.9%

Analysis of Variance

Source DF SS MS F PRegression 3 0.95927 0.31976 16.43 0.000Residual Error 28 0.54491 0.01946Total 31 1.50418

Source DF Seq SSX2 1 0.29994X3 1 0.02191AreaSize 1 0.63742

Page 55: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Equivalence of the t-test to the partial F-test

7554.32

01946.0

63742.0

4

,,

1

,| 321321*

n

XXXSSEXXXSSRF

The t-test:

72.5* t 001.0...000.0 Pand

The partial F-test:

F distribution with 1 DF in numerator and 28 DF in denominator x P( X <= x ) 32.7554 1.0000

*22* 7184.3272.5 Ft

Page 56: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The regression equation isInfSize = - 0.135 + 0.613 AreaSize - 0.243 X2 - 0.0657 X3

Predictor Coef SE Coef T PConstant -0.1345 0.1040 -1.29 0.206AreaSize 0.6127 0.1070 5.72 0.000X2 -0.24348 0.06229 -3.91 0.001X3 -0.06566 0.06507 -1.01 0.322

S = 0.1395 R-Sq = 63.8% R-Sq(adj) = 59.9%

Analysis of Variance

Source DF SS MS F PRegression 3 0.95927 0.31976 16.43 0.000Residual Error 28 0.54491 0.01946Total 31 1.50418

Source DF Seq SSAreaSize 1 0.62492X2 1 0.31453X3 1 0.01981

Page 57: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Testing whether two slopes are 0, say β2 = β3 = 0

Full model

321 ,,)( XXXSSEFSSE 4ndfF

Reduced model

iii xy 110

1)( XSSERSSE 2ndfR

iiiii xxxy 3322110

Page 58: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Testing whether two slopes are 0, say β2 = β3 = 0

The general linear test statistic:

FFR df

FSSE

dfdf

FSSERSSEF

*

becomes a partial F-test:

4

,,

2

|, 321132*

n

XXXSSEXXXSSRF

),,(

)|,(

321

132*

XXXMSE

XXXMSRF

Page 59: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

The regression equation isInfSize = - 0.135 + 0.613 AreaSize - 0.243 X2 - 0.0657 X3

Predictor Coef SE Coef T PConstant -0.1345 0.1040 -1.29 0.206AreaSize 0.6127 0.1070 5.72 0.000X2 -0.24348 0.06229 -3.91 0.001X3 -0.06566 0.06507 -1.01 0.322

S = 0.1395 R-Sq = 63.8% R-Sq(adj) = 59.9%

Analysis of Variance

Source DF SS MS F PRegression 3 0.95927 0.31976 16.43 0.000Residual Error 28 0.54491 0.01946Total 31 1.50418

Source DF Seq SSAreaSize 1 0.62492X2 1 0.31453X3 1 0.01981

Page 60: Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.

Testing whether β2 = β3 = 0

4

,,

2

|, 321132*

n

XXXSSEXXXSSRF

59.801946.0

2

01981.031453.0*

F

F distribution with 2 DF in numerator and 28 DF in denominator

x P( X <= x ) 8.5900 0.9988

05.00012.09988.01 P