1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of...

1

Chapter 3 Multiple Linear Regression

Ray-Bing Chen

Institute of Statistics

National University of Kaohsiung

2

3.1 Multiple Regression Models

• Multiple regression model: involve more than one regressor variable.

• Example: The yield in pounds of conversion depends on temperature and the catalyst concentration.

3

• E(y) = 50 +10 x1 + 7 x2

4

• The response y may be related to k regressor or predictor variables: (multiple linear regression model)

• The parameter j represents the expected change

in the response y per unit change in xi when all of

the remaining regressor variables xj are held

constant.

5

• Multiple linear regression models are often used as the empirical models or approximating functions. (True model is unknown)

• The cubic model:

• The model with interaction effects:

• Any regression model that is linear in the parameters is a linear regression model, regardless of the shape of the surface that it generates.

7

• The second-order model with interaction:

9

3.2 Estimation of the Model Parameters

3.2.1 Least-squares Estimation of the Regression Coefficients

• n observations (n > k)• Assume

– The error term , E() = 0 and Var() = 2 – The errors are uncorrelated.

– The regressor variables, x1,…, xk are fixed.

10

• The sample regression model:

• The least-squares function:

• The normal equations:

11

• Matrix notation:

12

• The least-squares function:

13

• The fitted model corresponding to the levels of the regressor variable, x:

• The hat matrix, H, is an idempotent matrix and is a symmetric matrix. i.e. H2 = H and HT = H

• H is an orthogonal projection matrix.• Residuals:

14

• Example 3.1 The Delivery Time Data– y: the delivery time,

– x1: the number of cases of product stocked,

– x2: the distance walked by the route driver

– Consider y = 0 + 1 x1 + 2 x2 +

17

3.2.2 A Geometrical Interpretation of Least Square

• y = (y1,…,yn) is the vector of observations.

• X contains p (p = k+1) column vectors (n ×1), i.e.

X = (1,x1,…,xk)

• The column space of X is called the estimation space.

• Any point in the estimation space is X.• Minimize square distance

S()=(y-X)’(y-X)

18

• Normal equation: 0)ˆ(' XyX

19

3.2.3 Properties of the Least Square Estimators• Unbiased estimator:

• Covariance matrix:

• Let C=(X’X)-1

• The LSE is the best linear unbiased estimator• LSE = MLE under normality assumption

)')'(()')'(()ˆ( 11 XXXXEyXXXEE

12 )'()ˆ( XXCov

20

3.2.4 Estimation of 2 • Residual sum of squares:

• The degree of freedom: n – p• The unbiased estimator of 2: Residual mean

squares

yXyy

XXyXyy

XyXy

eeSS s

''ˆ'

ˆ)'('ˆ''ˆ2'

)ˆ()'ˆ(

'Re

pn

SSMS s

s Re

Re

21

• Example 3.2 The Delivery Time Data

• Both estimates are in a sense correct, but they depend heavily on the choice of model.

• The model with small variance would be better.

22

3.2.5 Inadequacy of Scatter Diagrams in Multiple Regression

• For the simple linear regression, the scatter diagram is an important tool in analyzing the relationship between y and x.

• However it may not be useful in multiple regression.

– y = 8 – 5 x1 + 12 x2

– The y v.s. x1 plot do not exhibit any apparent

relationship between y and x1

– The y v.s. x2 plot indicates the linear relationship

with the slope 8.

24

• In this case, constructing scatter diagrams of y v.s. xj (j = 1,2,…,k) can be misleading.

• If there is only one (or a few) dominant regressor, or if the regressors operate nearly independently, the matrix scatterplots is most useful.

25

3.2.6 Maximum-Likelihood Estimation• The Model is y = X + ~N(0, 2I)• The likelihood function and log-likelihood

function:

• The MLE of 2

)()'(2

1))ln()2(ln(

2),(

))2/()()'(exp()2(

1),(

222

22/2

2

XyXyn

l

XyXyLn

26

3.3 Hypothesis Testing in Multiple Linear Regression

• Questions:– What is the overall adequacy of the model?– Which specific regressors seem important?

• Assume the errors are independent and follow a normal distribution with mean 0 and variance 2

27

3.3.1 Test for Significance of Regression• Determine if there is a linear relationship between

y and xj, j = 1,2,…,k.

• The hypotheses are

H0: β1 = β2 =…= βk = 0

H1: βj 0 for at least one j

• ANOVA

• SST = SSR + SSRes

• SSR/2 ~ 2k, SSRes/2 ~ 2

n-k-1, and SSR and SSRes

are independent1,

ReRe0 ~

)1/(

/

knk

s

R

s

R FMS

MS

knSS

kSSF

28

•

• Under H1, F0 follows F distribution with k and n-

k-1 and a noncentrality parameter of

knkn

kk

c

k

ccR

s

xxxx

xxxx

X

k

XXMSE

MSE

11

1111

1*

2

*'*'2

2Re

)',...,(

)(

)(

2

*'*'

ccXX

29

• ANOVA table

31


32

• R2 and Adjusted R2 – R2 always increase when a regressor is added to

the model, regardless of the value of the contribution of that variable.

– An adjusted R2:

– The adjusted R2 will only increase on adding a variable to the model if the addition of the variable reduces the residual mean squares.

)1/(

)/(1 Re2

nSS

pnSSR

T

sadj

33

3.3.2 Tests on Individual Regression Coefficients• For the individual regression coefficient:

– H0: βj = 0 v.s. H1: βj 0

– Let Cjj be the j-th diagonal element of (X’X)-1.

The test statistic:

– This is a partial or marginal test because any estimate of the regression coefficient depends on all of the other regression variables.

– This test is a test of contribution of xj given the

other regressors in the model

120 ~)ˆ(

ˆ

ˆ

ˆ kn

j

j

jj

j tseC

t

34


35

• The subset of regressors:

36

• For the full model, the regression sum of square

• Under the null hypothesis, the regression sum of squares for the reduce model

• The degree of freedom is p-r for the reduce model.

• The regression sum of square due to β2 given β1

• This is called the extra sum of squares due to β2

and the degree of freedom is p - (p - r) = r• The test statistic

yXSSR ''ˆ)(

yXSSR'1

'11

ˆ)(

)()()|( 112 RRR SSSSSS

pnrs

R FMS

rSSF ,

Re

120 ~

/)|(

37

• If β2 0, F0 follows a noncentral F distribution

with

• Multicollinearity: this test actually has no power!

• This test has maximal power when X1 and X2 are

orthogonal to one another!

• Partial F test: Given the regressors in X1, measure

the contribution of the regressors in X2.

22'1

11

'11

'2

'22

])([1

XXXXXIX

38

• Consider y = β0 + β1 x1 + β2 x2 + β3 x3 +

SSR(β1| β0 , β2, β3), SSR(β2| β0 , β1, β3) and SSR(β3|

β0 , β2, β1) are signal-degree-of –freedom sums of

squares.

• SSR(βj| β0 ,…, βj-1, βj, … βk) : the contribution of

xj as if it were the last variable added to the model.

• This F test is equivalent to the t test.

• SST = SSR(β1 ,β2, β3|β0) + SSRes

• SSR(β1 ,β2 , β3|β0) = SSR(β1|β0) + SSR(β2|β1, β0) +

SSR(β3 |β1, β2, β0)

39

• Example 3.5 Delivery Time Data

40

3.3.3 Special Case of Orthogonal Columns in X

• Model: y = Xβ + = X1β1+ X2β2 +

• Orthogonal: X1’X2 = 0

• Since the normal equation (X’X)β= X’y,

•

yX

yX

XX

XX'2

'1

2

1

2'2

1'1

ˆ

ˆ

0

0

yXXXyXXX '2

12

'22

'1

11

'11 )(ˆ and )(ˆ

42

3.3.4 Testing the General Linear Hypothesis• Let T be an m p matrix, and rank(T) = r• Full model: y = Xβ +

• Reduced model: y = Z + , Z is an n (p-r) matrix and is a (p-r) 1 vector. Then

• The difference: SSH = SSRes(RM) – SSRes(FM) with r degree of freedom. SSH is called the sum of squares due to the hypothesis H0: Tβ = 0

freedom) of degree p-(n ''ˆ')(Re yXyyFMSS s

freedom) of degreer p-(n ''ˆ')(

')'(ˆ

Re

1

yZyyRMSS

yZZZ

s

43

• The test statistic:

pnrs

H FpnFMSS

rSSF

,Re

~)/()(

/

45

• Another form:

• H0: Tβ = c v.s. H1: Tβ c Then

)/()(

/ˆ]')'([''ˆ

Re

11

pnFMSS

rTTXXTTF

s

pnrs

FpnFMSS

rcTTXXTcTF

,Re

11

~)/()(

/)ˆ(]')'([)'ˆ(

1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of...

Documents

Transcript of 1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of...