Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 +...

24
(Simple) Multiple linear regression Multiple regression β€’ One response (dependent) variable: – Y β€’ More than one predictor (independent variable) variable: – X 1 , X 2 , X 3 etc. – number of predictors = p β€’ Number of observations = n

Transcript of Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 +...

Page 1: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

(Simple) Multiple linear

regression

Multiple regression

β€’ One response (dependent) variable:

– Y

β€’ More than one predictor (independent

variable) variable:

– X1, X2, X3 etc.

– number of predictors = p

β€’ Number of observations = n

Page 2: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Multiple regression - graphical

interpretation

0 1 2 3 4 5 6 7

X1

0

5

10

15

Y

7 8 9 10 11 12

X2

0

5

10

15

Y

Multiple regression graphical explanation.syd

Two possible single variable models:

1) yi = 0 + 1xi1 + I

2) yi = 0 + 2xi2 + i

Which is a better fit?

Multiple regression - graphical

interpretation

Multiple regression graphical explanation.syd

Two possible single variable models:

1) yi = 0 + 1xi1 + I

2) yi = 0 + 2xi2 + i

Which is a better fit?

0 1 2 3 4 5 6 7

X1

0

5

10

15

Y

7 8 9 10 11 12

X2

0

5

10

15

Y

P=0.02

r2=0.67

P=0.61

r2=0.00

Page 3: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Multiple regression - graphical

interpretation

Multiple regression graphical explanation.syd

Perhaps a multiple regression model work fit better:

yi = 0 + 1xi1 + 2xi2 +i

0 1 2 3 4 5 6 7 X1

0

5

10

15

Y

1 3

2

4

5

6

7 8 9 10 11 12 X2

0

5

10

15

Y

X1 Y expected residual X2

1 4 3.02 0.98 11.5

2 3 4.58 -1.58 9.25

3 5 6.14 -1.14 9.25

4 9 7.7 1.3 11.2

5 11.5 9.26 2.24 11.9

6 9 10.82 -1.82 8

residual

𝑦𝑖 = 𝑏0 + 𝑏1π‘₯𝑖1

Multiple regression - graphical

interpretation

Multiple regression graphical explanation.syd

Perhaps a multiple regression model work fit better:

yi = 0 + 1xi1 + 2xi2 +i

0 1 2 3 4 5 6 7 X1

0

5

10

15

Y

7 8 9 10 11 12 X2

-2

-1

0

1

2

3

Residual of

X1 Y expected residual X2

1 4 3.02 0.98 11.5

2 3 4.58 -1.58 9.25

3 5 6.14 -1.14 9.25

4 9 7.7 1.3 11.2

5 11.5 9.26 2.24 11.9

6 9 10.82 -1.82 8𝑦𝑖 = 𝑏0 + 𝑏1π‘₯𝑖1

𝑦𝑖 = 𝑏0 + 𝑏1π‘₯𝑖1

Page 4: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Multiple regression - graphical

interpretation Perhaps a multiple regression model work fit better:

yi = 0 + 1xi1 + 2xi2 +I

Estimated by

MULTIPLE REGRESSION EXAMPLE X1 Y X2

𝑦𝑖 = 𝑏0 + 𝑏1π‘₯𝑖1 + 𝑏2π‘₯𝑖2

Multiple regression - statistics

and partial residual plots

Multiple regression 1.syd

X1

X1

Y

X2

X2

X3

X3

X4

X4

Y

y = 0+1x1+2x2+3x3+ 4x4

Overall model

Page 5: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Simple regression results

Multiple regression 1.syd

X1

X1

Y

X2

X2

X3

X3

X4

X4

Y

0.580 y = 0 + 1 x 4

0.0127 y = 0 + 1 x 3

0.366 y = 0 + 1 x 2

<0.00001 y = 0 + 1 x 1

Model

0.580 y = 0 + x

0.0127 y = 0 + x

0.366 y = 0 + 1

<0.00001 y = 0 + x

P - value Model

Multiple regression - statistics

y = 0+1x1+2x2+3x3+ 4x4

P- values

based on

simple

regressions

0.0001

0.366

0.0127

0.580

Multiple regression 1

Akaike (corrected) Information Criterion (Lower is better)

Bayesian Information Criterion (Lower is better)

Page 6: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Multiple regression - partial

residual plots

Multiple regression 1.syd

y = 0+1x1+2x2+3x3+ 4x4

Model Partial residual

y = 0+2x2+3x3+ 4x4 Ypartial(1)

y = 0+1x1+3x3 + 4x4 Ypartial(2)

y = 0+1x1+2x2 + 4x4 Ypartial(3)

y = 0+1x1 +2x2 +3x3 Ypartial(4)

0 50 100 150 200 250 300 350

X1

-200

-100

0

100

200

YP

AR

TIA

L(1

)

-30 -20 -10 0 10 20 30

X2

-30

-20

-10

0

10

20

30

YP

AR

TIA

L(2

)

-15 -10 -5 0 5 10 15

X3

-10

-5

0

5

10

15

YP

AR

TIA

L(3

)

0 10 20 30 40 50 60 70 80 90 100

X4

-3

-2

-1

0

1

2

3

YP

AR

TIA

L(4

)

0 50 100 150 200 250 300 350

X1

0

100

200

300

400

Y

-30 -20 -10 0 10 20 30

X2

0

100

200

300

400

Y

-15 -10 -5 0 5 10 15

X3

0

100

200

300

400

Y

0 10 20 30 40 50 60 70 80 90 100

X4

0

100

200

300

400

Y

Partial residuals vs Xi

Raw data (Y) vs Xi

Ypartial(4)y = 0+1x1 +2x2 +3x3

Ypartial(3)y = 0+1x1+2x2 + 4x4

Ypartial(2)y = 0+1x1+3x3 + 4x4

Ypartial(1)y = 0+2x2+3x3+ 4x4

Partial residualModel

Ypartial(4)y = 0+1x1 +2x2 +3x3

Ypartial(3)y = 0+1x1+2x2 + 4x4

Ypartial(2)y = 0+1x1+3x3 + 4x4

Ypartial(1)y = 0+2x2+3x3+ 4x4

Partial residualModel

Page 7: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Regression models

Linear model:

yi = 0 + 1xi1 + 2xi2 + .... + i

Sample equation:

. .. y b b x b x i = + + +

0 1 i1 2 i2

Partial regression coefficients

β€’ H0: 1 = 0

β€’ Partial population regression coefficient

(slope) for Y on X1, holding all other X’s

constant, equals zero

β€’ Example: assume Y = bird abundance,

X1=Patch Area and X2=Year

– slope of regression of Y against patch area,

holding years constant, equals 0.

Page 8: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Multiple regression plane

Bird

Ab

un

da

nce

Years Patch Area

Testing H0: i = 0

β€’ Use partial t-tests:

β€’ t = bi / SEbi

β€’ Compare with t-distribution with n-2 df

β€’ Separate t-test for each partial regression coefficient in model

β€’ Usual logic of t-tests:

– reject H0 if P < 0.05 (again this is convention – don’t feel tied to this)

Page 9: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Overall regression model

β€’ H0: 1 = 2 = ... = 0 (all population

slopes equal zero).

β€’ Test of whether overall regression

equation is significant.

β€’ Use ANOVA F-test:

– Variation explained by regression

– Unexplained (residual) variation

Assumptions

β€’ Normality and homogeneity of variance for response variable (previously discussed)

β€’ Independence of observations (previously discussed)

β€’ Linearity (previously discussed)

β€’ No collinearity (big deal in multiple regression)

Page 10: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Collinearity

β€’ Collinearity:

– predictors correlated

β€’ Assumption of no collinearity:

– predictor variables uncorrelated with (ie. independent of) each other

β€’ Effect of collinearity:

– estimates of is and significance tests unreliable

Checks for collinearity

β€’ Correlation matrix and/or SPLOM between predictors

β€’ Tolerance for each predictor: – 1-r2 for regression of that predictor on all others – if tolerance is low (near 0.1) then collinearity is a

problem

β€’ VIF values – 1/tolerance – (variance inflator function) – look for large values

(>10)

β€’ Condition indices (not in JMP – Pro) – Greater than 15 – be cautious – Greater than 30 – a serious problem

β€’ Look at all indicators to determine extent of colinearity

Page 11: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Scatterplots β€’ Scatterplot matrix (SPLOM)

– pairwise plots for all variables

β€’ Example: build a multiple regression model to predict total

employment using values of six independent variables. See

Longley.syd

– MODEL total = CONSTANT + deflator + gnp + unemployment +

armforce + population + time DEFLATOR

DE

FLA

TO

R

GNP UNEMPLOY ARMFORCE POPULATN TIME

DE

FLA

TO

R

GN

P

GN

P

UN

EM

PLO

Y

UN

EM

PLO

Y

AR

MF

OR

CE

AR

MF

OR

CE

PO

PU

LA

TN

PO

PU

LA

TN

DEFLATOR

TIM

E

GNP UNEMPLOY ARMFORCE POPULATN TIME

TIM

E

Look at relationship between

predictor variables –

immediately you can see

colinearity problems

Checks for collinearity

β€’ Correlation matrix and/or SPLOM between predictors

β€’ Tolerance for each predictor: – 1-r2 for regression of that predictor on all others – if tolerance is low (near 0.1) then collinearity is a

problem

β€’ VIF values – 1/tolerance – (variance inflator function) – look for large values

(>10)

β€’ Condition indices – Greater than 15 – be cautious – Greater than 30 – a serious problem

β€’ Look at all indicators to determine extent of colinearity

Page 12: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Condition indices

1 2 3 4 5

1.00000 9.14172 12.25574 25.33661 230.42395

6 7

1048.08030 43275.04738

Dependent Variable Β¦ TOTAL

N Β¦ 16

Multiple R Β¦ 0.998

Squared Multiple R Β¦ 0.995

Adjusted Squared Multiple R Β¦ 0.992

Standard Error of Estimate Β¦ 304.854

Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)

CONSTANT -3.48226E+06 8.90420E+05 0.00000 . -3.91080 0.00356

DEFLATOR 15.06187 84.91493 0.04628 0.00738 0.17738 0.86314

GNP -0.03582 0.03349 -1.01375 0.00056 -1.06952 0.31268

UNEMPLOY -2.02023 0.48840 -0.53754 0.02975 -4.13643 0.00254

ARMFORCE -1.03323 0.21427 -0.20474 0.27863 -4.82199 0.00094

POPULATN -0.05110 0.22607 -0.10122 0.00251 -0.22605 0.82621

TIME 1829.15146 455.47850 2.47966 0.00132 4.01589 0.00304

Tolerance and Condition Indices

Longley.syz

Variance Inflator Function (VIF)

Confidence Interval for Regression Coefficients

Β¦ 95.0% Confidence Interval

Effect Β¦ Coefficient Lower Upper VIF

---------+----------------------------------------------------------------

CONSTANT Β¦ -3.482259E+006 -5.496529E+006 -1.467988E+006 .

DEFLATOR Β¦ 15.061872 -177.029036 207.152780 135.532438

GNP Β¦ -0.035819 -0.111581 0.039943 1,788.513483

UNEMPLOY Β¦ -2.020230 -3.125067 -0.915393 33.618891

ARMFORCE Β¦ -1.033227 -1.517949 -0.548505 3.588930

POPULATN Β¦ -0.051104 -0.562517 0.460309 399.151022

TIME Β¦ 1,829.151465 798.787513 2,859.515416 758.980597

Page 13: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Solutions to collinearity

β€’ Simplest - Drop redundant (correlated)

predictors

β€’ Principal components regression

– potentially useful

Best model?

β€’ Model that best fits the data with fewest predictors

β€’ Criteria for comparing fit of different models:

– r2 generally unsuitable

– adjusted r2 better

– Mallow’s Cp better

– AIC Best – lower values indicate better fit

Page 14: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Explained variance

r2

proportion of variation in Y explained

by linear relationship with X1, X2 etc.

SS Regression

SS Total

Screening models

β€’ All subsets – recommended

– many models if many predictors ( a big problem)

β€’ Automated stepwise selection: – forward, backward, stepwise

– NOT recommended unless you get the same model both ways

β€’ Check AIC values

β€’ Hierarchical partitioning – contribution of each predictor to r2

Page 15: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Model comparison (simple

version)

β€’ Fit full model:

– y = 0+1x1+2x2+3x3+…

β€’ Fit reduced models (e.g.):

– y = 0+2x2+3x3+…

β€’ Compare

Multiple regression 1

X1

X1

X2 X3 X4 Y

X1

X2

X2

X3

X3

X4

X4

X1

Y

X2 X3 X4 Y

Y

y = 0+1x1+2x2+3x3+ 4x4

Any evidence of

Colinearity?

Model Building

Page 16: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Again check for colinearity

Compare Models using AIC

β€’ Model 1:

– AIC 78.67

– Corrected AIC 85.67

β€’ Model 2

– AIC 77.06

– Corrected AIC 81.67

y = 0+1x1+2x2+3x3+ 4x4

y = 0+1x1+2x2+3x3

Page 17: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Formally: Akaike information

criterion (AIC, AICc)

Sometimes the following equation is used: AIC = 2k + n[ln(RSS/n)]

where,

k = number of fitted parameters

n = number of observations

𝛿2 = residual sum of squares (RSS) / 𝑛

AICc = corrected for small sample size

Lower score means better fit

𝑛 ln 𝛿22πœ‹ + 1 + 2(π‘˜ + 1)

𝑛 ln 𝛿22πœ‹ + 1 + 2(π‘˜ + 1) +(2(π‘˜ + 1))(π‘˜+2

π‘›βˆ’π‘˜βˆ’2)

AIC:

AICc:

Model Selection

Page 18: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

How important is each predictor variable to

the model?

Compare models – sequential sum of squares

Model Adjusted r2

y = 0+1x1+2x2+3x3+ 4x4

y = 0+1x1+2x2+3x3

y = 0+1x1+2x2

y = 0+1x1

y = 0+1x1+2x2+3x3+ 4x4

For reference the output from the full model

Page 19: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

y = 0+1x1+2x2+3x3+ 4x4

For reference the output from the full model

Compare models – sequential sum of squares

0.96844

0.02743

0.00387

- 0.00001

Contribution to

Model r 2

0.96844 y = 0 +

1 x 1

0.99587 y = 0 +

1 x 1 + 2 x 2

0.99974 y = 0 +

1 x 1 + 2 x 2 +

3 x 3

0.99973 y = 0 +

1 x 1 + 2 x 2 +

3 x 3 + 4 x 4

Adjusted r 2 Model

0.96844

0.02743

0.00387

- 0.00001

Contribution to

Model r 2

0.96844 y = 0 +

1 x 1

0.99587 y = 0 +

1 x 1 + 2 x 2

0.99974 y = 0 +

1 x 1 + 2 x 2 +

3 x 3

0.99973 y = 0 +

1 x 1 + 2 x 2 +

3 x 3 + 4 x 4

Adjusted r 2 Model

(Simple) Non-linear regression

models

Page 20: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Non-linear regression

β€’ Use when you cannot easily linearize a relationship (that is clearly non-linear)

β€’ One response (dependent) variable:

– Y

β€’ One predictor (independent variable) variable:

– X1

β€’ Non-linear functions (of many types)

Regression models

Linear model:

yi = 0 + 1x1 +

Non - Linear model (one of many possible):

yi = 0 + 1x1

2 +

Page 21: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Non-linear regression

β€’ What is the hypothesis??

– This is a very big question- lets come back to this

β€’ What does r2 mean??

– In linear regression it is the explained variance divided by total variance

– In non-linear it is the same but variance explained can be calculated in two ways

β€’ Based on

β€’ Based on

2Λ†

iy

2)Λ†( yyi

Raw r2

Mean corrected r2

Non-linear regression

β€’ What is the hypothesis??

0 4 8 12 16

X

0

10

20

30

40

50

60

Y

Page 22: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Non-linear regression (for example)

b*Exp(c*x) a y + =

What are the

hypotheses?

Non-linear regression (many models

might be adequate)

What are the

hypotheses?

Exponential 2p: Y = a*Exp(b*X)

Exponential 3p: Y = a+b*Exp(c*X)

Polynomial cubic: Y = a+b*X+c*X2+d*X3

Page 23: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

What are the hypotheses?

Exponential 2p: Y = a*Exp(b*X)

Exponential 3p: Y = a+b*Exp(c*X)

Polynomial cubic: Y = a+b*X+c*X2+d*X3

a

b

c

a

b

a

b

c

d

Comparing regression Models

β€’ Evaluate assumptions - sometimes (like in the examples here) there are violations

β€’ Simple (but not always correct) - compare adjusted r2

β€’ Problem: what counts??

– Particularly problematic when there are differences in number of estimated parameters

β€’ One solution: compared added fit to expected added fit (because of increased numbers of parameters)

– One major restriction: models that are β€˜nested’ are easier to compare

– Means that the general form is the same or can be made the same simply by modifying parameter values

Page 24: Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 + 2 x i2 + I Estimated by MULTIPLE REGRESSION EXAMPLE X1 Y X2 𝑖=𝑏0+𝑏1 𝑖1+𝑏2

Non-linear regression (many models

might be adequate)

What are the

hypotheses?

Exponential 2p: Y = a*Exp(b*X)

Exponential 3p: Y = a+b*Exp(c*X)

Polynomial cubic: Y = a+b*X+c*X2+d*X3

Multiple and Non-Linear

Regression

β€’ Be careful!

β€’ Know what your hypotheses are

β€’ Understand how to build models to test your

hypotheses

β€’ Understand statistical output – you may be

mislead if you don’t