Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 +...

(Simple) Multiple linear

regression

Multiple regression

• One response (dependent) variable:

– Y

• More than one predictor (independent

variable) variable:

– X1, X2, X3 etc.

– number of predictors = p

• Number of observations = n

Multiple regression - graphical

interpretation

0 1 2 3 4 5 6 7

X1

0

5

10

15

Y

7 8 9 10 11 12

X2

0

5

10

15

Y

Multiple regression graphical explanation.syd

Two possible single variable models:

1) yi = 0 + 1xi1 + I

2) yi = 0 + 2xi2 + i

Which is a better fit?


interpretation


Two possible single variable models:

1) yi = 0 + 1xi1 + I

2) yi = 0 + 2xi2 + i

Which is a better fit?

0 1 2 3 4 5 6 7

X1

0

5

10

15

Y

7 8 9 10 11 12

X2

0

5

10

15

Y

P=0.02

r2=0.67

P=0.61

r2=0.00


interpretation


Perhaps a multiple regression model work fit better:

yi = 0 + 1xi1 + 2xi2 +i

0 1 2 3 4 5 6 7 X1

0

5

10

15

Y

1 3

2

4

5

6

7 8 9 10 11 12 X2

0

5

10

15

Y

X1 Y expected residual X2

1 4 3.02 0.98 11.5

2 3 4.58 -1.58 9.25

3 5 6.14 -1.14 9.25

4 9 7.7 1.3 11.2

5 11.5 9.26 2.24 11.9

6 9 10.82 -1.82 8

residual

𝑦𝑖 = 𝑏0 + 𝑏1𝑥𝑖1


interpretation


Perhaps a multiple regression model work fit better:

yi = 0 + 1xi1 + 2xi2 +i

0 1 2 3 4 5 6 7 X1

0

5

10

15

Y

7 8 9 10 11 12 X2

-2

-1

0

1

2

3

Residual of

X1 Y expected residual X2

1 4 3.02 0.98 11.5

2 3 4.58 -1.58 9.25

3 5 6.14 -1.14 9.25

4 9 7.7 1.3 11.2

5 11.5 9.26 2.24 11.9

6 9 10.82 -1.82 8𝑦𝑖 = 𝑏0 + 𝑏1𝑥𝑖1

𝑦𝑖 = 𝑏0 + 𝑏1𝑥𝑖1


interpretation Perhaps a multiple regression model work fit better:

yi = 0 + 1xi1 + 2xi2 +I

Estimated by

MULTIPLE REGRESSION EXAMPLE X1 Y X2

𝑦𝑖 = 𝑏0 + 𝑏1𝑥𝑖1 + 𝑏2𝑥𝑖2

Multiple regression - statistics

and partial residual plots

Multiple regression 1.syd

X1

X1

Y

X2

X2

X3

X3

X4

X4

Y

y = 0+1x1+2x2+3x3+ 4x4

Overall model

Simple regression results


X1

X1

Y

X2

X2

X3

X3

X4

X4

Y

0.580 y = 0 + 1 x 4

0.0127 y = 0 + 1 x 3

0.366 y = 0 + 1 x 2

<0.00001 y = 0 + 1 x 1

Model

0.580 y = 0 + x

0.0127 y = 0 + x

0.366 y = 0 + 1

<0.00001 y = 0 + x

P - value Model

Multiple regression - statistics

y = 0+1x1+2x2+3x3+ 4x4

P- values

based on

simple

regressions

0.0001

0.366

0.0127

0.580

Multiple regression 1

Akaike (corrected) Information Criterion (Lower is better)

Bayesian Information Criterion (Lower is better)

Multiple regression - partial

residual plots


y = 0+1x1+2x2+3x3+ 4x4

Model Partial residual

y = 0+2x2+3x3+ 4x4 Ypartial(1)

y = 0+1x1+3x3 + 4x4 Ypartial(2)

y = 0+1x1+2x2 + 4x4 Ypartial(3)

y = 0+1x1 +2x2 +3x3 Ypartial(4)

0 50 100 150 200 250 300 350

X1

-200

-100

0

100

200

YP

AR

TIA

L(1

)

-30 -20 -10 0 10 20 30

X2

-30

-20

-10

0

10

20

30

YP

AR

TIA

L(2

)

-15 -10 -5 0 5 10 15

X3

-10

-5

0

5

10

15

YP

AR

TIA

L(3

)

0 10 20 30 40 50 60 70 80 90 100

X4

-3

-2

-1

0

1

2

3

YP

AR

TIA

L(4

)

0 50 100 150 200 250 300 350

X1

0

100

200

300

400

Y

-30 -20 -10 0 10 20 30

X2

0

100

200

300

400

Y

-15 -10 -5 0 5 10 15

X3

0

100

200

300

400

Y

0 10 20 30 40 50 60 70 80 90 100

X4

0

100

200

300

400

Y

Partial residuals vs Xi

Raw data (Y) vs Xi

Ypartial(4)y = 0+1x1 +2x2 +3x3

Ypartial(3)y = 0+1x1+2x2 + 4x4


Ypartial(1)y = 0+2x2+3x3+ 4x4

Partial residualModel

Ypartial(4)y = 0+1x1 +2x2 +3x3



Ypartial(1)y = 0+2x2+3x3+ 4x4

Partial residualModel

Regression models

Linear model:

yi = 0 + 1xi1 + 2xi2 + .... + i

Sample equation:

. .. y b b x b x i = + + +

0 1 i1 2 i2

Partial regression coefficients

• H0: 1 = 0

• Partial population regression coefficient

(slope) for Y on X1, holding all other X’s

constant, equals zero

• Example: assume Y = bird abundance,

X1=Patch Area and X2=Year

– slope of regression of Y against patch area,

holding years constant, equals 0.

Multiple regression plane

Bird

Ab

un

da

nce

Years Patch Area

Testing H0: i = 0

• Use partial t-tests:

• t = bi / SEbi

• Compare with t-distribution with n-2 df

• Separate t-test for each partial regression coefficient in model

• Usual logic of t-tests:

– reject H0 if P < 0.05 (again this is convention – don’t feel tied to this)

Overall regression model

• H0: 1 = 2 = ... = 0 (all population

slopes equal zero).

• Test of whether overall regression

equation is significant.

• Use ANOVA F-test:

– Variation explained by regression

– Unexplained (residual) variation

Assumptions

• Normality and homogeneity of variance for response variable (previously discussed)

• Independence of observations (previously discussed)

• Linearity (previously discussed)

• No collinearity (big deal in multiple regression)

Collinearity

• Collinearity:

– predictors correlated

• Assumption of no collinearity:

– predictor variables uncorrelated with (ie. independent of) each other

• Effect of collinearity:

– estimates of is and significance tests unreliable

Checks for collinearity

• Correlation matrix and/or SPLOM between predictors

• Tolerance for each predictor: – 1-r2 for regression of that predictor on all others – if tolerance is low (near 0.1) then collinearity is a

problem

• VIF values – 1/tolerance – (variance inflator function) – look for large values

(>10)

• Condition indices (not in JMP – Pro) – Greater than 15 – be cautious – Greater than 30 – a serious problem

• Look at all indicators to determine extent of colinearity

Scatterplots • Scatterplot matrix (SPLOM)

– pairwise plots for all variables

• Example: build a multiple regression model to predict total

employment using values of six independent variables. See

Longley.syd

– MODEL total = CONSTANT + deflator + gnp + unemployment +

armforce + population + time DEFLATOR

DE

FLA

TO

R

GNP UNEMPLOY ARMFORCE POPULATN TIME

DE

FLA

TO

R

GN

P

GN

P

UN

EM

PLO

Y

UN

EM

PLO

Y

AR

MF

OR

CE

AR

MF

OR

CE

PO

PU

LA

TN

PO

PU

LA

TN

DEFLATOR

TIM

E

GNP UNEMPLOY ARMFORCE POPULATN TIME

TIM

E

Look at relationship between

predictor variables –

immediately you can see

colinearity problems

Checks for collinearity

• Correlation matrix and/or SPLOM between predictors

• Tolerance for each predictor: – 1-r2 for regression of that predictor on all others – if tolerance is low (near 0.1) then collinearity is a

problem

• VIF values – 1/tolerance – (variance inflator function) – look for large values

(>10)

• Condition indices – Greater than 15 – be cautious – Greater than 30 – a serious problem

• Look at all indicators to determine extent of colinearity

Condition indices

1 2 3 4 5

1.00000 9.14172 12.25574 25.33661 230.42395

6 7

1048.08030 43275.04738

Dependent Variable ¦ TOTAL

N ¦ 16

Multiple R ¦ 0.998

Squared Multiple R ¦ 0.995

Adjusted Squared Multiple R ¦ 0.992

Standard Error of Estimate ¦ 304.854

Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)

CONSTANT -3.48226E+06 8.90420E+05 0.00000 . -3.91080 0.00356

DEFLATOR 15.06187 84.91493 0.04628 0.00738 0.17738 0.86314

GNP -0.03582 0.03349 -1.01375 0.00056 -1.06952 0.31268

UNEMPLOY -2.02023 0.48840 -0.53754 0.02975 -4.13643 0.00254

ARMFORCE -1.03323 0.21427 -0.20474 0.27863 -4.82199 0.00094

POPULATN -0.05110 0.22607 -0.10122 0.00251 -0.22605 0.82621

TIME 1829.15146 455.47850 2.47966 0.00132 4.01589 0.00304

Tolerance and Condition Indices

Longley.syz

Variance Inflator Function (VIF)

Confidence Interval for Regression Coefficients

¦ 95.0% Confidence Interval

Effect ¦ Coefficient Lower Upper VIF

---------+----------------------------------------------------------------

CONSTANT ¦ -3.482259E+006 -5.496529E+006 -1.467988E+006 .

DEFLATOR ¦ 15.061872 -177.029036 207.152780 135.532438

GNP ¦ -0.035819 -0.111581 0.039943 1,788.513483

UNEMPLOY ¦ -2.020230 -3.125067 -0.915393 33.618891

ARMFORCE ¦ -1.033227 -1.517949 -0.548505 3.588930

POPULATN ¦ -0.051104 -0.562517 0.460309 399.151022

TIME ¦ 1,829.151465 798.787513 2,859.515416 758.980597

Solutions to collinearity

• Simplest - Drop redundant (correlated)

predictors

• Principal components regression

– potentially useful

Best model?

• Model that best fits the data with fewest predictors

• Criteria for comparing fit of different models:

– r2 generally unsuitable

– adjusted r2 better

– Mallow’s Cp better

– AIC Best – lower values indicate better fit

Explained variance

r2

proportion of variation in Y explained

by linear relationship with X1, X2 etc.

SS Regression

SS Total

Screening models

• All subsets – recommended

– many models if many predictors ( a big problem)

• Automated stepwise selection: – forward, backward, stepwise

– NOT recommended unless you get the same model both ways

• Check AIC values

• Hierarchical partitioning – contribution of each predictor to r2

Model comparison (simple

version)

• Fit full model:

– y = 0+1x1+2x2+3x3+…

• Fit reduced models (e.g.):

– y = 0+2x2+3x3+…

• Compare

Multiple regression 1

X1

X1

X2 X3 X4 Y

X1

X2

X2

X3

X3

X4

X4

X1

Y

X2 X3 X4 Y

Y

y = 0+1x1+2x2+3x3+ 4x4

Any evidence of

Colinearity?

Model Building

Again check for colinearity

Compare Models using AIC

• Model 1:

– AIC 78.67

– Corrected AIC 85.67

• Model 2

– AIC 77.06

– Corrected AIC 81.67

y = 0+1x1+2x2+3x3+ 4x4

y = 0+1x1+2x2+3x3

Formally: Akaike information

criterion (AIC, AICc)

Sometimes the following equation is used: AIC = 2k + n[ln(RSS/n)]

where,

k = number of fitted parameters

n = number of observations

𝛿2 = residual sum of squares (RSS) / 𝑛

AICc = corrected for small sample size

Lower score means better fit

𝑛 ln 𝛿22𝜋 + 1 + 2(𝑘 + 1)

𝑛 ln 𝛿22𝜋 + 1 + 2(𝑘 + 1) +(2(𝑘 + 1))(𝑘+2

𝑛−𝑘−2)

AIC:

AICc:

Model Selection

How important is each predictor variable to

the model?

Compare models – sequential sum of squares

Model Adjusted r2

y = 0+1x1+2x2+3x3+ 4x4

y = 0+1x1+2x2+3x3

y = 0+1x1+2x2

y = 0+1x1

y = 0+1x1+2x2+3x3+ 4x4

For reference the output from the full model

y = 0+1x1+2x2+3x3+ 4x4

For reference the output from the full model

Compare models – sequential sum of squares

0.96844

0.02743

0.00387

- 0.00001

Contribution to

Model r 2

0.96844 y = 0 +

1 x 1

0.99587 y = 0 +

1 x 1 + 2 x 2

0.99974 y = 0 +

1 x 1 + 2 x 2 +

3 x 3

0.99973 y = 0 +

1 x 1 + 2 x 2 +

3 x 3 + 4 x 4

Adjusted r 2 Model

0.96844

0.02743

0.00387

- 0.00001

Contribution to

Model r 2

0.96844 y = 0 +

1 x 1

0.99587 y = 0 +

1 x 1 + 2 x 2

0.99974 y = 0 +

1 x 1 + 2 x 2 +

3 x 3

0.99973 y = 0 +

1 x 1 + 2 x 2 +

3 x 3 + 4 x 4

Adjusted r 2 Model

(Simple) Non-linear regression

models

Non-linear regression

• Use when you cannot easily linearize a relationship (that is clearly non-linear)

• One response (dependent) variable:

– Y

• One predictor (independent variable) variable:

– X1

• Non-linear functions (of many types)

Regression models

Linear model:

yi = 0 + 1x1 +

Non - Linear model (one of many possible):

yi = 0 + 1x1

2 +


• What is the hypothesis??

– This is a very big question- lets come back to this

• What does r2 mean??

– In linear regression it is the explained variance divided by total variance

– In non-linear it is the same but variance explained can be calculated in two ways

• Based on

• Based on

2ˆ

iy

2)ˆ( yyi

Raw r2

Mean corrected r2


• What is the hypothesis??

0 4 8 12 16

X

0

10

20

30

40

50

60

Y

Non-linear regression (for example)

b*Exp(c*x) a y + =

What are the

hypotheses?

Non-linear regression (many models

might be adequate)

What are the

hypotheses?

Exponential 2p: Y = a*Exp(b*X)

Exponential 3p: Y = a+b*Exp(c*X)

Polynomial cubic: Y = a+b*X+c*X2+d*X3

What are the hypotheses?




a

b

c

a

b

a

b

c

d

Comparing regression Models

• Evaluate assumptions - sometimes (like in the examples here) there are violations

• Simple (but not always correct) - compare adjusted r2

• Problem: what counts??

– Particularly problematic when there are differences in number of estimated parameters

• One solution: compared added fit to expected added fit (because of increased numbers of parameters)

– One major restriction: models that are ‘nested’ are easier to compare

– Means that the general form is the same or can be made the same simply by modifying parameter values

Non-linear regression (many models

might be adequate)

What are the

hypotheses?




Multiple and Non-Linear

Regression

• Be careful!

• Know what your hypotheses are

• Understand how to build models to test your

hypotheses

• Understand statistical output – you may be

mislead if you don’t

Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 +...

Documents

Transcript of Multiple linear regression...Perhaps a multiple regression model work fit better: y i = 0 + 1 x i1 +...