Mult reg

51
Multiple Regression Goals Implementation Assumptions

description

 

Transcript of Mult reg

Page 1: Mult reg

Multiple Regression

GoalsImplementation

Assumptions

Page 2: Mult reg

Goals of Regression Description Inference Prediction (Forecasting)

Page 3: Mult reg

Examples

Page 4: Mult reg

Why is there a need for more than one predictor variable?

Shown using the examples given above:

more than one variable influences a response variable.

Predictors may themselves be correlated,

What is the independent contribution of each variable to explaining the variation in the response variable.

Page 5: Mult reg

Three fundamental aspects of linear regression

Model selection – What is the most parsimonious set of

predictors that explain the most variation in the response variable

Evaluation of Assumptions Have we met the assumptions of the

regression model Model validation

Page 6: Mult reg

The multiple regression model Express a p variable regression

model as a series of equations P equations condensed into a

matrix form, gives the familiar general linear

model coefficients are known as partial

regression coefficients

Page 7: Mult reg

The p – variable Regression Model

    

  

This model gives the expected value of Y conditional on the fixed values of X2, X3, Xp, plus error

1 - Intercept

2p- Partial Regression slope coefficients

i - Residual term associated with the ith observation

ipipiii XXXY 33221

Page 8: Mult reg

Matrix Representation

Regression model is best described as a system of equations:

npnpnn

pp

pp

n XXX

XXX

XXX

Y

Y

Y

33221

22323222

113132121

1

2

1

1

Page 9: Mult reg

We can re-write these equations:

nppnn

p

p

n XXX

XXX

XXX

Y

Y

Y

2

1

2

1

332

23222

13121

2

1

1

1

1

Y = X +

(n 1)

(n p) (p 1)

(n 1)

Page 10: Mult reg

Summary of Terms

Y = n 1 column vector of observations for response variable

X = n p matrix that portrays the n observations on p – 1 independent variablesX2, , Xp, and the first column of 1’s represents the intercept term, e.g., 1

= p 1 column vector of unknown parameters, 1, 2, , p, where 1, is the intercept term and the 2, , p, are partial regression coefficients.

= n 1 column vector of residuals i

Page 11: Mult reg

Response Variable

Intercept

Partial Regression Coefficient

Predictor Variable

A Partial Regression Model

Burst = 1.21 + 2.1 Femur Length – 0.25 Tail Length + 1.0 Toe Velocity

Page 12: Mult reg

Assumption 1. Expected value of the residual vector is 0

0

0

0

2

1

n

EE

Page 13: Mult reg

Assumption 2. There is no correlation between

the ith and jth residual terms

0jiE

Page 14: Mult reg

Assumption 3. The residuals exhibit constant

variance

IE 2

Page 15: Mult reg

Assumption 4. Covariance between the X’s and

residual terms is 0 Usually satisfied if the predictor

variables are fixed and non-stochastic

0,cov X

Page 16: Mult reg

Assumption 5. The rank of the data matrix, X is p,

the number of columns p < n, the number of observations. No exact linear relationships among

X variables. Assumption of no multicollinearity

pXr

Page 17: Mult reg

If these assumptions hold… Then the OLS estimators are in

the class of unbiased linear estimators

Also minimum variance estimators

Page 18: Mult reg

What does it mean to be BLUE? What does this mean? Allows us to compute a number of

statistics. OLS estimation

Page 19: Mult reg

An estimator , is the best linear unbiased estimator of , iff  Linear Unbiased, i.e., E( ) = Minimum variance in class of all linear

unbiased estimators Unbiased and minimum variance properties

means that OLS estimators are efficient estimators

If one or more of the conditions are not met than the OLS estimators are no longer BLUE

Page 20: Mult reg

Does is matter?

Yes, it means we require an alternative method for

characterizing the association between our Y

and X variables

Page 21: Mult reg

OLS Estimation

eXbY Sample-based counter part to population regression model:

OLS requires choosing values of b, such that error sum-of-squares (SSE) is as small as possible.

Page 22: Mult reg

The Normal Equations

XbYXbYeeSSE

Need to differentiate with respect to the unknowns (b):

Yields p simultaneous equations in p unknowns, Also known as the Normal Equations

Page 23: Mult reg

Matrix form of the Normal Equations

YXbXX

Page 24: Mult reg

The solution for the “b’s”

It should be apparent how to solve for the unknown parameters

Pre-multiply by the inverse of XX

YXXXbXXXX 11

Page 25: Mult reg

Solution ContinuedFrom the properties of Inverses we note that:

IXXXX 1

YXXXIb 1

YXXXb 1

This is the fundamental outcome of OLS theory

Page 26: Mult reg

Assessment of “Goodness-of-Fit” Use the R2 statistic

It represents the proportion of variability in response variable that is accounted for by the regression model

 1 R2 1  Good fit of model means that R-

square will be close to one.  Poor fit means that R-square will

be near 0.

Page 27: Mult reg

R2 – Multiple Coefficient of Determination

YYYY

YYYYR

ˆˆ

12

SST

SSER 12

SST

SSRR 2

Alternative Expressions

Page 28: Mult reg

Critique of R2 in Multiple Regression R2 inflated by increasing the

number of parameters in the model.

One should also analyze the residual values from the model (MSE)

Alternatively use the adjusted R2

Page 29: Mult reg

Adjusted R2

1

ˆˆ12

nYYYY

pnYYYYR

22;1 RRp

Page 30: Mult reg

How does adjusted R-square work? Total Sum-of-Squares is fixed,

because it is independent of number of variables The numerator, SSE, decreases as the

number of variables increases.  R2 artificially inflated by adding

explanatory variables to the model Use Adjusted R2 to compare different

regression  Adjusted R2 takes into account the number

of predictors in the model

Page 31: Mult reg

Statistical Inference and Hypothesis Testing Our goal may be:

1) hypothesis testing & 2) interval estimation

 Hence we will need to impose distributional limits on the residuals

 It turns out the probability distribution of the OLS estimators depends on the probability distribution of the residuals, .

Page 32: Mult reg

Recount Assumptions Normality – this means the

elements of b are normally distributed

b’s are unbiased. If these hold then we can perform

several hypothesis tests.

Page 33: Mult reg

ANOVA Approach Decomposition of total sums-of-

squares into components relating explained variance (regression) unexplained variance (error)

Page 34: Mult reg

ANOVA Table

Source of

Variation

Sums-of-Squares

df Mean Square

F-ratio

Regression

p - 1 MSR/MSE

Residual n - p

Total n - 1

2YnYXb

YXbYY

YY

1

2

p

YnYXb

pn

YXbYY

Page 35: Mult reg

Test of Null Hypothesis

 Tests the null hypothesis:

 H0: 2=3p = 0

Null hypothesis is known as a joint or simultaneous hypothesis, because it compares the values of all i simultaneously This tests overall significance of regression model

Page 36: Mult reg

The F-test statistic and R2 vary directly

pnYXbYY

pYnYXbF

12 pnSSE

pSSRF

1

pnSSRSST

pSSRF

1

11

p

pn

SSTSSR

SSTSSRF

11 2

2

p

pn

R

RF

Page 37: Mult reg

Tests of Hypotheses of true

Assume the regression coefficients are normally distributed

b N,2[]-1)

cov(b) = E(b - )(b - )= 2[]-1

Estimate of 2 is s2

pn

XbYXbYs

2

Page 38: Mult reg

Test Statistic

ii

ii

cs

bt

where cii is the element of the ith row and ith column of []-1

Follows a t distribution with n – p df.

iii cspntb

;

2

100(1-)% Confidence Interval is obtained from

Page 39: Mult reg

Model Comparisons Our interest is in parsimonious modeling

We seek a minimum set of X variables to predict variation in Y response variable.

Goal is to reduce the number of predictor variables to arrive at a more parsimonious description of the data.

Does leaving out one of the b’s significantly diminish the variance explained by the model.

Compare a Saturated to an Unsaturated model Note there are many possible Unsaturated models.

Page 40: Mult reg

General Philosophy Let SSE( r ) designate the error sum-of-squares

for reduced model  SSE( r ) SSE(f) The saturated model will contain p parameters The reduced model will contain k < p

parameters If we assume the errors are normally

distributed with mean 0 and variance sigma squared, then we can compare the two models.

Page 41: Mult reg

Model Comparison

Compare saturated model with the reduced model Use the SSE terms as the basis for comparison 

 

pnfSSE

kpfSSErSSE

)(

Follows an F-distribution, with (p – k), (n – p) dfIf Fobs > Fcritical we reject the reduced model as a parsimonious modelthe bi must be included in the model 

Hence,

Page 42: Mult reg

How Many Predictors to Retain?A short course in Model Selection Several Options

Sequential Selection Backward Selection Forward Selection Stepwise Selection

All possible subsets MAXR MINR RSQUARE ADJUSTED RSQUARE CP

Page 43: Mult reg

Sequential Methods Forward, Stepwise, Backward

selection procedures Entails “Partialling-out” the predictor

variables  Based on the partial correlation

coefficient

223

213

2313123.12

11 rr

rrrr

Page 44: Mult reg

Forward Selection Build-up” procedure. Add predictors until the “best”

regression model is obtained

Page 45: Mult reg

Outline of Forward Selection

1) No variables are included in regression equation2) Calculate correlations of all predictors with

dependent variable3) Enter predictor variable with highest correlation

into regression model if its corresponding partial F-value exceeds a predetermined threshold

4) Calculate the regression equation with the predictor

5) Select the predictor variable with the highest partial correlation to enter next.

Page 46: Mult reg

Forward Selection ContinuedCompare the partial F-test value

(called FH also known as “F-to-enter”):to a predetermined tabulated F-value

(called FC)

If FH > FC, include the variable with the highest partial correlation and return to step 5.

If FH < FC, stop and retain the regression equation as calculated

Page 47: Mult reg

Backward Selection A “deconstruction” approach Begin with the saturated (full) regression model Compute the drop in R2 as a consequence of

eliminating each predictor variable, and the partial F-test value; treat as if the variable was the last to enter the regression equation

Compare the lowest partial F-test value, (designated FL), to the critical value of F (designated FC)

a. If FL < FC, remove the variable recompute the regression equation using the remaining predictor variables and return to step 2.

b. FL < FC, adopt the regression equation as calculated

Page 48: Mult reg

Stepwise Selection Calculate correlations of all predictors with response

variable Select the predictor variable with highest correlation.

Regress Y on Xi. Retain the predictor if there is a significant F-test value.

Calculate partial correlations of all variable not in equation with response variable. Select next predictor to enter that has the highest partial correlation. Call this predictor Xj.

Compute the regression equation with both Xi and Xj entered. Retain Xj if its partial F-value exceeds the tabulated F (1, n-2-1) df. 

Now determine whether Xi warrants retention. Compare its partial F-value as if Xj was entered into the equation first.

Page 49: Mult reg

Stepwise Continued Retain if its F-value exceeds the tabulated F

value  Enter a new Xk variable. Compute

regression with three predictors. Compute partial F-values for Xi, Xj and Xk.

Determine whether any should be retained by comparing observed partial F with the critical F.

6) Retain regression equation when no other predictor can be entered or removed from the model.

Page 50: Mult reg

All possible subsets

s2 is residual variance for reduced model and 2 is the residual variance for full model

All subset regressions compute possible 1, 2, 3, … variable models given some optimality criterion.

Requires use of optimality criterion, e.g., Mallow’s Cp

2

22

ˆ

ˆ

pns

pC p (p = k + 1)

Page 51: Mult reg

Mallow’s Cp

Measures total squared error Choose model where Cp ~ p