1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n...

18
1 © 2008 Thomson South-Western. All Rights Reserved © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Chapter 15 Multiple Regression Multiple Regression Multiple Regression Multiple Regression Model Model Least Squares Method Least Squares Method Multiple Coefficient of Multiple Coefficient of Determination Determination Model Assumptions Model Assumptions Testing for Testing for Significance Significance Using the Estimated Regression Using the Estimated Regression Equation Equation for Estimation and Prediction for Estimation and Prediction Qualitative Independent Qualitative Independent Variables Variables

Transcript of 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n...

Page 1: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

1 1 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Chapter 15Chapter 15 Multiple Regression Multiple Regression

Multiple Regression ModelMultiple Regression Model Least Squares MethodLeast Squares Method Multiple Coefficient of Multiple Coefficient of

DeterminationDetermination Model AssumptionsModel Assumptions Testing for SignificanceTesting for Significance Using the Estimated Regression Using the Estimated Regression

EquationEquation

for Estimation and Predictionfor Estimation and Prediction Qualitative Independent Qualitative Independent VariablesVariables

Page 2: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

2 2 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

The equation that describes how the The equation that describes how the dependent variable dependent variable yy is related to the is related to the independent variables independent variables xx11, , xx22, . . . , . . . xxpp and an error and an error term is:term is:

Multiple Regression ModelMultiple Regression Model

yy = = 00 + + 11xx11 + + 22xx2 2 ++ . . . + . . . + ppxxpp + +

where:where:00, , 11, , 22, . . . , , . . . , pp are the are the parametersparameters, and, and is a random variable called the is a random variable called the error termerror term

Multiple Regression ModelMultiple Regression Model

Page 3: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

3 3 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

The equation that describes how the The equation that describes how the mean value of mean value of yy is related to is related to xx11, , xx22, . . . , . . . xxpp is:is:

Multiple Regression EquationMultiple Regression Equation

EE((yy) = ) = 00 + + 11xx1 1 + + 22xx2 2 + . . . + + . . . + ppxxpp

Multiple Regression EquationMultiple Regression Equation

Page 4: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

4 4 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

A simple random sample is used to compute A simple random sample is used to compute sample statistics sample statistics bb00, , bb11, , bb22, , . . . , . . . , bbpp that are that are used as the point estimators of the parameters used as the point estimators of the parameters 00, , 11, , 22, . . . , , . . . , pp..

Estimated Multiple Regression EquationEstimated Multiple Regression Equation

^̂yy = = bb00 + + bb11xx1 1 + + bb22xx2 2 + . . . + + . . . + bbppxxpp

Estimated Multiple Regression EquationEstimated Multiple Regression Equation

Page 5: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

5 5 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Estimation ProcessEstimation Process

Multiple Regression ModelMultiple Regression Model

yy = = 00 + + 11xx1 1 + + 22xx2 2 +. . .+ +. . .+ ppxxpp + + Multiple Regression EquationMultiple Regression Equation

EE((yy) = ) = 00 + + 11xx1 1 + + 22xx2 2 +. . .+ +. . .+ ppxxpp Unknown parameters areUnknown parameters are

00, , 11, , 22, . . . , , . . . , pp

Sample Data:Sample Data:xx11 x x22 . . . x . . . xpp y y. . . .. . . .. . . .. . . .

0 1 1 2 2ˆ ... p py b b x b x b x 0 1 1 2 2ˆ ... p py b b x b x b x

Estimated MultipleEstimated MultipleRegression EquationRegression Equation

Sample statistics areSample statistics are

bb00, , bb11, , bb22, , . . . , . . . , bbp p

bb00, , bb11, , bb22, , . . . , . . . , bbpp

provide estimates ofprovide estimates of00, , 11, , 22, . . . , , . . . , pp

Page 6: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

6 6 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Least Squares MethodLeast Squares Method

Least Squares CriterionLeast Squares Criterion

2ˆmin ( )i iy y 2ˆmin ( )i iy y

Computation of Coefficient ValuesComputation of Coefficient Values

The formulas for the regression coefficientsThe formulas for the regression coefficients

bb00, , bb11, , bb22, . . . , . . . bbp p involve the use of matrix algebra. involve the use of matrix algebra.

We will rely on computer software packages toWe will rely on computer software packages to

perform the calculations.perform the calculations.

Page 7: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

7 7 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Interpreting the CoefficientsInterpreting the Coefficients

In multiple regression analysis, we In multiple regression analysis, we interpret eachinterpret each

regression coefficient as follows:regression coefficient as follows: bbii represents an estimate of the change in represents an estimate of the change in yy corresponding to a 1-unit increase in corresponding to a 1-unit increase in xxii when all when all other independent variables are held constant.other independent variables are held constant.

Page 8: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

8 8 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Multiple Coefficient of DeterminationMultiple Coefficient of Determination

Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE

where:where: SST = total sum of squaresSST = total sum of squares SSR = sum of squares due to regressionSSR = sum of squares due to regression SSE = sum of squares due to errorSSE = sum of squares due to error

SST = SSR + SST = SSR + SSE SSE

2( )iy y 2( )iy y 2ˆ( )iy y 2ˆ( )iy y 2ˆ( )i iy y 2ˆ( )i iy y== ++

Page 9: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

9 9 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

The variance of The variance of , denoted by , denoted by 22, is the same for all, is the same for all values of the independent variables.values of the independent variables. The variance of The variance of , denoted by , denoted by 22, is the same for all, is the same for all values of the independent variables.values of the independent variables.

The error The error is a normally distributed random variable is a normally distributed random variable reflecting the deviation between the reflecting the deviation between the yy value and the value and the expected value of expected value of yy given by given by 00 + + 11xx1 1 + + 22xx2 2 + . . + + . . + ppxxpp..

The error The error is a normally distributed random variable is a normally distributed random variable reflecting the deviation between the reflecting the deviation between the yy value and the value and the expected value of expected value of yy given by given by 00 + + 11xx1 1 + + 22xx2 2 + . . + + . . + ppxxpp..

Assumptions About the Error Term Assumptions About the Error Term

The error The error is a random variable with mean of zero. is a random variable with mean of zero. The error The error is a random variable with mean of zero. is a random variable with mean of zero.

The values of The values of are independent. are independent. The values of The values of are independent. are independent.

Page 10: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

10 10 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

In simple linear regression, the In simple linear regression, the FF and and tt tests provide tests provide the same conclusion.the same conclusion. In simple linear regression, the In simple linear regression, the FF and and tt tests provide tests provide the same conclusion.the same conclusion.

Testing for SignificanceTesting for Significance

In multiple regression, the In multiple regression, the FF and and tt tests have different tests have different purposes.purposes. In multiple regression, the In multiple regression, the FF and and tt tests have different tests have different purposes.purposes.

Page 11: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

11 11 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Testing for Significance: Testing for Significance: F F Test Test

The The FF test is referred to as the test is referred to as the test for overalltest for overall significancesignificance.. The The FF test is referred to as the test is referred to as the test for overalltest for overall significancesignificance..

The The FF test is used to determine whether a significant test is used to determine whether a significant relationship exists between the dependent variablerelationship exists between the dependent variable and the set of and the set of all the independent variablesall the independent variables..

The The FF test is used to determine whether a significant test is used to determine whether a significant relationship exists between the dependent variablerelationship exists between the dependent variable and the set of and the set of all the independent variablesall the independent variables..

Page 12: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

12 12 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

A separate A separate tt test is conducted for each of the test is conducted for each of the independent variables in the model.independent variables in the model. A separate A separate tt test is conducted for each of the test is conducted for each of the independent variables in the model.independent variables in the model.

If the If the FF test shows an overall significance, the test shows an overall significance, the tt test is test is used to determine whether each of the individualused to determine whether each of the individual independent variables is significant.independent variables is significant.

If the If the FF test shows an overall significance, the test shows an overall significance, the tt test is test is used to determine whether each of the individualused to determine whether each of the individual independent variables is significant.independent variables is significant.

Testing for Significance: Testing for Significance: t t Test Test

We refer to each of these We refer to each of these tt tests as a tests as a test for individualtest for individual significancesignificance.. We refer to each of these We refer to each of these tt tests as a tests as a test for individualtest for individual significancesignificance..

Page 13: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

13 13 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Testing for Significance: Testing for Significance: F F Test Test

HypothesesHypotheses

Rejection RuleRejection Rule

Test StatisticsTest Statistics

HH00: : 11 = = 2 2 = . . . = = . . . = p p = 0= 0

HHaa: One or more of the parameters: One or more of the parameters

is not equal to zero.is not equal to zero.

FF = MSR/MSE = MSR/MSE

Reject Reject HH00 if if pp-value -value << or if or if FF > > FF

where where FF is based on an is based on an FF distribution distribution

with with pp d.f. in the numerator and d.f. in the numerator and

nn - - pp - 1 d.f. in the denominator. - 1 d.f. in the denominator.

Page 14: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

14 14 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Testing for Significance: Testing for Significance: t t Test Test

HypothesesHypotheses

Rejection RuleRejection Rule

Test StatisticsTest Statistics

Reject Reject HH00 if if pp-value -value << or or

if if tt << - -ttor or tt >> ttwhere where tt

is based on a is based on a t t distribution distribution

with with nn - - pp - 1 degrees of freedom. - 1 degrees of freedom.

tbs

i

bi

tbs

i

bi

0 : 0iH 0 : 0iH

: 0a iH : 0a iH

Page 15: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

15 15 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Testing for Significance: Multicollinearity Testing for Significance: Multicollinearity

The term The term multicollinearitymulticollinearity refers to the correlation refers to the correlation among the independent variables.among the independent variables. The term The term multicollinearitymulticollinearity refers to the correlation refers to the correlation among the independent variables.among the independent variables.

When the independent variables are highly correlatedWhen the independent variables are highly correlated (say, |(say, |r r | > .7), it is not possible to determine the| > .7), it is not possible to determine the separate effect of any particular independent variableseparate effect of any particular independent variable on the dependent variable.on the dependent variable.

When the independent variables are highly correlatedWhen the independent variables are highly correlated (say, |(say, |r r | > .7), it is not possible to determine the| > .7), it is not possible to determine the separate effect of any particular independent variableseparate effect of any particular independent variable on the dependent variable.on the dependent variable.

Page 16: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

16 16 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Testing for Significance: Multicollinearity Testing for Significance: Multicollinearity

Every attempt should be made to avoid includingEvery attempt should be made to avoid including independent variables that are highly correlated.independent variables that are highly correlated. Every attempt should be made to avoid includingEvery attempt should be made to avoid including independent variables that are highly correlated.independent variables that are highly correlated.

If the estimated regression equation is to be used onlyIf the estimated regression equation is to be used only for predictive purposes, multicollinearity is usuallyfor predictive purposes, multicollinearity is usually not a serious problem.not a serious problem.

If the estimated regression equation is to be used onlyIf the estimated regression equation is to be used only for predictive purposes, multicollinearity is usuallyfor predictive purposes, multicollinearity is usually not a serious problem.not a serious problem.

Page 17: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

17 17 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

Using the Estimated Regression EquationUsing the Estimated Regression Equationfor Estimation and Predictionfor Estimation and Prediction

The procedures for estimating the mean value of The procedures for estimating the mean value of yy and predicting an individual value of and predicting an individual value of y y in multiple in multiple regression are similar to those in simple regression.regression are similar to those in simple regression.

The procedures for estimating the mean value of The procedures for estimating the mean value of yy and predicting an individual value of and predicting an individual value of y y in multiple in multiple regression are similar to those in simple regression.regression are similar to those in simple regression.

We substitute the given values of We substitute the given values of xx11, , xx22, . . . , , . . . , xxpp into into the estimated regression equation and use thethe estimated regression equation and use the corresponding value of corresponding value of yy as the point estimate. as the point estimate.

We substitute the given values of We substitute the given values of xx11, , xx22, . . . , , . . . , xxpp into into the estimated regression equation and use thethe estimated regression equation and use the corresponding value of corresponding value of yy as the point estimate. as the point estimate.

Page 18: 1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

18 18 Slide

Slide

© 2008 Thomson South-Western. All Rights Reserved© 2008 Thomson South-Western. All Rights Reserved

In many situations we must work with In many situations we must work with qualitativequalitative independent variablesindependent variables such as gender (male, female),such as gender (male, female), method of payment (cash, check, credit card), etc.method of payment (cash, check, credit card), etc.

In many situations we must work with In many situations we must work with qualitativequalitative independent variablesindependent variables such as gender (male, female),such as gender (male, female), method of payment (cash, check, credit card), etc.method of payment (cash, check, credit card), etc.

For example, For example, xx22 might represent gender where might represent gender where xx22 = 0 = 0 indicates male and indicates male and xx22 = 1 indicates female. = 1 indicates female. For example, For example, xx22 might represent gender where might represent gender where xx22 = 0 = 0 indicates male and indicates male and xx22 = 1 indicates female. = 1 indicates female.

Qualitative Independent VariablesQualitative Independent Variables

In this case, In this case, xx22 is called a is called a dummy or indicator variabledummy or indicator variable.. In this case, In this case, xx22 is called a is called a dummy or indicator variabledummy or indicator variable..