Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple...

33
1 1 Slide Slide Department of Quantitative Methods & Information Systems Multiple Regression ECON 504 Chapter 3 Dr. Mohammad Zainal Fall 2013 2 Slide Slide Multiple Regression Multiple Regression Model Least Squares Method Multiple Coefficient of Determination Model Assumptions Testing for Significance Using the Estimated Regression Equation for Estimation and Prediction Categorical Independent Variables Residual Analysis Logistic Regression

Transcript of Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple...

Page 1: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

1

11SlideSlide

Department of Quantitative Methods & Information Systems

Multiple RegressionECON 504

Chapter 3

Dr. Mohammad ZainalFall 2013

22SlideSlide

Multiple Regression

Multiple Regression Model

Least Squares Method

Multiple Coefficient of Determination

Model Assumptions

Testing for Significance

Using the Estimated Regression Equationfor Estimation and Prediction

Categorical Independent Variables

Residual Analysis

Logistic Regression

Page 2: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

2

33SlideSlide

In this chapter we continue our study of regression analysis by considering situations involving two or more independent variables.

Multiple Regression

This subject area, called multiple regressionanalysis, enables us to consider more factors and thus obtain better estimates than are possible with simple linear regression.

44SlideSlide

The equation that describes how the dependent variable y is related to the independent variables x1, x2, . . . xp and an error term is:

Multiple Regression Model

y = 0 + 1x1 + 2x2 + . . . + pxp +

where:0, 1, 2, . . . , p are the parameters, and is a random variable called the error term

Multiple Regression Model

Page 3: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

3

55SlideSlide

The equation that describes how the mean value of y is related to x1, x2, . . . xp is:

Multiple Regression Equation

E(y) = 0 + 1x1 + 2x2 + . . . + pxp

Multiple Regression Equation

66SlideSlide

A simple random sample is used to compute sample statistics b0, b1, b2, . . . , bp that are used as the point estimators of the parameters 0, 1, 2, . . . , p.

Estimated Multiple Regression Equation

y = b0 + b1x1 + b2x2 + . . . + bpxp

Estimated Multiple Regression Equation

Page 4: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

4

77SlideSlide

Estimation Process

Multiple Regression ModelE(y) = 0 + 1x1 + 2x2 +. . .+ pxp +

Multiple Regression EquationE(y) = 0 + 1x1 + 2x2 +. . .+ pxp

Unknown parameters are0, 1, 2, . . . , p

Sample Data:x1 x2 . . . xp y. . . .. . . .

0 1 1 2 2ˆ ... p py b b x b x b x 0 1 1 2 2ˆ ... p py b b x b x b x

Estimated MultipleRegression Equation

Sample statistics areb0, b1, b2, . . . , bp

b0, b1, b2, . . . , bpprovide estimates of0, 1, 2, . . . , p

88SlideSlide

Least Squares Method

Least Squares Criterion

2ˆmin ( )i iy y 2ˆmin ( )i iy y

Computation of Coefficient Values

The formulas for the regression coefficientsb0, b1, b2, . . . bp involve the use of matrix algebra. We will rely on computer software packages toperform the calculations.

Page 5: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

5

99SlideSlide

Least Squares Method

Computation of Coefficient Values

The formulas for the regression coefficientsb0, b1, b2, . . . bp involve the use of matrix algebra. We will rely on computer software packages toperform the calculations.

The emphasis will be on how to interpret thecomputer output rather than on how to make themultiple regression computations.

1010SlideSlide

The years of experience, score on the aptitude testtest, and corresponding annual salary ($1000s) for asample of 20 programmers is shown on the next slide.

Example: Programmer Salary Survey

Multiple Regression Model

A software firm collected data for a sample of 20computer programmers. A suggestion was made thatregression analysis could be used to determine if salary was related to the years of experience and thescore on the firm’s programmer aptitude test.

Page 6: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

6

1111SlideSlide

47158100166

92105684633

781008682868475808391

88737581748779947089

24.043.023.734.335.838.022.223.130.033.0

38.026.636.231.629.034.030.133.928.230.0

Exper.(Yrs.)

TestScore

TestScore

Exper.(Yrs.)

Salary($000s)

Salary($000s)

Multiple Regression Model

1212SlideSlide

Suppose we believe that salary (y) is related tothe years of experience (x1) and the score on theprogrammer aptitude test (x2) by the following regression model:

Multiple Regression Model

wherey = annual salary ($000)x1 = years of experiencex2 = score on programmer aptitude test

y = 0 + 1x1 + 2x2 +

Page 7: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

7

1313SlideSlide

Solving for the Estimates of 0, 1, 2

Input DataLeast Squares

Output

x1 x2 y

4 78 247 100 43. . .. . .3 89 30

ComputerPackage

for SolvingMultiple

RegressionProblems

b0 = b1 =b2 =

R2 =

etc.

1414SlideSlide

Regression Equation Output

Solving for the Estimates of 0, 1, 2

Coef SE Coef T p

Constant 3.17394 6.15607 0.5156 0.61279Experience 1.4039 0.19857 7.0702 1.9E-06Test Score 0.25089 0.07735 3.2433 0.00478

Predictor

Page 8: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

8

1515SlideSlide

Estimated Regression Equation

SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE)

Note: Predicted salary will be in thousands of dollars.

1616SlideSlide

Interpreting the Coefficients

In multiple regression analysis, we interpret eachregression coefficient as follows:

bi represents an estimate of the change in ycorresponding to a 1-unit increase in xi when allother independent variables are held constant.

Page 9: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

9

1717SlideSlide

Salary is expected to increase by $1,404 for each additional year of experience (when the variablescore on programmer attitude test is held constant).

b1 = 1.404

Interpreting the Coefficients

1818SlideSlide

Salary is expected to increase by $251 for eachadditional point scored on the programmer aptitudetest (when the variable years of experience is heldconstant).

b2 = 0.251

Interpreting the Coefficients

Page 10: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

10

1919SlideSlide

Multiple Coefficient of Determination

Relationship Among SST, SSR, SSE

where:SST = total sum of squaresSSR = sum of squares due to regressionSSE = sum of squares due to error

SST = SSR + SSE

2( )iy y 2( )iy y 2ˆ( )iy y 2ˆ( )iy y 2ˆ( )i iy y 2ˆ( )i iy y= +

2020SlideSlide

ANOVA Output

Multiple Coefficient of Determination

Analysis of Variance

DF SS MS F PRegression 2 500.3285 250.164 42.76 0.000Residual Error 17 99.45697 5.850Total 19 599.7855

SOURCE

SSTSSR

Page 11: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

11

2121SlideSlide

Multiple Coefficient of Determination

R2 = 500.3285/599.7855 = .83418

R2 = SSR/SST

2222SlideSlide

Adjusted Multiple Coefficientof Determination

Adding independent variables, even ones that are not statistically significant, causes the prediction errors to become smaller, thus reducing the sum of squares due to error, SSE.

Because SSR = SST – SSE, when SSE becomes smaller, SSR becomes larger, causing R2 = SSR/SST to increase.

The adjusted multiple coefficient of determinationcompensates for the number of independent variables in the model.

Page 12: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

12

2323SlideSlide

Adjusted Multiple Coefficientof Determination

R Rn

n pa2 21 1

11

( )R R

nn pa

2 21 11

1

( )

2 20 11 (1 .834179) .814671

20 2 1aR

2 20 11 (1 .834179) .814671

20 2 1aR

2424SlideSlide

The variance of , denoted by 2, is the same for allvalues of the independent variables.

The error is a normally distributed random variablereflecting the deviation between the y value and theexpected value of y given by 0 + 1x1 + 2x2 + . . + pxp.

Assumptions About the Error Term

The error is a random variable with mean of zero.

The values of are independent.

Page 13: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

13

2525SlideSlide

In simple linear regression, the F and t tests providethe same conclusion.

Testing for Significance

In multiple regression, the F and t tests have differentpurposes.

2626SlideSlide

Testing for Significance: F Test

The F test is referred to as the test for overallsignificance.

The F test is used to determine whether a significantrelationship exists between the dependent variableand the set of all the independent variables.

Page 14: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

14

2727SlideSlide

A separate t test is conducted for each of theindependent variables in the model.

If the F test shows an overall significance, the t test isused to determine whether each of the individualindependent variables is significant.

Testing for Significance: t Test

We refer to each of these t tests as a test for individualsignificance.

2828SlideSlide

Testing for Significance: F Test

Hypotheses

Rejection Rule

Test Statistics

H0: 1 = 2 = . . . = p = 0Ha: One or more of the parameters

is not equal to zero.

F = MSR/MSE

Reject H0 if p-value < or if F > F where F is based on an F distributionwith p d.f. in the numerator andn - p - 1 d.f. in the denominator.

Page 15: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

15

2929SlideSlide

F Test for Overall Significance

Hypotheses H0: 1 = 2 = 0Ha: One or both of the parameters

is not equal to zero.

Rejection Rule For = .05 and d.f. = 2, 17; F.05 = 3.59Reject H0 if p-value < .05 or F > 3.59

3030SlideSlide

ANOVA Output

F Test for Overall Significance

Analysis of Variance

DF SS MS F PRegression 2 500.3285 250.164 42.76 0.000Residual Error 17 99.45697 5.850Total 19 599.7855

SOURCE

p-value used to test foroverall significance

Page 16: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

16

3131SlideSlide

F Test for Overall Significance

Test Statistics F = MSR/MSE= 250.16/5.85 = 42.76

Conclusion p-value < .05, so we can reject H0.(Also, F = 42.76 > 3.59)

3232SlideSlide

Testing for Significance: t Test

Hypotheses

Rejection Rule

Test Statistics

Reject H0 if p-value < orif t < -tor t > t where tis based on a t distributionwith n - p - 1 degrees of freedom.

tbs

i

bi

tbs

i

bi

0 : 0iH 0 : 0iH : 0a iH : 0a iH

Page 17: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

17

3333SlideSlide

t Test for Significanceof Individual Parameters

Hypotheses

Rejection Rule For = .05 and d.f. = 17, t.025 = 2.11Reject H0 if p-value < .05, or

if t < -2.11 or t > 2.11

0 : 0iH 0 : 0iH : 0a iH : 0a iH

3434SlideSlide

Coef SE Coef T p

Constant 3.17394 6.15607 0.5156 0.61279Experience 1.4039 0.19857 7.0702 1.9E-06Test Score 0.25089 0.07735 3.2433 0.00478

Predictor

Regression Equation Output

t Test for Significanceof Individual Parameters

t statistic and p-value used to test for the individual significance of “Experience”

Page 18: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

18

3535SlideSlide

t Test for Significanceof Individual Parameters

bsb

1

1

1 40391986

7 07 ..

.bsb

1

1

1 40391986

7 07 ..

.bsb

1

1

1 40391986

7 07 ..

.bsb

1

1

1 40391986

7 07 ..

.

bsb

2

2

2508907735

3 24 ..

.bsb

2

2

2508907735

3 24 ..

.bsb

2

2

2508907735

3 24 ..

.bsb

2

2

2508907735

3 24 ..

.

Test Statistics

Conclusions Reject both H0: 1 = 0 and H0: 2 = 0.Both independent variables aresignificant.

3636SlideSlide

Testing for Significance: Multicollinearity

The term multicollinearity refers to the correlationamong the independent variables.

When the independent variables are highly correlated(say, |r | > .7), it is not possible to determine theseparate effect of any particular independent variableon the dependent variable.

Page 19: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

19

3737SlideSlide

Testing for Significance: Multicollinearity

Every attempt should be made to avoid includingindependent variables that are highly correlated.

If the estimated regression equation is to be used onlyfor predictive purposes, multicollinearity is usuallynot a serious problem.

3838SlideSlide

Using the Estimated Regression Equationfor Estimation and Prediction

The procedures for estimating the mean value of yand predicting an individual value of y in multipleregression are similar to those in simple regression.

We substitute the given values of x1, x2, . . . , xp intothe estimated regression equation and use thecorresponding value of y as the point estimate.

Page 20: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

20

3939SlideSlide

Using the Estimated Regression Equationfor Estimation and Prediction

Software packages for multiple regression will oftenprovide these interval estimates.

The formulas required to develop interval estimatesfor the mean value of y and for an individual valueof y are beyond the scope of the textbook.

^

4040SlideSlide

In many situations we must work with categoricalindependent variables such as gender (male, female),method of payment (cash, check, credit card), etc.

For example, x2 might represent gender where x2 = 0indicates male and x2 = 1 indicates female.

Categorical Independent Variables

In this case, x2 is called a dummy or indicator variable.

Page 21: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

21

4141SlideSlide

The years of experience, the score on the programmer aptitude test, whether the individual hasa relevant graduate degree, and the annual salary($000) for each of the sampled 20 programmers areshown on the next slide.

Categorical Independent Variables

Example: Programmer Salary SurveyAs an extension of the problem involving the

computer programmer salary survey, suppose thatmanagement also believes that the annual salary isrelated to whether the individual has a graduate degree in computer science or information systems.

4242SlideSlide

47158100166

92105684633

781008682868475808391

88737581748779947089

24.043.023.734.335.838.022.223.130.033.0

38.026.636.231.629.034.030.133.928.230.0

Exper.(Yrs.)

TestScore

TestScore

Exper.(Yrs.)

Salary($000s)

Salary($000s)Degr.

NoYesNoYesYesYesNoNoNoYes

Degr.

YesNoYesNoNoYesNoYesNoNo

Categorical Independent Variables

Page 22: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

22

4343SlideSlide

Estimated Regression Equation

^

where:

y = annual salary ($1000)x1 = years of experiencex2 = score on programmer aptitude testx3 = 0 if individual does not have a graduate degree

1 if individual does have a graduate degree

x3 is a dummy variable

y = b0 + b1x1 + b2x2 + b3x3^

4444SlideSlide

ANOVA Output

Analysis of Variance

DF SS MS F PRegression 3 507.8960 269.299 29.48 0.000Residual Error 16 91.8895 5.743Total 19 599.7855

SOURCE

Categorical Independent Variables

R2 = 507.896/599.7855 = .8468

2 20 1

1 (1 .8468) .818120 3 1aR

2 20 1

1 (1 .8468) .818120 3 1aR

Previously,R Square = .8342

Previously,Adjusted

R Square = .815

Page 23: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

23

4545SlideSlide

Coef SE Coef T p

Constant 7.945 7.382 1.076 0.298Experience 1.148 0.298 3.856 0.001Test Score 0.197 0.090 2.191 0.044

Predictor

Regression Equation Output

Categorical Independent Variables

Grad. Degr. 2.280 1.987 1.148 0.268

Not significant

4646SlideSlide

More Complex Categorical Variables

If a categorical variable has k levels, k - 1 dummyvariables are required, with each dummy variablebeing coded as 0 or 1.

For example, a variable with levels A, B, and C couldbe represented by x1 and x2 values of (0, 0) for A, (1, 0)for B, and (0,1) for C.

Care must be taken in defining and interpreting thedummy variables.

Page 24: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

24

4747SlideSlide

For example, a variable indicating level of education could be represented by x1 and x2 values as follows:

More Complex Categorical Variables

HighestDegree x1 x2

Bachelor’s 0 0Master’s 1 0Ph.D. 0 1

4848SlideSlide

Residual Analysis

yy

For simple linear regression the residual plot againstand the residual plot against x provide the same

information.

yy In multiple regression analysis it is preferable to use

the residual plot against to determine if the model assumptions are satisfied.

Page 25: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

25

4949SlideSlide

Standardized Residual Plot Against y

Standardized residuals are frequently used in residual plots for purposes of:• Identifying outliers (typically, standardized

residuals < -2 or > +2)• Providing insight about the assumption that the

error term has a normal distribution The computation of the standardized residuals in

multiple regression analysis is too complex to be done by hand

Excel’s Regression tool can be used

5050SlideSlide

Residual Output

Standardized Residual Plot Against y

Observation Predicted Y Residuals Standard Residuals1 27.89626 -3.89626 -1.7717072 37.95204 5.047957 2.2954063 26.02901 -2.32901 -1.0590484 32.11201 2.187986 0.9949215 36.34251 -0.54251 -0.246689

Page 26: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

26

5151SlideSlide

Standardized Residual Plot Against y

Standardized Residual Plot

Standardized Residual Plot

-2

-1

0

1

2

3

0 10 20 30 40 50

Predicted Salary

Sta

nd

ard

R

esid

ual

s

-3

Outlier

5252SlideSlide

Logistic Regression

Logistic regression can be used to model situations in which the dependent variable, y, may only assume two discrete values, such as 0 and 1.

In many ways logistic regression is like ordinary regression. It requires a dependent variable, y, and one or more independent variables.

The ordinary multiple regression model is not applicable.

Page 27: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

27

5353SlideSlide

Logistic Regression

Logistic Regression Equation

The relationship between E(y) and x1, x2, . . . , xp isbetter described by the following nonlinear equation.

0 1 1 2 2

0 1 1 2 2( )

1

p p

p p

x x x

x x x

eE y

e

0 1 1 2 2

0 1 1 2 2( )

1

p p

p p

x x x

x x x

eE y

e

5454SlideSlide

Logistic Regression

Interpretation of E(y) as aProbability in Logistic Regression

If the two values of y are coded as 0 or 1, the valueof E(y) provides the probability that y = 1 given aparticular set of values for x1, x2, . . . , xp.

1 2( ) estimate of ( 1| , , , )pE y P y x x x 1 2( ) estimate of ( 1| , , , )pE y P y x x x

Page 28: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

28

5555SlideSlide

Logistic Regression

Estimated Logistic Regression Equation

A simple random sample is used to compute sample statistics b0, b1, b2, . . . , bp that are used as the point estimators of the parameters 0, 1, 2, . . . , p.

0 1 1 2 2

0 1 1 2 2ˆ

1

p p

p p

b b x b x b x

b b x b x b x

ey

e

0 1 1 2 2

0 1 1 2 2ˆ

1

p p

p p

b b x b x b x

b b x b x b x

ey

e

5656SlideSlide

Logistic Regression

Example: Simmons Stores

Simmons’ catalogs are expensive and Simmonswould like to send them to only those customers whohave the highest probability of making a $200 purchaseusing the discount coupon included in the catalog.

Simmons’ management thinks that annual spendingat Simmons Stores and whether a customer has aSimmons credit card are two variables that might behelpful in predicting whether a customer who receivesthe catalog will use the coupon to make a $200purchase.

Page 29: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

29

5757SlideSlide

Logistic Regression

Example: Simmons Stores

Simmons conducted a study by sending out 100catalogs, 50 to customers who have a Simmons creditcard and 50 to customers who do not have the card.At the end of the test period, Simmons noted for each ofthe 100 customers:

1) the amount the customer spent last year at Simmons,2) whether the customer had a Simmons credit card, and3) whether the customer made a $200 purchase.

A portion of the test data is shown on the next slide.

5858SlideSlide

Logistic Regression

Simmons Test Data (partial)

Customer

123456789

10

Annual Spending($1000)

2.2913.2152.1353.9242.5282.4732.3847.0761.1823.345

SimmonsCredit Card

1110100010

$200Purchase

0000010010

yx2x1

Page 30: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

30

5959SlideSlide

Logistic Regression

ConstantSpendingCard

-2.14640.34161.0987

0.57720.12870.4447

0.0000.0080.013

Predictor Coef SE Coef p

1.41

3.00

OddsRatio

95% CILower Upper

1.091.25

Simmons Logistic Regression Table (using Minitab)

-3.722.662.47

Z

Log-Likelihood = -60.487Test that all slopes are zero: G = 13.628, DF = 2, P-Value = 0.001

1.817.17

6060SlideSlide

Logistic Regression

Simmons Estimated Logistic Regression Equation

1 2

1 2

2.1464 0.3416 1.0987

2.1464 0.3416 1.0987ˆ

1

x x

x x

ey

e

1 2

1 2

2.1464 0.3416 1.0987

2.1464 0.3416 1.0987ˆ

1

x x

x x

ey

e

Page 31: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

31

6161SlideSlide

Logistic Regression

Using the Estimated Logistic Regression Equation

• For customers that spend $2000 annuallyand do not have a Simmons credit card:

• For customers that spend $2000 annuallyand do have a Simmons credit card:

2.1464 0.3416(2) 1.0987(0)

2.1464 0.3416(2) 1.0987(0)ˆ 0.1880

1e

ye

2.1464 0.3416(2) 1.0987(0)

2.1464 0.3416(2) 1.0987(0)ˆ 0.1880

1e

ye

2.1464 0.3416(2) 1.0987(1)

2.1464 0.3416(2) 1.0987(1)ˆ 0.4099

1e

ye

2.1464 0.3416(2) 1.0987(1)

2.1464 0.3416(2) 1.0987(1)ˆ 0.4099

1e

ye

6262SlideSlide

Logistic Regression

Testing for Significance

H0: 1 = 2 = 0Ha: One or both of the parameters

is not equal to zero.

Hypotheses

Rejection Rule

Test Statistics z = bi/sbi

Reject H0 if p-value <

Page 32: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

32

6363SlideSlide

Logistic Regression

Testing for Significance

Conclusions For independent variable x1:z = 2.66 and the p-value Hence, 1 = 0. In other words,x1 is statistically significant.

For independent variable x2:z = 2.47 and the p-value Hence, 2 = 0. In other words,x2 is also statistically significant.

6464SlideSlide

Logistic Regression

Odds in Favor of an Event Occurring

Odds Ratio

1

0

oddsOdds Ratio

odds 1

0

oddsOdds Ratio

odds

1 2 1 2

1 2 1 2

( 1| , , , ) ( 1| , , , )odds

( 0| , , , ) 1 ( 1| , , , )p p

p p

P y x x x P y x x x

P y x x x P y x x x

1 2 1 2

1 2 1 2

( 1| , , , ) ( 1| , , , )odds

( 0| , , , ) 1 ( 1| , , , )p p

p p

P y x x x P y x x x

P y x x x P y x x x

Page 33: Multiple RegressionTest Score Test Score Exper. (Yrs.) Salary ($000s) Salary ($000s) Multiple Regression Model Slide 12 Suppose we believe that salary (y) is related to the years of

33

6565SlideSlide

Logistic Regression

Estimated Probabilities

CreditCard

Yes

No

$1000 $2000 $3000 $4000 $5000 $6000 $7000

Annual Spending

0.3305 0.4099 0.4943 0.5791 0.6594 0.7315 0.7931

0.1413 0.1880 0.2457 0.3144 0.3922 0.4759 0.5610

Computedearlier

6666SlideSlide

Logistic Regression

Comparing Odds

Suppose we want to compare the odds of making a$200 purchase for customers who spend $2000 annuallyand have a Simmons credit card to the odds of making a$200 purchase for customers who spend $2000 annuallyand do not have a Simmons credit card.

1

.4099estimate of odds .6946

1 - .4099 1

.4099estimate of odds .6946

1 - .4099

0

.1880estimate of odds .2315

1 - .1880 0

.1880estimate of odds .2315

1 - .1880

.6946Estimate of odds ratio 3.00

.2315

.6946Estimate of odds ratio 3.00

.2315