Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

46
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categori cal Explanat ory Variable s

Transcript of Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Page 1: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 1

Chapter 25Categorical Explanatory Variables

Page 2: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 2

25.1 Two-Sample Comparisons

Does Wal-Mart discriminate against female employees? Are they paid less than men?

Use multiple regression with a categorical explanatory variable representing gender to analyze pay data.

Regression analysis can adjust the comparison between men and women to account for other variables that may affect pay.

Page 3: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 3

25.1 Two-Sample Comparison

Example: Mid-Level Managers’ Salaries

The average salary for women is $140,000 and the average salary for men is $144,700.

Page 4: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 4

25.1 Two-Sample Comparison

Example: Mid-Level Managers’ Salaries

The 95% confidence for the difference in mean salaries is $740 to $8,590 (since 0 is not in this interval, the difference is significant).

Assume conditions for inference are satisfied.

Page 5: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 5

25.1 Two-Sample Comparison

Confounding Variables

Without a randomized experiment, we must be careful about lurking variables that would account for the significant difference between average salaries (e.g., experience).

Experience is a confounding variable if it is correlated with salary and the two groups (men and women) differ with regard to experience.

Page 6: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 6

25.1 Two-Sample Comparison

Subsets and Confounding

Restrict analysis to a subset of cases with matching levels of the confounding variable (e.g., compare men and women with 5 years of experience).

Page 7: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 7

25.1 Two-Sample Comparison

Subsets and Confounding

The 95% confidence interval for the difference in average salaries between men and women within the subset of managers with 5 years experience includes 0 (the difference is not significant).

However, the standard error of the difference is much larger; the cases in the subset do not produce a precise estimate.

Page 8: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 8

25.2 Analysis of Covariance

Regression on Subsets

What about the difference between average salaries for managers with 2, 10 or 15 years experience?

Analysis of covariance: regression that combines categorical and numerical explanatory variables; adjusts the comparison of means for the effects of confounding variables.

Page 9: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 9

25.2 Analysis of Covariance

Regression on Subsets

Page 10: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 10

25.2 Analysis of Covariance

Regression on Subsets

Simple regressions fit separately to men and women show that estimated salary rises faster with experience for women compared to men.

Page 11: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 11

25.2 Analysis of Covariance

Combining Regressions

Combining the separate regressions for men and women requires a dummy variable identifying whether a manager is male or female (Group = 1 for men; Group = 0 for women).

Also requires the interaction term Group Years.An interaction term is the product of two explanatory variables in a regression model.

Page 12: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 12

25.2 Analysis of Covariance

Combining Regressions

Page 13: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 13

25.2 Analysis of Covariance

Combining Regressions

Page 14: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 14

25.2 Analysis of Covariance

Interpreting Coefficients

The equation for the group coded as 0 in the dummy variable forms a baseline for comparison.

The slope of the dummy variable is the difference between estimated intercepts in the simple regressions. The slope of the interaction is the difference between estimated slopes in the simple regressions.

Page 15: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 15

25.3 Checking Conditions

The scatterplot reveals a linear (weak) association between Salary and Years.

Some caution is necessary regarding lurking variables (e.g., educational background or business aptitude).

Page 16: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 16

25.3 Checking Conditions

Checking for Similar Variances

Plot the residuals on the fitted values.

Compare side-by-side boxplots of the residuals for each group. The similar variance condition is violated if the IQR in one boxplot is more than twice the length of the other.

Page 17: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 17

25.3 Checking Conditions

Checking for Similar Variances

Page 18: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 18

25.3 Checking Conditions

Checking for Similar Variances

Page 19: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 19

25.3 Checking Conditions

The similar variance condition is satisfied.

Examining the normal quantile plot confirms that the residuals are nearly normal.

Page 20: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 20

25.4 Interactions and Inference

Principle of marginality: if the interaction is statistically significant, retain it as well as both of its components regardless of their level of significance.

If the interaction is not statistically significant, remove it from the regression and re-estimate the equation. A model without an interaction term is simpler to interpret since the lines fit to the groups are parallel.

Page 21: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 21

25.4 Interactions and Inference

Interactions and Collinearity

An interaction in a multiple regression introduces collinearity (see large VIF for Group Years).

Page 22: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 22

25.4 Interactions and Inference

Interactions and Collinearity

Since the interaction in this example is not significant, remove it and re-estimate the MRM.

Page 23: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 23

25.4 Interactions and Inference

Parallel Fits

The slope for Group estimates the difference between the intercepts for male and female managers.

The coefficient of the dummy variable (1.024) means that the line for men is shifted up from the line for women by $1,024 for all levels of experience.

Page 24: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 24

25.4 Interactions and Inference

Parallel Fits

Page 25: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 25

25.4 Interactions and Inference

Parallel Fits

The t-statistic and associated p-value (0.6193) for the slope of Group indicates that it is not statistically significant.

This model finds no statistically significant difference between the average salaries of male and female managers when comparing managers with equal years of experience.

Page 26: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 26

4M Example 25.1: PRIMING IN ADVERTISING

Motivation

FedEx introduced the Courier Pak using two waves of promotion: an ad to raise awareness (i.e., priming) and a visit to existing clients by a sales rep. Management has two questions: (1) How many shipments were generated by a typical one hour contact by the sales rep? and (2) Was the promotion more effective for clients who were already aware of the Courier Pak?

Page 27: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 27

4M Example 25.1: PRIMING IN ADVERTISING

Method

Based on data from 125 customers, fit a multiple regression with a categorical variable. The response is number of shipments using Courier Pak. The explanatory variables are the amount of time spent with the client by a sales rep and a dummy variable indicating whether or not the client was aware of the Courier Pak. The interaction between the explanatory variables is included.

Page 28: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 28

4M Example 25.1: PRIMING IN ADVERTISING

Method Scatterplot with lines fit separately for each group (clients aware of Courier Pak shown in green).

Page 29: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 29

4M Example 25.1: PRIMING IN ADVERTISING

Method

The association within each group appears linear. The scatterplot suggests an interaction because the slopes appear different. The interaction indicates whether prior awareness of Courier Paks affects how the sales rep visit influenced the client.

Page 30: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 30

4M Example 25.1: PRIMING IN ADVERTISING

Mechanics – Estimate Model

Page 31: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 31

4M Example 25.1: PRIMING IN ADVERTISING

Mechanics – Check Conditions

Nothing in the plots suggest dependence. Similar variance condition is satisfied.

Page 32: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 32

4M Example 25.1: PRIMING IN ADVERTISING

Mechanics – Check Conditions

Similar variances confirmed.

Page 33: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 33

4M Example 25.1: PRIMING IN ADVERTISING

Mechanics – Check Conditions

Nearly normal condition is satisfied.

Page 34: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 34

4M Example 25.1: PRIMING IN ADVERTISING

Mechanics

Based on the F-statistic we can conclude that the model explains statistically significant variation. The interaction between awareness and hours of contact is statistically significant. Following the principle of marginality, we retain Aware in the model.

The interaction implies that the gap between the lines gets wider as the number of contact hours increases.

Page 35: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 35

4M Example 25.1: PRIMING IN ADVERTISING

Message

Priming produces a statistically significant increase in the subsequent use of Courier Paks when followed by a visit from a sales rep. Each additional hour of contact with a sales rep produces about 4.3 more uses of the Courier Paks with priming than without priming.

Page 36: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 36

25.5 Regression with Several Groups

Example: Estimating Store Sales

Explanatory variables are median household income in surrounding community, size of the local population, and market (urban, suburban, rural).

The response is sales in dollars per square foot.

Page 37: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 37

25.5 Regression with Several Groups

Scatterplot Matrix

Rural – redSuburban – greenUrban – blue

Association within each group appears linear.

Page 38: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 38

25.5 Regression with Several Groups

Example: Estimating Store Sales

In general, to distinguish J groups requires J-1 dummy variables.

For this example use two dummy variables:Suburban Dummy = 1 suburban, 0 otherwiseUrban Dummy = 1 urban, 0 otherwiseNote that rural locations would be coded 0,0.

Page 39: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 39

25.5 Regression with Several Groups

Example: Estimating Store Sales

Page 40: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 40

25.5 Regression with Several Groups

Example: Estimating Store Sales

The interpretation of the estimates is similar to the interpretation of models with two groups.

Coefficients associated with dummy variables reflect differences of stores in other locations compared to rural stores.

Page 41: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 41

25.5 Regression with Several Groups

Estimating Sales for Rural Stores

The estimated equation for baseline comparison (stores located in a rural location) is

Estimated Sales ($/SqFt) = -388.6992 + 0.0097 Income + 0.2401 Population

Page 42: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 42

25.5 Regression with Several Groups

Estimating Sales for Urban Stores

Consider stores in an urban location. The estimated sales is given by

Estimated Sales ($/SqFt) = (-388.6992 + 468.8654) + (0.0097 - 0.0053) Income + 0.2401 Population

Estimated Sales ($/SqFt) =80.1662 + 0.0044 Income + 0.2401 Population

Page 43: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 43

25.5 Regression with Several Groups

Interpretation of Results

Sales at a given income are higher in urban compared to rural stores, but do not grow as fast with increases in income.

Population has the same effect in every location because the model does not include an interaction term between Population and dummy variables for location.

Page 44: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 44

Best Practices

Be thorough in your search for confounding variables.

Consider interactions.

Choose an appropriate baseline group.

Write out the fits for separate groups.

Page 45: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 45

Best Practices (Continued)

Be careful interpreting the coefficient of the dummy variable.

Check for comparable variances in the groups.

Use color-coding or different plot symbols to identify subsets of observations in plots.

Page 46: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 46

Pitfalls

Don’t use too many dummy variables.

Don’t confuse interaction with correlation.

Don’t think that you have adjusted for all of the confounding factors.

Don’t confuse the different types of slopes.

Don’t forget to check the conditions of the MRM.