Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Post on 26-Dec-2015

214 views 0 download

Transcript of Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.

Copyright © 2014, 2011 Pearson Education, Inc. 1

Chapter 25Categorical Explanatory Variables

Copyright © 2014, 2011 Pearson Education, Inc. 2

25.1 Two-Sample Comparisons

Does Wal-Mart discriminate against female employees? Are they paid less than men?

Use multiple regression with a categorical explanatory variable representing gender to analyze pay data.

Regression analysis can adjust the comparison between men and women to account for other variables that may affect pay.

Copyright © 2014, 2011 Pearson Education, Inc. 3

25.1 Two-Sample Comparison

Example: Mid-Level Managers’ Salaries

The average salary for women is $140,000 and the average salary for men is $144,700.

Copyright © 2014, 2011 Pearson Education, Inc. 4

25.1 Two-Sample Comparison

Example: Mid-Level Managers’ Salaries

The 95% confidence for the difference in mean salaries is $740 to $8,590 (since 0 is not in this interval, the difference is significant).

Assume conditions for inference are satisfied.

Copyright © 2014, 2011 Pearson Education, Inc. 5

25.1 Two-Sample Comparison

Confounding Variables

Without a randomized experiment, we must be careful about lurking variables that would account for the significant difference between average salaries (e.g., experience).

Experience is a confounding variable if it is correlated with salary and the two groups (men and women) differ with regard to experience.

Copyright © 2014, 2011 Pearson Education, Inc. 6

25.1 Two-Sample Comparison

Subsets and Confounding

Restrict analysis to a subset of cases with matching levels of the confounding variable (e.g., compare men and women with 5 years of experience).

Copyright © 2014, 2011 Pearson Education, Inc. 7

25.1 Two-Sample Comparison

Subsets and Confounding

The 95% confidence interval for the difference in average salaries between men and women within the subset of managers with 5 years experience includes 0 (the difference is not significant).

However, the standard error of the difference is much larger; the cases in the subset do not produce a precise estimate.

Copyright © 2014, 2011 Pearson Education, Inc. 8

25.2 Analysis of Covariance

Regression on Subsets

What about the difference between average salaries for managers with 2, 10 or 15 years experience?

Analysis of covariance: regression that combines categorical and numerical explanatory variables; adjusts the comparison of means for the effects of confounding variables.

Copyright © 2014, 2011 Pearson Education, Inc. 9

25.2 Analysis of Covariance

Regression on Subsets

Copyright © 2014, 2011 Pearson Education, Inc. 10

25.2 Analysis of Covariance

Regression on Subsets

Simple regressions fit separately to men and women show that estimated salary rises faster with experience for women compared to men.

Copyright © 2014, 2011 Pearson Education, Inc. 11

25.2 Analysis of Covariance

Combining Regressions

Combining the separate regressions for men and women requires a dummy variable identifying whether a manager is male or female (Group = 1 for men; Group = 0 for women).

Also requires the interaction term Group Years.An interaction term is the product of two explanatory variables in a regression model.

Copyright © 2014, 2011 Pearson Education, Inc. 12

25.2 Analysis of Covariance

Combining Regressions

Copyright © 2014, 2011 Pearson Education, Inc. 13

25.2 Analysis of Covariance

Combining Regressions

Copyright © 2014, 2011 Pearson Education, Inc. 14

25.2 Analysis of Covariance

Interpreting Coefficients

The equation for the group coded as 0 in the dummy variable forms a baseline for comparison.

The slope of the dummy variable is the difference between estimated intercepts in the simple regressions. The slope of the interaction is the difference between estimated slopes in the simple regressions.

Copyright © 2014, 2011 Pearson Education, Inc. 15

25.3 Checking Conditions

The scatterplot reveals a linear (weak) association between Salary and Years.

Some caution is necessary regarding lurking variables (e.g., educational background or business aptitude).

Copyright © 2014, 2011 Pearson Education, Inc. 16

25.3 Checking Conditions

Checking for Similar Variances

Plot the residuals on the fitted values.

Compare side-by-side boxplots of the residuals for each group. The similar variance condition is violated if the IQR in one boxplot is more than twice the length of the other.

Copyright © 2014, 2011 Pearson Education, Inc. 17

25.3 Checking Conditions

Checking for Similar Variances

Copyright © 2014, 2011 Pearson Education, Inc. 18

25.3 Checking Conditions

Checking for Similar Variances

Copyright © 2014, 2011 Pearson Education, Inc. 19

25.3 Checking Conditions

The similar variance condition is satisfied.

Examining the normal quantile plot confirms that the residuals are nearly normal.

Copyright © 2014, 2011 Pearson Education, Inc. 20

25.4 Interactions and Inference

Principle of marginality: if the interaction is statistically significant, retain it as well as both of its components regardless of their level of significance.

If the interaction is not statistically significant, remove it from the regression and re-estimate the equation. A model without an interaction term is simpler to interpret since the lines fit to the groups are parallel.

Copyright © 2014, 2011 Pearson Education, Inc. 21

25.4 Interactions and Inference

Interactions and Collinearity

An interaction in a multiple regression introduces collinearity (see large VIF for Group Years).

Copyright © 2014, 2011 Pearson Education, Inc. 22

25.4 Interactions and Inference

Interactions and Collinearity

Since the interaction in this example is not significant, remove it and re-estimate the MRM.

Copyright © 2014, 2011 Pearson Education, Inc. 23

25.4 Interactions and Inference

Parallel Fits

The slope for Group estimates the difference between the intercepts for male and female managers.

The coefficient of the dummy variable (1.024) means that the line for men is shifted up from the line for women by $1,024 for all levels of experience.

Copyright © 2014, 2011 Pearson Education, Inc. 24

25.4 Interactions and Inference

Parallel Fits

Copyright © 2014, 2011 Pearson Education, Inc. 25

25.4 Interactions and Inference

Parallel Fits

The t-statistic and associated p-value (0.6193) for the slope of Group indicates that it is not statistically significant.

This model finds no statistically significant difference between the average salaries of male and female managers when comparing managers with equal years of experience.

Copyright © 2014, 2011 Pearson Education, Inc. 26

4M Example 25.1: PRIMING IN ADVERTISING

Motivation

FedEx introduced the Courier Pak using two waves of promotion: an ad to raise awareness (i.e., priming) and a visit to existing clients by a sales rep. Management has two questions: (1) How many shipments were generated by a typical one hour contact by the sales rep? and (2) Was the promotion more effective for clients who were already aware of the Courier Pak?

Copyright © 2014, 2011 Pearson Education, Inc. 27

4M Example 25.1: PRIMING IN ADVERTISING

Method

Based on data from 125 customers, fit a multiple regression with a categorical variable. The response is number of shipments using Courier Pak. The explanatory variables are the amount of time spent with the client by a sales rep and a dummy variable indicating whether or not the client was aware of the Courier Pak. The interaction between the explanatory variables is included.

Copyright © 2014, 2011 Pearson Education, Inc. 28

4M Example 25.1: PRIMING IN ADVERTISING

Method Scatterplot with lines fit separately for each group (clients aware of Courier Pak shown in green).

Copyright © 2014, 2011 Pearson Education, Inc. 29

4M Example 25.1: PRIMING IN ADVERTISING

Method

The association within each group appears linear. The scatterplot suggests an interaction because the slopes appear different. The interaction indicates whether prior awareness of Courier Paks affects how the sales rep visit influenced the client.

Copyright © 2014, 2011 Pearson Education, Inc. 30

4M Example 25.1: PRIMING IN ADVERTISING

Mechanics – Estimate Model

Copyright © 2014, 2011 Pearson Education, Inc. 31

4M Example 25.1: PRIMING IN ADVERTISING

Mechanics – Check Conditions

Nothing in the plots suggest dependence. Similar variance condition is satisfied.

Copyright © 2014, 2011 Pearson Education, Inc. 32

4M Example 25.1: PRIMING IN ADVERTISING

Mechanics – Check Conditions

Similar variances confirmed.

Copyright © 2014, 2011 Pearson Education, Inc. 33

4M Example 25.1: PRIMING IN ADVERTISING

Mechanics – Check Conditions

Nearly normal condition is satisfied.

Copyright © 2014, 2011 Pearson Education, Inc. 34

4M Example 25.1: PRIMING IN ADVERTISING

Mechanics

Based on the F-statistic we can conclude that the model explains statistically significant variation. The interaction between awareness and hours of contact is statistically significant. Following the principle of marginality, we retain Aware in the model.

The interaction implies that the gap between the lines gets wider as the number of contact hours increases.

Copyright © 2014, 2011 Pearson Education, Inc. 35

4M Example 25.1: PRIMING IN ADVERTISING

Message

Priming produces a statistically significant increase in the subsequent use of Courier Paks when followed by a visit from a sales rep. Each additional hour of contact with a sales rep produces about 4.3 more uses of the Courier Paks with priming than without priming.

Copyright © 2014, 2011 Pearson Education, Inc. 36

25.5 Regression with Several Groups

Example: Estimating Store Sales

Explanatory variables are median household income in surrounding community, size of the local population, and market (urban, suburban, rural).

The response is sales in dollars per square foot.

Copyright © 2014, 2011 Pearson Education, Inc. 37

25.5 Regression with Several Groups

Scatterplot Matrix

Rural – redSuburban – greenUrban – blue

Association within each group appears linear.

Copyright © 2014, 2011 Pearson Education, Inc. 38

25.5 Regression with Several Groups

Example: Estimating Store Sales

In general, to distinguish J groups requires J-1 dummy variables.

For this example use two dummy variables:Suburban Dummy = 1 suburban, 0 otherwiseUrban Dummy = 1 urban, 0 otherwiseNote that rural locations would be coded 0,0.

Copyright © 2014, 2011 Pearson Education, Inc. 39

25.5 Regression with Several Groups

Example: Estimating Store Sales

Copyright © 2014, 2011 Pearson Education, Inc. 40

25.5 Regression with Several Groups

Example: Estimating Store Sales

The interpretation of the estimates is similar to the interpretation of models with two groups.

Coefficients associated with dummy variables reflect differences of stores in other locations compared to rural stores.

Copyright © 2014, 2011 Pearson Education, Inc. 41

25.5 Regression with Several Groups

Estimating Sales for Rural Stores

The estimated equation for baseline comparison (stores located in a rural location) is

Estimated Sales ($/SqFt) = -388.6992 + 0.0097 Income + 0.2401 Population

Copyright © 2014, 2011 Pearson Education, Inc. 42

25.5 Regression with Several Groups

Estimating Sales for Urban Stores

Consider stores in an urban location. The estimated sales is given by

Estimated Sales ($/SqFt) = (-388.6992 + 468.8654) + (0.0097 - 0.0053) Income + 0.2401 Population

Estimated Sales ($/SqFt) =80.1662 + 0.0044 Income + 0.2401 Population

Copyright © 2014, 2011 Pearson Education, Inc. 43

25.5 Regression with Several Groups

Interpretation of Results

Sales at a given income are higher in urban compared to rural stores, but do not grow as fast with increases in income.

Population has the same effect in every location because the model does not include an interaction term between Population and dummy variables for location.

Copyright © 2014, 2011 Pearson Education, Inc. 44

Best Practices

Be thorough in your search for confounding variables.

Consider interactions.

Choose an appropriate baseline group.

Write out the fits for separate groups.

Copyright © 2014, 2011 Pearson Education, Inc. 45

Best Practices (Continued)

Be careful interpreting the coefficient of the dummy variable.

Check for comparable variances in the groups.

Use color-coding or different plot symbols to identify subsets of observations in plots.

Copyright © 2014, 2011 Pearson Education, Inc. 46

Pitfalls

Don’t use too many dummy variables.

Don’t confuse interaction with correlation.

Don’t think that you have adjusted for all of the confounding factors.

Don’t confuse the different types of slopes.

Don’t forget to check the conditions of the MRM.