Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

25
Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield Modeling Possibilities

description

Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield. Modeling Possibilities. Objective. To use StatPro’s multiple regression procedure to analyze whether the back discriminates against females in terms of salary. BANK.XLS. - PowerPoint PPT Presentation

Transcript of Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

Page 1: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

Example 11.3Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

Modeling Possibilities

Page 2: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Objective

To use StatPro’s multiple regression procedure to analyze whether the back discriminates against females in terms of salary.

Page 3: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

BANK.XLS The Fifth National Bank of Springfield is facing a

gender-discrimination suit. The charge is that its female employees receive substantially smaller salaries than its male employees.

The bank’s employee database is listed in this file. Here is a partial list of the data.

Page 4: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Variables For each of the 208 employees, the data set includes

the following variables:

– EducLev: education level, a categorical variable with categories 1 (finished high school), 2 (finished some college courses), 3 (obtianed a bachelor’s degree), 4 (took some graduate courses) and 5 (obtained a graduate degree)

– JobGrade: a categorical variable indicating the current job level, the possible levels being from 1-6 (6 is highest)

– YrHired: year employee was hired

– YrBorn: year employee was born

– Gender: a categorical variable with values “Female” and “Male”

Page 5: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Variables -- continued

– YrsPrior: number of years of work experience at another bank prior to working at Fifth National

– PCJob: a dummy variable with value 1 if the employee’s current job is computer-related and value 0 otherwise

– Salary: current annual salary in thousands of dollars

Do the data provide evidence that females are discriminated against in terms of salary?

Page 6: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Naïve Approach A naïve approach to the problem is to compare the

average salaries of the males and females.

The average of all salaries is $39,922, the average female salary is $37,210, and the average male salary is $45,505.

The difference between the averages is statistically different. The females are definitely earning less, but perhaps there is a reason.

The question is whether the differences between the average salaries is still evident after taking other attributes into account. A perfect task for regression.

Page 7: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Dummy Variables

Some potential explanatory variables are categorical and cannot be measured on a quantitative scale.

However, we often need to use these variables because they are related to the response variable.

The trick is to create dummy variables, also called indicator or 0-1 variables.

These are variables that indicate the category a given observation is in.

Page 8: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Dummy Variables -- continued

To create dummy variables we can use an IF statement or we can use StatPro’s Dummy variable procedure.

The Dummy variable procedure is usually easier particularly when there are multiple categories.

Once the dummy variables are created, we can combine the variables if we like by simply adding the columns to get the dummy for the new category.

Page 9: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis In this example we create dummy variables for Gender,

and EducLev.

Then we can run a regression analysis with Salary as the response variable, using any combination of numerical and dummy explanatory variables.

We must follow two rules:

– We shouldn’t use any of the original categorical variables that the dummies are based on.

– We should use one less dummy than the number of categories for any categorical variable.

Page 10: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued This second rule is a technical one. If we violate it the

software will give us an error message.

For example, Ed_1-Ed_6, any five of these variables can be used. The omitted dummy then corresponds to the reference category.

As we will see the interpretation of the dummy variable coefficients are all relevant to this reference category.

To get used to dummy variables in regression analysis we will proceed in several stages.

Page 11: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued We first estimate a regression equation with only one variable.

The output is shown in this table. The resulting equation isPredicated Salary = 45.505 - 8.26Female

Page 12: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued

To interpret this equation recall that Female has only two possible values, 0 and 1. If we substitute 1 then the predicted salary equals 37.209 and if we substitute 0 the predicated salary is 45.505.

These are the average salaries of females and males. Therefore the interpretation of the -8.926 coefficient of the Female dummy variable is straightforward.

Page 13: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued The above equation only tells part of the story, it

ignores all information except for gender.

We expand this equation by adding the experience variables. The output is shown in this table.

Page 14: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued The corresponding equation is

Predicted Salary = 35.492 + 0.998YrsExper

+ 0.131YrsPrior - 8.080Female

It is useful to write two separate equations, one for females and one for males Predicted Salary = 27.412 + 0.988YrsExper + 0.131YrsPrior Predicted Salary = 35.492 + 0.988YrsExper + 0.131YrsPrior

We interpret the coefficient -8.080 of the Female dummy variable as the average salary disadvantage for females relative to males after controlling for job experience. But there is still more story to tell.

Page 15: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued We next add education level to the equation by including four of the five

education level dummies. Although any four could be used, we use Ed_2 to Ed_5, so that the lowest level becomes the reference category.

We would expect this to lead to positive coefficients for these dummies, which are easier to interpret.

The resulting output is shown in the table on the next slide.

Page 16: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued

Page 17: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued

The estimated regression equations is nowPredicated Salary=26.613 + 1.033YrsExper + 0.362YrsPrior - 4.501Female + 0.160Ed_2 + 4.765Ed_3 + 7.320Ed_4

+11.770Ed_5

There are now two categorical variables involved, gender and educational level.

However, we can still write a separate equation for any combination of categories by setting the dummies to the appropriate values.

Page 18: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued

For example, the equation for females at the fifth education level is found by setting Female=1 and Ed_5=1 and setting the other job dummies equal to 0. The equation formed isPredictedSalary = 33.882 + 1.033YrsExper + 0.362YrsPrior

We interpret this equation as follows:

– For either gender and any education level, the expected increase in salary for one extra year of experience with Fifth National of $1033; the expected increase in salary for one extra year of prior experience with another bank is $362.

Page 19: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued– The coefficients of the education dummies indicate the

average increase in salary an employee can expect relative to the reference (lowest) education level.

– The key coefficient, the negative $4501 for females, indicates the average salary disadvantage for females relative to males, given that they have the same experience levels and the same education levels.

One further explanation for gender differences in salary might be job grade. Perhaps females tend to be in lower job grades, which would help explain why they get lower salaries on average.

Page 20: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued One way to check this is with a pivot table, as shown

below, where we put job grade in the row area, gender in the column area, and request counts, displayed as percentages of columns.

Clearly, females tend to be concentrated at the lower job grades.

Page 21: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued

This certainly helps to explain why females get lower salaries on average.

We can go one step further to see the effect of job grade on salary by including the dummies for job grade in the equation, along with the other variables we have included so far.

As with the education dummies, we use the lowest job grad as the reference category and include only the five dummies for the other categories.

Page 22: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued While we’re at it, we include the other two potential

explanatory variables to the equation: Age, coded as 95 minus YrBorn, and HasPCJob, a dummy based on the PCJob categorical variable.

The regression output is shown on the next slide.

As expected, the coefficients of the job grade dummies are all positive, and they increase as the job grade increases – it pays to be in the higher job grades.

Page 23: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued

Page 24: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued

The effect of age appears to be minimal, and there appears to be a “bonus” of close to $5000 for having a PC-related job.

The R2 value has now increased to 76.5%, and the penalty for being a female has decreased to $2555 – still large but not as large as before.

However, even if this penalty, the coefficient of Female in this last equation, is considered “small,” is it convincing evidence against the argument for gender discrimination?

Page 25: Example 11.3 Possible Gender Discrimination in Salary at Fifth National Bank of Springfield

11.1 | 11.2 | 11.1a | 11.2a | 11.2b | 11.3a | 11.4 | 11.3b | 11.5 | 11.6

Regression Analysis -- continued We believe the answer is “no.”

We have used variations in job grades to reduce the penalty for being female. But the remaining question is then: Why are females predominantly in the low job grades?

Perhaps this is the real source of gender discrimination.

Perhaps management is not advancing the females as quickly as it should, which naturally results in lower salaries for females.