CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically...

17
CATEGORICAL VARIABLES Testing hypotheses using

Transcript of CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically...

Page 1: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

CATEGORICAL VARIABLESTesting hypotheses using

Page 2: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

• Independent variable: Income, measured categorically (nominal variable)– Two values: low income and high income– Income is measured by where a car is

parked - student lot (low income) and faculty-staff lot (high income)

• Dependent variable: Car value, measured categorically (ordinal variable)– 1, 2, 3, 4 or 5 (1- cheapest, 5 - most

expensive)• Sampling

– Stratified, disproportionate, systematic random sampling of 10 cars from a student lot, and 10 cars from a faculty lot

• Coding– Income is automatically coded by a car’s

location (faculty-staff or student lot)– A 5-level categorical measure is used to

code car values

Hypothesis: Higher income persons drivemore expensive cars

Page 3: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

DV - Car value

IV - Income 1 2 3 4 5 n

LOW (student lot) 2 2 0 3 3 10

HIGH (F/S lot) 3 1 1 1 4 10

Car value

Student lot

Car value

Faculty/staff lot

Team A

For the purposes of this class, always place the values of the DV along the horizontal axis,and the values of the IV along the vertical axis

Each value of the DV has its own column Each value of the IV has its own row

Step 1:Coding

Hypothesis:Higher income persons drive more expensive cars

Page 4: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

DV - Car value

IV - Income 1 2 3 4 5 n

LOW (student lot) 6 4 0 0 0 10

HIGH (F/S lot) 5 1 2 2 0 10

Team B

Student lot Faculty/staff lot

For the purposes of this class, always place the values of the DV along the horizontal axis,and the values of the IV along the vertical axis

Each value of the DV has its own column Each value of the IV has its own row

Step 1:Coding

Hypothesis:Higher income persons drive more expensive cars

Page 5: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

DV - Car value

IV - Income 1 2 3 4 5 %

LOW (student lot) 20% 20% 0% 30% 30% 100%

HIGH (F/S lot) 30% 10% 10% 10% 40% 100%

For accurate analysis, frequencies must be converted to percentages

Convert each row separately so the cells add to 100 percent

DV - Car value

IV - Income 1 2 3 4 5 %

LOW (student lot) 60% 40% 0% 0% 0% 100%

HIGH (F/S lot) 50% 10% 20% 20% 0% 100%

Team A

Team B

DV - Car value

IV - Income 1 2 3 4 5 n

LOW (student lot) 2 2 0 3 3 10

HIGH (F/S lot) 3 1 1 1 4 10

DV - Car value

IV - Income 1 2 3 4 5 n

LOW (student lot) 6 4 0 0 0 10

HIGH (F/S lot) 5 1 2 2 0 10

Step 2:Percentaging

Page 6: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

DV - Car value

IV - Income 1 2 3 4 5 %

LOW (student lot) 20% 20% 0% 30% 30% 100%

HIGH (F/S lot) 30% 10% 10% 10% 40% 100%

Switch values of the independent variable. Does the distribution of car values

change? If so, is the difference in the predicted direction?

DV - Car value

IV - Income 1 2 3 4 5 %

LOW (student lot) 60% 40% 0% 0% 0% 100%

HIGH (F/S lot) 50% 10% 20% 20% 0% 100%

Team A

Team B

Step 3:Analysis

• Forty percent of the cars in the student lot are value 1 and 2. Same for the F/S lot.

• There are differences between rows in values 3-5, but they seem minimal.

• All the cars in the student lot are value 1 and 2.

• But forty percent of the cars in the F/S lot are value 3 and 4.

• As we “switch” values of the IV from low to high income, the proportion of expensive cars substantially increases. The direction of the effect is consistent with the hypothesis.

Page 7: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

• IV Poverty is measured by income, DV crime by arrests– Income has two values, low and high– Arrests has two values, never arrested and arrest record

• To test the hypothesis, switch from one category of the IV to the other.– Does the distribution of cases along the DV change substantially? – If so, is the change in the hypothesized direction?

Another exampleHypothesis: poverty crime

NeverArrested

ArrestRecord

Low

Income80% 20% 100%

High

Income20% 80% 100%

Distribution flip-flops in an unexpected direction. High income persons seem much more likely to have an arrest record. The hypothesis is rejected.

Distribution remains the same. There seems to be no connection between income and arrest record. The hypothesis is rejected.

NeverArrested

ArrestRecord

Low

Income50% 50% 100%

High

Income50% 50% 100%

NeverArrested

ArrestRecord

Low

Income20% 80% 100%

High

Income80% 20% 100%

Distribution flip-flops in the expected direction.High income persons seemmuch less likely to have anarrest record. The hypothesis is confirmed.

Page 8: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

Cranking it up a notch with “elaboration analysis”

• Hmmm, interesting! Sergeants are more stressed than patrol officers. • But is it possible that another variable - one closely associated with position - either mediates

the relationship with job stress or is the real driving force? In other words…

Position on police force other variable job stress OR other variable job stress

position on police force

Hypothesis: position on police force determines job stress

Job Stress

Position Low High n

Sergeant 30 60 90

Patrol officer 86 24 110

Source: Fitzgerald

Job Stress

Position Low High

Sergeant 33% 67% 100%

Patrol officer 78% 22% 100%

Page 9: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

Elaboration analysis - using first-order partial tablesto analyze the effect of a “control” variable

• So…what variables might be associated with position and with job stress?

– Data indicates that females are less likely to be police supervisors.

– The literature review also suggests that males and females may have different stress responses

• Let’s “elaborate” (dig deeper)

– Does the effect of position on job stress hold regardless of gender?

• Gender is used as a “control” variable. We will test the original, “zero-order” relationship between position and job stress, “controlling” for each value of gender.

– Gender is categorical, so we keep using tables

• Create one table just like the one we originally designed (position job stress) for each value of control variable gender

– One table for males, another for females

– Each table is identical to the zero-order table, except it only includes cops of that gender

• These tables are called “first order partial tables” because they represent our first attempt to introduce a “control” variable.

– Each table is “partial” - only part of the sample - because it only includes cases with a certain value of the control variable

Page 10: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

Original “zero-order”tables

First order partial tables - one for each value of the control variable

Job Stress

Position Low High n

Sergeant 30 60 90

Patrol officer 86 24 110

Job Stress - 130 male officers

Position Low High n

Sergeant 14 46 60

Patrol officer 58 12 70

Job Stress - 70 female officers

Position Low High n

Sergeant 16 14 30

Patrol officer 28 12 40

Job Stress - 130 male officers

Position Low High

Sergeant 23% 77% 100%

Patrol officer 83% 17% 100%

Job Stress - 70 female officers

Position Low High

Sergeant 53% 47% 100%

Patrol officer 70% 30% 100%

Job Stress

Position Low High

Sergeant 33% 67% 100%

Patrol officer 78% 22% 100%

Page 11: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

Zero-order table,

all cops

First-order partial table,male cops

No, the percentages aren’t exactly the same. But, overall, the relationship in the first-order partial table is in the same direction as in the zero-order table, perhaps stronger. Most male sergeants report being highly stressed, and most male patrol officers report very low stress. Knowing that an officer is male is consistent with the hypothesis that higher position leads to more job stress.

Does the zero-order relationship between position and job stress persist for males?

Job Stress - Male officers

Position Low High

Sergeant 23% 77% 100%

Patrol officer 83% 17% 100%

Job Stress

Position Low High

Sergeant 33% 67% 100%

Patrol officer 78% 22% 100%

Page 12: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

OUTCOME: SPECIFICATION

Knowing that officers were male didn’t change our opinion about the effects of position on job stress. So for male officers, the “zero-order” relationship between position and job stress holds. But knowing that officers were female gave us a new insight. Only 47% of female sergeants report being highly stressed, a far smaller proportion than 77% of male sergeants.

So our opinion of the effects of position on job stress is moderated by one value of the control variable, female. Knowing that a supervisor is female tells us something we didn’t know.

Does the zero-order relationship between position and job stress persist for females?

Zero-order table,

all cops

Job Stress

Position Low High

Sergeant 33% 67% 100%

Patrol officer 78% 22% 100%

Job Stress - Female officers

Position Low High

Sergeant 53% 47% 100%

Patrol officer 70% 30% 100%

Job Stress - Male officers

Position Low High

Sergeant 23% 77% 100%

Patrol officer 83% 17% 100%

Page 13: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

First-order partial analysis:three outcomes

• Doing a first-order partial analysis yields three possible interpretive outcomes:– Specification (prior example): The zero-order relationship persists for

some but not all values of the new variable. Coding this variable teaches us something.

– Replication (next example): The original relationship from the zero-order table persists at both values of the new variable. Coding for the new variable teaches us nothing.

– Explanation (final example): The zero-order relationship is not present at any value of the new variable. The apparent effect of the original independent variable - the one in the hypothesis - has been completely “explained away.”

We just covered specification. Let’s turn to the other twopossible outcomes of elaboration analysis.

Page 14: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

PRACTICAL EXERCISEHypothesis: Higher rank Less cynicism

• Sample of 100 officers and 100 supervisors

― Twenty officers scored low on cynicism; 80 were high cynicism

― Fifty supervisors scored low on cynicism; 50 were high cynicism

• Build a (zero-order) frequency table, then convert it to percentages

• Be sure to place the categories of the dependent variable in columns, and the categories of the independent variable in rows

Page 15: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

• According to our literature review, a variable associated with rank – gender – may affect cynicism.

• Let’s “control” for gender. We get data on cynicism for officers and supervisors, broken down by gender:

MALES

Officers: 10 low cynicism, 50 high cynicism

Supervisors: 35 low, 35 high

FEMALES

Officers: 10 low, 30 high

Supervisors: 15 low, 15 high

• Create first-order partial tables for gender, convert tables to percentages, and analyze the results...

PRACTICAL EXERCISEHypothesis: Higher rank Less cynicism

Page 16: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

• But the literature suggests that still another variable associated with rank – time on the job – may affect cynicism.

• Let’s “control” for time on the job. Here’s the data:

LESS THAN FIVE YEARS ON THE JOB

Officers: 0 low cynicism, 75 high cynicism

Supervisors: 2 low cynicism, 40 high cynicism

FIVE YEARS OR MORE ON THE JOB

Officers: 20 low, 5 high

Supervisors: 48 low, 10 high

• Create first-order partial tables for time on the job, convert tables to percentages, and analyze the results...

PRACTICAL EXERCISEHypothesis: Higher rank Less cynicism

Page 17: CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

Cynicism - Males

Rank Low High n

Officers 10 50 60

Supervisors 35 35 70

130

But isn’t this too “loosey-goosey”?• Assume there is a relationship between variables.

When we “switch” the value of the IV, will thechange in the DV always be this obvious?

• No. And when the DV has multiple categories,such as in our parking lot exercise, visuallydiscerning an effect can be impossible. Bottomline - changes in percentage are not enough.

• Great. Now what?• Fortunately, we can use the cell frequencies to

calculate a statistic known as “Chi-square”, X2.This statistic assigns a numerical measure tothe relationship between variables. We thenlook up that number in a table to determine ifit is large enough to be statistically “significant.”

• All we need is the original frequency table?• We use the table to build a second table, which projects what the frequencies

would be if there was NO relationship between variables. We then compare the two frequency tables. More on that during the third part of the semester!

Cynicism - Males

Rank Low High

Officers 17% 83% 100%

Supervisors 50% 50% 100%