MEASUREMENT ISSUES

27
MEASUREMENT ISSUES LEONIE HUDDY STONY BROOK UNIVERSITY [email protected]

description

MEASUREMENT ISSUES. Leonie huddy Stony brook university [email protected]. outline. MEASUREMENT ERROR Definitions & Sources Need for reliable measurement of DVs and moderators in survey experiments Examples: measurement of moderator variables - PowerPoint PPT Presentation

Transcript of MEASUREMENT ISSUES

Page 1: MEASUREMENT ISSUES

MEASUREMENT ISSUESLEONIE HUDDYSTONY BROOK [email protected]

Page 2: MEASUREMENT ISSUES

OUTLINEI. MEASUREMENT ERROR

1. Definitions & Sources2. Need for reliable measurement of DVs and moderators in

survey experiments• Examples: measurement of moderator variables

II. Using Experiments to Validate MeasurementI. Example: Racial Resentment

III. Cross-National Measurement

Page 3: MEASUREMENT ISSUES

I. MEASUREMENT ERROR1. DEFINITIONS (ALWIN, 2007) Measurement error: Error that occurs when observed value is different from the true value (systematically or at random) Bias: a measure differs in systematic ways from its true

value Reliability: the measure is free of measurement error Validity: measures right concept. May also need to assess this to ensure valid measurement.

• Face validity (looks right on the surface)• Discriminant validity (differs from opposing soing what it

should)• Convergent validity (goes with what it should)• Predictive validity (predicts what it is supposed to)

Page 4: MEASUREMENT ISSUES

2. SOURCES OF MEASUREMENT ERROR (ALWIN)Bias Variance-interviewer bias -interviewer error variance-respondent bias -respondent error variance-instrument bias -instrument error variance-mode bias -mode error variance

Page 5: MEASUREMENT ISSUES

3. RELIABLE MEASUREMENT OF THE DEPENDENT VARIABLE (DV)The major problem with measurement error in the DV:VARIABILITY: Measurement error makes it more difficult to successfully identify significant treatment effects.• Important to include multiple measures of the DV to

reduce measurement error and increase measurement reliability

• Many experimental studies focus more on the manipulation than the DV

Bias in the DV (under or over estimates) does not bias the estimated relationship with an independent variable

Page 6: MEASUREMENT ISSUES

4. RELIABLE MEASUREMENT OF EXPERIMENTAL MODERATORSExperimental effects in political science are frequently heterogeneous. Hypothetical examples include:• The effect of elite partisan cues in a framing study

depends on partisan identity (direction and identity strength)

• The effects of new information about a government policy may depend on existing levels of political sophistication

• Exposure to a more or less generous welfare policy depends on la respondent’s left/right political ideology

The reliable measurement of moderators (and their correct theoretical model specification) will increase the likelihood of detecting heterogeneous experimental effects.

Page 7: MEASUREMENT ISSUES

A. MODERATOR MEASUREMENT EXAMPLE: PARTISAN IDENTITY VS. TRADITIONAL PID STRENGTH; HUDDY, MASON & AAROE

Threat: an experimental blog statement that suggests that Democrats or Republicans will lose the upcoming election; message is from either the same or the other main party.

Sample statements in Democratic threat manipulation (from the other party):• I love watching Democrats delude themselves! They’re talking a big

game, but look closer and they know they’re in trouble.• America clearly wants Republican leadership, and the Democrats

are running in circles desperately trying to convince themselves that anyone in America trusts them!

• People don’t trust Democrats and they don’t like their politics.• They lost a lot of credibility over their years of flip-flopping, it's going

to take more than a couple of years to get it back.

Page 8: MEASUREMENT ISSUES

MULTI-ITEM PARTISAN IDENTITY VS. TRADITIONAL PID STRENGTH; HUDDY, MASON & AAROEPartisan Identity ScaleHow important is being a [Democrat/ Republican/Independent] to you?”How well does the term [Democrat/ Republican/Independent] describe you? When talking about [Democrats/ Republicans/Independents], how often do you use “we instead of “they”? To what extent do you think of yourself as being a [Democrat/ Republican/Independent]?

Traditional PID Strength“Generally speaking do you think of yourself as a Democrat, a Republican, or an Independent?” “Are you a strong or not so strong Democrat/Republican?” IF INDEPENDENT: “Do you think of yourself as close to the Republican party or closer to the Democratic party? “

Page 9: MEASUREMENT ISSUES

MULTI-ITEM PARTISAN IDENTITY VS. PID STRENGTH: PREDICTING ANGRY REACTIONS TO THREAT; HUDDY, MASON & AAROE

Merged Blog Study Student Sample

Dem Rep Dem Rep

Traditional Strength

.004 (.03) -.06 (.08) -.04 (.07) .05 (.16)Partisan Identity -.03 (.06) -.24 (.14) .00 (.20) -.48 (.41)Partisan Threat

.09 (.05) -.04 (.11) -.41 (.16) -.15 (.34)StrengthXThreat

.02 (.04) -.03 (.10) -.07 (.12) -.08 (.28)IdentityXThreat

.36 (.08) .59 (.19) .96 (.31) .44 (.66)N

1568 252 145 38

Page 10: MEASUREMENT ISSUES

B. MODERATOR MEASUREMENT EXAMPLE: CANDIDATE SKIN COLOR, RACIAL PREJUDICE AND SOCIAL DESIRABILITY; TERKILDSEN 1993• Assigned to read about a light or and dark skinned black

candidate• Subjects part of the Louisville, KY jury pool• Measured self–monitoring (tendency to distort true beliefs

in response to social norms) AND racial prejudice as factors that moderate the experimental treatment

• Both are measured as multi-item scales to reduce measurement error

Page 11: MEASUREMENT ISSUES

QUESTION WORDING-SELF MONITORING; TERKILDSEN (1993) C. Self-monitoring scale (abridged version): Respondents were asked to indicate if "each statement is true or false as it applies to you:" Scale reliability was .74.

F 1. I can only argue for ideas which I already believe.

T 2. When I am uncertain how to act in social situations I look to the behavior of others.

T 3. I laugh more when I watch a comedy with others than when alone.

F 4. I would not change or modify my opinions in order to please someone else or win favor. T 5. I am not always the person I appear to be.

F 6. My behavior is usually an expression of my true attitudes and beliefs.

F 7. I am not particularly good at making other people like me.

T 8. I can look anyone in the eye and tell a lie.

Scoring indicates responses for high self-monitors. Respondents received a 1 when they agreed with a high self-monitor's response and a 0 when they disagreed.

Page 12: MEASUREMENT ISSUES

QUESTION WORDING-RACIAL PREJUDICE; TERKILDSEN (1993):D. Racial Prejudice (adopted from the General Social Survey): "Please rate black Americans on each scale provided using any number between 1 and 5." A "don't know" option was furnished. The endpoints of the six scales were labeled as follows: Scale reliability was .85. 1. Rich-Poor 2. Intelligent-Unintelligent 3. Hard-working-Lazy 4. Prone to Violence-Not Prone to Violence 5. Prefer to be self-supporting-Prefer to live off welfare 6. Patriotic-Unpatriotic Item four is reverse coded.

Page 13: MEASUREMENT ISSUES

Light-Skinned black candidate

Dark-Skinned black candidate

White candidate

Prejudice -1.5 (1.2)* -4.10 (1.5) *** 1.2 (1.0)

Self-monitor (SM) -.74 (.41) ** -0.25 (.46) -.29 (.31)

Prejudice X SM .69 (2.4) 9.3 (3.5) *** -2.9 (2.4)

# of cases 100 109 109

*** p<.01, one-tailed; ** p<.05, one-tailed test; * p<.10, one-tailed testPrejudice is coded from -.5 to +.5; SM is coded from 0-1.

RACE AND SKIN-TONE OF POLITICAL CANDIDATES, VOTE FOR GOVERNOR ON 1-4 SCALE; TERKILDSEN, 1993

Page 14: MEASUREMENT ISSUES

5. MEASUREMENT OF THE TREATMENT EFFECT• Emotional ads study (Weber); The Campaign Ads Study

(2007) examined the emotional impact of experimentally altered campaign ads on political attitudes and participation.

• 4 ads designed to manipulate anger, anxiety, sadness, enthusiasm

• Respondents complete a battery of emotion questions (3 question / emotion) after the treatment

• In this study, ads have heterogeneous effects and do not alter emotions cleanly. Raises questions about how to assess the effects of the treatment. At a minimum, need to measure the treatment well.

Page 15: MEASUREMENT ISSUES

Enthusiasm Sadness Anxiety Anger0

0.1

0.2

0.3

0.4

0.5

0.6 Reported Anger

Enthusiasm Sadness Anxiety Anger0

0.050.1

0.150.2

0.250.3

0.350.4

0.450.5

Reported Anxiety

MANIPULATION CHECKS-EMOTIONAL ADS; TOP PANEL-SMIS ADULT SAMPLE, BOTTOM PANEL-STUDENTS (WEBER)

Page 16: MEASUREMENT ISSUES

II. USING EXPERIMENTS TO VALIDATE KEY VARIABLES: Racial Resentment, (Feldman and Huddy 2005 )Controversy over the measurement and conception of racial prejudice in political researchA. Overt Prejudice: belief that blacks are inherently inferior to whites.B. New Racism: resentment at the special treatment of blacks.

• symbolic racism (Kinder and Sears); • modern racism (McConahay); • racial resentment (Kinder and Sanders).

Page 17: MEASUREMENT ISSUES

NEW RACISM IS CONTROVERSIAL• It is an excellent predictor of white racial policy attitudes (• But some argue that the items may be too close to the racial policies

they are supposed to predict (e.g., Schuman 2000; Sniderman and Tetlock 1986)

• Conceptualization makes it difficult to distinguish resentment from individualism (Sniderman et al 2000).

Racial Resentment Items 1) “Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without any special favors.”(2) “Over the past few years blacks have gotten less than they deserve.”(3) “It's really a matter of some people not trying hard enough; if blacks would only try harder they could be just as well off as whites.” (4) “Generations of slavery and discrimination have created conditions that make it difficult for blacks to work their way out of the lower class.”

Page 18: MEASUREMENT ISSUES

DATA: NEW YORK STATE RACIAL ATTITUDES SURVEY• RDD telephone interview of New York state residents (late 2000 -2001)• 760 white, non-Hispanic, non-Asian respondents. • Survey conducted by the Center for Survey Research at Stony Brook

University. College Scholarship Experiment. (similar to a program adopted by some universities to replace race-based affirmative action college admissions). Respondents were randomly assigned to one of 8 conditions. • “To what extent do you favor providing special college scholarships

for _____ students who score in the top fifteen percent of their school class, even if their school’s grades are not in the top fifteen percent nationally?”

• The eight conditions referred to white, black, poor white, poor black, middle class white, middle class black, poor, and middle class students.

Page 19: MEASUREMENT ISSUES
Page 20: MEASUREMENT ISSUES

PREDICTIONS CONCERNING RACIAL RESENTMENT : IDEOLOGY OR PREJUDICE? • Racial resentment as prejudice: should only predict

opposition to policies targeted for black students• Racial resentment as ideology: should promote

opposition to programs for all students regardless of race• Or does the meaning of racial resentment vary with left-

right (liberal-conservative) ideology? • Racial for liberals (only affects their opposition to programs

for black students) • Ideological for conservatives (predicts opposition to

program for all students)

Page 21: MEASUREMENT ISSUES

Probability of Support for Scholarships by Racial Resentment: POLITICAL LIBERALS

Prob

abili

ty o

f Sup

port

Racial Resentment

Poor White Poor Black Middle Class White Middle Class Black

0 .5 10

.2

.4

.6

.8

1

Page 22: MEASUREMENT ISSUES

Probability of Support for Scholarships by Racial Resentment: Race-by- Class Conditions POLITICAL CONSERVATIVES

Prob

abili

ty o

f Sup

port

Racial Resentment

Poor White Poor Black Middle Class White Middle Class Black

0 .5 10

.2

.4

.6

.8

1

Page 23: MEASUREMENT ISSUES

III. CROSS-NATIONAL SURVEYS:METHODS TO DEVELOP COMPARABLE QUESTIONS1. Sequential: Developed in one context and then exported to another; survey simply translated without adaptation for another context

• Examples: Eurobaromoeter; usually questions developed in French and English first, and then these questions are translated for other countries

Does not allow for pre-testing in other languages. Other countries are stuck with what ahs been developed initially.

• Example of problems: ISSP problems: could not ask Japanese about whether their earnings were ”just” or “fair” because this is inappropriate in the Japanese context.

Harkness: all items should be carefully exported. It may be easier to discard “bad” items in long psychological batteries because there are others. It may be more difficult in social science questionnaires in which a concept is measured by only one or two items.

Page 24: MEASUREMENT ISSUES

2. Parallel Development:Combine expertise from many countries and develop the survey in a single language

• e.g., ESS which is written and developed in English first; ISSP also is developed by a multicultural group and everyone votes on the final questionnaire

Survey is then subject to multicultural testing before it is finalizedAdvance translation occurs by translating some questions before the questionnaire is completed in order to identify problems. Such translations do not have to be perfect but are designed to bring up obvious problems. Overall, this approach is better than sequential but is time consuming and involves complex coordination

Page 25: MEASUREMENT ISSUES

3. SIMULTANEOUS:Decentering: a draft questionnaire is produced in one language and the final version is produced in two. In the decentering phase specific cultural references are also removed. Typically applied when studying only 2 cultures; ensures that questions are truly comparable. This technique has been used on existing instruments. But it my create very bland itemsAn alternative is to have some core common questions and some country specific; but then these are difficult to compare

Page 26: MEASUREMENT ISSUES

REFERENCESAlwin, Duane F. 2007. Margins of Error: A Study of Reliability in Survey Measurement.” Groves, Robert M. 1989. Survey Errors and Survey Costs. New York: Wiley. Weber, Lavine, FedericoLavineTourangeau, Roger, Lance Rips, and Kenneth Rasinski. 2000. The Psychology of Survey Response. New York: Cambridge University Press.Snyder, Mark, and Steven W. Gangestad. 1986. On the Nature of Self‑Monitoring: Matters of Assessment, Matters of Validity. Journal of Personality and Social Psychology, 51, 125‑139.Feldman, Stanley and Huddy, Leonie. 2005. “Racial Resentment and White Opposition to Race-Conscious Programs: Principles or Prejudice? “American Journal of Political Science, 49 (1): 168-183. Huddy, Leonie and Anna Gunthorsdottir. 2000. The Persuasive Effects of Emotive Visual Imagery: Superficial Manipulation or A Deepening of Conviction? Political Psychology. 21:745-778. Harkness, Janet. 2003. “Questionnaire Translation” In Janet Harkness, Fons J. R. Van De Vijver, and Peter de Mohler. Cross-Cultural Survey Methods. Hoboken, NJ: John Wiley and sons. pp. 35-56. HUDDY, MASON & AAROEShcuman

Page 27: MEASUREMENT ISSUES

Sniderman & Tetlock