Carma internet research module: Survey reduction

Survey ReductionTechniques

CARMA Internet Research ModuleJeffrey Stanton

Primary Goal: Reduce Administration Time

Secondary goals– Reduce perceived administration time– Increase the engagement of the respondent with the experience of completing

instrument lock in interest and excitement from the start– Reduce the extent of missing and erroneous data due to carelessness, rushing,

test forms that are hard to use, etc.– Increase the respondents’ ease of experience (maybe even enjoyment!) so that

they will persist to the end AND that they will respond again next year (or whenever the next survey comes out)

Conclusions?– Make the survey SEEM as short and compact as possible– Streamline the WHOLE EXPERIENCE from the first call for participation all the

way to the end of the final page of the instrument– Focus test-reduction efforts on the easy stuff before diving into the nitty-gritty

statistical stuff

2

3

Please ch

oose th

e

option that

most

close

ly fits

how you

describe yo

urself.

Please

select

only one of t

he two

options:Fe

male []

Male []

Instruction ReductionFewer than 4% of respondents make use of printed instructions:

Novick and Ward (2006, ACM-SIGDOC)Comprehension of instructions only influences novice performance on

surveys: Catrambone (1990; HCI)Instructions on average are written five grade levels above average grade

level of respondent; 23% of respondents failed to understand at least one element of instructions: Spandorfer et al. (1993; Annals of EM)

Unless you are working with a special/unusual population, you can assume that respondents know how to complete Likert scales and other common response formats without instructions

Most people don’t read instructions anyway. When they do, the instructions often don’t help them respond any better!

If your response format is so novel that people require instructions, then you have a substantial burden to pilot test, in order to ensure that people comprehend the instructions and respond appropriately. Otherwise, do not take the risk!

4

Archival Demographics

Most survey projects seek to subdivide the population into meaningful groupsgender, race/ethnicity, agemanagers and non-managersexempt and non-exemptpart time and full timeunit and departmental affiliations

Demographic data often comprise one page, 5-15 questions, and 1-3 minutes of administration time per respondent

Self-completed demographic data frequently containing missing fields or intentional mistakes

5

Archival DemographicsFor the sake of anonymity, these data can be de-identified

up front and attached to randomly generated code (alphanumeric) - in other words, have the demographic form contain a code, and that code is matched to the survey.

Respondents should feel like demographics are not serving to identify them in their survey responses.

You could offer respondents two choices: 1) match (or automatically fill in) some/all demographic data using

the code number provided in your invitation email (or on a paper letter);

2) they fill in the demographic data (on web-based surveys, a reveal can branch respondents to the demographics page)

6

EligibilityIf a survey has eligibility requirements, the screening questions should be placed at the earliest possible point in the survey.(eligibility requirements can appear in instructions, but this should not be the sole method of screening out ineligible respondents)Skip LogicSkip logic actually shortens the survey by setting aside questions for which the respondent is ineligible.BranchingBranching may not shorten, but can improve the user experience by offering questions specifically focused to the respondent’s demographic or reported experience.

7

Illustration credit: Vovici.com

Eligibility, Skip Logic, and Branching

Ever answer a survey where you knew that your answer would predict how many questions you would have to answer after that? e.g., “How many hotel chains have you been to in the last year?”If users can predict that their eligibility, the survey skip logic, or survey branching will lead to longer responses, more complex responses, or more difficult or tedious responses, they may:a) Abandon the surveyb) Backup and change their answer to the conditional with less

work (if the interface permits it).Branch design should try not to imply what the user would have experienced in another branch.Paths through the survey should avoid causing considerably more work for some respondents than for others.

8

Implications: Eligibility, Skip Logic, and Branching

Panel Designs and/or Multiple Administration

Panel designs measure the same respondents on multiple occasions.Typically either a) predictors are gathered at an early point in time, and outcomes

gathered at a later point in time, or b) both predictors and outcomes are measured at every time point. c) (There are variations on these two themes).Panel designs are based on maturation and/or intervention processes that require the passage of time. Examples: career aspirations over time, person-organization fit over time, training before/afterMinimally, panel designs can help mitigate (though not solve) the problem of common method bias; e.g., responding to a criterion at time 2, respondents tend to forget how they responded at time 1.

9

Panel Designs and/or Multiple Administration

Survey designers can apply the logic of panel designs to their own surveys:Sometimes, you have to collect a large number of variables (no measure shortening), and it is impractical to do so in a single administration.Generally speaking: Better to have a many short, pleasant survey administrations with a cumulative “work time lost” of an hour vs. long and grinding one hour-long survey.The former can get you happier and less fatigued respondents and better data, hopefully.In the limit, consider the implications of a “Today’s Poll” approach to measuring climate, stress, satisfaction, or other attitudinal variables: One question per day, every day….

10

Unobtrusive Behavioral Observation

Surveys appear convenient and relatively inexpensive in and of themselves…however, the cumulative work time lost across all respondents may be quite large. Methods that assess social variables through observations of overt behavior rather than self report can provide indications of stress, satisfaction, organizational citizenship, intent to quit, and other psychologically and organizationally relevant variables.Examples • Cigarette breaks over time (frequency, # of incumbents per day); • Garbage (weight of trash before/after a recycling program); • Social media usage (tweets, blog posts, Facebook); • Wear of floor tiles• Absenteeism or tardiness records; • Incumbent, team and department production quality and quantity

measures11

Unobtrusive Behavioral Observation

Most unobtrusive observations must be conducted over time:Establish a baseline for the behavior.Examine subsequent time periods to examine changes/trends over time.

Generally, much more labor intensive data collection than surveys.Results should be cross-validated with other types of evidence.

12

Scale Reduction and One-item MeasuresStandard scale construction calls for “sampling the construct domain”

with items that tap into different aspects of the construct with items that refer to various content areas. Scales with more items can include a larger sample of the behaviors or topics relevant to the construct.

13

Item ContentConstruct Domain

RELEVANTmeasuring what

you want measure

CONTAMINATEDmeasuring what you

don’t want to measure

DEFICIENTnot measuring what you want to measure

Scale Reduction and One-item Measures

When fewer items are used, by necessity they must be either– more general in wording to obtain full coverage (hopefully)– more narrow to focus on a subset of behaviors/topics

Internal consistency reliability reinforces this trade-off: As the number of items gets smaller, inter-item correlation must rise to maintain a given level of internal consistency.

However, scales with fewer than 3-5 items rarely achieve acceptable internal consistency without simply becoming alternative wordings of the same questions.

Discussion: How many of you have taken a measure where you were being asked the same question again and again? Your reactions? Why was this done?

The one-item solution: A one-item measure usually “covers” a construct only if is highly non-specific. A one item measure has a measurable reliability (see Wanous & Hudy; ORM, 2001), but the concept of internal consistency is meaningless.

Discuss: A one-item knowledge measure vs. a one-item job satisfaction measure.

14

One-item Measure LiteratureResearch using single item measures of each of the five JDI job satisfaction

facets and found correlations between .60 and .72 to the full length versions of the JDI scalesNagy (2002)

Review of single-item graphical representation scales; so called “faces” scales Patrician (2004)

Single item graphic scale for organizational identificationShamir & Kark (2004)

Research finding that single item job satisfaction scales systematically overestimate workers’ job satisfactionOshagbemi (1999)

Single item measures work best on “homogeneous” constructsLoo (2002)

15

Scale Reduction:Technical Considerations

Items can be struck from a scale based on three different sets of qualities: 1. Internal item qualities refer to properties of items that can be assessed in

reference to other items on the scale or the scale's summated scores. 2. External item qualities refer to connections between the scale (or its

individual items) and other constructs or indicators. 3. Judgmental item qualities refer to those issues that require subjective

judgment and/or are difficult to assess in isolation of the context in which the scale is administered

Literature review suggests that the most widely used method for item selection in scale reduction is some form of internal consistency maximization

Corrected item-total correlations provide diagnostic information about internal consistency. In scale reduction efforts, item-total correlations have been employed as a basis for retaining items for a shortened scale version

Factor analysis is another technique that, when used for scale reduction, can lead to increased internal consistency, assuming one chooses items that load strongly on a dominant factor

16

Scale Reduction II

Despite their prevalence, there are important limitations of scale reduction techniques that maximize internal consistency. Choosing items to maximize internal consistency leads to item

sets highly redundant in appearance, narrow in content, and potentially low in validity

High internal consistency often signifies a failure to adequately sample content from all parts of the construct domain

To obtain high values of coefficient alpha, a scale developer need only write a set of items that paraphrase each other or are antonyms of one other. One can expect an equivalent result (i.e., high redundancy) from using the analogous approach in scale reduction, that is, excluding all items but those highly similar in content.

17

Scale Reduction III

• IRT provides an alternative strategy for scale reduction that does not focus on maximizing internal consistency.– One should retain items that are highly discriminating (i.e., moderate to large

values of a) and one should attempt to include items with a range of item thresholds (i.e., b) that adequately cover the expected range of the trait in measured individuals

– IRT analysis for scale reduction can be complex and does not provide a definitive answer to the question of which items to retain; rather, it provides evidence for which items might work well together to cover the trait range

• Relating items to external criteria provides a viable alternative to internal consistency and other internal qualities– Because correlations vary across different samples, instruments, and

administration contexts, an item that predicts an external criterion best in one sample may not do so in another.

– Choosing items to maximize a relation with an external criterion runs the risk of a decrease in discriminant validity between the measures of the two constructs.

18

Scale Reduction IVThe overarching goal of any scale reduction project should be to closely replicate the

pattern of relations established within the construct's nomological network. In evaluating any given item's relations with external criteria, one should seek moderate

correlations with a variety of related scales (i.e., convergent validity) and low correlations with a variety of unrelated measures

Researchers may also need to examine other criteria beyond statistical relations to determine which items should remain in an abbreviated scale.

Clarity of expression, its relevance to a particular respondent population, the semantic redundancy of an item's content with other items, the perceived invasiveness of an item, and an item's "face" validity. Items lacking apparent relevance, or that are highly redundant with other items on the scale, may be viewed negatively by respondents.

To the extent that judgmental qualities can be used to select items with face validity, both the reactions of constituencies and the motivation of respondents maybe enhanced

Simple strategy for retention that does not require IRT analysis: Stepwise regression Rank ordered item inclusion in an "optimal" reduced-length scale that accounts for a nearly

maximal proportion of variance in its own full-length summated scale score. Order of entry into the stepwise regression is a rank order proxy indicating item goodness Empirical results show that this method performs as well as a brute force combinatorial scan of

item combinations; method can also be combined with human judgment to pick items from among the top ranked items (but not in strict ranking order)

19

Trade-offs with Reduced SurveysThe shorter the survey… the higher the response rate the less work time that is lost? the higher chance that one or more constructs will perform poorly if

the measures are not well established/developed? less information might be obtained about each respondent and their

score on a given construct? have to sell its meaningfulness to decision makers who will act on the

results

20

BibliographyBinning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential and evidential bases. Journal of Applied Psychology, 74, 478-494.Catrambone, R. (1990). Specific versus general procedures in instructions. Human-Computer Interaction, 5, 49-93.Dillman, D. A., Smyth, J. D., & Christian, L. M. (2008). Internet, mail, and mixed-mode surveys: The tailored design method. Hoboken, NJ: Wiley.Donnellan, M. B., Oswald, F. L., Baird, B. M., & Lucas, R. E. (2006). The Mini-IPIP scales: Tiny-yet-effective measures of the Big Five factors of personality. Psychological Assessment, 18, 192-203.Emons, W. H. M., Sijtsma, K., & Meijer, R. R. (2007). On the consistency of classification using short scales. Psychological Methods, 12, 105-12.Girard, T. A., & Christiansen, B. K. (2008). Clarifying problems and offering solutions for correlated error when assessing the validity of selected-subtest short forms. Psychological Assessment, 20, 76-8.Hinkin, T. R. (1995). A review of scale development practices in the study of organizations. Journal of Management, 21, 967-988.Levy, P. (1968). Short-form tests: A methodological review. Psychological Bulletin, 6, 410-416.Loo, R. (2002). A caveat on using single-item versus multiple-item scales. Journal of Managerial Psychology, 17, 68-75.Lord, F. M. (1965). A strong true-score theory, with applications. Psychometrika, 3, 239-27.Nagy, M. S. (2002). Using a single item approach to measure facet job satisfaction. Journal of Occupational and Organizational Psychology, 75, 77-86.Novick, D. G., & Ward, K. (2006). Why don't people read the manual? Paper presented at the SIGDOC '06 Proceedings of the 24th Annual ACM International Conference on Design of Communication.Oshagbemi, T. (1999). Overall job satisfaction: how good are single versus multiple-item measures? Journal of Managerial Psychology, 14, 388-403.Patrician, P. A. (2004). Single-item graphic representational scales. Nursing Research, 53, 347-352.Shamir, B., & Kark, R. (2004). A single item graphic scale for the measurement of organizational identification. Journal of Occupational and Organizational Psychology, 77, 115-123.

21

Bibliography (Continued)Smith, G. T., McCarthy, D. M., & Anderson, K. G. (2000). On the sins of short form development. Psychological Assessment, 12, 102-111.S pandorfer, J. M., Karras, D. J., Hughes, L. A., & Caputo, C. (1995). Comprehension of discharge instructions by patients in an urban emergency department. Annals of Emergency Medicine, 25, 71-74.Stanton, J. M., Sinar, E., Balzer, W. K., Smith, P. C., (2002). Issues and strategies for reducing the length of self-report scale. Personnel Psychology, 55, 167-194.Wanous, J. P., & Hudy, M. J. (2001). Single-item reliability: A replication and extension. Organizational Research Methods, 4, 361-375.Widaman, K. F., Little, T. D., Preacher, K. J., Sawalani, G. M. (2011). On creating and using short forms of scales in secondary research. In K. H. Trzesniewski, M. B. Donnellan, & R. E. Lucas (Eds.). Secondary data analysis: An introduction for psychologists (pp. 39-61). Washington, DC: American Psychological Association.

22

Carma internet research module: Survey reduction

Education

Transcript of Carma internet research module: Survey reduction