INTRODUCING COMPARATIVE (CROSS-CULTURAL) SURVEY …€¦ · especially when Likert-type of...
Transcript of INTRODUCING COMPARATIVE (CROSS-CULTURAL) SURVEY …€¦ · especially when Likert-type of...
4/24/2010
1
Guest Lectures to Peking University (Beijing, P.R. China)
Main lecture: Measurement Equivalence (ME) of Paper-and-pencil and Online OrganizationalSurveys
… and after the break…
Additional lecture: Using Ad Hoc Measures For Response Styles. A Cautionary Note.
Prof. Dr. Alain De [email protected]
April 28th 2010
2
INTRODUCING
COMPARATIVE(CROSS-CULTURAL)
SURVEY METHODOLOGY
4/24/2010
2
3
Typical Research Topics
Guiding principle throughout all research stages: Establishment of (cross-cultural/group) equivalence,that is minimizing (comparative) bias
Research topics:• Sample design• Design of a questionnaire/survey instruments• Translation and adaptation of survey instruments• Pretesting (translated) surveys• Interviewer recruitment, selection and training• Monitoring interviewer quality (e.g. using paradata)• Harmonization of data collection
4
• Harmonization of survey and statistical data (after data collection)
• Statistical adjustment for sample design • Psychometric quality assessment of (multi-item)
scales used, e.g. checking measurement equivalenceof scales across populations
• Quantifying and correcting for biasing effects(e.g. response styles)
• Analysis of survey and statistical data
4/24/2010
3
5
Selection of My Publications
On measurement invariance of scales across nations:
De Beuckelaer, A., Lievens, F., & Swinnen, G. (2007). Measurement equivalence in the
conduct of a global organizational survey across six cultural regions.
Journal of Occupational and Organizational Psychology, 80, 575-600.
On measurement invariance of scales across modes of data collection:
De Beuckelaer, A., & Lievens, F. (2009). Measurement equivalence of paper-and-pencil
and online organizational surveys: A large scale examination in 16 countries.
Applied Psychology. An International Review, 58, 336-361.
6
On the biasing effect of scale items not exhibiting measurement invariance
across (cultural) groups:
De Beuckelaer, A., & Swinnen, G. (in press). Biased latent variable mean comparisons due to
measurement non-invariance: A simulation study. Forthcoming in E. Davidov, P. Schmidt,
& J. Billiet (Eds.), Methods and applications in cross-cultural analysis. Taylor & Francis.
On the quantification and correction for response styles
(one important source of cross-cultural bias):
De Beuckelaer, A., Weijters, B., & Rutten, A. Using ad hoc measures for response styles.
A cautionary note. Forthcoming in Quality and Quantity.
4/24/2010
4
Main Lecture: Measurement Equivalence(ME) of Paper-and-pencil and Online Organizational Surveys
8
Overview
1. Introduction
2. Method
3. Results
4. Reflection
5. Questions and discussion
ME of Paper-and-Pencil and Online SurveysPAPER
4/24/2010
5
9
‘mixed-mode organizational surveys’
Online (OL) and paper-and-pencil (PP) surveying combined in organizational surveys
OL surveys
Advantages: less costly, faster responses, greater flexibility in survey design, widergeographical reach, do not suffer from [human] coding errors, less sensitive to order of question effects, and more complete in terms of information provided(various references to journal papers)
Disadvantages: higher non-response rates, higher probability of dishonest answers, potential technological problems, decreased item reliability (higher measurementerror), possibility of multiple submissions, no full coverage of all occupationalgroups represented within the organization (various references to journal papers)
1. IntroductionPAPER
10
Organizational surveys partly rely on OL surveying because of:
(1) Increased efficiency of the data collection process
(2) Elimination of human coding errors
(3) Cost-reductions
1. IntroductionPAPER
4/24/2010
6
11
Research question
Is mixing OL and PP surveys acceptable from a ‘methodological point of view’?
In other words, is measurement equivalence (ME) between both modes of
data collection ensured?
1. IntroductionPAPER
12
Job level % Online % Paper-and-pencil
Lowest 26.7% 73.3%
Intermediate 76.4% 23.6%
Highest 66.4% 33.6%
Danger of sample bias (e.g., higher-level managers prefer to answer online, whereaslower-level managers may not be able to use computers at work)
Job level and mode of data collection (across all countries)
Direct implication for analyses: ME assessment before and after controlling forjob-level differences
1. IntroductionPAPER
4/24/2010
7
13
Sample
N=52,461 managers; k=16 countries; in alphabetical order:
Australia, Brazil, P.R. China, Czech Republic, France, Germany, Netherlands, Nigeria, Pakistan, Puerto Rico, Russian Federation,Spain, Sweden, UK, US, Vietnam
Overall response rate: 86% (across countries)
Measurement Instrument
Five factors: F1: team commitment (3 items); F2: supervisor support (3 items); F3: goal clarity (3 items); F4: decision making (2 items); F5: environmental and societal responsability (2 items)
Scale: 5-point Likert-type of agreement/disagreement rating scale
2. MethodPAPER
14
Analysis method
Part A: Ordinary covariance-structure (CS) model to check the plausibility of the
hypothesized five-factor model (i.e., construct validity assessment)
Part B: Mean-and Covariance Structure (MACS) Analyses; three nested models:
2. Method
ME model Type of equivalence Implication
Form invariance model
(least restrictive)
Identical pattern of salient
and non-salient factor loadings
Meaning of the factors is ‘roughly the same’
Metric invariance model
Factor loadings identical across OL and PP surveys
The extent to which indicators capture changes in the underlying construct(s) is identical!
Scalar invariance model
(most restrictive)
Factor loadings and indicator intercepts identical across OL and PP surveys
Estimated (mode-specific) factor means scores may be compared in a meaningful way!
PAPER
4/24/2010
8
15
Part A: CB-based model to test the five-factor structure
Test results: Five-factor structure fits reasonably well in the 16 countries under
study
(RMSEA slightly too high in Sweden [.062] and the US [.064];
TLI slightly too low in some countries [but no other problems!])
Implication: All 16 countries will be further examined in part B of the analysis
3. ResultsPAPER
16
Part B: nested MACS models to test for ME across OL and PP surveys
Test results: Scalar invariance of the survey instrument is establishedin all countries but P.R. China, and France.
In P.R. China and France the metric invariance model did fit.
Implication: In most countries (14 out of 16) mixing OL and PP surveysdoes not harm! (i.e., scalar equivalence is established)
3. ResultsPAPER
4/24/2010
9
17
After controlling for job level by means of a ‘matched samples approach’ the data of 13 countries* were re-analyzed. The analyses virtually led to the sameoverall conclusion (i.e., strong support for scalar equivalence). After controllingfor job level scalar invariance across OL and PP surveys was also establishedin France.
*The sample size (after matching) was too small (N<90) in 3 countries(i.e., P.R. China, Puerto Rico, and Vietnam)
3. ResultsPAPER
18
Contribution made
First study to assess ME of an organizational survey across two modes of datacollection (i.e., OL and PP) in a large number of countries (k=16).
Result: good news ! (i.e., mixing these modes is not problematical from a methodological point of view)
Limitations
Results are instrument- and organization-specific! They are not generalizeable toother organizational surveys!
Other modes of data collection (e.g., telephone interviewing) are not considered.
Within-country variations (e.g., ethnic groups) are not considered!
4. ReflectionPAPER
4/24/2010
10
Additional lecture: Using Ad Hoc Measures For Response Styles. A Cautionary Note.
20
Overview
1. Introduction
2. Method
3. Results / interpretation
4. Implications for cross-cultural research
5. Questions and discussion
6. References
Ad Hoc Measures For Response Styles
4/24/2010
11
21
1. Introduction
Response styles (RS) in survey response lead to substantial bias in cross-cultural comparisons,
especially when Likert-type of (agree/disagree) scales are used to rate survey items.
(e.g., Billiet & McClendon, 2000; Smith, 2004; Van Herk et al., 2004; Harzing, 2006 )
Adequate quantification (and correction for RS) is required.
In this study, the focus is on two types of RS which are known to bias cross-cultural
comparisons severely (Cheung & Rensvold, 2000):
Acquiescence (ARS): respondents’ tendency to agree (say yes) regardless of item content
Extreme (ERS): respondents’ tendency to pick the extreme category points
of the rating scale regardless of item content
22
1. Introduction
Many well-cited papers have identified potential determinants of RS?
They tap into individual [I]- level and societal [S]-level variables influencing ARS
and ERS.
For instance
Winkler et al., 1982 [I, ARS]; Greenleaf, 1992, JMR [I, ARS]; Greenleaf, 1992, POQ [I, ERS];
Marin et al., 1992 [I, ARS & ERS], Watson, 1992 [I, ARS];
Johnson et al., 2005 *I & S, ARS & ERS+; Harzing *S, ARS & ERS+ (and some more …)
4/24/2010
12
23
1. IntroductionWhat about adequate quantification of ARS / ERS? .
General principles:
• Based on a set# of items construct ‘indices’ for every RS
(i.e., absolute or relative frequency of prototypical responses *e.g., ‘ARS or ERS answers’+)
• When quantifying ARS, apply a ‘weighting scheme’ (e.g., equal or differential weighting)
for various levels of agreement
#If survey items which are used for substantive purposes (i.e., to measure the theoretical
constructs under study) are also used to derive indices of RS, the measures of RS
are confounded! To control for ARS (but not for ERS) one may design/use ‘balanced
scales’ (see Billiet & McClendon, 2000) but this is often practically not feasible (e.g.,
complexity of the ‘balancing’ task; need to use existing, non-balanced scales).
24
1. Introduction
The danger of confound is present in many# of these earlier studies including
two of the most recent publications (i.e., Johnson et al., 2005; Harzing, 2006)!
#: all studies except for Winkler et al., 1982 (balanced scale [12*2 items]) and Watson, 1992 (separate set of 7 items)
The confound may have led to wrong statistical conclusions regarding the determinants of
response styles, and may explain some conflicting findings across studies / samples (e.g., +
and – effect of education on ARS and ERS [across 4 samples in Marin et al., 1992];
significantly positive and non-significant effect of age on ARS [Winkler et al., 1982 and
Greenleaf, 1992, JMR versus Johnson et al., 2005]).
4/24/2010
13
25
1. IntroductionTo reduce the possible impact of confounding between ‘content’ and (response) ‘style’
one may consider:
Option #1: Select a large number of content-related items which are heterogeneous in
content (i.e., showing low inter-item correlations)
Does this really solve the problem?
Option #2: Quantify RS based on a (maximally heterogeneous, separate) random sample
of survey items drawn from a wide range of multi-item measures (RIRS
approach introduced by Weijters, JAMS, in press; RIRS = Representative
Indicators for Response Styles)
Option #2 is superior to option #1 as the probability of a confound between content
and style is reduced to (about) zero.
26
1. IntroductionNice! But does it really matter?
In this empirical study, we examine between-method convergent validity of:
1. The ‘traditional approach’ (i.e., ‘ad hoc measure’: same items are used to measure
content and style)
2. The RIRS approach (i.e., response style indicators derived from a separate set of
randomly selected [heterogeneous] items)
Failure to establish between-method convergent validity (as manifested by clearly
distinct RS quantifications) will increase our doubt about the adequacy of the
‘traditional approach’.
4/24/2010
14
27
1. IntroductionFor the purpose of statistical testing we define the following
functional null-hypotheses:
H1a: The mean frequency of acquiescent responses (ARS) as calculated by both methods is identical.
H1b: The mean frequency of extreme responses (ERS) as calculated by both methods is identical.
H2a: The correlation between measures of ARS as calculated by both methods is high# and positive.
H2a: The correlation between measures of ARS as calculated by both methods is high# and positive.
#: criteria for the evaluation of correlations:
*.00, .20* = ‘negligible’; *.20, .40* = ‘low’; *.40, .60* = ‘moderate’; *.60, .80* = ‘marked’;
[.80, 1.00+ = ‘high’ (see Franzblau, 1958, Ch. 7)
28
2. MethodSurvey Measures Used (questionnaire in English)
Part 1.
To mimic use of the ‘traditional approach’ (all items are used to measure both content and style):
Fifteen survey items were taken from the Food-Related Lifestyle (FRL) measure
(see Grunert et al., 1993; Scholderer et al., 2004). These items comprise the following dimensions:
Dimension #1: consumption situation: snacks versus meals (3 items);
Dimension #2: consumption situation: social event (3 items);
Dimension #3: purchasing motives: self-fulfilment in food (3 items);
Dimension #4: purchasing motives: security (3 items);
Dimension #5: purchasing motives: social relationships (3 items).
All items are measured on 5-point Likert type of agree/disagree scales; the applicability of the FRL
measure in The Netherlands has been demonstrated by Laros (2006).
4/24/2010
15
29
2. MethodPart 2.
To mimic use of the ‘RIRS approach’:
Thirty individual items (in our case: two sets of 15 items) were randomly selected from the
multi-item measures presented in Handbook of Marketing Scales (Bearden & Netemeyer,
1999), and split into two sets of 15 items (odd and even numbered items).
RS (ARS and ERS) were quantified twice (i.e., separately for the 2 sets of items)
(to control for possible differential effects [if present] of boredom and fatigue about half of the respondents
completed part 1 before part 2, whereas the other half completed part 2 before part 1)
At the end of the survey, some demographical variables such as age and sex were collected
(i.e., Part 3 of the survey).
30
2. Method
Sample
N=150 students (Radboud University - Mgt. School) participating in English-taught mgt.
programs (sufficient mastery of the English-language)
Some statistics:
on average 23½ years of age (SD=1.8)
about 60% of females and 40% of males
4/24/2010
16
31
Table 1: Mean proportion of ARS and ERS responses
3. Results
Mean proportion
of
Traditional
approach
RIRS approach Binomial test on
equal mean
proportions
ARS responses .432 .424 (odd items) p=.57
.431 (even items) p=.93
ERS responses .274 .208 (odd items) p<.01
.189 (even items) p<.01
Notes. ARS quantification: ‘agree’ and ‘very much agree’ responses are equally weighted; the basis for calculating
proportions is 2,250 (15 items * 150 respondents) for: (1) the traditional approach, (2) the RIRS approach – odd,
and (3) the RIRS approach – even.
32
3. Results
Conclusions from Table 1
H1A is NOT rejected
H1b is rejected
As far as the mean proportion of ERS responses is concerned,
between-method convergent validity is NOT established.
4/24/2010
17
33
Table 2: Pearson correlation of ARS and ERS responses (across methods)
3. Results
Pearson
correlation
RIRS approach
- Odd items
RIRS approach
- Even items
Traditional
approach ARS responses .184 [p=.024]
(negligible
[almost low])
.046 [p=.572]
(negligible)
ERS responses .605 [p=.000]
(marked)
.481 [p=.000]
(moderate)
Important note: The Pearson correlation between “RIRS – Odd”
and “RIRS – Even” is .141 (negligible) for ARS and .747 (marked) for ERS!
34
3. Results
Conclusions from Table 2
H2A is rejected
As far as ARS responses are concerned, between-methodconvergent validity as measured by a Pearson correlation isNOT established.
Even within the RIRS approach ARS quantifications are not stable!
H1b is not rejected
4/24/2010
18
35
3. Results
The decision as to how one quantifies RS seems to be
a crucial one …
… as low levels of convergent validity are obtained between …
the traditional approach of using ‘ad hoc’ measures
and
the more sophisticated RIRS approach
36
3. Results / Interpretation
It is plausible that:
ARS is mainly determined by the interaction of respondent
and item characteristics; this would explain the instability
of ARS quantifications (even within the RIRS method!)
This finding is in line with Greenleaf (1992, JMR) who does
not find ARS that generalizes across different pairs of attitudes
and behaviors.
4/24/2010
19
37
3. Results / Interpretation
ERS is mainly determined by the interaction of
respondent and response categories
(which were fixed in this study).
This interpretation is in line with Arce-Ferrer (2006) who relates the tendency
to pick the extreme category points of a rating scale with the meaning that
respondents attach to the response categories.
38
This empirical study casts further doubt on the validity of research findings
as obtained in earlier research on the determinants of RS.
So,
“Can we trust the outcomes of earlier studies, in particular those in
which ‘ad hoc’ measures were used to quantify RS?”
is still a legitimate question to address!
4. Implications for cross-cultural research
4/24/2010
20
39
5. Questions & discussion