INTRODUCING COMPARATIVE (CROSS-CULTURAL) SURVEY …€¦ · especially when Likert-type of...

4/24/2010

1

Guest Lectures to Peking University (Beijing, P.R. China)

Main lecture: Measurement Equivalence (ME) of Paper-and-pencil and Online OrganizationalSurveys

… and after the break…

Additional lecture: Using Ad Hoc Measures For Response Styles. A Cautionary Note.

Prof. Dr. Alain De [email protected]

April 28th 2010

2

INTRODUCING

COMPARATIVE(CROSS-CULTURAL)

SURVEY METHODOLOGY

mailto:[email protected]

4/24/2010

2

3

Typical Research Topics

Guiding principle throughout all research stages: Establishment of (cross-cultural/group) equivalence,that is minimizing (comparative) bias

Research topics:• Sample design• Design of a questionnaire/survey instruments• Translation and adaptation of survey instruments• Pretesting (translated) surveys• Interviewer recruitment, selection and training• Monitoring interviewer quality (e.g. using paradata)• Harmonization of data collection

4

• Harmonization of survey and statistical data (after data collection)

• Statistical adjustment for sample design • Psychometric quality assessment of (multi-item)

scales used, e.g. checking measurement equivalenceof scales across populations

• Quantifying and correcting for biasing effects(e.g. response styles)

• Analysis of survey and statistical data

4/24/2010

3

5

Selection of My Publications

On measurement invariance of scales across nations:

De Beuckelaer, A., Lievens, F., & Swinnen, G. (2007). Measurement equivalence in the

conduct of a global organizational survey across six cultural regions.

Journal of Occupational and Organizational Psychology, 80, 575-600.

On measurement invariance of scales across modes of data collection:

De Beuckelaer, A., & Lievens, F. (2009). Measurement equivalence of paper-and-pencil

and online organizational surveys: A large scale examination in 16 countries.

Applied Psychology. An International Review, 58, 336-361.

6

On the biasing effect of scale items not exhibiting measurement invariance

across (cultural) groups:

De Beuckelaer, A., & Swinnen, G. (in press). Biased latent variable mean comparisons due to

measurement non-invariance: A simulation study. Forthcoming in E. Davidov, P. Schmidt,

& J. Billiet (Eds.), Methods and applications in cross-cultural analysis. Taylor & Francis.

On the quantification and correction for response styles

(one important source of cross-cultural bias):

De Beuckelaer, A., Weijters, B., & Rutten, A. Using ad hoc measures for response styles.

A cautionary note. Forthcoming in Quality and Quantity.

4/24/2010

4

Main Lecture: Measurement Equivalence(ME) of Paper-and-pencil and Online Organizational Surveys

8

Overview

1. Introduction

2. Method

3. Results

4. Reflection

5. Questions and discussion

ME of Paper-and-Pencil and Online SurveysPAPER

4/24/2010

5

9

‘mixed-mode organizational surveys’

Online (OL) and paper-and-pencil (PP) surveying combined in organizational surveys

OL surveys

Advantages: less costly, faster responses, greater flexibility in survey design, widergeographical reach, do not suffer from [human] coding errors, less sensitive to order of question effects, and more complete in terms of information provided(various references to journal papers)

Disadvantages: higher non-response rates, higher probability of dishonest answers, potential technological problems, decreased item reliability (higher measurementerror), possibility of multiple submissions, no full coverage of all occupationalgroups represented within the organization (various references to journal papers)

1. IntroductionPAPER

10

Organizational surveys partly rely on OL surveying because of:

(1) Increased efficiency of the data collection process

(2) Elimination of human coding errors

(3) Cost-reductions


4/24/2010

6

11

Research question

Is mixing OL and PP surveys acceptable from a ‘methodological point of view’?

In other words, is measurement equivalence (ME) between both modes of

data collection ensured?


12

Job level % Online % Paper-and-pencil

Lowest 26.7% 73.3%

Intermediate 76.4% 23.6%

Highest 66.4% 33.6%

Danger of sample bias (e.g., higher-level managers prefer to answer online, whereaslower-level managers may not be able to use computers at work)

Job level and mode of data collection (across all countries)

Direct implication for analyses: ME assessment before and after controlling forjob-level differences


4/24/2010

7

13

Sample

N=52,461 managers; k=16 countries; in alphabetical order:

Australia, Brazil, P.R. China, Czech Republic, France, Germany, Netherlands, Nigeria, Pakistan, Puerto Rico, Russian Federation,Spain, Sweden, UK, US, Vietnam

Overall response rate: 86% (across countries)

Measurement Instrument

Five factors: F1: team commitment (3 items); F2: supervisor support (3 items); F3: goal clarity (3 items); F4: decision making (2 items); F5: environmental and societal responsability (2 items)

Scale: 5-point Likert-type of agreement/disagreement rating scale

2. MethodPAPER

14

Analysis method

Part A: Ordinary covariance-structure (CS) model to check the plausibility of the

hypothesized five-factor model (i.e., construct validity assessment)

Part B: Mean-and Covariance Structure (MACS) Analyses; three nested models:

2. Method

ME model Type of equivalence Implication

Form invariance model

(least restrictive)

Identical pattern of salient

and non-salient factor loadings

Meaning of the factors is ‘roughly the same’

Metric invariance model

Factor loadings identical across OL and PP surveys

The extent to which indicators capture changes in the underlying construct(s) is identical!

Scalar invariance model

(most restrictive)

Factor loadings and indicator intercepts identical across OL and PP surveys

Estimated (mode-specific) factor means scores may be compared in a meaningful way!

PAPER

4/24/2010

8

15

Part A: CB-based model to test the five-factor structure

Test results: Five-factor structure fits reasonably well in the 16 countries under

study

(RMSEA slightly too high in Sweden [.062] and the US [.064];

TLI slightly too low in some countries [but no other problems!])

Implication: All 16 countries will be further examined in part B of the analysis

3. ResultsPAPER

16

Part B: nested MACS models to test for ME across OL and PP surveys

Test results: Scalar invariance of the survey instrument is establishedin all countries but P.R. China, and France.

In P.R. China and France the metric invariance model did fit.

Implication: In most countries (14 out of 16) mixing OL and PP surveysdoes not harm! (i.e., scalar equivalence is established)

3. ResultsPAPER

4/24/2010

9

17

After controlling for job level by means of a ‘matched samples approach’ the data of 13 countries* were re-analyzed. The analyses virtually led to the sameoverall conclusion (i.e., strong support for scalar equivalence). After controllingfor job level scalar invariance across OL and PP surveys was also establishedin France.

*The sample size (after matching) was too small (N<90) in 3 countries(i.e., P.R. China, Puerto Rico, and Vietnam)

3. ResultsPAPER

18

Contribution made

First study to assess ME of an organizational survey across two modes of datacollection (i.e., OL and PP) in a large number of countries (k=16).

Result: good news ! (i.e., mixing these modes is not problematical from a methodological point of view)

Limitations

Results are instrument- and organization-specific! They are not generalizeable toother organizational surveys!

Other modes of data collection (e.g., telephone interviewing) are not considered.

Within-country variations (e.g., ethnic groups) are not considered!

4. ReflectionPAPER

4/24/2010

10

Additional lecture: Using Ad Hoc Measures For Response Styles. A Cautionary Note.

20

Overview

1. Introduction

2. Method

3. Results / interpretation

4. Implications for cross-cultural research

5. Questions and discussion

6. References

Ad Hoc Measures For Response Styles

4/24/2010

11

21

1. Introduction

Response styles (RS) in survey response lead to substantial bias in cross-cultural comparisons,

especially when Likert-type of (agree/disagree) scales are used to rate survey items.

(e.g., Billiet & McClendon, 2000; Smith, 2004; Van Herk et al., 2004; Harzing, 2006 )

Adequate quantification (and correction for RS) is required.

In this study, the focus is on two types of RS which are known to bias cross-cultural

comparisons severely (Cheung & Rensvold, 2000):

Acquiescence (ARS): respondents’ tendency to agree (say yes) regardless of item content

Extreme (ERS): respondents’ tendency to pick the extreme category points

of the rating scale regardless of item content

22

1. Introduction

Many well-cited papers have identified potential determinants of RS?

They tap into individual [I]- level and societal [S]-level variables influencing ARS

and ERS.

For instance

Winkler et al., 1982 [I, ARS]; Greenleaf, 1992, JMR [I, ARS]; Greenleaf, 1992, POQ [I, ERS];

Marin et al., 1992 [I, ARS & ERS], Watson, 1992 [I, ARS];

Johnson et al., 2005 *I & S, ARS & ERS+; Harzing *S, ARS & ERS+ (and some more …)

4/24/2010

12

23

1. IntroductionWhat about adequate quantification of ARS / ERS? .

General principles:

• Based on a set# of items construct ‘indices’ for every RS

(i.e., absolute or relative frequency of prototypical responses *e.g., ‘ARS or ERS answers’+)

• When quantifying ARS, apply a ‘weighting scheme’ (e.g., equal or differential weighting)

for various levels of agreement

#If survey items which are used for substantive purposes (i.e., to measure the theoretical

constructs under study) are also used to derive indices of RS, the measures of RS

are confounded! To control for ARS (but not for ERS) one may design/use ‘balanced

scales’ (see Billiet & McClendon, 2000) but this is often practically not feasible (e.g.,

complexity of the ‘balancing’ task; need to use existing, non-balanced scales).

24

1. Introduction

The danger of confound is present in many# of these earlier studies including

two of the most recent publications (i.e., Johnson et al., 2005; Harzing, 2006)!

#: all studies except for Winkler et al., 1982 (balanced scale [12*2 items]) and Watson, 1992 (separate set of 7 items)

The confound may have led to wrong statistical conclusions regarding the determinants of

response styles, and may explain some conflicting findings across studies / samples (e.g., +

and – effect of education on ARS and ERS [across 4 samples in Marin et al., 1992];

significantly positive and non-significant effect of age on ARS [Winkler et al., 1982 and

Greenleaf, 1992, JMR versus Johnson et al., 2005]).

4/24/2010

13

25

1. IntroductionTo reduce the possible impact of confounding between ‘content’ and (response) ‘style’

one may consider:

Option #1: Select a large number of content-related items which are heterogeneous in

content (i.e., showing low inter-item correlations)

Does this really solve the problem?

Option #2: Quantify RS based on a (maximally heterogeneous, separate) random sample

of survey items drawn from a wide range of multi-item measures (RIRS

approach introduced by Weijters, JAMS, in press; RIRS = Representative

Indicators for Response Styles)

Option #2 is superior to option #1 as the probability of a confound between content

and style is reduced to (about) zero.

26

1. IntroductionNice! But does it really matter?

In this empirical study, we examine between-method convergent validity of:

1. The ‘traditional approach’ (i.e., ‘ad hoc measure’: same items are used to measure

content and style)

2. The RIRS approach (i.e., response style indicators derived from a separate set of

randomly selected [heterogeneous] items)

Failure to establish between-method convergent validity (as manifested by clearly

distinct RS quantifications) will increase our doubt about the adequacy of the

‘traditional approach’.

4/24/2010

14

27

1. IntroductionFor the purpose of statistical testing we define the following

functional null-hypotheses:

H1a: The mean frequency of acquiescent responses (ARS) as calculated by both methods is identical.

H1b: The mean frequency of extreme responses (ERS) as calculated by both methods is identical.

H2a: The correlation between measures of ARS as calculated by both methods is high# and positive.

H2a: The correlation between measures of ARS as calculated by both methods is high# and positive.

#: criteria for the evaluation of correlations:

*.00, .20* = ‘negligible’; *.20, .40* = ‘low’; *.40, .60* = ‘moderate’; *.60, .80* = ‘marked’;

[.80, 1.00+ = ‘high’ (see Franzblau, 1958, Ch. 7)

28

2. MethodSurvey Measures Used (questionnaire in English)

Part 1.

To mimic use of the ‘traditional approach’ (all items are used to measure both content and style):

Fifteen survey items were taken from the Food-Related Lifestyle (FRL) measure

(see Grunert et al., 1993; Scholderer et al., 2004). These items comprise the following dimensions:

Dimension #1: consumption situation: snacks versus meals (3 items);

Dimension #2: consumption situation: social event (3 items);

Dimension #3: purchasing motives: self-fulfilment in food (3 items);

Dimension #4: purchasing motives: security (3 items);

Dimension #5: purchasing motives: social relationships (3 items).

All items are measured on 5-point Likert type of agree/disagree scales; the applicability of the FRL

measure in The Netherlands has been demonstrated by Laros (2006).

4/24/2010

15

29

2. MethodPart 2.

To mimic use of the ‘RIRS approach’:

Thirty individual items (in our case: two sets of 15 items) were randomly selected from the

multi-item measures presented in Handbook of Marketing Scales (Bearden & Netemeyer,

1999), and split into two sets of 15 items (odd and even numbered items).

RS (ARS and ERS) were quantified twice (i.e., separately for the 2 sets of items)

(to control for possible differential effects [if present] of boredom and fatigue about half of the respondents

completed part 1 before part 2, whereas the other half completed part 2 before part 1)

At the end of the survey, some demographical variables such as age and sex were collected

(i.e., Part 3 of the survey).

30

2. Method

Sample

N=150 students (Radboud University - Mgt. School) participating in English-taught mgt.

programs (sufficient mastery of the English-language)

Some statistics:

on average 23½ years of age (SD=1.8)

about 60% of females and 40% of males

4/24/2010

16

31

Table 1: Mean proportion of ARS and ERS responses

3. Results

Mean proportion

of

Traditional

approach

RIRS approach Binomial test on

equal mean

proportions

ARS responses .432 .424 (odd items) p=.57

.431 (even items) p=.93

ERS responses .274 .208 (odd items) p<.01

.189 (even items) p<.01

Notes. ARS quantification: ‘agree’ and ‘very much agree’ responses are equally weighted; the basis for calculating

proportions is 2,250 (15 items * 150 respondents) for: (1) the traditional approach, (2) the RIRS approach – odd,

and (3) the RIRS approach – even.

32

3. Results

Conclusions from Table 1

H1A is NOT rejected

H1b is rejected

As far as the mean proportion of ERS responses is concerned,

between-method convergent validity is NOT established.

4/24/2010

17

33

Table 2: Pearson correlation of ARS and ERS responses (across methods)

3. Results

Pearson

correlation

RIRS approach

- Odd items

RIRS approach

- Even items

Traditional

approach ARS responses .184 [p=.024]

(negligible

[almost low])

.046 [p=.572]

(negligible)

ERS responses .605 [p=.000]

(marked)

.481 [p=.000]

(moderate)

Important note: The Pearson correlation between “RIRS – Odd”

and “RIRS – Even” is .141 (negligible) for ARS and .747 (marked) for ERS!

34

3. Results

Conclusions from Table 2

H2A is rejected

As far as ARS responses are concerned, between-methodconvergent validity as measured by a Pearson correlation isNOT established.

Even within the RIRS approach ARS quantifications are not stable!

H1b is not rejected

4/24/2010

18

35

3. Results

The decision as to how one quantifies RS seems to be

a crucial one …

… as low levels of convergent validity are obtained between …

the traditional approach of using ‘ad hoc’ measures

and

the more sophisticated RIRS approach

36

3. Results / Interpretation

It is plausible that:

ARS is mainly determined by the interaction of respondent

and item characteristics; this would explain the instability

of ARS quantifications (even within the RIRS method!)

This finding is in line with Greenleaf (1992, JMR) who does

not find ARS that generalizes across different pairs of attitudes

and behaviors.

4/24/2010

19

37

3. Results / Interpretation

ERS is mainly determined by the interaction of

respondent and response categories

(which were fixed in this study).

This interpretation is in line with Arce-Ferrer (2006) who relates the tendency

to pick the extreme category points of a rating scale with the meaning that

respondents attach to the response categories.

38

This empirical study casts further doubt on the validity of research findings

as obtained in earlier research on the determinants of RS.

So,

“Can we trust the outcomes of earlier studies, in particular those in

which ‘ad hoc’ measures were used to quantify RS?”

is still a legitimate question to address!

4. Implications for cross-cultural research

4/24/2010

20

39

5. Questions & discussion

INTRODUCING COMPARATIVE (CROSS-CULTURAL) SURVEY …€¦ · especially when Likert-type of...

Documents

Transcript of INTRODUCING COMPARATIVE (CROSS-CULTURAL) SURVEY …€¦ · especially when Likert-type of...