Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good...

22
Impact on Reliability of Mixing Modes in Longitudinal Studies Alexandru Cernat Abstract Mixed mode designs are increasingly important in surveys. Furthermore, large longitudinal studies are either moving or considering such a design. In this context our knowledge of the impact mixing modes on data quality indicators in longitudinal studies is absent. This study tries to ameliorate this situation by taking the advantage of a quasi-experimental design in a longitudinal survey. Using a model that estimates reliability for repeated measures, quasi-simplex models, 35 variables are analysed by comparing a unimode CAPI design to a sequential CATI-CAPI one. Results show no differences in reliabilities and stabilities across mixed modes wither in the wave when the switch was made or in subsequent waves. Implications and limitation are discussed. Surveys have become an influential presence in modern society, being essential for policy, academic and marketing research and mass-media. In this context, the dropping response rates are threatening external validity (Tourangeau, 2004). In response to this data collecting agencies are looking both to old solutions, such as increasing the number of contacts, and for newer ones, such as tailoring design (Dillman et al., 2008) or using social media (Groves, 2011). These procedures inevitably increase the price and the complexity of the survey process. In parallel, the economic downturn adds pressure on survey agencies to decrease the overall price of surveys. Mixing modes has an essential role in this process as it potentially offers a solution to decreasing overall cost without threatening data quality. This is done by maximizing cheaper modes of response while using the more expensive ones in order to interview the hard to contact or unwilling respondents. In addition, mixing modes may lead to different cover- age and non-response biases that, theoretically, may compensate each other. But although mixing modes offers a good theoretical solution to saving costs its impact on data quality, especially measurement, is still marred with unknowns. Similar social pressures are made on longitudinal studies, thus, they are also considering mixing modes as a solution. The British Cohort Studies (e.g., National Child Development Study) and Understanding Society are such examples (Couper, 2012), the former already collecting data using mixed modes while latter is considering it. Unfortunately there are still many unknowns regarding mixing modes in this context. One important risk for this survey design in longitudinal studies is the potential to increase of long-term attrition (Lynn, 2012) and its subsequent impact both on external validity and power. Additionally, mixing modes can lead to (different) measurement bias. This may, in turn, cause measurement inequivalence compared both with previous waves and different modes. 1

Transcript of Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good...

Page 1: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

Impact on Reliability of Mixing Modes in LongitudinalStudies

Alexandru Cernat

Abstract

Mixed mode designs are increasingly important in surveys. Furthermore, largelongitudinal studies are either moving or considering such a design. In this contextour knowledge of the impact mixing modes on data quality indicators in longitudinalstudies is absent. This study tries to ameliorate this situation by taking the advantageof a quasi-experimental design in a longitudinal survey. Using a model that estimatesreliability for repeated measures, quasi-simplex models, 35 variables are analysed bycomparing a unimode CAPI design to a sequential CATI-CAPI one. Results show nodifferences in reliabilities and stabilities across mixed modes wither in the wave whenthe switch was made or in subsequent waves. Implications and limitation are discussed.

Surveys have become an influential presence in modern society, being essential for policy,academic and marketing research and mass-media. In this context, the dropping responserates are threatening external validity (Tourangeau, 2004). In response to this data collectingagencies are looking both to old solutions, such as increasing the number of contacts, andfor newer ones, such as tailoring design (Dillman et al., 2008) or using social media (Groves,2011). These procedures inevitably increase the price and the complexity of the surveyprocess. In parallel, the economic downturn adds pressure on survey agencies to decreasethe overall price of surveys.

Mixing modes has an essential role in this process as it potentially offers a solution todecreasing overall cost without threatening data quality. This is done by maximizing cheapermodes of response while using the more expensive ones in order to interview the hard tocontact or unwilling respondents. In addition, mixing modes may lead to different cover-age and non-response biases that, theoretically, may compensate each other. But althoughmixing modes offers a good theoretical solution to saving costs its impact on data quality,especially measurement, is still marred with unknowns.

Similar social pressures are made on longitudinal studies, thus, they are also consideringmixing modes as a solution. The British Cohort Studies (e.g., National Child DevelopmentStudy) and Understanding Society are such examples (Couper, 2012), the former alreadycollecting data using mixed modes while latter is considering it. Unfortunately there arestill many unknowns regarding mixing modes in this context. One important risk for thissurvey design in longitudinal studies is the potential to increase of long-term attrition (Lynn,2012) and its subsequent impact both on external validity and power. Additionally, mixingmodes can lead to (different) measurement bias. This may, in turn, cause measurementinequivalence compared both with previous waves and different modes.

1

Page 2: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

The relative lack of research regarding the impact of mixing modes on reliability is also anissue that becomes even more important for longitudinal studies. Thus, while cross-sectionaldata concentrates on bias of the modes this represents only a part of the measurement issue.Different reliabilities in mixed-modes may be a threat to the longitudinal comparability ofpanel studies.

Furthermore, reliability is an essential component of overall validity (Lord and Novick,1968) as it attenuates the relationship with other criterion variables. Empirically distinguish-ing between these two concepts would help us both understand the processes resulting frommixing modes and find possible solutions to minimize the differences across mode designs.

The present paper aims to tackle part of these issues by analysing the impact of mixingmodes on data quality in a longitudinal study using a quasi-experimental design. The Un-derstanding Society Innovation Panel (USIP), a national representative longitudinal studyaimed at conducting methodological experiments, included a mixed mode design in its secondwave. Here a sequential mixed mode design using Computer Assisted Telephone Interview(CATI) – Computer Assisted Personal Interview (CAPI) was randomly allocated and com-pared to a CAPI unimode design. This context will give the opportunity to use models thattake advantage of the longitudinal character of the data (i.e., Quasi–Markov Simplex Models(QMSM) and Latent Markov Chains (LMC)) in order to compare the reliability of the twomode designs. In this context we define to reliability as the proportion of variance of theobserved items that is due to the true score, as opposed to random error(Lord and Novick,1968).

1 Theory and hypothesis

1.1 Impact of mixing modes and reliability

Mixing modes in surveys is becoming an increasingly important topic as it may offer some ofthe solutions needed in the present context. Thus, there are three main reasons why mixingmodes is attractive. Firstly, it can decrease coverage error if the different modes reachdifferent populations. A similar effect is obtained by minimizing non-response error. This isdone by starting with a cheaper mode and sequentially using the more expensive modes toconvert the hard to contact or unwilling respondents (De Leeuw, 2005). This would result inmore representative samples as people who would not be reached by a certain mode wouldbe included in the survey by using the other one. By using a combination of modes it isalso believed that we could reduce costs by interviewing as many people possible with thecheaper modes.

Modes can be mixed at various stages of the survey in order to achieve different goals.De Leeuw (2005) highlights three essential stages when these can be implemented: recruit-ment, response and follow-up. The most important one for our purposes is the second one,the mode used in this phase leading to the most important measurement effects. By com-bining these phases and the types of mode results in a wide variety of possible approachesthat try to minimize costs and minimize nonresponse and measurement bias.

Although mixing modes is attractive for the reasons listed above they also introduce het-erogeneity that can effect data quality and substantial results. A large number of studies have

2

Page 3: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

tried to compare the modes and explain the differences found between them but there are stillmany unknowns regarding the mechanisms through which these appear. Tourangeau et al.(2000) provide one possible framework for understanding these. They propose three mainpsychological mechanisms through which modes lead to different responses. The first oneis impersonality and it is effected by the respondents’ perceived risk of exposing themselvesdue to the presence of others. The second dimension is perceived legitimacy of the surveyand of the interviewer while the last one is the cognitive burden that each mode inflicts onthe respondent.

When evaluating the impact of mixing modes on measurement usually the analysis con-centrates either on missing data or on response styles such as acquiescence, primacy/recencyor non-differentiation (Roberts, 2007; Betts and Lound, 2010; Dex and Gumy, 2011, for anoverview). While response styles are important reliability is an aspect that is often ignoredin the mixed mode literature. As mentioned in the introduction, reliability is an importantpart of overall validity of the measurement (Lord and Novick, 1968) as it can attenuate therelationship with other (criterion) variables. Thus, many of the differences in covariancesthat can be found between mode designs may be due to different percentage of randomerror rather than bias per se. This may prove to be an important distinction if we aim tounderstand the mechanisms that are leading to biased responses in different mode designs.

Furthermore, reliability is essential for longitudinal surveys. If different mode designs areimplemented during the lifetime of a panel study the different reliability coefficients acrossmodes can lead to artificial increase or decrease of change. These, in turn, having effects onthe substantial results provided by the data. Understanding the level of reliability and thedifferences between modes on this indicator would help us comprehend to what degree thisis an important issue.

Considering the present theoretical framework the reliability of the data can be influencedby four distinct factors. Three of these are driven, at least partially, by the interviewer whileone is due to the overall data collection process. Thus, the first is due to the direct effect ofcollecting data in an alternative mode that increases the respondent burden while decreas-ing motivation. An example of this is CATI which uses only the auditory communicationchannel, this increasing the burden on the respondent (De Leeuw, 2005). In addition, thedistance to the interviewer, both physical and social, means that the respondent is less in-vested in the completion of the questionnaire this leading to lower quality data and moredrop-offs. Telephone interviews have also on average a shorter duration compared to CAPI,this causing to additional cognitive burden. All these effects can lead to increase numberof mistakes when responding to questionnaires using CATI and thus to different degrees ofreliability across modes.

Linked with this is the process of satisficing (Krosnick, 1991). In this framework respon-dents who have lost motivation to complete the questionnaire in an optimized way will chooseto bypass some of the mental steps needed in the response process. Satisficing can be eitherweak, such as selection of first category or acquiescence, or strong, like social desirabilityor the random coin flip (Krosnick, 1991). The last of these effects will introduce randomerror due to motivation, burden and/or respondent ability. Considering the literature thatexplains differences between modes we expect those that use only the auditory channel toproduce both the higher levels of burden and the lower levels of motivation compared tothose that use both auditory and visual modes. Thus, we would expect an increase in ran-

3

Page 4: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

dom error and, as a result, an decrease in reliability through the effect of the mental randomcoin flip as well.

The third mechanism through which reliability can be influenced by mixing modes in lon-gitudinal studies is panel conditioning. This is the process through which subjects changetheir responses due to the exposure to repeated measures in time. This results in increasereliability and stability of items and decrease of item nonresponse (Sturgis et al., 2009;Chang and Krosnick, 2009). Thus, changing the mode of interview may lead to decreaseof this effect if the mode change leads to the practice of a different cognitive task. If this istrue then the reliability for the mixed mode design should be smaller in subsequent waves.

The last factor leading to lower reliability in a mixed-mode design is the overall increase ofthe survey complexity. This, in turn, can lead to increase in errors both during the fieldworkand during the processing of the data. If this is true then we would expect differences inreliability between the two mode designs especially in the waves when we have multiplemodes and less so in subsequent waves. Table 1 summarizes these effects.

Table 1: Mixed modes effects on reliability

Cause Mechanism Waves affectedDirect Burden and motivation When modes are mixedStrong satisficing Mental coin flip When modes are mixedPanel conditioning Changing cognitive tasks Subsequent wavesSurvey complexity Errors in data collection and processing When modes are mixed

Relatively few studies have concentrated on quality indicators like reliability or validityin the mixed modes literature (Jackle et al., 2006; Chang and Krosnick, 2009; Revilla, 2010,2011; Vannieuwenhuyze and Revilla). For example Revilla (2010) has found low mean dif-ferences in the reliabilities of items measuring dimensions such as political trust, social trustor satisfaction using an MTMM design. The highest difference was found between CATIand Computer Assisted Web Interview designs in the political trust model. Unfortunatelythese results are confounded with selection effects. A similar approach was applied usingan instrumental variable that aimed to bypass this issue (Vannieuwenhuyze and Revilla).Although some methodological limitations remain initial results show low to medium mea-surement effects and bigger selection effects. Similarly a difference in reliability was found byChang and Krosnick (2009) when comparing an telephone design with two internet panels.The present analysis will contribute to the literature by adding a new analytical model thattakes advantage of the longitudinal data and offers an estimation of reliability.

1.2 Reliability in panel data

In order to evaluate the effect of the mixed mode design on the data quality we will con-centrate on evaluating the impact of reliability. Using Classical Test Theory (CTT) wecan define the reliability as the percentage of variance of the observed variable that is dueto the true score as opposed to variance caused by random error (Lord and Novick, 1968).

4

Page 5: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

There are a number of models that aim to separate random measurement and true scoressuch as Multitrait–Multimethod (Campbell and Fiske, 1959), multiple indicators Confirma-tory Factor Analysis (Bollen, 1989, CFA) or the Quasi–Markov Simplex Model (Heise, 1969;Wiley and Wiley, 1970; Alwin, 2007).

Considering the characteristics of our data, four waves of panel data, we concentrate onthe strand of literature that tries to explain reliability using repeated measures as opposedto multiple items (Alwin, 2007). A first attempt of assessing reliability using these kinds ofmeasures was made by Lord and Novick (1968) who highlighted that by using two parallelmeasures we could estimate reliability. By parallel measures the authors referred to measuresthat have equal true scores and the variances of the random errors are equal. In such acase the correlation between the two measures is a correct estimation of reliability. As theauthors highlight themselves (Lord and Novick, 1968, p. 134) the model assumes the absenceof memory, practice, fatigue or change in true scores. Especially the latter and the formermake this estimation of reliability unfeasible for most social science applications.

In order to overcome the assumptions of the test-retest approach a series of models thattake into account the change in time of the true scores were put forward. They usuallyassume an autoregressive change in time where the true score Ti is influenced only by Ti−1and no other previous measures. As a result, these models need at least three waves to beidentified. In addition, they still need to make the assumption of equal variance of randomerror in order to be be computed (Wiley and Wiley, 1970; van de Pol and Langeheine, 1990).On the other hand they offer two important advantages (Alwin, 2007, p. 103). Firstly, theyare able to separate random error from specific variance of the true score. Secondly, undercertain conditions, they can rule out systematic error as long as it is not stable in time.

In the next subsections we will present two such models. While they are conceptuallysimilar, imposing comparable assumptions and leading to estimates of reliability, they aredeveloped from distinct statistical traditions and for different types of variables. Thus,while QMSM can be used for continuous and ordinal variables by considering the true scorecontinuous the LMC model has been developed to deal with categorical variables and viewsthe true scores as discrete.

1.2.1 Quasi–Markov Simplex Model

The QMSM is composed of two parts. The first one, the measurement component, is basedon CTT, and assumes that the observed score Ai is caused by a true score, Ti, and randommeasurement error, εi. The impact of the true score on the observed variable is estimatedwith a regression slope λii. The relationships in the case of a four waves model:

A1 = λ11T1 + ε1 (1)

A2 = λ22T2 + ε2 (2)

A3 = λ33T3 + ε3 (3)

A4 = λ44T4 + ε4 (4)

In addition to the measurement part the model includes a structural dimension whichmodels the relationships between the true scores. As a result of the auto–regressive (Markov

5

Page 6: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

Figure 1: Quasi–Markov Simplex Model for four waves

T1 T2

β21T3

β32T4

β42

A1

λ11

A2

λ22

A3

λ33

A4

λ44

ε1 ε2 ε3 ε4

d2 d3 d4

simplex) change in time of the true scores we have the following equations:

T2 = β21T1 + d2 (5)

T3 = β32T2 + d3 (6)

T4 = β43T3 + d4 (7)

Where βi,i−1 is the regression slope of Ti−1 on Ti and di is the disturbance term. Theformer can be interpreted as stability in time of the true score while the latter can also beinterpreted as the specific variance of the true score at each wave. The model can be seenin Figure 1.

In order to identify the model we need to make two assumptions. The first one constrainsunstandardized λii to be equal to 1:

λ11 = λ22 = λ33 = λ44 = 1 (8)

In addition, we constrain the variance of the random errors, θi, to be equal in time(Wiley and Wiley, 1970)

θ1 = θ2 = θ3 = θ4 = θ (9)

Although the two assumptions have two different roles they are both needed for identifi-cation purposes. The first one (8) is necessary in order to give a scale to the latent variables(Bollen, 1989) and is standard practice in the CFA framework. The second assumption (9)was proposed by Wiley and Wiley (1970) in their seminal paper. The author suggest thatthis assumption is sound theoretically as the random error is a product of the measurementinstrument and not of the population. And albeit this assumption has been previously criti-cised (e.g. Alwin, 2007, p.107) it is still less restrictive than that proposed by Heise (1969),the latter proposing that the reliability should be considered equal in time.

6

Page 7: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

Given the previous equations and the definition of reliability in CTT, the percentageof variance explained by the true score (Lord and Novick, 1968), we propose the followingmeasures of reliability for each of the four waves:

κ1 = 1− θ

ψ11 + θ(10)

κ2 = 1− θ

β221ψ11 + ψ22 + θ

(11)

κ3 = 1− θ

β232(β

221ψ11 + ψ22) + ψ33 + θ

(12)

κ4 = 1− θ

β43(β232(β

221ψ11 + ψ22) + ψ33) + ψ44 + θ

(13)

here κi represents reliability, ψ11 is the variance of the true score T1 and ψ22, ψ33 and ψ44

are the variances of the disturbance terms. These equations highlight that the total varianceat a given time is a combination of random error, specific true score and variance of the truescore of the previous waves. These formulas will be used in order to evaluate the impact ofthe mixed modes on reliability at different waves.

1.2.2 Latent Markov Chain

While the QMSM provides a reliability estimate for continuous and ordered variables itcannot do so in the case of discrete, unordered, variables. In this case a more appropriatemodel would need to take into account each cell of the variables. Such a model was appliedto reliability analyses in panel data by Clogg and Manning (1996) and can be considereda Latent Markov Chain model based on Langeheine and van de Pol (2009) typology. Forsimplicity we will consider all variables, be they observed or latent, although the model caneasily be extended to variables with more categories.

Let i, j, k and l be the levels a dichotomous variable A measured at four points in time:A1, A2, A3 and A4. By levels we refer to the observed response to the item (e.g., answering’yes’ may be level 1 and ’no’ 2). The cell probability (ijkl) is denoted by πA1A2A3A4(ijkl).The observed tabulation of A1, A2, A3 and A4 can be explained by a latent variable, X,that has t levels. Thus πA1A2A3A4X(ijklt) represents the probability of a cell (ijklt) in anindirectly observed contingency table. The model can be written as:

πA1A2A3A4(ijkl) =T∑t=1

πA1A2A3A4X(ijklt) (14)

The first assumption of such a model is called local independence (Lazarsfeld and Henry,1968). This implies that once we have controlled for the latent variable there is no relation-ship between the observed variables:

πA1A2A3A4X(ijklt) = πX(t)πA1|X=t(i)πA2|X=t(j)πA3|X=t(k)πA4|X=t(l) (15)

where πX(t) is the probability that X = t, πA1|X=t(i) is the probability A1 = i conditionalon X = t (i.e., Pr(A = i|X = t)) and so on.

7

Page 8: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

For the moment the model is analogous to a CFA with one latent and four observedvariables. The main difference between the two is that the latent class model does not makeany assumption about the distribution of the variables. This model can be extended to aautoregressive one with four latent variables:

πA1A2A3A4(ijkl) =T∑

t1=1

T∑t2=1

T∑t3=1

T∑t4=1

πX1(t1)πA1|X1=t1(i)πX2|X1=(t1)(t2)πA2|X2=t2(j)πX3|X2=(t2)(t3)

πA3|X3=t3(k)πX4|X3=(t3)(t4)πA4|X4=t4(l) (16)

where X1−X4 are the true scores at the four time points, πAi|Xi=ti(i) is the measurementmodel (i.e., the relationship between the latent variable and the observed variable at time i)and πXi|Xi−1=(ti−1)(ti) are the transition probabilities from i− 1 to i (i.e, stability in time ofthe true score).

The reliability in this context can be calculated using the conditional odds ratio betweenXi and Ai (θAX):

ΘAiXi=πAi|Xi=1(1)πAi|Xi=2(2)

πAi|Xi=i(2)πAi|Xi=2(1)(17)

where ΘAiXigives the odds ratio of correct predictions versus incorrect ones.

This can be transformed using Yule’s Q into a measure of association similar to R2 (i.e.,it is a proportional reduction in error (Clogg and Manning, 1996; Coenders and Saris, 2000;Alwin, 2007)

QAiXi= (θAiXi

− 1)/(θAiXi+ 1) (18)

Figure 2: Latent Markov Chain with four waves

T1 T2

ΠX2X1T3

ΠX3X2T4

ΠX4X3

A1

ΠA1X1

A2

ΠA2X2

A3

ΠA3X3

A4

ΠA4X4

ε1 ε2 ε3 ε4

d2 d3 d4

8

Page 9: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

In order to identify these models two important constrains are needed. The first one istime-homogeneity of latent transition probabilities (Alwin, 2007; van de Pol and Langeheine,1990):

ΠX2X1 = ΠX3X2 = ΠX4X3 = ΠXXt−1 (19)

where ΠXiXi−1are matrices with transition probabilities of the true scores from one time

point to another. The second assumption is that of equal reliabilities over time (Alwin,2007). Here ΠAiXi

are the matrices of conditional probabilities linking the observed and thelatent variables:

ΠA1X1 = ΠA2X2 = ΠA3X3 = ΠA4X4 = ΠAX (20)

These assumptions are also needed for identification purposes and they imply that, unlikethe QMSM, we can only have one estimate of reliability and one of stability for each modewhen using LCM. And, while the two models both give a similar estimate of reliability,the assumption of equal reliabilities in time of LCM (20) is theoretically different from theassumption of equal error variance in time of the QMSM (9). As a result, we will not comparethe reliabilities of the two models.

One possible risk of the LMC approach is the resulting high value of the reliabilities.Alwin (2007) highlights that in this kind of model reliability is also a product of the numberof categories the variable has. Thus for items with two categories we expect high reliability.This is not a limitation of the method as long as it can discriminate the mode design effectreliability and stability.

Concluding the presentation of the two analytical approaches we would also like to high-light that despite the similarity between QMSM and LMC, both conceptually and in one ofthe assumptions, they are two distinct approaches that come form different statistical tra-ditions (Alwin, 2007). In this paper we see this as an advantage as it gives us two differentways of identifying the impact of mixing modes on measurement.

Furthermore, while we believe that reliability is an important quality indicator, as itshows the percentage of variance that is due to the true score, as opposed to random error,we also need to highlight that it ignores the part of the variance that is bias. Thus, whilea considerable part of the mixed mode literature talks about types of systematic errorsthat manifest differently between modes, such as acquiescence, primacy/recency or socialdesirability (Roberts, 2007; Betts and Lound, 2010; Dex and Gumy, 2011, for an overview),the two models used here, QMSM and LMC, ignore the bias as long as it is stable in time.Keeping in mind this limitation we propose two hypotheses.

1.3 Hypothesis

As this has already been motivated in section 1.1 there are three main reasons why mixingmodes would directly lead to a decrease in reliability. Firstly, using a mode that leads to anincrease in burden and a decrease in motivation for the respondent will lead to more mistakesand inconsistencies. Furthermore, these will decrease reliability through strong satisficing(Krosnick, 1991). Due to the loosing motivation and higher cognitive burden respondentswill have a higher tendency to do a mental coin flip. Lastly, the overall increase of complexityof the mixed mode design as opposed to a unimode one will lead to increase in random errors.

9

Page 10: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

H1: Overall the reliability and stability is smaller for the CATI–CAPI compared to theCAPI design in wave 2.

Additional impact of mixing modes on reliability is possible in subsequent waves. Thispossibility is important for longitudinal studies as it threatens comparability with previouswaves even if mode switch was temporary. One possible mechanism through which this maytake place is panel conditioning. The change of mode leads to a different type of cognitivetask which, in turn, stops the increase of reliability in subsequent waves.

H2: A difference in reliability will also be found in subsequent waves after mixing modes.

2 Methodology

2.1 Data

The USIP is a yearly panel study that started in 2008 and is financed by the UK Economicand Social Research Council. The survey is used for methodological experiments. It usesa stratified and geographically clustered sample in order to represent England, Scotlandand Wales. Using the Postcode Address File it applied systematic random sampling afterstratifying for the density of the manual and non-manual population in order to select 120sectors. Within each of these sectors 23 addresses were selected. The total number of selectedaddresses was 2.760. In wave 4 a refreshment sample of 849 household was added to thesampling design. All residents over 16 were interviewed using Computer Assisted PersonalInterviews. In the present analyses we will be using waves 1–4, which have been collectedbetween 2008 and 2011. Wave 1 had an initial household level response rate of 59.5% followedby response rates of 72.7%, 66.7% and 64%, respectively, for subsequent waves (McFall et al.,2012). The individual sample size for the full-interview vary from a maximum of 2384 inwave 1 to a minimum of 1621 in wave 3.

One of the characteristics that were manipulated in the experiments of the USIP is themode design. For example, the wave two of the survey had implemented a CATI–CAPIsequential mixed mode design for two thirds of the sample while for the last third a CAPIunimode design was used. Furthermore, the sequential design was equally divided in an ’earlytransfer’ group and a ’late transfer’ group. While in the case of the former if one individualfrom the household refused or was unable/unwilling to participate over the telephone theentire family was transferred to a CAPI interview in the latter group such a transfer wasmade only after trying to interview all adults from the household using CATI (Burton et al.,2010). While this design decision is interesting we will consider the two CATI approachestogether and will refer to them as the CATI–CAPI mixed mode design as opposed to theCAPI unimode design.

Due to the fact that the allocation to the mode design was randomized we can considerthe resulting data as having a quasi-experimental design. Using the notation introduced byCampbell and Stanley (1963) we can represent the data as seen in Table 2. The two groupshave similar mode design with the exception of wave 2, when the CATI–CAPI sequentialdesign was introduced for a portion of the sample. In addition, we see that the two groupsare randomized, as a result they should be comparable and all differences between themshould be caused by the mixed mode design.

10

Page 11: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

Table 2: Quasi-experimental design of mixed modes in USIP

Group W1 W2 W3 W4

RF2F O1 O2 O3 O4

RTEL O1 XO2 O3 O4

In order to evaluate the impact of the mixed-mode design on the reliability of the itemswe have selected all the items that were measured in the USIP in all four waves. A Stata.ado that automatically evaluates the names of the variables in all four waves was used.Additional rules for selecting variables were used. As a result, all variables that had lessthan 100 cases for each wave on the pooled data were eliminated. Variables that are notthe direct results of data collection (e.g., weighting) or variables without variance (i.e., onecategory with 100%) were also eliminated.

After this selection and the elimination of nominal variables a total of 46 variables re-mained. Out of these 18 will be analysed using QMSM while 28 dummy variables will beanalysed using LCM. While the dummy variables cover a wider range of topics, from beliefsand self-description to income and job related questions, the metric and ordinal ones are lessso. The ordinal variables are mainly composed of the SF12, a health scale that measuresboth physical and psychological well-being (Ware et al., 2007). The continuous variables,on the other hand measure total income, net and gross, self-description, namely height andweight, and the number of hours worked in a typical week. All these 46 variables will beanalysed with the two methods presented above in order to estimate differences in reliabilityand stability between the two mode designs.

Table 3: Descriptive statistics of variablesBeliefes/attitudes Household Income Job Other Self-description Sum

Dummy 1 8 2 9 6 2 28Metric 0 0 2 1 0 2 5Ordinal 0 0 0 0 1 12 13

Sum 1 8 4 10 7 16 46

The data management and part of the analyses were made using Stata 12. The bulk ofthe analyses were made using Mplus 7 and the runmplus .ado.

2.2 Analytical method

For both types of analytical approaches I will use BIC to compare the different models:

BIC = −2ln(L) + kln(n) (21)

where k is the number of free parameters to be used while n is the sample size. Thisinformation criterion controls both for sample size and model complexity. Moreover it doesnot assume the models are nested and it can be used consistently both for the QMSM andLMC.

11

Page 12: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

Before exploring more about the ways in which mode influence measurement we needto highlight an important caveat. While theoretically it makes sense to distinguish betweenmeasurement and selection effects in mode differences these are harder to distinguish empir-ically. Relatively a small number of articles have tried to do this (Vannieuwenhuyze et al.,2010; Lugtig et al., 2011; Vannieuwenhuyze and Loosveldt, 2012; Buelens et al., 2012). Usu-ally they do so either through a very complex design (e.g. Buelens et al., 2012) or by usinga number of assumptions (e.g. Lugtig et al., 2011; Vannieuwenhuyze and Loosveldt, 2012).In order to simplify the analyses we will not distinguish between measurement and selectioneffects. Using the random allocation to mode we can estimate with certainty the total modeeffect. Thus, differences between the two mode designs in reliability can be seen as a totalmixed mode impact that includes the two effects and their possible interaction.

2.2.1 Quasi–Markov Simplex Model

The QMSM models will be analysed in a sequential order form the most general, less re-stricted, to the most constrained model. The first model (Model 1 ) assumes that the unstan-dardised loadings are equal to one (8) and that random measurement error is equal in time(9) within mode design. Thus, nothing is constrained equal across the two modes designs.The next four models stem from the definitions of the reliabilities for the four time points.As a result, Model 2 assumes that ψ11 and θ are equal across modes. If this is true thenthe reliability for wave one (κ1) is equal across modes. Model 3 adds to this constrain theequality of β21 and ψ22 across modes, thus implying that κ1 and κ2 are equal across modes.The last two models, Model 4 and Model 5, follow a similar logic and constrain β32 and ψ33,respectively β43 and ψ44 to be equal across the two mode designs. Because we expect thebiggest differences in wave two then the fit of Model 3 should present a significant decreasein BIC. If, on the other hand the best fitting model is Model 5 then both reliability andstability are equal across modes. Normally, we could use Model 2 as a randomization test.If the selection of the two groups was indeed random we would not expect significant differ-ences for ψ11 and θ1 across modes designs. Unfortunately, due to the assumption of equalrandom measurement in time (9), θ1 is ’contaminated’ by the random measurement errorsof the rest of the time points.

Although QMSM represents one of the best models we have for measuring reliabil-ity with repeated items it is marred with estimation issues. Two of these are the neg-ative variances of some of the variables and standardised stability coefficients over one(Jagodzinski and Kuhnel, 1987; Van der Veld and Saris, 2003). While Coenders et al. (1999)and Jagodzinski and Kuhnel (1987) explore the possible causes of these issues we proposea different solution here. Instead of estimating with Maximum Likelihood methods we willbe using Bayesian estimation. This has the advantage that it needs smaller sample sizesand does not results in unacceptable coefficients. While these are important advantages theBayesian estimation has two important drawbacks: it cannot use weights and it cannot domultigroup comparisons. The latter is especially important as we aim to compare the twomode groups. In order to bypass this issue we have taken advantage of the fact that thisestimation can deal with missing data using the Full Information procedure (Enders, 2010).Thus, using this approach all the information in the data is used for the analysis. We cantake advantage of this and model two parallel QMSM for the two groups although there are

12

Page 13: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

no common cases by imposing the lack of any relationship between them1. We will be usingthe Bayesian implementation in Mplus 7 with the following parameters: four chains, thinningof five, convergence criteria of 0.01 and a maximum of 70000 iterations and a minimum of30000 (Muthen and Muthen, 2012) in order to sequentially impose the restrictions presentedpreviously.

2.2.2 Latent Markov Chain

The estimation procedure for LMC will include three distinct models. The models onceagain start from the least restrictive to the most restrictive. Thus, Model 1 will assume thatboth the transition probabilities in time and the reliabilities are equal in time within modedesign (19)-(20). Model 2 imposes the additional restriction that the reliability is the samefor the two modes:

ΠAXTEL= ΠAXF2F

(22)

Model 3 adds the assumption that the transitions are equal both in time and betweenmodes:

ΠXXt−1TEL= ΠXXt−1F2F

(23)

Figure 3: Multigroup Latent Markov Chain with four waves

T1 T2ΠXiXi−1

T3ΠXiXi−1

T4ΠXiXi−1

Mode

A1

ΠAiXi

A2

ΠAiXi

A3

ΠAiXi

A4

ΠAiXi

ε1 ε2 ε3 ε4

d2 d3 d4

1Analyses were carried out to compare the Bayesian approach with Maximum Likelihood (with andwithout weights and a balanced sample). The models resulted in similar estimates of reliability and stability.

13

Page 14: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

Thus comparing the three models using the BIC we are able to see which model best fitsthe data. If Model 1 is the best fitting one then we conclude that both the reliabilities andthe transition probabilities from one wave to another (i.e., stabilities) are different acrossmodes. On the other hand, if Model 3 is the best fitting one we can assume that both thereliability and the stability are equal across the two modes designs. If Model 2 is the bestfitting one we can assume that the reliabilities are equal but the stability of the true scoresare not.

In order to estimate the model we will using Robust Maximum Likelihood estimationwith 500 maximum number of iterations and with random starts: 200 initial stage randomstarts and 20 final stage optimizations. In order to be consistent we will use no weights butwe will use the Full Information procedure.

3 Analysis and Results

Previous research has highlighted that the QMSM is an unstable model and can sometimeseither not converge or give out of bounds coefficients (e.g. Jagodzinski and Kuhnel, 1987;Van der Veld and Saris, 2003). While using the Bayesian approach bypassed most of theseissues it did prove problematic for three of the continuous variables, two items measuringincome and one measuring weight. Although the models converged when analysed by modedesign our parallel Markov chain approach did not lead to convergence even when increasingthe maximum number of iterations or the thinning coefficient. While we could compare thereliabilities and stabilities across modes we would not be able to use the same approachas that presented in section 2.2.1. As a result, these three variables will be ignored in thefollowing analyses. Similar issues have arisen in the case of LMC. Out of the initial 28 itemseight of them have issues in convergence, Mplus printing either a non-positive definite first-order derivative product matrix or a non-positive definite Fisher information matrix. Oneof the solutions proposed, increasing the number of random starts did not prove successfulin any of the models. The items were concentrated on two main topics. Four of themwere measuring attributes linked with the household and were derived from household levelinformation. Three of the items were measuring job related aspects, such as if they arefull-time or part-time employed, while the last item asked if the respondent had an UKdriving licence. These eight variables will also be ignored in the following analyses. Thus,our actual variable sample size is 35, 13 being ordinal variables, two being continuous while20 are dichotomous variables.

Table 4 presents the total sample sizes used for the two types of analyses. Because of theFull Information procedure the sample sizes for QMSM are relatively high with a median of1790 and a minimum of 1020. On the other hand the sample sizes are somewhat smaller forthe LMC, reaching 298 cases for a variable measuring the use of childcare facilities, but stillwith a median of 1750 individuals used per analysis.

3.1 Quasi–Markov Simplex Model

Concentrating on the 15 ordered variables 12 of them measure health-related aspects whilethe other three measure height, number of work hours and when they last weighted them-

14

Page 15: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

Table 4: Descriptive statistics of sample sizesOrdered variables Dummy variables

Min. 1020 2981st Qu. 1786 995Median 1788 1754Mean 1728 1410

3rd Qu. 1788 1788Max. 1788 1788

selves. Each of these items was analysed five times, each time imposing a new constrain aspresented in section 2.2.1, thus resulting in 75 models. Within each variable we comparedthe BIC of the five models. A decrease of this coefficient indicating a improve in the modelwhile controlling for sample size and model complexity.

Figure 4: Mean BIC improvement in model

Figure 4 presents the mean BIC improvement of the model as constrains are added.Thus, the mean BIC improvement of moving from Model 1 to Model 2 is around 33 whilefor adding the constrains of Model 5 to Model 4 leads to a further mean BIC improvementof 27. Overall we can observe that each constrain leads to improvement of fit and usuallyModel 5 proves to be the best fitting one. This implies that there is no difference betweenthe two mode designs in reliability or stability for ordered variables.

Table 5 presents the exceptions to the linear decrease in the fir with the additionalconstrains. If we look in the sequence of models for the best fitting one and consider that asthe best representation of the data then Height is the only variable that does not have Model5 as the best fitting one. In this case Model 2 appears to be the best fitting one. Thus,for Height either the the reliability in wave 2 or the stability to wave 2 is different betweenthe two mode designs. Looking in more details at the estimates of Model 2 for height weobserve that while reliabilities are very similar, 0.974 for the unimode design versus 0.976for the mixed mode one, the difference in the stability of the true score from wave one towave two is bigger, being 0.966 for the former and 0.997 for the latter.

15

Page 16: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

Table 5: BIC differences within variablesVariable Model BIC Difference

Model 1 16328.1 0.0Model 2 16323.0 5.1

Height Model 3 16337.3 -14.2Model 4 16323.3 13.9Model 5 16336.3 -13.0Model 1 20655.6 0.0Model 2 20647.1 8.4

Job hours Model 3 20664.1 -16.9Model 4 20638.8 25.2Model 5 20633.4 5.4Model 1 13226.1 0.0Model 2 13215.8 10.3

SF4b Model 3 13204.6 11.2Model 4 13208.0 -3.3Model 5 13195.0 13.0Model 1 16473.1 0.0Model 2 16473.3 -0.2

SF5 Model 3 16443.6 29.7Model 4 16431.7 11.9Model 5 16427.9 3.8

Figure 5: Mean reliability odered variables (Model 1 )

0.00

0.25

0.50

0.75

1.00

1 2 3 4Wave

Mea

n re

liabi

lity

Mode design

F2f

Tel

Looking in more detail at the reliability we observe very small differences between thegroups, thus the previous result of no decrease in goodness of fit with the constrains is lesssurprising. We also observe a moderate level of reliability for the items. In addition, Figure

16

Page 17: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

5 shows the mean of the stabilities in time. Here we also find very small differences betweenthe groups with an overall increase of stability in time. This is an expected result and canbe explained both in terms of panel conditioning (Sturgis et al., 2009) and as selection intime of ’good’ respondents.

Figure 6: Mean stability continuous (Model 1 )

0.00

0.25

0.50

0.75

1.00

2 3 4Wave

Mea

n st

abili

ty

Mode design

F2f

Tel

3.2 Latent Markov Chain

In addition to the QMSM models we have analysed 20 dichotomous variables. For each ofthese we have ran three models, as presented in Section 2.2.2. Overall similar results havebeen found. In mean the constrains of Model 2, equal reliabilities in time, bring an averageimprovement of BIC of 20. A similar process takes place when the additional constrain ofequal stabilities across modes designs are imposed.

Looking at the stabilities and reliabilities we find a similar results as in the case ofQMSM. The models indicate high reliabilities that are consistent across the two groups.Thus, for both of them the mean reliability is 0.98. A similar results can be found in thecase of stability. On average the mixed-mode group had a stability of 10.6 while the one forunimode design was 9.4 on a logodds scale. As the BIC results indicate, this difference doesnot withstand and the best fitting model is one that constrains them to be equal.

17

Page 18: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

Figure 7: Mean BIC improvement in model

0

10

20

30

40

M. 1−2 M. 2−3Models

BIC

_im

prov

emen

t

Dummy_variables

4 Conclusions and discussion

Section 1.1 argued that mixing modes will have a detrimental impact on reliability, especiallywhen one of the modes included brought additional respondent burden and lead to decreasemotivation. The results of our analyses do not confirm this hypothesis. In the case ofQMSM we have found only one variable out of 14 that did not indicate Model 5 as thebest fitting one. A similar result was found when using LMC. Here Model 3 was always thebest fitting one, indicating once again that stability and reliability are equal between modes.This implies that for almost all the variables analysed here the reliability and stabilities wereequal across modes.

By using the QMSM we were also able to analyse the impact of mixing modes in sub-sequent waves with regards to reliability. We have argued in section 1.1 that mixing modesmay lead to a decrease (or lack of increase) in reliability compared to a unimode design. Onepotential explanation for such an effect is panel conditioning, the mixing of modes leadingto a different type of cognitive task that, in turn, would decrease the impact of training.Our results do not support this hypothesis. Not only we do not observe any difference inreliabilities between the two mode designs in waves 3 and 4 we also do not observe an overallincrease in reliability. The result of no effects on reliability in subsequent waves is the firstone of its kind and may indicate that at least on this dimension, longitudinal reliability,panel studies are ’safe’ from mode effects.

While the results are not final and further replications are needed these results indicatethat reliability may not be the main threat to cross mode designs comparisons. If these resultsare replicated then selection (Lynn, 2012; Vannieuwenhuyze and Revilla) and response stylesmay prove to be more important issues than reliability. While our analyses show that randomerror is the same in the two mixed mode designs the same cannot be claimed about systematicerror that is stable in time. In order to capture these alternative methods are needed, such

18

Page 19: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

as MTMM or modelling of response styles.The study has also contributed to the QMSM field as we have proposed two important

solutions to some of the estimation issues that marred the method (Jagodzinski and Kuhnel,1987; Van der Veld and Saris, 2003). Firstly, we have proposed Bayesian estimation as a wayto avoid out of bounds coefficients. This has proved successful as all the models that usedthis approach converged with coefficients inside the theoretical limits. In addition, we havefound a solution to the lack of implementation of multi-group comparison when using thisestimation method. Taking advantage of the Full Information method used for missing datawe have modelled two parallel Markov chains and imposed that no correlation exists betweenthem. This has proved successful for all but three items. Although these have convergedwhen analysed by mode they did not when using this method. More research is needed tounderstand exactly why this happened.

A series of limitations of the study need to be highlighted. Firstly, we do not make thedistinction between selection and measurement effects but talk about total effect of mixingmodes. Using the random allocation to the design, which can also be seen as an instrumentalvariable, we are able to show with certainty that these are the total effects of mixing modes.These results should be valid as long as the measurement and selection effects do not impactreliability in different directions.

Additionally, we believe that the analysis should be extended to cover more sensitive andattitudes questions. These may prove more susceptible to mode effects, as highlighted bythe variable measuring hight. Additionally, analysis of subgroups, such as those with lowcognitive abilities or language skills, may lead to higher differences. Similar extensions shouldalso be made in different types mixed-mode designs and different cultural backgrounds.

Lastly, the high levels of reliability and stability in LMC brings doubts regarding itsusefulness as an instrument for measuring reliability and stability of dichotomous variables.While it is very attractive due to the lack of distributional assumptions it may also provenot sensitive enough to find differences across groups, especially where big differences arenot obvious.

19

Page 20: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

References

Alwin, D. F. (2007). The Margins of Error: A Study of Reliability in Survey Measurement.Wiley-Blackwell.

Betts, P. and Lound, C. (2010). The application of alternative modes of data collection onUK governent social surveys. literature review and consultation with national statisticalinstitutes. Office for National Statistics, pages 1–83.

Bollen, K. (1989). Structural Equations with Latent Variables. Wiley-Interscience Publica-tion, New York.

Buelens, B., van der Laan, J., Schouten, B., van den Brakel, J., Burger, J., and Klausch,T. (2012). Disentangling mode-specific selection and measurement bias in social surveys.Discussion paper Statistics Netherlands, pages 1–29.

Burton, J., Laurie, H., and Uhrig, N. (2010). Understanding society innovation panel wave2 results from methodological experiments. Understanding Society Working Paper Series,(04):1–34.

Campbell, D. T. and Fiske, D. W. (1959). Convergent and discriminant validation by themultitrait-multimethod matrix. Psychological Bulletin, 56(2):81–105.

Campbell, D. T. and Stanley, J. (1963). Experimental and Quasi-Experimental Designs forResearch. Wadsworth Publishing, 1 edition.

Chang, L. and Krosnick, J. A. (2009). National surveys via rdd telephone interviewing versusthe internet comparing sample representativeness and response quality. Public OpinionQuarterly, 73(4):641–678.

Clogg, C. and Manning, W. (1996). Assesing reliability of categorical measurements us-ing latent class models. In Eye, A. v. and Clogg, C., editors, Categorical Variables inDevelopmental Research: Methods of Analysis, pages 169–182. Academic Press Inc.

Coenders, G. and Saris, W. E. (2000). Testing nested additive, multiplicative, and gen-eral Multitrait-Multimethod models. Structural Equation Modeling: A MultidisciplinaryJournal, 7(2):219–250.

Coenders, G., Saris, W. E., Batista-Foguet, J. M., and Andreenkova, A. (1999). Stability ofthree-wave simplex estimates of reliability. Structural Equation Modeling: A Multidisci-plinary Journal, 6(2):135–157.

Couper, M. (2012). Assesment of innovations in data collection technology for undersandingsociety. Technical report, Economic and Social Research Council.

De Leeuw, E. D. (2005). To mix or not to mix data collection modes in surveys. Journal ofOfficial Statistics, 21(5):233–255.

20

Page 21: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

Dex, S. and Gumy, J. (2011). On the experience and evidence about mixing modes ofdata collection in large-scale surveys where the web is used as one of the modes in datacollection. National Center for Research Methods Review Paper, pages 1–74.

Dillman, D. A., Smyth, J. D., and Christian, L. M. (2008). Internet, Mail, and Mixed-ModeSurveys: The Tailored Design Method. Wiley, 3 edition.

Enders, C. K. (2010). Applied Missing Data Analysis. The Guilford Press, New York, 1edition.

Groves, R. M. (2011). Three eras of survey research. Public Opinion Quarterly, 75(5):861–871.

Heise, D. R. (1969). Separating reliability and stability in test-retest correlation. Americansociological review, 34(1):93–101.

Jackle, A., Roberts, C., and Lynn, P. (2006). Telephone versus Face-to-Face interviewing:Mode effects on data quality and likely causes. report on phase II of the ESS-Gallup mixedmode methodology project. ISER Working Paper, (41):1–88.

Jagodzinski, W. and Kuhnel, S. M. (1987). Estimation of reliability and stability in Single-Indicator Multiple-Wave models. Sociological Methods & Research, 15(3):219–258.

Krosnick, J. (1991). Response strategies for coping with the cognitive demands of attitudemeasures in surveys. Applied cognitive psychology, 5(3):213–236.

Langeheine, R. and van de Pol, F. (2009). Latent markov chains. In Hagenaars, J. andMcCutcheon, A., editors, Applied Latent Class Analysis, pages 304–341. Cambridge Uni-versity Press, 1 edition.

Lazarsfeld, P. F. and Henry, N. (1968). Latent structure analysis. Houghton, Mifflin.

Lord, F. M. and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Addison-Wesley Publishing Company, Inc.

Lugtig, P. J., Lensvelt-Mulders, G. J. L. M., Frerichs, R., and Greven, F. (2011). Estimatingnonresponse bias and mode effects in a mixed mode survey. International Journal ofMarket Research, 53(5):669–686.

Lynn, P. (2012). Mode-switch protocols: how a seemingly small design difference can affectattrition rates and attrition bias. ISER Working Paper, (28):1–17.

McFall, S., Burton, J., Jackle, A., Lynn, P., and Uhrig, N. (2012). Understanding society –the UK household longitudinal study, innovation panel, waves 1-4, user manual. Technicalreport, University of Essex, Colchester.

Muthen, L. and Muthen, B. (2012). Mplus User’s Guide. Seventh Edition. CA: Muthen &Muthen, Los Angeles.

21

Page 22: Impact on Reliability of Mixing Modes in Longitudinal Studies...mixing modes o ers a good theoretical solution to saving costs its impact on data quality, especially measurement, is

Revilla, M. (2010). Quality in unimode and Mixed-Mode designs: A Multitrait-Multimethodapproach. Survey Research Methods, 4(3):151–164.

Revilla, M. (2011). Impact of the mode of data collection on the quality of survey questionsdepending on respondents’ characteristics. RECSM Working Paper, (21):1–25.

Roberts, C. (2007). Mixing modes of data collection in surveys: A methodological review.NCRM Methods Review Papers.

Sturgis, P., Allum, N., and Brunton-Smith, I. (2009). Attitudes over time: The psychologyof panel conditioning. In Lynn, P., editor, Methodology of longitudinal surveys, pages113–126. Wiley, Chichester.

Tourangeau, R. (2004). Survey research and societal change. Annual review of psychology,55:775–801. PMID: 14744234.

Tourangeau, R., Rips, L. J., and Rasinski, K. (2000). The Psychology of Survey Response.Cambridge University Press, 1 edition.

van de Pol, F. and Langeheine, R. (1990). Mixed markov latent class models. SociologicalMethodology, 20:213.

Van der Veld, W. and Saris, W. (2003). A new framework and model for the survey responseprocess. unifying p. converse, c. achen, and j. zaller, and s. feldman. pages 1–29, Marburg.

Vannieuwenhuyze, J., Loosveldt, G., and Molenberghs, G. (2010). A method for evaluatingmode effects in mixed-mode surveys. Public Opinion Quarterly, 74(5):1027–1045.

Vannieuwenhuyze, J. T. A. and Loosveldt, G. (2012). Evaluating relative mode effects inMixed-Mode surveys: Three methods to disentangle selection and measurement effects.Sociological Methods & Research, 42(1):82–104.

Vannieuwenhuyze, J. T. A. and Revilla, M. Evalauting relative mode effects on data qualityin Mixed-Mode surveys. Under review.

Ware, J., Kosinski, M., Turner-Bowker, D. M., and Gandek, B. (2007). User’s Manual forthe SF-12v2 Health Survey. QualityMetric, Incorporated.

Wiley, D. and Wiley, J. (1970). The estimation of measurement error in panel data. AmericanSociological Review, 35(1):112–117.

22