Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing...

37
Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global Programme on Evidence for Health Policy Discussion Paper No. 42 World Health Organization, Geneva, Switzerland January 23, 2002 1 Introduction Measuring the health state of individuals is important for the evaluation of health interven- tions, monitoring individual health progress, and as a critical step in measuring the health of populations. Self-report responses in household survey data are widely used for assessing the non-fatal health status of populations. These data typically take the form of ordered categorical (ordinal) responses. Over the past three decades, there has been great progress in developing instruments to measure the multiple domains of health that are reliable and demonstrate within population validity [31],[22]. One key analytical issue is that these self-report ordinal responses are not comparable across populations primarily because of response category cut-point shifts. Conceptualizing the observed responses as resulting from a mapping between an underlying unobserved latent variable (e.g., ability on the domain of mobility) and categorical response categories, cut-points are threshold levels on the latent variable that characterize the transition from one observed categorical response to the next. If cut-points dier systematically across populations, or even across sociodemographic groups within a population, then the observed ordinal responses are not cross-population comparable since they will not imply the same level on the underlying latent variable that we are trying to measure (Figure 1). Another way of characterizing this problem is that, for the same level of the latent variable on any given domain, the probability of an individual responding in any given response category is dierent across populations. This issue of cross-population comparability is not limited to health surveys: it is of equal relevance to self-report surveys on responsiveness of health systems, as well as to numerous other questions that rely on ordinal responses. One example of self-report health data comes from the WHO Multi-Country Household Survey Study on Health and Responsiveness [28]. The main self-report question on the domain of mobility is: “Overall in the past 30 days, how much diculty did you have with moving around?” Respondents are asked to classify themselves using one of ve response categories: “1=Extreme/Cannot do; 2=Severe diculty; 3=Moderate diculty; 4=Mild

Transcript of Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing...

Page 1: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Statistical Models for Enhancing

Cross-Population Comparability

A. Tandon, C.J.L. Murray, J.A. Salomon, and G. KingGlobal Programme on Evidence for Health Policy Discussion Paper No. 42

World Health Organization, Geneva, Switzerland

January 23, 2002

1 Introduction

Measuring the health state of individuals is important for the evaluation of health interven-tions, monitoring individual health progress, and as a critical step in measuring the healthof populations. Self-report responses in household survey data are widely used for assessingthe non-fatal health status of populations. These data typically take the form of orderedcategorical (ordinal) responses. Over the past three decades, there has been great progressin developing instruments to measure the multiple domains of health that are reliable anddemonstrate within population validity [31],[22].

One key analytical issue is that these self-report ordinal responses are not comparableacross populations primarily because of response category cut-point shifts. Conceptualizingthe observed responses as resulting from a mapping between an underlying unobservedlatent variable (e.g., ability on the domain of mobility) and categorical response categories,cut-points are threshold levels on the latent variable that characterize the transition fromone observed categorical response to the next. If cut-points differ systematically acrosspopulations, or even across sociodemographic groups within a population, then the observedordinal responses are not cross-population comparable since they will not imply the samelevel on the underlying latent variable that we are trying to measure (Figure 1). Anotherway of characterizing this problem is that, for the same level of the latent variable on anygiven domain, the probability of an individual responding in any given response categoryis different across populations. This issue of cross-population comparability is not limitedto health surveys: it is of equal relevance to self-report surveys on responsiveness of healthsystems, as well as to numerous other questions that rely on ordinal responses.

One example of self-report health data comes from the WHO Multi-Country HouseholdSurvey Study on Health and Responsiveness [28]. The main self-report question on thedomain of mobility is: “Overall in the past 30 days, how much difficulty did you have withmoving around?” Respondents are asked to classify themselves using one of five responsecategories: “1=Extreme/Cannot do; 2=Severe difficulty; 3=Moderate difficulty; 4=Mild

Page 2: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Latent mobility scale

A B C

E

S

Mo

Mi

N

E

ES

S

Mo

Mo

Mi

MiN

N

N = None, Mi = Mild, Mo = Moderate, S = Severe, E = Extreme

Cut-points

Figure 1: Mapping from unobserved latent variable to observed response categories

difficulty; 5=No difficulty.” We can hypothesize that cut-points may vary between popula-tions because of different cultural or other expectations for domains of health. Cut-pointsare also likely to vary within a cultural or sociodemographic group. The cut-points forolder individuals may shift as their expectations for a domain diminish with age. Men maybe more likely to deny declines in health so that their cut-points may be systematicallyshifted as compared to women. Contact with health services may influence expectations fora domain and thus shift cut-points [20].

Empirical examples suggesting cross-population cut-point shifts in health surveys abound[23]. For instance, in Australian national health surveys comparing the self-reported healthstatus of Aboriginals with that of the general population, only around 12% of the Aboriginalpopulation characterized their own health status as fair or poor, while more than 20% of thegeneral population rated their health in these low categories. By any other major indicatorof mortality and morbidity, the Aboriginal population fares much worse than the generalpopulation, which suggests that there may be important differences in the interpretationof categorical responses in the different sub-populations due to shifts in response categorycut-points. Residents of the state of Kerala in India — which has the lowest rates of infantand child mortality and the highest rates of literacy in India — consistently report highestincidences of morbidity in the country [19].

The object of this document is to elaborate on several statistical models used in theanalysis of survey data. First, we focus on off-the-shelf models that are widely availableas part of any standard statistical software. In particular, we demonstrate the problems ofinference that arise from these standard methods when the underlying data are not cross-population comparable. In later sections, we introduce methods that modify these standardroutines to enhance the cross-population comparability of survey analyses.

2

Page 3: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

2 Models for Analyzing Ordinal Survey Responses

We begin by describing the application of existing statistical models for the analysis ofordinal survey data. These models serve as the building blocks for the methodologicalinnovations introduced in subsequent sections. In particular, the focus is on two off-the-shelf methods: (a) the ordered probit model (widely used by econometricians and othersocial scientists), and (b) the partial credit model (from psychometrics). Both these modelsare used in the analysis of ordered categorical response data. The partial credit model is amultiple-category generalization of the Rasch model and is part of a large body of literature— often referred to as item response theory (IRT) — which has its roots in educational testingusing standardized exams.

One needs to be careful, though, in using these standard models in the analysis of datathat may not be cross-population comparable. In other words, if there are good reasons tobelieve that respondents saying they are in “good” health in Ethiopia and in Denmark meanvery different things in terms of an underlying latent variable measure, then the use of thesemethods without correction may lead to very misleading conclusions regarding the actuallevels of health in these two populations. In order to better demonstrate this point, andto subsequently introduce some methodological innovations dealing with cross-populationcomparability, a simulated dataset is utilized. The simulated dataset consists of 1,000 re-spondents each from two hypothetical populations (countries A and B) for which the levelof health on a domain, say mobility, is to be estimated based on self-report categorical re-sponses to three questions (one core question, and two auxiliary questions). These questionsare:1

Main Question: “Overall in the past 30 days, how much difficulty did you have moving

around?”

Auxiliary Question 1 : “Overall in the past 30 days, how much difficulty did you have

standing for long periods such as 30 minutes?”

Auxiliary Question 2 : “Overall in the past 30 days, how much difficulty did you have

climbing several flights of stairs or walking up a steep hill?”

Each of the questions asks the respondents to pick one of five responses:

1 = “Extreme/Cannot Do”

2 = “Severe”

3 = “Moderate”

4 = “Mild”

5 = “None”

Since this is simulated data, the true mobility levels are known for each respondent. Thisenables a comparison of the estimated mobility levels versus truth for the different models.

1The questions mirror those in the WHO Multi-Country Study.

3

Page 4: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

The simulated data is generated based on the assumption that true mobility is a function ofage, sex, education, and country of residence for each respondent. An individual-level ran-dom effect term is also added to represent other individual-specific unobserved factors thatmight affect mobility. Table 1 reports the mean age, education level, and sex distributionin the simulated sample.

Table 1: Descriptive statistics (simulated data)

Country Mean Age Mean Education Female N

A 38.72 4.72 500 1,000B 38.63 7.33 492 1,000

In addition, the simulation allows cut-points for each question to differ by sociodemo-graphic group. The response category cut-points are generated as functions of age, sex,education, and country of residence. Figure 2 plots the distribution of the simulated “ob-served” categorical responses for the three questions for countries A and B.2 At first glance,the distribution of self-report responses in the two countries does not look very different.

Country A

Frac

tion

Main question1 2 3 4 5

0

.2

.4

.6

.8

1

Country A

Frac

tion

Auxiliary question 11 2 3 4 5

0

.2

.4

.6

.8

1

Country A

Frac

tion

Auxiliary question 21 2 3 4 5

0

.2

.4

.6

.8

1

Country B

Frac

tion

Main question1 2 3 4 5

0

.2

.4

.6

.8

1

Country B

Frac

tion

Auxiliary question 11 2 3 4 5

0

.2

.4

.6

.8

1

Country B

Frac

tion

Auxiliary question 21 2 3 4 5

0

.2

.4

.6

.8

1

Figure 2: Distribution of responses for three self-report questions in countries A and B

In the next two sub-sections, these data are analyzed using both the ordered probitmodel and the Rasch-based partial credit model. It is assumed that the data analyst hasaccess to the self-report categorical responses as well as standard demographic variablessuch as age, sex, education, and country of residence for each of the respondents. The goalis to estimate mobility levels in the two simulated populations using these data. In later

2In generating the categorical responses, a stochastic error term with a variances ranging from 15 to 25units was used (assumed different across questions, with auxiliary question 2 being the “noisiest” question).

4

Page 5: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

sections, we introduce models that allow response category-cut-points also to be functionsof covariates. In such models, the direction of shift for the response category cut-pointsis also of substantive interest (e.g., to test the hypotheses that more educated respondentshave higher cut-points indicative of higher norms, or that older individuals respond basedon norms for their age category, and so on). Of course, such models can also be used fortesting hypotheses relating to causal inferences and other tests of statistical significance.

2.1 The Ordered Probit Model

The ordered probit model assumes there is an unobserved latent variable Y ∗i (mobility)distributed with mean µi and variance 1, where i refers to the respondent.

3 The mean levelof the latent variable is a function of individual-level sociodemographic characteristics suchas age, sex, education, and country of residence,

Y ∗i ∼ N(µi, 1), i = 1, ..., N

µi = Z 0iβ.

Let yi be the observed categorical response of individual i to the main self-report question.The ordered probit model stipulates an observation mechanism such that:

yi = k if τk−1 ≤ Y ∗i < τk; for τ0 = −∞, τ5 =∞, ∀ i & k = 1, ..., 5.

Also, it follows from the set-up of the model that τ1 < τ2 < τ3 < τ4. Given this structure,the probabilities of responding in any given category k = 1, ..., 5, conditional on a vector ofcovariates Zi, can be derived as:

Pr(yi = k) =

F (τ1 − Z 0iβ), k = 1F (τ2 − Z 0iβ)− F (τ 1 − Z 0iβ), k = 2F (τ3 − Z 0iβ)− F (τ 2 − Z 0iβ), k = 3F (τ4 − Z 0iβ)− F (τ 3 − Z 0iβ), k = 41− F (τ4 − Z 0iβ), k = 5,

(1)

where F (·) is the standard normal cumulative distribution function.If the observations are assumed independent across individuals, then the likelihood func-

tion is simply the product of the probabilities of observing each value of yi in the dataset.Estimates of the β vector as well as the cut-points τk may then be obtained using maximumlikelihood methods. It is important to note that the standard ordered probit model assumesthe same set of cut-points for the entire sample. Table 2 reports the results from a run ofthe ordered probit model for our simulated data for the main question in both countries.

Figure 3 plots the cut-points estimated from the ordered probit model versus true cut-points for the main question. Because the true cutpoints may vary across individuals but

3Since the latent variable is unobserved, the variance of the latent variable conditional on determinantsis arbitrarily set to 1 in the ordered probit model. In addition, in order to identify the model, the constantterm is set to 0. These conventions produce a scale that is unique up to any positive affine transformation,i.e., the latent scale has so-called interval properties.

5

Page 6: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Table 2: Estimation results : ordered probit

Variable Coefficient (Std. Err.)

Age 30-44 -0.079 (0.065)Age 45-59 -0.166 (0.077)Age 60+ -0.498 (0.088)Male -0.062 (0.053)1 < Educ ≤ 6 0.124 (0.091)6 < Educ ≤ 11 0.245 (0.096)Educ > 11 0.344 (0.113)Country B -0.232 (0.056)

τ1 -1.612 (0.102)τ2 -1.335 (0.1)τ3 -1.01 (0.098)τ4 -0.365 (0.096)

True

cut

-poi

nts

Predicted cut-points

First cut-point Second cut-point Third cut-point Fourth cut-point

-1.5 -1 -.5 0

-100

-50

0

50

Figure 3: Predicted versus true cut-points: ordered probit for main question

the model assumes that they are fixed, each predicted cutpoint is associated with a range ofdifferent true values. Figure 4 is a plot of true mobility versus estimated average mobilityusing the standard ordered probit model. As reported in the graph, the R-squared value isonly about 0.011. Not only does the ordered probit model predict the mean mobility poorly,it also predicts that the average mobility is lower in country B (see coefficient on countryB in Table 2) even though the true level of mobility is higher in country B in the simulateddata. The basic point of this simulation experiment is simple: if there are significant cut-point shifts in the underlying data-generating mechanism then using standard proceduressuch as the ordered probit model to analyze the data can be very misleading.

Since the ordered probit model is a probability model, we can also obtain the predictedprobabilities of responding in each of the five categories for the main question, given anyparticular level on the underlying latent variable scale (Figure 5). We have used only themain question for analyzing the data using the ordered probit model. One way to analyze

6

Page 7: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

R-squared = .011 RMSE = 21.264

True

mob

ility

Predicted mobility-1 -.5 0 .5

-100

-50

0

50

Figure 4: Predicted versus true mobility: ordered probit for main question

Predicted probabilities: ordered probit

Latent mobility scale-2.5 -2 -1.5 -1 -.5 0 .5 1 1.5

0

.2

.4

.6

.8

1

Pr(k = 5)Pr(k = 1)

Pr(k = 4)

Pr(k = 3) Pr(k = 2)

Figure 5: Predicted probabilities: ordered probit for main question

multiple questions using this model would be to pool the data and allow for a dummyvariable per question (since the cut-points will be assumed to be the same for all questions).However, doing this will yield a different mean value of the latent variable per question foreach individual. Running the model in this way is potentially confusing, since we assumethat an individual has a single value on the latent variable of interest that informs answersto all three questions, but this procedure would allow estimates of this latent variable todiffer by question.

2.2 The Partial Credit Model

A second model that is often used in the analysis of ordinal data is the partial credit modelfrom item response theory. This is basically a polytomous extension of the binary-response

7

Page 8: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Rasch model [16],[17],[18].4 Suppose there are N respondents, each answering J questionson a given domain. Individual i = 1, ...,N chooses response category k = 1, ..., 5 for questionj = 1, ..., J . The partial credit model conceptualizes the ordinal nature of the categoricaldata as a series of dichotomies or “steps”.5 These dichotomies are modeled such thatthe probability that a respondent chooses response category k, given the choice betweenresponse category k or k − 1, is:

φkij =Pr(yij = k)

Pr(yij = k − 1) + Pr(yij = k) =exp(βi − δkj )

1 + exp(βi − δkj )

Here, Pr(yij = k) is the probability that individual i responds in category k for questionj, and φijk is the corresponding probability of responding in category k conditional onresponding either in category k − 1 or k. βi is the “ability” of individual i, and δkj is the“difficulty” associated with the k-th step in question j. In other words, the probability ofresponding in category k, conditional on responding either in category k−1 or k, is modeledas a positive function of a person’s ability and a negative function of the difficulty for thequestion category. Making use of the condition that the probabilities of responding in acategory must sum to 1 across all five categories for each individual i and question j, i.e.,

Pr(yij = 1) + Pr(yij = 2) + Pr(yij = 3) + Pr(yij = 4) + Pr(yij = 5) = 1,

a general expression for the probability of responding in the k -th category (where k =1, ..., 5) can be derived:

Pr(yij = k) =exp[(k − 1)βi −Pk−1

m=0 δmj ]P5

s=1 exp[(s− 1)βi −Ps−1m=0 δ

mj ],

where, for notational convenience,P0m=0 δ

0j ≡ 0. For the case of five categories, the proba-

bilities of responding in each category can be written as:

Pr(yij = k) =

1/A, k = 1exp(βi − δ1j )/A, k = 2

exp(2βi − δ1j − δ2j )/A, k = 3

exp(3βi − δ1j − δ2j − δ3j )/A, k = 4

exp(4βi − δ1j − δ2j − δ3j − δ4j )/A, k = 5,

(2)

where A is the expression

A ≡ 1 + exp(βi − δ1j ) + exp(2βi − δ1j − δ2j ) + exp(3βi − δ1j − δ2j − δ3j )

+exp(4βi − δ1j − δ2j − δ3j − δ4j )

For a fixed number of questions, the unconditional estimation of the likelihood functionyields difficulty parameters that are inconsistent [16],[3]. Consistent estimates of the dif-ficulty parameters can be obtained by conditioning on the raw score (i.e., on the sum of

4The Rasch model is a fixed-effect logit model and can also be reformulated as a quasi-symmetry loglinearmodel [27],[8].

5In this sense, the partial credit model can be viewed as an adjacent category logit model.

8

Page 9: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

responses across questions for each individual). So, for example, the conditional probabilitythat a person responds in category 2 for all 3 questions is calculated as the joint probabilitydivided by the probability of getting a raw score r of 6 across the questions:

Pr(yi1 = 2)Pr(yi2 = 2)Pr(yi3 = 2)

Pr(r = 6)

The likelihood written in this manner is free of the ability parameter β. Once the difficultyparameters have been estimated using the conditional approach, estimates of βr can beobtained using the unconditional likelihood derived from:

Pr(yij = k) =

1/A, k = 1

exp(βr − δ̂1j )/A, k = 2

exp(2βr − δ̂1j − δ̂2j )/A, k = 3

exp(3βr − δ̂1j − δ̂2j − δ̂3j )/A, k = 4

exp(4βr − δ̂1j − δ̂2j − δ̂3j − δ̂4j )/A, k = 5,

The notation changes to βr because this method requires only one estimate of ability forevery possible sum score of responses across all questions.

In the partial credit model, the difficulty parameters are points on the latent variablescale where the probabilities of responding in one category or the next are equal. Alterna-tively, the difficulty parameters are points where the probability of responding in categoryk, conditional on responding in categories k− 1 or k, is 0.5. The ability parameters can bethought of as estimates of the individual’s underlying latent variable. The estimates of abil-ity levels can be compared to true mobility for the simulated data to assess the performanceof this model. This simple version of the partial credit model assumes that the difficultyparameters do not vary by sociodemographic characteristics which — in the language ofpsychometrics — is akin to saying that it assumes there is no “differential item functioning”.

Table 3 reports the difficulty parameters for the simulated data obtained by runningthe conditional likelihood procedure in STATA (for identification, δ1 is set to zero for themain question).6 Figure 6 plots the estimated ability parameters versus the true mobility.As with the ordered probit model, Figure 7 reports the predicted probabilities from themodel for given values of ability. The predicted probabilities are quite similar to thosethat are predicted by the ordered probit model (Figure 1). As the value of the latentvariable increases, the probability of responding in the lowest category becomes small andthe probability of responding in higher categories increases.

The partial credit model does better than the ordered probit model in predicting thetrue level of mobility. The R-squared value is much higher than that of the ordered probitmodel. However, the comparison between the two models is not entirely fair since we onlyuse one question for the ordered probit model and all three questions in the partial creditmodel.

In the formulation introduced here, the partial credit model uses no extraneous infor-mation (i.e., covariates such as sex, age, and education) in the estimation of the abilities.

6Estimates of the difficulty and ability parameters using STATA were of the same magnitude as thoseobtained using IRT software such as WINMIRA and RUMM.

9

Page 10: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

R-squared = .221 RMSE = 17.437

True

mob

ility

Predicted mobility-1 0 1 2

-100

-50

0

50

Figure 6: Predicted versus true mobility: two-stage partial credit

Predicted probabilities: partial credit

Latent mobility scale-2.5 -2 -1.5 -1 -.5 0 .5 1 1.5

0

.2

.4

.6

.8

1

Pr(k = 1)

Pr(k = 5)

Pr(k = 4)

Pr(k = 3)Pr(k = 2)

Figure 7: Predicted probabilities: two-stage partial credit for main question

10

Page 11: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Table 3: Estimation results : two-stage partial credit

Variable Coefficient (Std. Err.)

δ1

Dummy Aux 1 0.207 (0.183)Dummy Aux 2 1.615 (0.178)

δ2

Dummy Aux 1 0.225 (0.186)Dummy Aux 2 0.723 (0.183)Main question -0.795 (0.267)

δ3

Dummy Aux 1 1.277 (0.154)Dummy Aux 2 1.797 (0.151)Main question -0.933 (0.187)

δ4

Dummy Aux 1 -1.267 (0.11)Dummy Aux 2 1.291 (0.131)Main question -0.544 (0.175)

In the next subsection, we present an alternative specification of the model that includescovariates.

2.3 The Partial Credit Model with Covariates

The partial credit model can be reformulated so that instead of having a dummy variableper individual βi, variables such as age, sex, education, and country of residence can beintroduced. Such a modification to the partial credit model is especially useful in theanalysis of health survey data given that sociodemographic variables are usually collected insuch surveys. Equation (2) with covariates can be written as the probability that individuali responds in category k for each of the questions j, conditional on a vector of covariatesZi:

Pr(yij = k) =

1/A, k = 1exp(Z 0iβ − δ1j )/A, k = 2

exp(2Z 0iβ − δ1j − δ2j )/A, k = 3

exp(3Z 0iβ − δ1j − δ2j − δ3j )/A, k = 4

exp(4Z 0iβ − δ1j − δ2j − δ3j − δ4j )/A, k = 5,

(3)

where A is the expression

A ≡ 1 + exp(Z 0iβ − δ1j ) + exp(2Z0iβ − δ1j − δ2j ) + exp(3Z

0iβ − δ1j − δ2j − δ3j )

+exp(4Z 0iβ − δ1j − δ2j − δ3j − δ4j )

Assuming independence across observations and questions, estimates can be computed usingmaximum likelihood. The mean predicted level of mobility versus truth is plotted in Figure8 and the estimates are in Table 4.

11

Page 12: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Table 4: Estimation results : partial credit with covariates

Variable Coefficient (Std. Err.)

Mean

Age 30-44 -0.134 (0.024)Age 45-59 -0.203 (0.028)Age 60+ -0.336 (0.032)Male -0.077 (0.019)1<Educ≤6 0.049 (0.033)1<Educ≤6 0.109 (0.034)Educ>11 0.16 (0.041)Country B -0.075 (0.02)

δ1

Dummy Aux 1 0.274 (0.185)Dummy Aux 2 1.261 (0.163)Main question 0.272 (0.144)

δ2

Dummy Aux 1 0.092 (0.185)Dummy Aux 2 -0.076 (0.166)Main question -0.747 (0.14)

δ3

Dummy Aux 1 1.261 (0.151)Dummy Aux 2 1.247 (0.126)Main question -1.124 (0.1)

δ4

Dummy Aux 1 -1.319 (0.109)Dummy Aux 2 0.746 (0.099)Main question -1.202 (0.066)

The mean level of the estimated latent variable that is plotted in Figure 8 does notaccount for the fact that the deterministic variation in the latent variable will be imperfectlycaptured by the limited set of included covariates. In the absence of a random effect,the model will overestimate the amount of stochastic variability in the data. The nextsubsection introduces a method for accounting for this by using Bayes’ theorem to estimatethe predicted mobility.

2.4 Random Effects and Latent Variable Estimation using Bayes’ Theo-rem

If there is an individual-level random effect in the data — i.e., when covariates in our modeldo not capture all the systematic variation in the latent variable — then there remainsinformation content in the set of responses across questions for each individual that has notbeen fully exploited. The partial credit model with covariates and a random effect νi with

12

Page 13: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

R-squared = .055 RMSE = 20.787

True

mob

ility

Predicted mobility-.6 -.4 -.2 0 .2

-100

-50

0

50

Figure 8: Predicted versus true mobility: partial credit with covariates

mean zero and variance σ2ν can be written out as follows:

Pr(yij = k) =

1/A, k = 1exp[(Z 0iβ + νi)− δ1j ]/A, k = 2

exp[2(Z 0iβ + νi)− δ1j − δ2j ]/A, k = 3

exp[3(Z 0iβ + νi)− δ1j − δ2j − δ3j ]/A, k = 4

exp[4(Z 0iβ + νi)− δ1j − δ2j − δ3j − δ4j ]/A, k = 5,

(4)

where A is the expression

A ≡ 1 + exp[(Z 0iβ + νi)− δ1j ] + exp[2(Z0iβ + νi)− δ1j − δ2j ] + exp[3(Z

0iβ + νi)− δ1j − δ2j − δ3j ]

+exp[4(Z 0iβ + νi)− δ1j − δ2j − δ3j − δ4j ]

In order to exploit the information content in the set of responses we can make use of Bayes’theorem to obtain estimates of the mean level of mobility conditional of the observed set ofresponses. That is, we can estimate Pr(µi | yi) using Bayes’ formula:

Pr(µi | yi) = Pr(yi |µi)Pr(µi)RPr(yi |µi)Pr(µi) dµi . (5)

where yi represents the vector of categorical responses on all questions for individual i. Theway this can be implemented is as follows. First, we use the model with a random effect andestimate all the parameters including the variance of the random effect. This estimate ofthe variance can be used to simulate 100 different values of µi around the predicted Z

0iβ of

the latent variable for each individual in the sample. Hence, for each simulated value of µi,Pr(µi) can be calculated. Pr(yi |µi) can be calculated using the probability specificationsgiven in equation (4). Integrating over all simulated values of µi for each individual givesus the denominator of equation (5).

In the absence of a model that estimates the variance of this individual-specific ran-dom effect, one can assume that the random effect captures about 50% of the variation

13

Page 14: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

R-squared = .334 RMSE = 17.441

True

mob

ility

Predicted mobility-2 -1 0 1 2

-100

-50

0

50

Figure 9: Predicted versus true mobility: partial credit with covariates (Bayesian)

in estimated variance of the error term. Under this assumption, the Bayesian predicationof mobility conditional on the observed pattern of responses is plotted in Figure 9 for thepartial credit model with covariates.7

It is quite remarkable that the Bayesian correction significantly improves the estimationof mobility (Figure 9) when compared with the estimation of abilities using the two-stepconditional procedure for the partial credit model (Figure 6), as judged by the R-squaredvalues. In other words, if the goal of the analyst is to estimate the underlying latent variable,then a modification of the partial credit model that allows for covariates and a random effectoutperforms the simple version of the partial credit model.

2.5 Ordered Probit versus Partial Credit

We have introduced two basic types of models that are widely used in the analysis of cat-egorical data, namely the ordered probit model and the partial credit model (with abilitydummies and with covariates). Fundamentally, both models assume some sort of latentvariable that gives rise to an observation mechanism governed by probabilities given inequations (1) and (2). Viewed this way, the two models are quite similar differing onlywith respect to the functional form for the data generating mechanism and their differ-ences in approach to modeling the probabilities: these being derived from differences in thecumulative probability function for the ordered probit model versus the focus on adjacentcategories in the partial credit model.

Apart from poor predictions of the underlying latent variable, both the ordered probitand the partial credit models suffer from the problem that one cannot allow the responsecategory cut-points (τ ’s), or the so-called difficulty parameters (δ’s), to be functions of thesame covariates as the mean value of the latent variable. This is because there will be a clear

7We have developed working versions of the models with random effects. However, they are very slow torun and we are currently trying to improve the speed of estimation.

14

Page 15: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

identification problem if one does so: in the absence of additional exogenous information,neither model will be able to detect whether the effects of the covariates are on the meanvalue of the latent variable or on the cut-points or difficulties. This is easy to see fromthe equations for the predicted probabilities [equations (1) and (2)]. This is likely to be aserious shortcoming of both models in estimating cross-population comparable differencesin the latent variable of interest. In simple terms, these models do not allow for a world inwhich the Danish not only have a higher health status, but also have different expectationsfor their health status relative to Ethiopians.

In the next section, we introduce an innovation to both the ordered probit and partialcredit models that allows for the introduction of exogenous information in the form ofvignettes. Analyzing the self-report questions in conjunction with responses to vignettesallows us to identify the model such that the same set of covariates can be used to assessdifferences in the mean level of the underlying latent variable as well as in cut-points ordifficulties.

3 Vignettes

We now introduce the use of vignettes as a means of correction of self-report responses inorder to make them cross-population comparable. A vignette is a description of a concretelevel of ability on a given domain that respondents are asked to evaluate with relation tothe same main question and on the same categorical response scale as the main self-reportquestion [24]. The vignette fixes the level of ability such that variations in categoricalresponses are attributable to variations in response category cut-points. This introductionof exogenous information in the form of responses to vignettes allows us to identify theeffects of a set of sociodemographic covariates (such as age, sex, education, country ofresidence, etc.) on both the level of the underlying latent variable that is being estimatedas well as on the cut-points (in the ordered probit version of the model) and difficulties (inthe partial credit version of the model).8

In the WHO Multi-Country Study, there are six vignettes for the domain of mobility,each designed to capture a different level of ability on this domain. The vignettes are:

Vignette 1 : [Paul] is an active athlete who runs long distance races of 20 kilometers

twice a week and engages in soccer with no problems.

Vignette 2: [Mary] has no problems with moving around or using her hands, arms and

legs. She jogs 4 kilometers twice a week without any problems.

Vignette 3 : [Rob] is able to walk distances of up to 200 meters without any problems

but feels breathless after walking one kilometer or climbing up more than one flight of

stairs. He has no problems with day-to-day physical activities, such as carrying food

from the market.

8An alternative method to set a comparable scale such that response category cut-point differences canbe recovered is to use measured tests [26].

15

Page 16: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Vignette 4 : [Margaret] feels chest pain and gets breathless after walking distances of

up to 200 meters, but is able to do so without assistance. Bending and lifting objects

such as groceries produces pain.

Vignette 5: [Louis] is able to move his arms and legs, but requires assistance in standing

up from a chair or walking around the house. Any bending is painful and lifting is

impossible.

Vignette 6: [David] is paralyzed from the neck down. He is confined to bed and must

be fed and bathed by somebody else.

Respondents are asked to classify each of these vignettes on the same five-point responsecategory scale as the main question. So, for each individual, we not only have categoricalresponses to their self-report main question and several auxiliary questions, but we also havetheir categorical responses to a set of vignettes (ranging in number from six to eight acrossthe different domains for health and responsiveness in the WHO Multi-Country Study).

In order to introduce statistical models designed around the use of vignettes, we haveextended the simulated data set to include hypothetical ratings of seven mobility vignettesin countries A and B by assigning “true” mobility scores to the different vignettes andassuming that individuals will use the categorical response scale the same way in assessingvignettes as they do in assessing their own levels of mobility on the main question. Thisassumption is critical for the estimation of the models, as discussed below.

The simulated vignette ratings for the two countries are summarized in Figures 10 and11. Each graph shows the distribution of categorical responses for the set of vignettes (lightercolors signifying worse responses). The vignettes are ranked from 1 to 7 in decreasingorder of ability: i.e., vignette 1 refers to a higher level of mobility than vignette 2, andvignette 3 is higher than vignette 2, and so on. From these graphs, it is clear that there areimportant differences in the cutpoints between country A and country B. At lower levels ofmobility, respondents in country B are more likely to characterize a vignette unfavorablythan respondents in country A. In addition, the compression of the middle categories incountry B suggest cut-points that are more narrowly spaced than those in country A.

The types of variation in vignette ratings that we have generated in the simulateddataset closely parallel the variation observed in actual data from the WHO Multi-CountrySurvey Study. In a later section, we show the response distributions for China versus thosefor India for mobility vignettes. In the following sections we describe how variants of theordered probit model and partial credit model may be used in conjunction with vignetteratings in order to characterize these systematic cutpoint differences more precisely. Bothmodels are modified such that: (a) information from responses to vignettes are introducedin the likelihood function, and (b) cut-points and difficulties are allowed to be functions ofthe same covariates as those used in the estimation of the mean value of the latent variable.

3.1 Hierarchical Ordered Probit Model (HOPIT)

The hierarchical ordered probit (HOPIT) model is a modification of the standard orderedprobit model described earlier. In order to incorporate information on vignette ratings and

16

Page 17: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

0%10%

20%30%40%50%

60%70%80%90%

100%

1 2 3 4 5 6 7Vignette

Figure 10: Distribution of vignette responses for country A

0%10%

20%30%40%50%

60%70%80%90%

100%

1 2 3 4 5 6 7Vignette

Figure 11: Distribution of vignette responses for country B

17

Page 18: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

multiple questions, the expanded model has several components to the likelihood function:the first component refers to estimation of cut-points using responses to vignettes, and thesecond component utilizes responses on the self-report main question. The remaining com-ponents are for auxiliary questions. In formal terms, the first component of the likelihoodfunction assumes there is an unobserved latent variable Y v∗ij distributed with mean µvij andvariance 1. Here, i refers to the respondent, j refers to the vignette number, and the v su-perscript indicates that this refers to the vignette component of the model. In mathematicalterms,

Y v∗ij ∼ N(µvij , 1), i = 1, ...,N ; j = 1, ..., V

µvij = J 0iα,

where Ji is a vector of indicator variables for each of V − 1 vignettes. Letting yvij denotethe observed categorical response by individual i to vignette j, the observation mechanismis defined as follows:

yvij = k if τk−1i ≤ Y v∗ij < τki ; for τ0i = −∞, τ5i =∞, ∀ i, j & k = 1, ..., 5.In addition, the cut-points are allowed to be functions of covariates:

τki = X0iγk,

As before, τ1i < τ2i < τ3i < τ4i .

The second component of the likelihood function utilizes information from the respon-dent’s main self-report question (the one that is tied to the vignettes) and assumes there isan unobserved latent variable Y s∗i distributed with mean µsi and variance σ

2. Here, the ssuperscript indicates that this component refers to self-report questions. This formulationis slightly different from the standard ordered probit model: since we are allowing the vi-gnettes to drive the cut-point estimation, this second component of the likelihood functionhas more in common with an interval regression model (i.e., an ordered probit model withknown cut-points). Since the cut-point estimation is being driven by vignettes and the scaleis set by the first estimation component, we are now able to obtain estimates of the varianceof the latent variable (i.e., there is no need to set the variance equal to 1 as before). Inmathematical terms, the model is:

Y s∗i ∼ N(µsi ,σ2), i = 1, ..., N

µsi = Ziβ0.

Let ysi be the observed categorical responses on the self-report such that:

ysi = k if τk−1i ≤ Y s∗i < τki ; for τ0i = −∞, τ5i =∞, ∀ i & k = 1, ..., 5

Similarly for the auxiliary questions, let aji be the observed categorical responses on thej-th auxiliary question such that:

aji = k, if τj,k−1i ≤ Y s∗i < τ j,ki ; for τ j,0i = −∞, τ j,5i =∞, ∀ i & k = 1, ..., 5

andτ j,ki = X 0

iγj,k

18

Page 19: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

R-squared = .495 RMSE = 15.186

True

mob

ility

Predicted mobility2 4 6 8

-100

-50

0

50

Figure 12: Predicted versus true mobility: HOPIT

It is assumed that Y s∗i & Y s∗i0 are independent ∀ i 6= i0, conditional on Xi. Y v∗ij & Y s∗iare independent ∀ i, j conditional on Xi, Ji and Zi. The probabilities associated withthe observed responses to vignettes, the main question, and the auxiliary questions canbe computed as in equation (1) with the adjustment for cut-point shifts being functionsof covariates. The likelihood function can be written using these probabilities as threeseparate components. The three components of the likelihood function are additive in logsand can be jointly maximized to yield the parameter estimates. There is explicit parametricdependence between the different components of the likelihood function. The cut-points tobe estimated from the vignettes component are the same as those in the main questioncomponent. In addition, µsi is the same for both the main question and all the auxiliaryquestions. This ensures that the estimated cut-points for both the main question and theauxiliary questions are on the same scale to enable meaningful comparisons.

Tables 5 to 9 report the results of the estimation in the Annex. Figure 12 plots theestimates of the mean level versus truth. The R-squared for the prediction has improvedwhen compared with the simple ordered probit model as well as with the partial creditmodels with and without covariates. Figure 13 reports the true versus estimated cut-points for the main question. These differ by sociodemographic group in that they are alsofunctions of the same covariates (age, sex, education, and country of residence) as the meanlevel of the mobility. As can be seen, the model is able to recover the cut-point differencesquite well. Figures 14 and 15 report the comparison of estimated cut-points to truth for thetwo auxiliary questions. The recovery here is not quite as good as that for the main question.This is to be expected since the information in the vignettes are directly driving the mainquestion cut-points, whereas the estimation of the cut-points for the auxiliary questions ismore indirect and is not anchored to the cut-points derived from vignette responses. Theestimation of the latent variable using Bayes’ theorem (Figure 16) improves the R-squaredquite significantly, yielding estimates of mobility that are quite close to the true mobilitylevels in the underlying simulated data.

19

Page 20: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

True

cut

-poi

nts

Predicted cut-points

First cut-point Second cut-point Third cut-point Fourth cut-point

2 4 6 8

-100

-50

0

50

Figure 13: Predicted versus true cut-points: HOPIT main question

True

cut

-poi

nts

Predicted cut-points

First cut-point Second cut-point Third cut-point Fourth cut-point

2 4 6 8

-100

-50

0

50

Figure 14: Predicted versus true cut-points: HOPIT auxiliary question 1

20

Page 21: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

True

cut

-poi

nts

Predicted cut-points

First cut-point Second cut-point Third cut-point Fourth cut-point

2 4 6 8

-100

-50

0

50

Figure 15: Predicted versus true cut-points: HOPIT auxiliary question 2

R-squared = .729 RMSE = 11.139

True

mob

ility

Predicted mobility0 5 10

-100

-50

0

50

Figure 16: Predicted versus true mobility: HOPIT (Bayesian)

21

Page 22: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

3.2 Hierarchical Partial Credit Model

In analogy to the HOPIT model, we implement the use of vignettes in exactly the sameway for the Rasch-based partial credit model. We allow for responses to vignettes to setthe difficulty levels and estimate differences across sociodemographic groups in the firstcomponent of the likelihood function. In the other components of the likelihood, we utilizeinformation from the main and auxiliary questions. The logic is the same as before: we areusing information on difficulty parameters from responses on vignettes to allow us to havecovariates that affect both the mean level of the estimated latent variable as well as thedifficulty parameters. For all the vignette questions, i.e., for j = 1, ..., V :

Pr(yvi = k) =

1/A, k = 1exp(J 0iα− δ1i )/A, k = 2exp(2J 0iα− δ1i − δ2i )/A, k = 3exp(3J 0iα− δ1i − δ2i − δ3i )/A, k = 4exp(4J 0iα− δ1i − δ2i − δ3i − δ4i )/A, k = 5,

(6)

where Ji is a vector of indicator variables for each of V −1 vignettes, and A is the expressionA ≡ 1 + exp(J 0iα− δ1i ) + exp(2J

0iα− δ1i − δ2i ) + exp(3J

0iα− δ1i − δ2i − δ3i )

+exp(4J 0iα− δ1i − δ2i − δ3i − δ4i )

and,δki = X

0iβk

Similarly, the probabilities for the main question (the one which is tied to the vignettes):

Pr(ysi = k) =

1/A, k = 1exp(Z 0iβ − δ1i )/A, k = 2exp(2Z 0iβ − δ1i − δ2i )/A, k = 3exp(3Z 0iβ − δ1i − δ2i − δ3i )/A, k = 4exp(4Z 0iβ − δ1i − δ2i − δ3i − δ4i )/A, k = 5,

(7)

where Zi is a vector of individual-level covariates, and A is the expression

A ≡ 1 + exp(Z 0iβ − δ1i ) + exp(2Z0iβ − δ1i − δ2i ) + exp(3Z

0iβ − δ1i − δ2i − δ3i )

+exp(4Z 0iβ − δ1i − δ2i − δ3i − δ4i )

And for the j-th auxiliary question:

Pr(ysij = k) =

1/A, k = 1exp(Z 0iβ − δ1ij)/A, k = 2

exp(2Z 0iβ − δ1ij − δ2ij)/A, k = 3

exp(3Z 0iβ − δ1ij − δ2ij − δ3ij)/A, k = 4

exp(4Z 0iβ − δ1ij − δ2ij − δ3ij − δ4ij)/A, k = 5,

(8)

where Zi is a vector of individual-level covariates, and A is the expression

A ≡ 1 + exp(Z 0iβ − δ1ij) + exp(2Z0iβ − δ1ij − δ2ij) + exp(3Z

0iβ − δ1ij − δ2ij − δ3ij)

+exp(4Z 0iβ − δ1ij − δ2ij − δ3ij − δ4ij)

22

Page 23: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

R-squared = .487 RMSE = 15.287

True

mob

ility

Predicted mobility-6 -5 -4 -3

-100

-50

0

50

Figure 17: Predicted versus true mobility: partial credit model

R-squared = .683 RMSE = 12.019

True

mob

ility

Predicted mobility-8 -6 -4 -2

-100

-50

0

50

Figure 18: Predicted versus true mobility: partial credit model (Bayesian)

Tables 10 to 14 in the Annex report the results of this estimation. Figures 17 and18 show the predicted mobility versus the true mobility before and after the Bayesiancorrection. The R-squared values obtained from the hierarchical ordered probit model forpredicted mobility are similar in magnitude for the pre-Bayesian estimates obtained usingthe HOPIT model. The post-Bayesian estimation appears to be slightly higher for HOPITthan for the hierarchical partial credit model. This may result from the fact that thehierarchical partial credit model, in the way we have formulated it, does not estimate thevariance of the stochastic term. This constraint will inhibit the model from fitting the dataas well as it could if the variance were included as a parameter.

23

Page 24: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

4 Goodness-of-Fit

Assessing goodness-of-fit for categorical data is not straightforward. One can compute asimple count-R2 which is a measure of the proportion of correct responses obtained fora given sample. For ordinal data, the predicted categorical response would be the oneassociated with the maximum predicted probability. Other options include a pseudo-R2

measure, which in software such as STATA, is a likelihood-based comparison of the modelwith all the parameters to one with only the intercept [12]. Rasch-based models use measuresof fit such as “outfit” and “infit”: “outfit” is a chi-square test based on the sum of thestandardized deviation of observed versus expected values of a response. “Infit” is also a chi-square test which utilizes an information-weighted sum by adjusting for extreme responsesusing weights [32].

In order to assess model fit, a standard likelihood ratio test can be used. These testscompare the log-likelihood value of the full model with a constrained version of the samemodel (i.e., a model that is nested within the full model) to assess the contribution ofthe dropped covariates to the likelihood function. Assume L0 is the log-likelihood valueassociated with the full model and L1 is the log-likelihood value of the constrained model.Then −2(L1−L0) is distributed χ2 with d0−d1 degrees of freedom, where d0 and d1 are themodel degrees of freedom associated with the full and the constrained models, respectively[12].

5 Unidimensionality

Both the HOPIT model and the Rasch-based models in IRT assume some form of unidi-mensionality. In formal terms, unidimensionality can be defined as the assumption thatany dependence between different questions tapping into a given domain is solely due tothe existence of a single underlying latent trait. Tests of unidimensionality are often basedon uncovering this assumed factor that underlies observed responses to multiple question.Mathematically, the assumption of unidimensionality can be worked out by assuming re-sponses to all questions on a given domain are tapping this latent trait.

In the WHO Multi-Country Study, test-retest data are available from a subsample ofrespondents who were revisited and administered the survey questionnaire for a secondtime. This availability of test-retest data can be used to design a test of unidimensionality.Suppose we get latent variable estimates from two separate questions on any given domain,Y ∗1 and Y ∗2 . Each of these estimates of the latent variable represents some measure of“truth” with error. That is, if truth were denoted by Y ∗true, then:

Y ∗1test = Y∗true + ²1test

Y ∗1retest = Y∗true + ²1retest

andY ∗2test = Y

∗true + ²2test

Y ∗2retest = Y∗true + ²2retest

24

Page 25: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Here, ²1 and ²2 are the question-specific error terms for both test and retest questions,²1 ∼ N(0, σ2²1), ²2 ∼ N(0, σ2²2). The correlation coefficient ρ between the measured Y ∗’s is:

ρ =cov(Y ∗1 , Y ∗2 )σY ∗1 σY ∗2

(9)

Rewriting (9),

ρ =cov(Y ∗1 , Y ∗2 )σY ∗1 σY ∗2

=cov(Y ∗1 , Y ∗2 )q

σ2Y ∗true+ σ2²1

qσ2Y ∗true

+ σ2²2

(10)

Similarly,

ρ∗ =cov(Y ∗true, Y ∗true)σY ∗trueσY ∗true

(11)

Dividing (11) by (10),

ρ∗

ρ=

qσ2Y ∗true

+ σ2²1

qσ2Y ∗true

+ σ2²2

σY ∗trueσY ∗true,

since cov(Y ∗1 , Y ∗2 ) = cov(Y ∗true, Y ∗true) if the error terms are assumed to be uncorrelated.Therefore,

ρ∗ =

vuutσ2Y ∗true+ σ2²1

σ2Y ∗true·vuutσ2Y ∗true

+ σ2²2

σ2Y ∗true· ρ = 1

where σ2²i =var(Yitest−Yiretest)

2 for i = 1, 2. Given that both σ2Y ∗true= cov(Y ∗1 , Y ∗2 ) and ρ

are observed, the above expression should equal 1. This can form the basis of a test ofunidimensionality using information from test-retest data.

6 Discussion

One of the key conclusions of this paper is that adjustments are needed to make survey re-sults comparable across populations. In particular, when categorical variables are involved,analyses must account for differences in response category cut-points. There is consider-able evidence that suggests that response category cut-points are different across countries.Therefore, until variation in cut-points is addressed, one must start from a presumptionthat results are not comparable across populations.

The problem of cross-population comparability also appears to apply within populationsacross different socio-economic and demographic groups. This has important implicationsfor the measurement of inequality, which may be greater or smaller than measured beforetaking into account response category cut-point shifts. It also has critical implications forcomparisons over time. Cut-points may systematically shift over time (e.g., due to risingincome, education, and health norms) so long-term trends may be difficult to assess withoutcorrection.

25

Page 26: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

7 Annex

Table 5: Estimation results : HOPITVariable Coefficient (Std. Err.)

Vignettes

Ivignette 2 -0.035 (0.146)Ivignette 3 -4.033 (0.117)Ivignette 4 -5.116 (0.122)Ivignette 5 -5.341 (0.123)Ivignette 6 -7.458 (0.175)Ivignette 7 -7.643 (0.195)

Mean

Age 30-44 -0.488 (0.085)Age 45-59 -0.715 (0.1)Age 60+ -1.656 (0.113)Male 0.174 (0.068)1<Educ≤6 0.185 (0.115)6<Educ≤11 0.332 (0.122)Educ>11 0.521 (0.147)Country B 0.996 (0.074)Intercept -2.985 (0.166)

log(s) 0.061 (0.043)

26

Page 27: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Table 6: Estimation results : HOPIT τ1

Variable Coefficient (Std. Err.)

Main question

Age 30-44 -0.504 (0.046)Age 45-59 -0.569 (0.054)Age 60+ -1.282 (0.062)Male 0.25 (0.037)1<Educ≤6 0.079 (0.061)6<Educ≤11 0.072 (0.065)Educ>11 0.129 (0.079)Country B 1.296 (0.041)Intercept -4.662 (0.134)

Auxiliary question 1

Age 30-44 -0.384 (0.132)Age 45-59 -0.544 (0.156)Age 60+ -1.172 (0.170)Male 0.227 (0.106)1<Educ≤6 0.056 (0.175)6<Educ≤11 0.161 (0.185)Educ>11 0.196 (0.227)Country B 0.928 (0.113)Intercept -4.312 (0.223)

Auxiliary question 2

Age 30-44 -0.099 (0.113)Age 45-59 -0.159 (0.134)Age 60+ -0.768 (0.155)Male 0.464 (0.092)1<Educ≤6 0.249 (0.157)6<Educ≤11 0.242 (0.166)Educ>11 0.395 (0.197)Country B 1.260 (0.099)Intercept -3.777 (0.202)

27

Page 28: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Table 7: Estimation results : HOPIT τ2

Variable Coefficient (Std. Err.)

Main question

Age 30-44 -0.441 (0.048)Age 45-59 -0.551 (0.056)Age 60+ -1.283 (0.063)Male 0.25 (0.038)1<Educ≤6 0.053 (0.062)6<Educ≤11 0.059 (0.066)Educ>11 0.072 (0.081)Country B 1.259 (0.043)Intercept -4.399 (0.134)

Auxiliary question 1

Age 30-44 -0.356 (0.125)Age 45-59 -0.402 (0.145)Age 60+ -1.232 (0.164)Male 0.271 (0.100)1<Educ≤6 -0.117 (0.165)6<Educ≤11 -0.005 (0.175)Educ>11 0.104 (0.212)Country B 0.839 (0.107)Intercept -3.922 (0.210)

Auxiliary question 2

Age 30-44 -0.152 (0.112)Age 45-59 -0.224 (0.133)Age 60+ -0.845 (0.155)Male 0.460 (0.092)1<Educ≤6 0.324 (0.157)6<Educ≤11 0.346 (0.165)Educ>11 0.458 (0.196)Country B 1.258 (0.098)Intercept -3.579 (0.201)

28

Page 29: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Table 8: Estimation results : HOPIT τ3

Variable Coefficient (Std. Err.)

Main question

Age 30-44 -0.395 (0.051)Age 45-59 -0.537 (0.059)Age 60+ -1.160 (0.065)Male 0.227 (0.040)1<Educ≤6 0.089 (0.065)6<Educ≤11 0.079 (0.070)Educ>11 0.136 (0.087)Country B 1.252 (0.046)Intercept -4.074 (0.135)

Auxiliary question 1

Age 30-44 -0.271 (0.118)Age 45-59 -0.388 (0.138)Age 60+ -1.262 (0.158)Male 0.217 (0.095)1<Educ≤6 -0.093 (0.159)6<Educ≤11 0.031 (0.168)Educ>11 0.073 (0.204)Country B 0.836 (0.101)Intercept -3.611 (0.203)

Auxiliary question 2

Age 30-44 -0.120 (0.113)Age 45-59 -0.250 (0.136)Age 60+ -0.883 (0.162)Male 0.345 (0.093)1<Educ≤6 0.206 (0.160)6<Educ≤11 0.259 (0.168)Educ>11 0.367 (0.200)Country B 1.235 (0.100)Intercept -2.945 (0.203)

29

Page 30: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Table 9: Estimation results : HOPIT τ4

Variable Coefficient (Std. Err.)

Main question

Age 30-44 -0.371 (0.059)Age 45-59 -0.526 (0.069)Age 60+ -1.095 (0.074)Male 0.188 (0.046)1<Educ≤6 0.054 (0.076)6<Educ≤11 0.090 (0.081)Educ>11 0.199 (0.101)Country B 1.223 (0.057)Intercept -3.364 (0.139)

Auxiliary question 1

Age 30-44 -0.285 (0.115)Age 45-59 -0.371 (0.135)Age 60+ -1.263 (0.155)Male 0.290 (0.093)1<Educ≤6 0.004 (0.157)6<Educ≤11 0.081 (0.166)Educ>11 0.127 (0.200)Country B 0.822 (0.100)Intercept -3.498 (0.201)

Auxiliary question 2

Age 30-44 -0.097 (0.118)Age 45-59 -0.221 (0.143)Age 60+ -0.842 (0.178)Male 0.371 (0.098)1<Educ≤6 0.161 (0.170)6<Educ≤11 0.118 (0.178)Educ>11 0.339 (0.212)Country B 1.263 (0.106)Intercept -2.461 (0.212)

30

Page 31: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Table 10: Estimation results : hierarchical partial credit

Variable Coefficient (Std. Err.)

Vignettes

Ivignette 2 -0.181 (0.349)Ivignette 3 -4.975 (0.264)Ivignette 4 -5.665 (0.265)Ivignette 5 -5.809 (0.265)Ivignette 6 -8.535 (0.362)Ivignette 7 -8.814 (0.394)

Mean

Age 30-44 -0.29 (0.066)Age 45-59 -0.461 (0.074)Age 60+ -1.052 (0.079)Male 0.096 (0.051)1<Educ≤6 0.108 (0.082)6<Educ≤11 0.198 (0.088)Educ>11 0.333 (0.111)Country B 0.563 (0.056)Intercept -4.327 (0.277)

31

Page 32: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Table 11: Estimation results : hierarchical partial credit δ1

Variable Coefficient (Std. Err.)

Main question

Age 30-44 -0.708 (0.126)Age 45-59 -0.614 (0.151)Age 60+ -1.196 (0.176)Male 0.233 (0.100)1<Educ≤6 0.177 (0.161)6<Educ≤11 0.150 (0.171)Educ>11 0.394 (0.223)Country B 1.383 (0.110)Intercept -3.664 (0.316)

Auxiliary question 1

Age 30-44 -0.306 (0.326)Age 45-59 -0.910 (0.352)Age 60+ -0.515 (0.427)Male -0.034 (0.251)1<Educ≤6 0.558 (0.363)6<Educ≤11 0.669 (0.395)Educ>11 0.429 (0.502)Country B 0.827 (0.266)Intercept -3.961 (0.481)

Auxiliary question 2

Age 30-44 0.225 (0.215)Age 45-59 0.222 (0.258)Age 60+ 0.022 (0.320)Male 0.339 (0.179)1<Educ≤6 -0.161 (0.346)6<Educ≤11 -0.322 (0.357)Educ>11 -0.017 (0.419)Country B 0.799 (0.188)Intercept -3.038 (0.446)

32

Page 33: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Table 12: Estimation results : hierarchical partial credit δ2

Variable Coefficient (Std. Err.)

Main question

Age 30-44 -0.042 (0.158)Age 45-59 -0.188 (0.187)Age 60+ -0.841 (0.204)Male 0.175 (0.123)1<Educ≤6 -0.153 (0.199)6<Educ≤11 -0.082 (0.213)Educ>11 -0.330 (0.273)Country B 0.396 (0.135)Intercept -5.569 (0.341)

Auxiliary question 1

Age 30-44 -0.477 (0.333)Age 45-59 -0.037 (0.365)Age 60+ -0.998 (0.466)Male 0.455 (0.260)1<Educ≤6 -0.451 (0.383)6<Educ≤11 -0.402 (0.416)Educ>11 0.258 (0.532)Country B 0.300 (0.275)Intercept -4.512 (0.497)

Auxiliary question 2

Age 30-44 -0.486 (0.235)Age 45-59 -0.558 (0.287)Age 60+ -1.111 (0.366)Male 0.408 (0.198)1<Educ≤6 0.649 (0.370)6<Educ≤11 0.754 (0.383)Educ>11 0.693 (0.452)Country B 0.688 (0.209)Intercept -5.606 (0.464)

33

Page 34: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Table 13: Estimation results : hierarchical partial credit δ3

Variable Coefficient (Std. Err.)

Main question

Age 30-44 -0.195 (0.134)Age 45-59 -0.331 (0.156)Age 60+ -0.529 (0.159)Male 0.143 (0.102)1<Educ≤6 0.195 (0.168)6<Educ≤11 0.076 (0.180)Educ>11 0.161 (0.224)Country B 0.775 (0.114)Intercept -5.62 (0.322)

Auxiliary question 1

Age 30-44 -0.014 (0.304)Age 45-59 -0.449 (0.350)Age 60+ -1.168 (0.417)Male -0.414 (0.244)1<Educ≤6 -0.504 (0.444)6<Educ≤11 -0.289 (0.468)Educ>11 -0.423 (0.572)Country B 0.556 (0.235)Intercept -3.342 (0.545)

Auxiliary question 2

Age 30-44 -0.089 (0.205)Age 45-59 -0.354 (0.253)Age 60+ -0.855 (0.329)Male -0.150 (0.175)1<Educ≤6 -0.044 (0.288)6<Educ≤11 0.289 (0.306)Educ>11 0.156 (0.359)Country B 0.512 (0.184)Intercept -4.00 (0.398)

34

Page 35: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

Table 14: Estimation results : hierarchical partial credit δ4

Variable Coefficient (Std. Err.)

Main question

Age 30-44 -0.167 (0.110)Age 45-59 -0.286 (0.127)Age 60+ -0.526 (0.133)Male 0.044 (0.085)1<Educ≤6 -0.069 (0.139)6<Educ≤11 -0.002 (0.148)Educ>11 0.101 (0.186)Country B 0.665 (0.096)Intercept -5.130 (0.298)

Auxiliary question 1

Age 30-44 -0.086 (0.243)Age 45-59 -0.005 (0.274)Age 60+ -0.588 (0.321)Male 0.536 (0.196)1<Educ≤6 0.497 (0.379)6<Educ≤11 0.396 (0.397)Educ>11 0.365 (0.466)Country B 0.343 (0.203)Intercept -7.336 (0.482)

Auxiliary question 2

Age 30-44 0.060 (0.198)Age 45-59 0.023 (0.249)Age 60+ -0.386 (0.347)Male 0.308 (0.171)1<Educ≤6 0.028 (0.291)6<Educ≤11 -0.293 (0.307)Educ>11 0.133 (0.359)Country B 0.856 (0.181)Intercept -4.947 (0.397)

References

[1] Andrich, D. (1988), Rasch Models for Measurement, Newbury Park, California: SagePublications.

[2] Bjorner J.B., and J.E. Ware (1998), “Using Modern Psychometric Methods to MeasureHealth Outcomes,” Medical Outcomes Trust Monitor, 3:2-3.

[3] Chamberlain, G. (1980), “Analysis of Covariance with Qualitative Data,” Review ofEconomic Studies 47:225-238.

35

Page 36: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

[4] Clark, A.E. (1997), “Job Satisfaction and Gender: Why Are Women So Happy atWork?” Labour Economics 4:341-372.

[5] Frechette, G.R. (2001), “Random-Effects Ordered Probit,” Stata Technical Bulletin59:23-26.

[6] Groot, W., and H.M. van den Brink (1999), “Job Satisfaction and Preference Drift,”Economics Letters 63:363-367.

[7] Groot, W. (2000), “Adaptation and Scale of Reference Bias in Self-Assessments ofQuality of Life,” Journal of Health Economics 19:403-420.

[8] Kelderman, H. (1984), “Loglinear Rasch Model Tests,” Psychometrika 49:223-245.

[9] Keller S.D., J.E. Ware, B. Gandek B et al. (1998), “Testing the Equivalence of Trans-lations of Widely Used Response Choice Labels,” Journal of Clinical Epidemiology51(11):933-944.

[10] Kerkhofs, M., and M. Lindeboom (1995), “Subjective Health Measures and State De-pendent Reporting Errors,” Health Economics 4:221-235.

[11] Long, J.S. (1997), Regression Models for Categorical and Limited Dependent Variables,Thousand Oaks, California: Sage Publications.

[12] Long, J.S., and J. Freese (2001), Regression Models for Categorical Dependent Vari-ables using STATA, College Station, Texas: STATA Press.

[13] Ludlow, L.H., and S.M. Haley (1995), “Rasch Model Logits: Interpretation, Use, andTransformation,” Educational and Psychological Measurement 55(6):967-975.

[14] MacIntosh, R. (1998), “Global Attitude Measurement: An Assessment of the WorldValues Survey Postmaterialism Scale,” American Sociological Review 63(1):452-464.

[15] Martin, E.A., P.C. Campanelli, and R.E. Fay (1991), “An Application of Rasch Analy-sis to Questionnaire Design: Using Vignettes to Study the Meaning of Work in theCurrent Population Survey,” The Statistician 40:265-276.

[16] Masters, G.N. (1982), “A Rasch Model for Partial Credit Scoring,” Psychometrika47(2):149-174.

[17] Masters, G.N. (1985), “A Comparison of Latent Trait and Latent Class Analyses ofLikert-Type Data,” Psychometrika 50(1):69-82.

[18] Masters, G.N., and B.D. Wright (1997), “The Partial Credit Model,” in van der Linden,W. and R.K. Hambleton (eds.), Handbook of Modern Item Response Theory, NewYork: Springer-Verlag.

[19] Murray, C.J.L. (1996), “Epidemiology and Morbidity Transitions in India,” in Das-Gupta, M., L.C. Chen, and T.N. Krishnan (eds.), Health, Poverty and Developmentin India, Delhi: Oxford University Press.

36

Page 37: Statistical Models for Enhancing Cross-Population ... · Statistical Models for Enhancing Cross-Population Comparability A. Tandon, C.J.L. Murray, J.A. Salomon, and G. King Global

[20] Murray, C.J.L., A. Tandon, J. Salomon, C.D. Mathers, and R. Sadana (2001), “Cross-Population Comparability of Evidence for Health Policy,” Global Programme on Evi-dence for Health Policy Discussion Paper, Geneva: World Health Organization.

[21] Powers, D.A., and Y. Xie (2000), Statistical Methods for Categorical Data Analysis,San Diego, California: Academic Press.

[22] Rabin, R., and F. deCharro (2001), “EQ-5D: a measure of health status from theEuroQol Group.” Annals of Medicine 33: 337-343.

[23] Sadana, R., C.D. Mathers, A.D. Lopez, C.J.L. Murray, and K. Iburg (2001), “Cross-Population Comparability,” in Murray, C.J.L. and J. Frenk (eds.), Health System Per-formance: Concepts, Measurement, and Determinants, Geneva: World Health Organi-zation.

[24] Salomon, J.A., A. Tandon, C.J.L. Murray (2001), “Using Vignettes to Improve Cross-Population Comparability of Health Surveys: Concepts, Design and Evaluation Tech-niques,” Global Programme on Evidence for Health Policy Discussion Paper No. 41,Geneva: World Health Organization.

[25] Samejima, F. (1997), “Graded Response Model,” in van der Linden, W. and R.K.Hambleton (eds.), Handbook of Modern Item Response Theory, New York: Springer-Verlag.

[26] Tandon, A., S. Chatterji, B. Ustun, J.A. Salomon, and C.J.L. Murray (2001), “Cross-Validation of Cut-Point Estimation Using Measured Tests and Vignettes: The Case ofVision,” Global Programme on Evidence for Health Policy Discussion Paper, Geneva:World Health Organization.

[27] Tjur, T. (1982), “A Connection Between Rasch’s Item Analysis Model and a Multi-plicative Poisson Model,” Scandinavian Journal of Statistics 9:23-30.

[28] Ustun, T.B., S. Chatterji, M. Villanueva et al. (2001), “WHO Multi-Country House-hold Survey Study on Health and Responsiveness 2000-2001,” Global Programme onEvidence for Health Policy Working Paper, Geneva: World Health Organization.

[29] van der Linden, W., and R.K. Hambleton (eds.) (1997), Handbook of Modern ItemResponse Theory, New York: Springer-Verlag.

[30] von Davier, M. (2001), WINMIRA User Manual, Kiel: IPN.

[31] Ware, J.E., and B. Gandek (1998), “Overview of the SF-36 Health Survey and theInternational Quality of Life Assessment (IQOLA) Project.” Journal of Clinical Epi-demiology 51(11): 903-912.

[32] Wright, B.D., and M. Mok (2000), “Rasch Models Overview,” Journal of AppliedMeasurement 1(1):83-106.

[33] Wright, B.D., and G.N. Masters (1982), Rating Scale Analysis: Rasch Measurement,Chicago: MESA Press.

37