Impact evaluation of multiple overlapping programs under a conditional independence assumption

28
Research in Economics 63 (2009) 27–54 www.elsevier.com/locate/rie Impact evaluation of multiple overlapping programs under a conditional independence assumption Nguyen Viet Cuong * Faculty of Trade Economics, National Economics University, 207 Giai Phong Street, Hanoi, Post code: 04, Viet Nam Received 26 July 2007; accepted 5 October 2008 Abstract Under the assumption on conditional independence between potential outcomes and program assignment, program impacts measured by the Average Treatment Effect (ATE) and the Average Treatment Effect on Treated (ATT) can be identified and estimated using cross-section regression or propensity score matching (PSM). Traditional impact literature often deals with the impact evaluation of a single program. In reality, one can participate in several programs simultaneously and the programs may be correlated. This paper discusses cross-section regression and PSM methods in this general context. It is shown that under the PSM method, impact of a program of interest can be measured as a weighted average of program impacts on groups with different program statuses. Estimation of impacts of multiple overlapping programs is illustrated using Monte Carlo simulation and an empirical example of impact measurement of international and internal remittances in Vietnam. c 2008 University of Venice. Published by Elsevier Ltd. All rights reserved. Keywords: Treatment effect; Impact evaluation; Multiple programs; Matching; Propensity score 1. Introduction The main objective of impact evaluation is to assess the extent to which a program has changed outcomes of subjects. 1 The average impact of a program on a group of subjects is defined as the difference between their outcome in the status of the program and their outcome in the status of no-program. However, for each subject, we are not able to observe the two potential outcomes at the same time. For example, for a participant in a program, we can observe her outcome in the presence of the program, but we cannot observe her outcome if she had not participated in the program, i.e., the outcome in the absence of the program. This missing data problem can be solved if the assumption of conditional independence of treatment and potential outcomes holds (Rubin, 1977). Under this assumption, the program impact can be estimated by traditional cross-section regression and matching methods. The idea of the matching method is to compare the outcomes of participants and non-participants who have the similar distribution of conditioning pre-treatment variables. Matching by conditioning variables becomes difficult when there are a large number of these variables. Rosenbaum and Rubin (1983) proved that the program impact can be identified conditional on the probability of being assigned * Tel.: +84 904159258; fax: +84 4 38693369. E-mail addresses: c [email protected], [email protected]. 1 In literature of impact evaluation, a broader term “treatment” instead of program/project is sometimes used to refer to an intervention whose impact is evaluated. 1090-9443/$ - see front matter c 2008 University of Venice. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.rie.2008.10.001

Transcript of Impact evaluation of multiple overlapping programs under a conditional independence assumption

Research in Economics 63 (2009) 27–54www.elsevier.com/locate/rie

Impact evaluation of multiple overlapping programs under aconditional independence assumption

Nguyen Viet Cuong∗

Faculty of Trade Economics, National Economics University, 207 Giai Phong Street, Hanoi, Post code: 04, Viet Nam

Received 26 July 2007; accepted 5 October 2008

Abstract

Under the assumption on conditional independence between potential outcomes and program assignment, program impactsmeasured by the Average Treatment Effect (ATE) and the Average Treatment Effect on Treated (ATT) can be identified andestimated using cross-section regression or propensity score matching (PSM). Traditional impact literature often deals with theimpact evaluation of a single program. In reality, one can participate in several programs simultaneously and the programs maybe correlated. This paper discusses cross-section regression and PSM methods in this general context. It is shown that under thePSM method, impact of a program of interest can be measured as a weighted average of program impacts on groups with differentprogram statuses. Estimation of impacts of multiple overlapping programs is illustrated using Monte Carlo simulation and anempirical example of impact measurement of international and internal remittances in Vietnam.c© 2008 University of Venice. Published by Elsevier Ltd. All rights reserved.

Keywords: Treatment effect; Impact evaluation; Multiple programs; Matching; Propensity score

1. Introduction

The main objective of impact evaluation is to assess the extent to which a program has changed outcomes ofsubjects.1 The average impact of a program on a group of subjects is defined as the difference between their outcomein the status of the program and their outcome in the status of no-program. However, for each subject, we are not ableto observe the two potential outcomes at the same time. For example, for a participant in a program, we can observeher outcome in the presence of the program, but we cannot observe her outcome if she had not participated in theprogram, i.e., the outcome in the absence of the program. This missing data problem can be solved if the assumptionof conditional independence of treatment and potential outcomes holds (Rubin, 1977). Under this assumption, theprogram impact can be estimated by traditional cross-section regression and matching methods. The idea of thematching method is to compare the outcomes of participants and non-participants who have the similar distribution ofconditioning pre-treatment variables.

Matching by conditioning variables becomes difficult when there are a large number of these variables. Rosenbaumand Rubin (1983) proved that the program impact can be identified conditional on the probability of being assigned

∗ Tel.: +84 904159258; fax: +84 4 38693369.E-mail addresses: c [email protected], [email protected].

1 In literature of impact evaluation, a broader term “treatment” instead of program/project is sometimes used to refer to an intervention whoseimpact is evaluated.

1090-9443/$ - see front matter c© 2008 University of Venice. Published by Elsevier Ltd. All rights reserved.doi:10.1016/j.rie.2008.10.001

28 N.V. Cuong / Research in Economics 63 (2009) 27–54

to the program (the so-called propensity score). Thus, multidimensional matching can be achieved by matching basedon the propensity score instead of the conditioning variables.

The literature on program impact evaluation often neglects other programs that simultaneously impact onparticipants and non-participants of the program in question. Imbens (1999) and Lechner (2001) extend the methodof the propensity score matching (PSM) to multiple mutually exclusive programs. Frolich (2002) discusses differentimpact evaluation methods including those based on the CIA in a similar context. However, in reality the programs areoften overlapping. Some people can join several programs at the same time. For example, for evaluation of a micro-credit program that is provided by a bank, the participants and non-participants in the program can receive credit fromother sources such as private lenders, relatives and other credit institutions. Without taking into account the impactsof the other programs, the estimation of the impact of the program of interest can be biased.

This paper discusses the CIA-based methods consisting of cross-section regression and PSM to this more generalcontext in which people may participate in several programs simultaneously. It is shown that impact of a particularprogram can be identified and estimated using the methods of cross-section regression and PSM. Under the matchingmethod, the impact of a program can be measured as a weighted average of impacts of the program on groups withvarious program statuses. Evidence from Monte Carlo simulation shows that this matching method can lead to lowermean-squared-error (MSE) than matching that simply uses variables of participation in other programs as conditioningvariables.

The paper is organized as follows. The second section discusses the methods of cross-section regression and PSMin impact evaluation of a single program. The third section extends the methods to the case of multiple overlappingprograms. Simulation results are presented in the fourth section. The fifth section illustrates estimation of impacts ofinternational and internal remittances on per capita expenditure using Vietnam Household Living Standard Survey2004. Finally the sixth section concludes.

2. Impact evaluation of a single program

2.1. Problems and parameters of interest

The main objective of impact evaluation of a program is to assess the extent to which the program has changedoutcomes of subjects. To make the definition of “impact” more explicit, suppose that some people in populationP are assigned to a program, and denote D as a binary variable for participation in the program, i.e. D equals 1if one participates in the program, and D equals 0 otherwise. Further let Y0 and Y1 denote the potential outcomescorresponding to the states of program and no-program.2

The impact of the program on the outcome of person i is measured by the following difference:

∆i = Y1i − Y0i . (2.1)

This is the difference between the outcome of the person when she participates in the program and the potentialoutcome of that person when she does not participate in the program. The problem is that we cannot observe bothoutcomes in Eq. (2.1) for one person. The unobservable outcome is called counterfactual.

It is almost impossible to estimate the program impact for each person (Heckman et al., 1999), since we neverknow the counterfactual outcome. However, an average program impact can be estimated for a group of subjects. Twoparameters that are most popular are the Average Treatment Effect (ATE), and the Average Treatment Effect on theTreated (ATT).3

ATE is the expected impact of the program on a person who is randomly selected and assigned into the program. Itis defined as4:

AT E = E(∆) = E(Y1 − Y0) = E(Y1)− E(Y0). (2.2)

2 Y0 and Y1 can be vectors of outcomes, but for simplicity let us consider a single outcome of interest.3 There are other parameters such as local average treatment effect, marginal treatment effect, or even effect of “treatment on non-treated” which

measures what impact the program would have on the non-participants if they had participated in the program, etc.4 For simplicity, subscript i is dropped in some formulas.

N.V. Cuong / Research in Economics 63 (2009) 27–54 29

ATT is the expected impact of the program on the actual participants:

AT T = E(∆|D = 1) = E(Y1 − Y0|D = 1) = E(Y1|D = 1)− E(Y0|D = 1). (2.3)

More generally, we can allow these parameters to vary across observed variables X , since one might be interested inprogram impact on certain groups that are specified by the X variables:

AT E(X) = E(∆|X) = E(Y1|X)− E(Y0|X), (2.4)

and

AT T(X) = E(∆|X, D = 1) = E(Y1|X, D = 1)− E(Y0|X, D = 1). (2.5)

Estimation of ATE and ATT is not straightforward, since there are some components that we cannot observedirectly. Eq. (2.2) can be rewritten as:

AT E = E(Y1)− E(Y0) = [E(Y1|D = 1)− E(Y0|D = 1)] Pr(D = 1)

+ [E(Y1|D = 0)− E(Y0|D = 0)] Pr(D = 0) , (2.6)

where Pr(D = 1) and Pr(D = 0) are proportions of participants and non-participants in the program, respectively.The first term in (2.6) is the very parameter ATT multiplied by the proportion of the participants, while the secondterm is the Average Treatment Effect of the Non-Treated (ATNT) multiplied by the proportion of the non-participants,which measures the effect that the non-participants would have gained if they had participated in the program:

AT N T = E(Y1|D = 0)− E(Y0|D = 0). (2.7)

The problem in measuring ATE and ATT is that the counterfactual terms E(Y1|D = 0) and E(Y0|D = 1) arenot observed and cannot be estimated directly. Different methods have been devised to estimate ATE and ATT undercertain assumptions on how the program is assigned to people in the population and how the outcomes are determined.This paper will discuss methods of regression and matching which rely on the conditional independence assumption(CIA).

2.2. Impact evaluation under the conditional independence assumption

A popular way to discuss program impact evaluation is to use a model of potential outcome equations (Heckmanet al., 1999; Heckman, 2005), in which the potential outcomes Y0 and Y1 are expressed as functions of conditioningvariables, X :

Y0 = α0 + Xβ0 + ε0, (2.8)

Y1 = α1 + Xβ1 + ε1. (2.9)

In fact, Y0 and Y1 can be any functions of X , not necessarily linearly or parametrically specified, and all identificationstrategies presented in this paper are still valid when Y0 and Y1 are non-linear functions of X . The assumption on thelinear function is made for simplicity of the description of the regression methods. For the matching method, there isno assumption imposed on the functional form of the outcomes.

Substituting (2.8) and (2.9) into (2.4) and (2.5), we get the conditional parameters, AT E(X) and AT T(X):

AT E(X) = (α1 − α0)+ X (β1 − β0)+ E(ε1 − ε0|X), (2.10)

AT T(X) = (α1 − α0)+ X (β1 − β0)+ E(ε1 − ε0|X, D = 1). (2.11)

Without additional assumptions, (2.10) and (2.11) cannot be identified since they contain the unobserved terms,E(ε1−ε0|X) and E(ε1−ε0|X, D = 1). The key assumption to identify the parameters using cross-section regressionsor matching methods is the conditional independence assumption.

Assumption 2.1 (CIA).

Y0, Y1 ⊥ D|X (A.2.1)

30 N.V. Cuong / Research in Economics 63 (2009) 27–54

The assumption states that once conditioned on the variables X , the potential outcomes Y0 and Y1 are independentof the program assignment. In Rosenbaum and Rubin (1983), this assumption is called ignorability of treatment orconditional independence.5

To estimate the conditional parameters using the regression method, it is required that X be exogenous in thepotential outcome equations:

Assumption 2.2.

E(ε0|X) = E(ε1|X) = 0. (A.2.2)

Under assumptions (A.2.1) and (A.2.2), the unobserved terms in the parameters AT E(X) and AT T(X) vanish:

E(ε1 − ε0|X, D = 1) = E(ε1 − ε0|X) = E(ε1|X)− E(ε0|X) = 0. (2.12)

The two parameters AT E(X) and AT T(X) are the same, and they can be estimated without bias if coefficientsα0, α1, β0, β1 are estimated. It should be noted that the observed outcome equation can be written using a switchingmodel (Quandt, 1972) as follows:

Y = DY1 + (1− D)Y0 = α0 + Xβ0 + D [(α1 − α0)+ X (β1 − β0)]+ [(ε1 − ε0)D + ε0] . (2.13)

The coefficients in (2.13) can be estimated without bias using OLS, since the error term has the conventional property

E [D(ε1 − ε0)+ ε0|X, D] = DE(ε1 − ε0|X, D)+ E(ε0|X, D) = 0. (2.14)

Matching is a non-parametric method to estimate AT E(X) and AT T(X)under the CIA. We have:

AT E(X) = E(Y1|X)− E(Y0|X) = E(Y1|X, D = 1)− E(Y0|X, D = 0), (2.15)

AT T(X) = E(Y1|X, D = 1)− E(Y0|X, D = 1) = E(Y1|X, D = 1)− E(Y0|X, D = 0). (2.16)

As a result, AT E(X) and AT T(X) are the same, and they can be estimated by comparing the outcome of the participantsand the outcome of a so-called comparison group, which comprises subjects who do not participate in the programbut have variables X identical to those of the participants. Thus the matching method assumes the existence of such acomparison group. This assumption is called common support. Let p(X) denote the propensity score, the conditionalprobability of participating in the program, given the pre-treatment variables X . Then, the common support assumptioncan be stated formally as follows:

Assumption 2.3.

0 < p(X) = P(D = 1|X) < 1. (A.2.3)

Compared with the parametric regression, it relaxes assumption on exogeneity of X , A.2.2 at the cost of the assumptionon common support, A.2.3.

The difficulty in the matching method is how to find matched non-participants for the participants when there aremany variables X . A popular solution is proposed by Rosenbaum and Rubin (1983) who show that if the potentialoutcomes are independent of the program assignment given the variables X , then they are also independent of theprogram assignment given the propensity score.6

Proposition 2.1 (Rosenbaum and Rubin, 1983).

Y0, Y1 ⊥ D|X ⇒ (Y0, Y1) ⊥ D|p(X),

where p(X) = Pr(D = 1|X) = E(D|X).

5 The assumption is sometimes stated in a weaker version called the conditional mean independence assumption, i.e., E(Y0|X, D) = E(Y0|X)and E(Y1|X, D) = E(Y1|X). In addition, if the parameter of interest is ATT, the required assumption is E(Y0|X, D) = E(Y0|X) (or we only needY0 ⊥ D|X instead of Y0, Y1 ⊥ D|X ). In the paper we often mention the CIA, since the conditional mean independence assumptions which involvethe expectation terms are a bit abstract and difficult to be interpreted.

6 Other matching methods are subclassification (Cochran and Chambers, 1965; Cochran, 1968), and covariate matching (Rubin, 1979, 1980).

N.V. Cuong / Research in Economics 63 (2009) 27–54 31

Using this proposition, AT E(X) and AT T(X) are rewritten as:

AT E(X) = AT T(X) = E(Y1|p(X), D = 1)− E(Y0|p(X), D = 0). (2.17)

Thus non-participants are matched with the participants based on the propensity score. Once the comparison isconstructed, the parameters of program impact can be estimated by comparing the outcome of the comparison andtreatment groups.

Finally, the unconditional parameters, ATE and ATT, can be identified and estimated by simply taking theexpectation of the conditional parameters, AT E(X) and AT T(X), since:

AT E = EX(

AT E(X))=

∫X

AT E(X)dF(X), (2.18)

AT T = EX |D=1(

AT T(X))=

∫X |D=1

AT T(X)dF(X |D = 1), (2.19)

where F(.) and F(.|D = 1) are the distribution functions of X and X |D = 1, respectively.

3. Impact evaluation in multiple correlated programs under the conditional independence assumption

For illustration of the ideas, this section discusses impact evaluation in the case of two programs under the CIA.Estimation of program impacts in the case of multiple programs is very similar and presented in Appendix B.

3.1. Parameters of interest

Suppose that there are two programs that are assigned to some people in the population. Denote D as a vectorvariable of program participation for a person. D contains two binary variable elements: d1 and d2, i.e.,

D =

(d1d2

)where d1 = 1 if the person receives program 1, and d1 = 0 otherwise; similarly d2 = 1 if the person receives program2, and d2 = 0 otherwise. As a result, the set of the potential treatments has 4 values:

ΩD =

(11

);

(10

);

(01

);

(00

). (3.1)

Further, let Y denote the observed value of an outcome of interest. This variable equals one of the potential outcomesin ΩY P

D= Y11; Y10; Y01; Y00. These potential outcomes correspond to the values of the participation variable D. The

potential outcome Y PD can be written as a function of observed variables X and unobserved variables ε:

Y11 = α11 + Xβ11 + ε11,

Y10 = α10 + Xβ10 + ε10,

Y01 = α01 + Xβ01 + ε01,

Y00 = α00 + Xβ00 + ε00.7

The observed outcome can be written in terms of the potential outcomes as follows:

Y = d1d2Y11 + d1(1− d2)Y10 + (1− d1)d2Y01 + (1− d1)(1− d2)Y00

= d1d2(Y11−Y 10−Y01 + Y00)+ d1(Y10 − Y00)+ d2(Y01 − Y00)+ Y00

= d1d2(α12 + Xβ12 + ε12)+ d1(α1 + Xβ1 + ε1)+ d2(α2 + Xβ2 + ε2)+ α0 + Xβ0 + ε0, (3.2)

7 In some equations, the superscript “P” is dropped for simplicity.

32 N.V. Cuong / Research in Economics 63 (2009) 27–54

where:

α0 = α00,

α1 = α10 − α00,

α2 = α01 − α00,

α12 = α11 − α10 − α01 + α00,

β0 = β00,

β1 = β10 − β00,

β2 = β01 − β00,

β12 = β11 − β10 − β01 + β00,

ε0 = ε00,

ε1 = ε10 − ε00,

ε2 = ε01 − ε00,

ε12 = ε11 − ε10 − ε01 + ε00.

This way of denotation has two advantages. Firstly, it implies the program variables d1 and d2 that are interactedwith the potential outcomes. For example, α12 means that a linear combination of α parameters that is multipliedwith d1d2, while α1 means that a linear combination of α parameters that is multiplied with d1. Secondly, it allowsfor simple algebra when there are more than two programs (see Appendix B), since there is a relation between thedenoted parameters as follows:

α1 = α10 − α00 = α10 − α0,

α2 = α01 − α00 = α01 − α0,

α12 = α11 − α10 − α01 + α00 = α11 − α1 − α2 − α0.

We will focus on the impact of program d1. The discussion of program d2 is the same. Impact of program d1 on aperson is equal to:

∆i (d1) = Yd1=1,d2,X,ε − Yd1=0,d2,X,ε = d2(α12 + Xβ12 + ε12)+ (α1 + Xβ1 + ε1). (3.3)

The conditional parameters AT E(X) and AT T(X) for d1 are expressed as follows:

AT E1(X) = E [∆i (d1)|X ]= E

(Yd1=1,d2,X,ε − Yd1=0,d2,X,ε|X

)= [E(d2|X)(α12 + Xβ12)+ α1 + Xβ1]+ [E(d2ε12|X)+ E(ε1|X)] (3.4)

AT T 1(X) = E [∆i (d1)|X, d1 = 1]= E

(Yd1=1,d2,X,ε − Yd1=0,d2,X,ε|X, d1 = 1

)= [E(d2|X, d1 = 1)(α12 + Xβ12)+ α1 + Xβ1]+ [E(d2ε12|X, d1 = 1)+ E(ε1|X, d1 = 1)] . (3.5)

Similar to the case of a single program, AT E1(X) and AT T 1(X) are not identified without additional assumptions,since (3.4) and (3.5) contain unobserved components. It should be noted that there are two possibilities for correlationbetween d1 and d2. In the first case, d1 is correlated with d2 but once conditional on X , they are independent of eachother, i.e.,

d1 ⊥ d2|X. (3.6)

In this case the program impact of d1 can be estimated similar to the case of the single program, i.e., the program d2can be ignored provided that all variables X are controlled for.

In the second case, there is correlation between d1 and d2 even after conditional on X . This can be the case, ifparticipation in one program affects participation in the other program. For example people getting the vocationaltraining might be more eager for borrowing micro credit given their characteristics X .

N.V. Cuong / Research in Economics 63 (2009) 27–54 33

To identify the program impacts, the CIA is expressed as follows:

Assumption 3.1.

Y11, Y10, Y01, Y00 ⊥ D|X, where D =

(d1d2

). (A.3.1)

We allow the correlation between d1 and d2 given X , thus the identification assumption that the methods rely on is theassumption (A.3.1). Under assumption (A.3.1), the program impacts can be identified parametrically using the OLSregression method, and non-parametrically using the matching method. The unconditional parameters, ATE and ATT,can be then identified and estimated due to (2.18) and (2.19).

3.2. Linear regression method

Although regression can be estimated in any form, e.g., linear or non-linear, or even non-parametric, for simplicitythis section shows how to estimate the program impacts parametrically using the linear regression method.

Similar to the case of a single program, we need an assumption on exogeneity of X in the potential outcomeequations, i.e.:

Assumption 3.2.

E (ε11|X) = E (ε10|X) = E (ε01|X) = E (ε00|X) = 0. (A.3.2)

Proposition 3.1. Under the assumptions (A.3.1) and (A.3.2), the linear regression produces unbiased estimators ofthe all conditional and unconditional parameters, AT E1(X), AT T 1(X), AT E1 and AT T 1.

The proof is very simple as follows. Firstly, the program impact parameters are identified under assumptions (A.3.1)and (A.3.2), since:

[E(d2ε12|X)+ E(ε1|X)] = 0, (3.7)

[E(d2ε12|X, d1 = 1)+ E(ε1|X, d1 = 1)] = 0.8 (3.8)

Secondly, parameters α12, β12, α1, β1 are estimated unbiasedly from Eq. (3.2). Rewrite (3.2) as:

Y = α0 + Xβ0 + d1d2(α12 + Xβ12)+ d1(α1 + Xβ1)+ d2(α2 + Xβ2)+ (d1d2ε12 + d1ε1 + d2ε2 + ε0) (3.9)

in which the error term has the following conventional property:

E(d1d2ε12 + d1ε1 + d2ε2 + ε0|X, d2, d1) = d1d2 E(ε12|X, d2, d1)+ d1 E(ε1|X, d2, d1)

+ d2 E(ε2|X, d2, d1)+ E(ε0|X, d2, d1)

= 0.

Thus unbiased estimators of AT E1(X) and AT T 1(X) are:

AT E1(X) =⌊(α12 + X β12)E(d2|X)+ α1 + X β1

⌋, (3.10)

AT T 1(X) =⌊(α12 + X β12)E(d2|X, d1 = 1)+ α1 + X β1

⌋, (3.11)

where E(d2|X) can be sample mean of the variable d2 for given X . The unconditional parameters are estimated using(2.18) and (2.19).

There are two points that should be noted. Firstly, Eq. (3.9) allows for the overlap between participation in programd1 and participation in program d2. If the two programs are mutually exclusive, d1d2 will be equal to zero. Secondly,

8 The proofs of (3.7) and (3.8) are presented in Appendix A.

34 N.V. Cuong / Research in Economics 63 (2009) 27–54

AT E1(X) is not necessarily equal to AT T 1(X) as in the case of a single program. These two parameters are the sameif the following equation holds:

E(d2|X) = E(d2|X, d1 = 1). (3.12)

(3.12) holds if condition (3.6) is satisfied, i.e., d1 and d2 are independent conditional on X .

3.3. Matching method

Suppose we are interested in the impact of program d1. The program impact is measured by the parametersAT E1(X) and AT T 1(X) as follows:

AT E1(X) = E(Yd1=1 − Yd1=0|X), (3.13)

AT T 1(X) = E(Yd1=1 − Yd1=0|X, d1 = 1). (3.14)

To express the two parameters in terms of the four potential outcomes, we rearrange (3.14):

AT T 1(X) = E(Yd1=1|X, d1 = 1)− E(Yd1=0|X, d1 = 1)

= [E(Y11|X, d1 = 1, d2 = 1)Pr(d2 = 1|X, d1 = 1)+ E(Y10|X, d1 = 1, d2 = 0)

× Pr(d2 = 0|X, d1 = 1)]− [E(Y01|X, d1 = 1, d2 = 1)Pr(d2 = 1|X, d1 = 1)

+ E(Y00|X, d1 = 1, d2 = 0)Pr(d2 = 0|X, d1 = 1)]

= [E(Y11|X, d1 = 1, d2 = 1)− E(Y01|X, d1 = 1, d2 = 1)] Pr(d2 = 1|X, d1 = 1)

+ [E(Y10|X, d1 = 1, d2 = 0)− E(Y00|X, d1 = 1, d2 = 0)] Pr(d2 = 0|X, d1 = 1). (3.15)

It is worth noting two points in (3.15). Firstly, (3.15) allows for the overlap between participation in program d1and participation in program d2. If the two programs are mutually exclusive, then Pr(d2 = 1|X, d1 = 1) will be equalto 0, and Pr(d2 = 0|X, d1 = 1) is equal to 1. In this case the implementation of the matching method is similar to thecase of a single binary program, taking into account that the comparison group should exclude those who participatein program d2.

Secondly, (3.15) allows for correlation between d1 and d2 given X . If the two programs are uncorrelated given X(i.e., condition (3.6) holds), then:

E(Y11|X, d1 = 1, d2 = 1) = E(Y10|X, d1 = 1, d2 = 0),

E(Y01|X, d1 = 1, d2 = 1) = E(Y00|X, d1 = 1, d2 = 0).

As a result, the estimation of the program impacts is similar to the case of a single program.Similarly, the average treatment effect on the non-treated conditional on X is written as follows:

AT N T 1(X) = E(Yd1=1|X, d1 = 0)− E(Yd1=0|X, d1 = 0)

= [E(Y11|X, d1 = 0, d2 = 1)− E(Y01|X, d1 = 0, d2 = 1)] Pr(d2 = 1|X, d1 = 0)

+ [E(Y10|X, d1 = 0, d2 = 0)− E(Y00|X, d1 = 0, d2 = 0)] Pr(d2 = 0|X, d1 = 0). (3.16)

AT E1(X) can be expressed in terms of the potential outcome Y11, Y10, Y01, Y00 using Eq. (2.6) and the results from(3.15) and (3.16):

AT E1(X) = AT T 1(X) Pr(d1 = 1|X)+ AT N T 1(X) Pr(d1 = 0|X). (3.17)

In addition to the assumption (A.3.1), to estimate AT E1(X) and AT T 1(X) for program d1 the matching methodrequires that there be remaining people who do not participate in the program d1 but have identical distribution of theX variables given program d2. This is the common support assumption:

Assumption 3.3.

0 < P(d1 = 1|X, d2 = 0) < 1

0 < P(d1 = 1|X, d2 = 1) < 1(A.3.3)

where P(d1 = 1|X, d2) is the conditional probability of being assigned the program d1 given the X variables and d2.

N.V. Cuong / Research in Economics 63 (2009) 27–54 35

This assumption can be written using denotation of the vector variable D:

0 < P(D = D∗|X) < 1 where D∗ ∈ ΩD =

(11

);

(10

);

(01

);

(00

).

However, assumption (A.3.3) is mentioned to emphasize that the program of interest is d1.

Proposition 3.2. Under assumptions (A.3.1) and (A.3.3), the conditional and unconditional parameters, AT E1(X),AT T 1(X), AT E1 and AT T 1 are identified by the matching method.

This proposition results from assumptions (A.3.1) and (A.3.3), which allow the unobservable outcomes in (3.15)and (3.16) to equal the observable outcomes:

E(Y01|X, d1 = 1, d2 = 1) = E(Y01|X, d1 = 0, d2 = 1) (3.18)

E(Y00|X, d1 = 1, d2 = 0) = E(Y00|X, d1 = 0, d2 = 0) (3.19)

E(Y11|X, d1 = 0, d2 = 1) = E(Y11|X, d1 = 1, d2 = 1) (3.20)

E(Y10|X, d1 = 0, d2 = 0) = E(Y10|X, d1 = 1, d2 = 0). (3.21)

As a result, AT T 1(X) and AT N T 1(X) are identified. The parameter AT E1(X) is identified as in (3.17). Theunconditional parameters AT E1 and AT T 1 can be identified simply by taking the expectation of the conditionalparameters over the range of the X variables and d2 as in Eqs. (2.18) and (2.19).9

To estimate the parameters, the non-participants of program d1 will be matched to participants of this programbased on the closeness of the distance between the pre-treatment variables. The matching is performed for people whohave the same program variable d2, i.e., the participants and matched non-participants have the same participationstatus in program d2.

Let nic denote as the number of non-participants who are matched with the participant i , and let w(i, j) be theweight attached to the outcome of each matched non-participant j , j = 1, . . . , nic. These weights are non-negativeand sum up to 1, i.e.,

nic∑j=1

w(i, j) = 1.

Weights can be equal weights, e.g., as in n nearest-neighbor matching or different weights e.g., kernel matching.The estimator of AT T 1(X) at a given value x of the pre-treatment variables X is:

AT T 1(X=x) =1

nx1 + nx2

nx1

∑d2=0,X i=x

[Y1i −

nic∑j=1

w(i, j)Y0 j

]

+ nx2

∑d2=1,X i=x

[Y1i −

nic∑j=1

w(i, j)Y0 j

], (3.22)

where

• nx1 is the number of units who have d1 = 1; d2 = 0; X = x .• nx2 is the number of units who have d1 = 1; d2 = 1; X = x .• Y1i and Y0 j are the observed outcomes of participant i and non-participant j with X = x .

9 If we are interested in the impact of the d1 program, we only need assumptions which are specified by (3.18) through (3.21) to identify theprogram impact. Assumption (A.3.1) is a general (strong) one which is required to estimate impacts of any change in the program status on anygroup. For example, one can be interested in joint impact of the two programs on the treated, which is defined as:

AT T 12(X) = E(Y11 − Y00|X, d1 = 1, d2 = 1).

Then assumption (A.3.1) guarantees that E(Y00|X, d1 = 1, d2 = 1) = E(Y00|X, d1 = 0, d2 = 0) so that AT T 12(X) can be identified.

36 N.V. Cuong / Research in Economics 63 (2009) 27–54

To estimate AT N T 1(X), each non-participant j is matched with n jc participants based on the closeness of variablesX . The formula of the estimator of AT N T 1(X) is similar to (3.22):

AT N T 1(X=x) =1

n′x1 + n′x2

n′x1

∑d2=0,X j=x

[Y0 j −

n jc∑i=1

w(i, j)Y1i

]

+ n′x2

∑d2=1,X j=x

[Y0 j −

n jc∑i=1

w(i, j)Y1i

] , (3.23)

where

• n′x1 is the number of units who have d1 = 0; d2 = 0; X = x .• n′x2 is the number of units who have d1 = 0; d2 = 1; X = x .• w(i, j) is the weight attached to the outcome of participant i who is matched to non-participant j . The weights are

also non-negative and sum up to 1.• Y1i and Y0 j are the observed outcomes of participant i and non-participant j with X = x .

The estimator of AT E1(X)is the weighted average of the estimators of AT T 1(X) and AT N T 1(X) according toformula (3.17).

Finally the estimators of the unconditional parameters are:

AT T 1 =1∑

I d1 = 1; x ∈ SX

∑x∈SX |d1=1

AT T 1(X=x),

AT E1 =1∑

I x ∈ SX

∑x∈SX

AT E1(X=x),

where I is an indicator function that equals 1 if the value of is true, and 0 otherwise; SX is the space of the Xvariables in the data sample.

3.4. Matching using the propensity score

As mentioned, a popular way to perform matching is to use the propensity score (Rosenbaum and Rubin, 1983).Proposition 2.1 is extended straightforward to the case of two multiple overlapping programs as follows:

Proposition 3.3.

Y11, Y10, Y01, Y00 ⊥ D|X ⇒ Y11, Y10, Y01, Y00 ⊥ D|P(D|X),

where:

D =

(d1d2

),

P(D|X) = P(D = D∗|X) with D∗ ∈ ΩD =

(11

);

(10

);

(01

);

(00

).

The proof is very similar to the case of one binary program in Rosenbaum and Rubin (1983).The proposition means if the CIA holds for the X variables, it also holds for the propensity scores. To perform the

propensity score matching, a multinomial model can be used to predict the conditional probability of being assigned toeach program status (4 statues) given X , i.e., P (d1 = 1, d2 = 1|X), P (d1 = 1, d2 = 0|X), P (d1 = 0, d2 = 1|X) andP (d1 = 0, d2 = 0|X). The propensity scores will be selected depending on the program statuses of matched people.For example, if we want to match people having program status d1 = 1, d2 = 1with those having d1 = 1, d2 = 0,

N.V. Cuong / Research in Economics 63 (2009) 27–54 37

the probabilities P (d1 = 1, d2 = 1|X) and P (d1 = 1, d2 = 0|X)will be used as the propensity scores. Inconvenienceof this matching method is that the matching is performed using two propensity scores.

Since we focus on impact of a program of interest, (e.g., program d1), and use the estimators based on (3.22)and (3.23), we perform the matching with exact match on the participation in the another program (program d2 in thisdiscussion). As a result, we do not need to use propensity score estimates from a multinomial model. More specifically,we can use a probit or logit model to predict P (d1 = 1|X) using one sample with d2 = 1 and another sample withd2 = 0. The predicted probabilities can be used as propensity scores to match participants with non-participants in thed1 program who have the same participation status of the d2 program.

4. Results from Monte Carlo simulation

4.1. Simulation design

This section presents simulation results of measuring ATT using the regression and PSM methods when thereare two overlapping programs. Suppose that two programs d1 and d2 are implemented simultaneously, and we areinterested in the impact of program d1. Corresponding to the values of d1 and d2, there are 4 potential outcomes,which are expressed as functions of covariates X and error terms ε:

Y00 = 10+ X1 + X2 + ε00, (4.1)

Y10 = 10+ k X1 + k X2 + ε10, (4.2)

Y01 = 10+ k X1 + k X2 + ε01, (4.3)

Y11 = 10+ gX1 + k X2 + ε11, (4.4)

where X1 and X2 follow a bivariate normal distribution N (µ1, µ2, σ1, σ2, ρ) = N (15, 15, 5, 5, 0.5). Each error termfollows a normal distribution N (µ, σ ) = N (0, 5). Impacts of programs d1 and d2 are changed through varyingthe coefficients of X1 and X2 from (1, 1) to (k, g). That the same coefficient k is specified in both (4.2) and(4.3) means if there is no correlation between d1 and d2, ATE and ATT of d1 are the same as those of d2. Thevalues of g and k will be changed to examine the sensitivity of the estimates to the magnitude of the programimpacts.

The assignment of programs d1 and d2 is designed in the two following scenarios.

Scenario 1: d1 and d2 are correlated, but once conditional on X1 they are independent:

W1 = X1 + u1,

d1 = 1 if W1 < W ∗1 ,

d1 = 0 otherwise,

and

W2 = X1 + u2,

d2 = 1 if W2 < W ∗2 ,

d2 = 0 otherwise,

where error terms u1 and u2 each follow a normal distribution N (µ, σ ) = N (0, 5).

Scenario 2: Conditional on X1, d1 and d2 are still correlated. This happens when one participating in d1 is morepromoted to participate in d2.

W1 = X1 + u1

d1 = 1 if W1 < W ∗1d1 = 0 otherwise,

and

W2 = X1 − 10d1 + u2,

38 N.V. Cuong / Research in Economics 63 (2009) 27–54

d2 = 1 if W2 < W ∗2 ,

d2 = 0 otherwise,

where error terms u1 and u2 each follow a normal distribution N (µ, σ ) = N (0, 5).

4.2. Simulation results

Tables 1 and 2 present the simulation results of estimation of ATT of program d1 using different estimators. Ineach table, there are four panels corresponding to the values of g and k. There are two OLS regression estimators, onewithout interactions between X and d1, d2:

Y = β0 + β1 X1 + β2 X2 + β3d1 + β4d2 + β5d1d2 + ε,

and one with the interaction:

Y = β0 + β1 X1 + β2 X2 + β3d1 + β4d2 + β5d1d2 + β6 X1d1 + β7 X2d1 + β8 X1d2 + β9 X2d2 + ε.

There are three methods of matching estimation using the metric of the propensity score. The first is matching usingtwo covariates X1 and X2. The second is matching using three covariates X1, X2 and d2. The third uses the matchingestimator given in (3.22). For each estimation strategy, there are three matching schemes to select non-participantsand weight their outcomes, i.e., nearest-neighbor matching, three nearest-neighbors matching, and kernel matchingwith bandwidth of 0.01.

Table 1 presents the results under scenario 1. It is showed that in terms of MSE, the regression methods perform bestsince the models are correctly specified. When the value of the coefficients g and k are small, matching method 3 givesa bit smaller MSE compared with matching methods 1 and 2. As the value of g and k increases, difference in MSEbetween the three matching methods increases. Method 3 results in the smallest MSE, then method 2, and method 1gives the largest MSE. The trend happens regardless of sample size and matching scheme. In addition, compared withmethods 1 and 2, method 3 works very well when the sample size is small (i.e., n = 250 and n = 500), and the kernelmatching estimator is used. This result suggests matching method 3 should be used when the impacts of d1 and d2 areexpected large.

The results of scenario 2 are presented in Table 2. Again the regression methods perform better than matching,especially in small samples. Matching method 1 gives substantial magnitude of MSE, since it is a biased estimator. Itis an evidence that although the participation in program d2 does not affect the participation on program d1, failing tocontrol program d2 will lead to bias in measuring impact of program d1 if they are correlated.

Matching method 3 has a bit smaller MSE compared with matching method 2 when the value of g and k are small.As the value of g and k increases, method 3 results in much lower MSE compared with method 2, especially in thecase of small sample sizes and kernel matching scheme.

5. An empirical example

This section illustrates the estimation of impacts of international and internal remittances on expenditure percapita of the receiving households in Vietnam. The parameter of interest is ATT. Households in Vietnam receive alarge amount of international and internal remittances annually. Some households can receive both international andinternal remittances at the same time. The receipt of remittances is expected to increase household consumption. Tomeasure impact of the remittance receipt, we use Vietnam Household Living Standard Survey (VHLSS) in 2004. Thesurvey was conducted by General Statistical Office of Vietnam with technical support of World Bank. The number ofhouseholds covered by the survey is 9188, which is representative at the regional level.

Table 3 presents the percentage and the number of households receiving remittances in the 2004 VHLSS. It showsthat there are 563 and 7825 sampled households who receiving international and internal remittances, respectively.There are 435 sampled household receiving both international and internal remittances.

Table 4 presents impact estimates of the receipts of international and internal remittances using the regression andpropensity score matching methods. To estimate the impacts by OLS regression, per capita consumption expenditure

N.V. Cuong / Research in Economics 63 (2009) 27–54 39

Table 1Mean impact ratio and MSE for two correlated programs: scenario 1.

Measurement n = 250 n = 500 n = 1000 n = 3000

Model parameter: k = 1.3; g = 1.5ATT 4.932 4.800 4.895 4.897Observed outcome 42.849 42.795 42.846 42.829Proportion with D1 = 1 0.241 0.240 0.240 0.240Proportion with D2 = 1 0.241 0.240 0.240 0.240Correlation between D1 and D2 0.303 0.305 0.305 0.306

IM MSE IM MSE IM MSE IM MSE

Regression methodWithout interaction 0.978 0.879 0.983 0.407 0.964 0.242 0.962 0.100With interaction 0.998 0.906 0.992 0.436 0.974 0.244 0.972 0.0891 nearest-neighbor matchingMethod 1 0.954 3.780 0.973 1.945 0.959 0.926 0.984 0.375Method 2 0.966 3.393 0.970 1.881 0.955 0.959 0.981 0.340Method 3 0.909 3.257 0.942 1.707 0.959 0.906 0.978 0.3003 nearest-neighbors matchingMethod 1 0.891 2.912 0.936 1.309 0.945 0.721 0.973 0.257Method 2 0.888 2.615 0.937 1.258 0.942 0.676 0.973 0.226Method 3 0.820 2.797 0.904 1.138 0.926 0.664 0.966 0.213Kernel matching (bandwidth = 0.01)Method 1 1.220 3.192 1.163 1.465 1.081 0.574 1.032 0.197Method 2 1.225 2.939 1.158 1.244 1.076 0.519 1.032 0.163Method 3 1.129 3.509 1.090 1.167 1.049 0.457 1.018 0.134

Model parameter: k = 1.5; g = 1.8ATT 8.066 8.033 8.008 8.053Observed outcome 44.709 44.692 44.678 44.706Proportion with D1 = 1 0.240 0.240 0.239 0.240Proportion with D2 = 1 0.240 0.241 0.240 0.240Correlation between D1 and D2 0.304 0.304 0.304 0.305

IM MSE IM MSE IM MSE IM MSE

Regression methodWithout interaction 0.958 1.002 0.965 0.562 0.967 0.277 0.966 0.161With interaction 0.973 0.976 0.977 0.526 0.978 0.236 0.976 0.1231 nearest-neighbors matchingMethod 1 0.941 5.953 0.970 3.030 0.982 1.374 0.993 0.467Method 2 0.937 4.472 0.979 2.393 0.981 1.219 0.995 0.419Method 3 0.897 4.595 0.953 2.084 0.971 0.959 0.988 0.3343 nearest-neighbors matchingMethod 1 0.919 3.952 0.952 2.073 0.967 0.939 0.985 0.335Method 2 0.906 3.125 0.951 1.594 0.967 0.759 0.985 0.263Method 3 0.862 3.365 0.921 1.491 0.951 0.700 0.979 0.228Kernel matching (bandwidth = 0.01)Method 1 1.212 5.699 1.139 2.817 1.085 1.125 1.039 0.335Method 2 1.206 5.033 1.141 2.375 1.084 0.867 1.039 0.263Method 3 1.083 3.508 1.065 1.438 1.049 0.595 1.024 0.184

Model parameter: k = 1.8; g = 2.1ATT 12.075 12.161 12.040 12.148Observed outcome 47.345 47.324 47.346 47.354Proportion with D1 = 1 0.239 0.239 0.240 0.240Proportion with D2 = 1 0.240 0.239 0.242 0.240Correlation between D1 and D2 0.304 0.306 0.308 0.305

(continued on next page)

40 N.V. Cuong / Research in Economics 63 (2009) 27–54

Table 1 (continued)

Measurement n = 250 n = 500 n = 1000 n = 3000

IM MSE IM MSE IM MSE IM MSE

Regression methodWithout interaction 0.945 1.648 0.953 0.909 0.955 0.580 0.957 0.379With interaction 0.959 1.322 0.963 0.683 0.965 0.423 0.967 0.2511 nearest-neighbor matchingMethod 1 0.959 8.217 0.984 4.006 0.993 2.229 0.995 0.732Method 2 0.952 6.067 0.973 3.219 0.987 1.515 0.996 0.582Method 3 0.926 4.892 0.963 2.410 0.980 1.233 0.991 0.3953 nearest-neighbors matchingMethod 1 0.936 6.168 0.975 2.715 0.985 1.484 0.989 0.526Method 2 0.934 3.805 0.967 1.913 0.979 0.942 0.989 0.347Method 3 0.888 4.378 0.939 1.862 0.968 0.812 0.984 0.266Kernel matching (bandwidth = 0.01)Method 1 1.223 12.339 1.156 5.595 1.097 2.522 1.041 0.632Method 2 1.218 10.153 1.148 4.459 1.091 1.818 1.040 0.452Method 3 1.072 5.609 1.061 1.908 1.047 0.823 1.022 0.234

Model parameter: k = 2.1; g = 2.5ATT 16.557 16.528 16.594 16.618Observed outcome 50.016 50.087 50.082 50.075Proportion with D1 = 1 0.241 0.240 0.240 0.240Proportion with D2 = 1 0.240 0.240 0.240 0.240Correlation between D1 and D2 0.308 0.305 0.305 0.305

IM MSE IM MSE IM MSE IM MSE

Regression methodWithout interaction 0.954 2.121 0.955 1.250 0.954 1.030 0.955 0.676With interaction 0.966 1.363 0.965 0.809 0.967 0.589 0.966 0.3911 nearest-neighbors matchingMethod 1 0.987 12.143 0.985 6.430 0.988 3.286 0.992 1.155Method 2 0.974 8.933 0.990 4.593 0.995 2.412 0.994 0.804Method 3 0.956 5.398 0.969 2.660 0.985 1.452 0.994 0.4543 nearest-neighbors matchingMethod 1 0.968 8.927 0.975 4.688 0.983 2.433 0.991 0.787Method 2 0.965 5.152 0.971 2.416 0.983 1.496 0.990 0.440Method 3 0.917 4.631 0.948 2.152 0.973 0.983 0.987 0.308Kernel matching (bandwidth = 0.01)Method 1 1.238 23.626 1.157 10.168 1.091 4.152 1.039 1.136Method 2 1.231 20.282 1.155 8.904 1.092 3.387 1.039 0.717Method 3 1.074 7.078 1.062 2.609 1.044 1.104 1.021 0.306

IM: mean ratio of the impact estimate over the true impact.n is number of observations.Number of replications: 500.

is assumed to have the following functional forms:

ln(Y ) = β0 + Xβ1 + d1β2 + d2β3 + ε,

ln(Y ) = β0 + Xβ1 + d1β2 + ε,

ln(Y ) = β0 + Xβ1 + d2β3 + ε,

where ln(Y ) is the logarithm of per capita expenditure; X are household characteristics; d1 and d2 are binary variablesindicating the receiving of international and internal remittances, respectively. The X variables include dummyregional variables, urban/rural, household composition and assets. The list of the X variables and the OLS regressionresults are presented in Table C.1 of Appendix C.

For the propensity score matching, there are three methods to estimate the impacts. The first is matching using theX variables. The second is matching using the X and program variables, i.e. d1 and d2. The third uses the matching

N.V. Cuong / Research in Economics 63 (2009) 27–54 41

Table 2Mean impact ratio and MSE for two correlated programs: scenario 2.

Measurement n = 250 n = 500 n = 1000 n = 3000

Model parameter: k = 1.3; g = 1.5ATT 3.284 3.290 3.308 3.280Observed outcome 42.297 42.303 42.286 42.272Proportion with D1 = 1 0.241 0.240 0.241 0.240Proportion with D2 = 1 0.227 0.226 0.227 0.226Correlation between D1 and D2 0.754 0.755 0.755 0.755

IM MSE IM MSE IM MSE IM MSE

Regression methodWithout interaction 0.950 2.554 0.940 0.226 0.932 0.616 0.930 0.246With interaction 1.030 3.014 0.973 0.198 0.967 0.597 0.967 0.2031 nearest-neighbor matchingMethod 1 2.312 17.374 2.218 16.043 2.193 15.554 2.215 15.865Method 2 0.900 9.707 1.002 0.686 0.958 2.045 0.992 0.582Method 3 0.866 6.904 0.990 0.521 0.929 1.566 0.964 0.5283 nearest-neighbors matchingMethod 1 2.253 14.679 2.194 15.341 2.153 14.282 2.202 15.464Method 2 0.758 7.338 0.973 0.486 0.929 1.344 0.970 0.406Method 3 0.726 6.863 0.963 0.333 0.897 1.155 0.951 0.334Kernel matching (bandwidth = 0.01)Method 1 2.929 28.542 2.298 18.037 2.399 20.617 2.301 18.027Method 2 1.444 6.788 1.027 0.414 1.081 1.290 1.025 0.333Method 3 1.108 6.794 1.002 0.273 1.002 0.869 0.991 0.236

Model parameter: k = 1.5; g = 1.8ATT 5.266 5.260 5.227 5.230Observed outcome 43.766 43.761 43.740 43.768Proportion with D1 = 1 0.239 0.239 0.239 0.240Proportion with D2 =1 0.225 0.225 0.225 0.226Correlation between D1 and D2 0.752 0.758 0.754 0.755

IM MSE IM MSE IM MSE IM MSE

Regression methodWithout interaction 0.945 2.615 0.911 1.736 0.928 0.857 0.931 0.342With interaction 0.989 2.876 0.951 1.614 0.962 0.702 0.965 0.2331 nearest-neighbor matchingMethod 1 2.278 44.531 2.257 42.849 2.257 43.195 2.269 44.006Method 2 0.956 9.136 0.957 4.939 0.955 2.447 0.999 0.801Method 3 0.898 8.430 0.913 4.423 0.974 2.058 0.991 0.5843 nearest-neighbors matchingMethod 1 2.250 41.065 2.231 40.399 2.252 42.483 2.254 42.872Method 2 0.884 7.425 0.894 4.036 0.950 1.856 0.979 0.572Method 3 0.847 6.742 0.876 3.008 0.940 1.336 0.972 0.370Kernel matching (bandwidth = 0.01)Method 1 2.720 75.047 2.559 63.849 2.456 57.008 2.350 49.565Method 2 1.375 8.810 1.148 3.696 1.064 1.740 1.024 0.491Method 3 1.055 8.671 0.997 2.644 1.019 1.122 1.002 0.292

Model parameter: k = 1.8; g = 2.1ATT 7.018 7.098 7.051 7.042Observed outcome 45.703 45.698 45.663 45.693Proportion with D1 = 1 0.239 0.242 0.240 0.240Proportion with D2 = 1 0.225 0.227 0.226 0.226Correlation between D1 and D2 0.755 0.749 0.754 0.755

(continued on next page)

42 N.V. Cuong / Research in Economics 63 (2009) 27–54

Table 2 (continued)

Measurement n = 250 n = 500 n = 1000 n = 3000

IM MSE IM MSE IM MSE IM MSE

Regression methodWithout interaction 0.892 4.696 0.900 2.311 0.896 1.392 0.900 0.767With interaction 0.926 4.165 0.932 1.892 0.930 0.974 0.935 0.4431 nearest-neighbor matchingMethod 1 2.576 114.75 2.481 107.54 2.497 110.28 2.504 111.82Method 2 0.955 14.142 0.964 6.556 0.965 2.869 0.984 0.971Method 3 0.896 13.929 0.937 5.256 0.949 2.532 0.978 0.7853 nearest-neighbor matchingMethod 1 2.552 109.60 2.467 104.52 2.484 107.71 2.496 110.51Method 2 0.873 13.271 0.909 4.950 0.950 2.053 0.972 0.691Method 3 0.832 10.659 0.893 3.338 0.930 1.455 0.969 0.499Kernel matching (bandwidth = 0.01)Method 1 3.047 188.06 2.810 157.47 2.705 141.69 2.595 125.42Method 2 1.498 21.826 1.200 6.489 1.068 2.173 1.013 0.581Method 3 1.003 11.925 0.998 2.970 0.990 1.191 0.993 0.331

Model parameter: k = 2.1; g = 2.5ATT 9.530 9.439 9.551 9.536Observed outcome 47.750 47.766 47.788 47.779Proportion with D1 = 1 0.240 0.239 0.239 0.240Proportion with D2 = 1 0.226 0.226 0.225 0.226Correlation between D1 and D2 0.754 0.755 0.755 0.755

IM MSE IM MSE IM MSE IM MSE

Regression methodWithout interaction 0.895 5.788 0.894 3.284 0.901 2.032 0.900 1.279With interaction 0.947 4.572 0.934 2.138 0.938 1.234 0.936 0.6431 nearest-neighbor matchingMethod 1 2.591 218.83 2.568 213.13 2.534 211.74 2.528 211.66Method 2 0.943 16.232 0.953 7.744 0.978 3.651 0.991 1.136Method 3 0.930 12.843 0.936 5.958 0.971 2.608 0.982 0.8853 nearest-neighbors matchingMethod 1 2.569 209.68 2.562 210.29 2.530 209.77 2.523 210.13Method 2 0.888 14.143 0.932 5.849 0.954 2.892 0.982 0.727Method 3 0.861 11.994 0.910 4.118 0.949 1.853 0.972 0.488Kernel matching (bandwidth = 0.01)Method 1 3.051 353.52 2.899 308.65 2.737 270.06 2.619 237.12Method 2 1.460 31.838 1.198 8.592 1.078 3.134 1.023 0.647Method 3 1.054 11.051 1.001 3.159 1.002 1.395 0.996 0.299

IM: mean ratio of the impact estimate over the true impact.n is number of observations.Number of replications: 500.

estimator given in (3.22). For each estimation strategy, there are three matching schemes to select non-participants andweight their outcomes, i.e. nearest-neighbor matching, three nearest-neighbors matching, and kernel matching withbandwidth of 0.01. The probit regressions which are used to estimate the propensity scores are reported in Table C.1of Appendix C. It shows that the receipt of international remittances is negative correlated with the receipt of internalremittances.

To examine the sensitivity of the impact estimates to the X variables, we use two models. Model 1 includes a largeset of variables X , while Model 2 uses only variables X which are statistically significant in the probit regressions ofthe program variables. In Table C.1, only the results from Model 1 are presented.

It should be noted that remittances are treated as exogenous transfers in this empirical example. This assumption isalso assumed in several studies which analyze the effect of remittances on inequality, for example Stark et al. (1986,1988) and Adams (1989). Although, this assumption can be questionable in the recent literature of migration and

N.V. Cuong / Research in Economics 63 (2009) 27–54 43

Table 3Household receiving international and internal remittances.

Households not receivinginternal remittances

Households receivinginternal remittances

Total by row

Households not receiving internationalremittances

% 12.0 80.9 92.9

No. obs. 1235 7390 8625Households receiving internationalremittances

% 1.7 5.4 7.1

No. obs. 128 435 563Total by column % 13.7 86.4 100

No. obs. 1363 7825 9188

Note: the percentages are estimated using the sampling weights of the 2004 VHLSS.Source: Estimation from the 2004 VHLSS.

Table 4Impact estimates of the receipts of international and internal remittances on logarithm of per capita expenditure (ATT).

Methods Model 1 Model 2Estimate Std. err. T-stat. Estimate Std. err. T-stat.

Impact of receipt of international remittancesRegression methodDo not control internal remittances 0.318 0.023 13.943 0.341 0.027 12.770Control internal remittances 0.323 0.023 14.087 0.348 0.027 13.0201 nearest-neighbor matchingMethod 1 0.330 0.048 6.931 0.307 0.042 7.291Method 2 0.333 0.048 6.874 0.304 0.042 7.203Method 3 0.329 0.039 8.476 0.298 0.037 8.1453 nearest-neighbors matchingMethod 1 0.333 0.041 8.091 0.301 0.043 7.023Method 2 0.321 0.037 8.620 0.297 0.042 7.006Method 3 0.331 0.032 10.352 0.304 0.039 7.869Kernel matching (bandwidth = 0.01)Method 1 0.357 0.021 17.038 0.336 0.023 14.597Method 2 0.352 0.022 15.707 0.342 0.024 14.235Method 3 0.350 0.019 18.608 0.353 0.021 16.812

Impact of receipt of internal remittancesRegression methodDo not control international remittances 0.021 0.017 1.215 0.039 0.019 2.060Control international remittances 0.041 0.017 2.467 0.060 0.018 3.2701 nearest-neighbor matchingMethod 1 0.017 0.035 0.479 0.021 0.031 0.671Method 2 0.040 0.028 1.428 0.041 0.034 1.203Method 3 0.039 0.027 1.436 0.039 0.028 1.3983 nearest-neighbors matchingMethod 1 0.025 0.027 0.933 0.026 0.027 0.937Method 2 0.047 0.035 1.367 0.039 0.029 1.380Method 3 0.040 0.027 1.480 0.042 0.030 1.398Kernel matching (bandwidth = 0.01)Method 1 0.033 0.018 1.909 0.031 0.016 1.938Method 2 0.051 0.020 2.508 0.049 0.023 2.130Method 3 0.054 0.019 2.902 0.052 0.020 2.559

For the PSM method, standard errors are calculated using bootstrap with 500 replications.Source: Estimation from the 2004 VHLSS.

remittances, it is made in this example so that the OLS regression and matching method relying on single cross-section data can be used.

44 N.V. Cuong / Research in Economics 63 (2009) 27–54

It shows that the impact estimates of international remittances are much higher than those of internal remittances.For example, according to the OLS regression in Model 1, the receipts of international and internal remittancesincrease per capita expenditure by around 32% and 4%, respectively. The estimates from the OLS regression andmatching methods are quite similar. Models 1 and 2 also give rather similar results.

For the propensity score matching, the nearest and three nearest-neighbors matching schemes do not producestatistically significant estimates of impacts of internal remittances. Among the three matching methods, method 3tends to produce the estimates with lower standard errors and higher test-statistics, especially for the kernel matching.In contrast, method 1 which does not control other remittance types tends to yield higher standard errors of theestimates.

6. Conclusions

When measuring impact of a program, one should pay attention that participants and non-participants can attendother simultaneous programs. If correlation between the selection of the program and the selection of the otherprograms vanishes once conditional on variables X , one can ignore those other programs. However the simulationshows that controlling for the participation in the other programs lead to some gain of efficiency in term of MSE.If the correlation remains even conditioning on the X variables, neglect of the other programs will lead to biasedestimation of the impact of the interested program. The PSM method can be used to measure the program impact inthis case. In the paper, the matching estimator is written as a weighted average of program impacts on groups withdifferent program statuses. In other words, it combines the propensity score matching on the X variables and exactmatching on the participation in the other programs. It is shown by the simulation that when impacts of the programsare high, this PSM method leads to lower MSE compared with other PSM estimations. The example of measurementof impacts of international and internal remittances also shows that this PSM method tends to result in lower standarderrors, especially for the kernel matching scheme.

Appendix A. Proof of (3.7) and (3.8)

The assumption (A.3.1) is equivalent to:

ε11, ε10, ε01, ε00 ⊥ D|X

As a result:

[E(d2ε12|X)+ E(ε1|X)] = E(d2ε12|X)

= E(ε12|X, d2 = 1)Pr(d2 = 1|X)

= [E(ε12|X, d2 = 1, d1 = 1)Pr(d1 = 1|X, d2 = 1)

+ E(ε12|X, d2 = 1, d1 = 0)Pr(d1 = 0|X, d2 = 1)] Pr(d2 = 1|X)

= [E(ε12|X)Pr(d1 = 1|X, d2 = 1)

+ E(ε12|X)Pr(d1 = 0|X, d2 = 1)] Pr(d2 = 1|X)

= 0.

Similarly, we will have

[E(d2ε12|X, d1 = 1)+ E(ε1|X, d1 = 1)] = 0.

Appendix B. The case of multiple overlapping binary programs

Parameters of interestNow suppose that there are m programs that are assigned to subjects in population P . Denote participation in the

programs by a vector variable D:

D =(

d1, d 2, . . . , dm

)

N.V. Cuong / Research in Economics 63 (2009) 27–54 45

where dk is a variable that equals 1 if she participates in program k, and 0 otherwise. Subjects who do not participatein any program will have the value of the vector D equal to D = (0, 0, . . . , 0). In contrast, subjects who participatein all the programs will have the value of the vector D equal to D = (1, 1, . . . , 1). The set of the potential treatmentshas 2m values:

ΩD =

00.

.

0

;

10.

.

0

; . . . . ;

11.

.

1

.

Corresponding to each value of the vector variable D, there is a potential outcome, denoted by Y PD . Thus for each

subject, there are 2m potential outcomes. However we are able to observe only one outcome of those, depending onthe realization of the vector variable D.

The potential outcomes can be written as functions of the observed variables X and unobserved variable ε:

Y PD = αD + XβD + εD. (B.1)

For example, when there are three programs, i.e. m = 3, the potential outcomes are as follows:

Y Pd1,d2,d3

= αd1,d2,d3 + Xβd1,d2,d3 + εd1,d2,d3 ,

which can be more specific as eight equations:

Y P000 = α000 + Xβ000 + ε000,

Y P100 = α100 + Xβ100 + ε100,

Y P010 = α010 + Xβ010 + ε010,

Y P001 = α001 + Xβ001 + ε001,

Y P110 = α110 + Xβ110 + ε110,

Y P101 = α101 + Xβ101 + ε101,

Y P011 = α011 + Xβ011 + ε011,

Y P111 = α111 + Xβ111 + ε111.

Similar to (3.2), the observed outcome can be written in terms of 2m potential outcomes and the program variables,and then the variables X and ε are as follows:

Y =∑

i

di (αi + Xβi + εi )+∑i 6= j

di d j (αi j + Xβi j + εi j )+∑

i 6= j 6=···6=h

di d j dh(αi jh + Xβi jh + εi jh)

+ · · · + (α12...m + Xβ12...m + ε12...m)∏

i

di + (αD=0 + XβD=0 + εD=0) (B.2)

where: αi = αdi=1,D\di=0−αD=0 with D \di denotes the vector of the program variables not including di . Parameterαdi=1,D\di=0 belongs to the equation of potential outcome with the participation in only the program di . And:

αi j = αdi=1,d j=1,D\di ,d j=0 − αi − α j − αD=0,

. . . .

α12...m = αD=1 −∑

i 6= j 6=k

αi j...k︸ ︷︷ ︸m−1

∑i 6= j

αi j −∑

i

αi − αD=0,

and the denotation is similar for β and ε. It should be noted that in this section, i (i = 1, . . . ,m) denotes program i ,not observation i .

46 N.V. Cuong / Research in Economics 63 (2009) 27–54

For example, with three programs (m = 3), Eq. (B.2) becomes:

Y = d1(α1 + Xβ1 + ε1)+ d2(α2 + Xβ2 + ε2)+ d3(α3 + Xβ3 + ε3)

+ d1d2(α12 + Xβ12 + ε12)+ d2d3(α23 + Xβ23 + ε23)+ d1d3(α13 + Xβ13 + ε13)

+ d1d2d3(α123 + Xβ123 + ε123)+ (α0 + Xβ0 + ε0)

where:

α0 = α000,

α1 = α100 − α0 = α100 − α000,

α2 = α010 − α0 = α010 − α000,

α3 = α001 − α0 = α001 − α000,

α12 = α110 − α1 − α2 − α0,

α23 = α011 − α2 − α3 − α0,

α13 = α101 − α1 − α3 − α0,

α123 = α111 − α12 − α23 − α13 − α1 − α2 − α3 − α0,

and the denotation is similar for β and ε.Suppose that we are interested in impacts of program k which are measured by the two parameters:

AT Ek(X) = E(Y Pdk=1|X)− E(Y P

dk=0|X)

= E(

Ydk=1,X,D,ε − Ydk=0,X,D,ε

∣∣∣ X)

= αk + Xβk + (αik + Xβik)∑i 6=k

E(di |X)+ (αi jk + Xβi jk)

×

∑i 6=k;i 6= j, j 6=k

E(di d j |X)+ · · · + (α12...m + Xβ12...m)E

(∏i 6=k

di |X

)+

E(εk |X)

+

∑i 6=k

E(diεik |X)+∑

i 6=k;i 6= j, j 6=k

E(di d jεi jk |X)+ · · · + E

(ε12...m

∏i 6=k

di |X

) , (B.3)

AT T k(X) = E(Y Pdk=1|X, dk = 1)− E(Y P

dk=0|X, dk = 1)

= E(

Ydk=1,X,D,ε − Ydk=0,X,D,ε|X, dk = 1)

= αk + Xβk + (αik + Xβik)∑i 6=k

E(di |X, dk = 1)+ (αi jk + Xβi jk)∑

i 6=k;i 6= j, j 6=k

E(di d j |X, dk = 1)

+ · · · + (α12...m + Xβ12...m)E

(∏i 6=k

di |X, dk = 1

)

+

E(εk |X, dk = 1)+∑i 6=k

E(diεik |X, dk = 1)

+

∑i 6=k;i 6= j, j 6=k

E(di d jεi jk |X, dk = 1)+ · · · + E

(ε12...m

∏i 6=k

di |X, dk = 1

) (B.4)

where D = D \ dk , i.e. D is the vector of program variables not including the dk program.In a more general case, one can estimate the impact of a treatment state D = Dg relative to a treatment state

D = Dh :

AT Egh(X) = E(Y PD=Dg

|X)− E(Y PD=Dh

|X), (B.5)

AT T gh(X) = E(Y PD=Dg

|X, D = Dg)− E(Y PD=Dh

|X, D = Dg). (B.6)

N.V. Cuong / Research in Economics 63 (2009) 27–54 47

However, explanation of (B.5) and (B.6) is complicated and less practical. For simplicity, we focus on the impact of aparticular program, e.g., program dk .

Regression methodIdentification of the impact parameters of the program dk is not straightforward, since there are unobserved terms in

(B.3) and (B.4). Similar to the case of two binary programs, to identify the parameters we need the CIA assumption10:

Assumption 1.

Y PD ⊥ D|X. (A.1)

In addition, the assumption on the exogeneity of the X variables in all the equations of potential outcomes, i.e.:

Assumption 2.

E(εD|X) = 0. (A.2)

Proposition 1. Under assumptions (A.1) and (A.2), the regression method produces unbiased estimators of the allconditional and unconditional parameters AT Ek(X), AT T k(X), AT Ek and AT T k.

The proof is similar to the case of two binary programs. The unobserved terms in (B.3) and (B.4) are equal to0. In addition, the error term in Eq. (B.2) has the conventional property that E(ε|X, D) = 0. Thus the conditionalparameters AT Ek(X) and AT T k(X) are estimated based on (B.3) and (B.4) using the coefficient estimates fromEq. (B.2). Once the conditional parameters are estimated, the unconditional ones can also be estimated usingEqs. (2.18) and (2.19). It should be noted that the observed outcome Y can be any function of X , and the identificationassumptions and estimation strategy are the same as the case of the linear function.

Matching methodIn addition to the CIA, the matching method requires the assumption on common support to identify the impact

parameters:

Assumption 3.

0 < P(dk = 1|X, D) < 1. (A.3)

P(dk = 1|X, D) is the conditional probability of participating in program dk given the X variables and other programvariables. It is required that there be still subjects who do not participate in program dk but have the same variables Xand participation statuses of the other programs (not include program dk) as those of the participants of program dk .

It should be noted that the common support can be stated in terms of the probability of being assigned the treatmentvariable D, i.e.:

0 < P(D = D∗|X) < 1,

where D∗ ∈ ΩD =

0

0.

.

0

;1

0.

.

0

; . . . ;1

1.

.

1

.

Proposition 2. Under assumptions (A.1) and (A.3), the conditional and unconditional parameters AT Ek(X),AT T k(X), AT Ek and AT T k for program dk are identified by the matching method.

10 We can require a weaker assumption “conditional mean independence” in order to identify the program impact parameters: E(

Y P(D)|X, D

)=

E(

Y P(D)|X

).

48 N.V. Cuong / Research in Economics 63 (2009) 27–54

Proof. Similar to (3.15), the AT T k(X) is written as follows:

AT T k(X) = E(Y Pdk=1|X, dk = 1)− E(Y P

dk=0|X, dk = 1)

=

∑Dg∈ΩD

[E(Y P

D=Dg,dk=1|X, D = Dg, dk = 1)− E(Y P

D=Dg,dk=0|X, D = Dg, dk = 1)

]Pr(D = Dg|X, dk = 1)

, (B.7)

and AT N T k(X):

AT N T k(X) = E(Y Pdk=1|X, dk = 0)− E(Y P

dk=0|X, dk = 0)

=

∑Dg∈ΩD

[E(Y P

D=Dg,dk=1|X, D = Dg, dk = 0)− E(Y P

D=Dg,dk=0|X, D = Dg, dk = 0)

]Pr(D = Dg|X, dk = 0)

(B.8)

where ΩD is the set of potential treatments (programs) D = D \ dk .There are unobserved terms in (B.7) and (B.8), i.e., E(Y P

D=Dg,dk=1|X, D = Dg, dk = 0) and

E(Y PD=Dg,dk=0

|X, D = Dg, dk = 1). However, under assumptions (A.1) and (A.3), we have:

E(Y PD=Dg,dk=1

|X, D = Dg, dk = 0) = E(Y PD=Dg,dk=1

|X, D = Dg, dk = 1), (B.9)

E(Y PD=Dg,dk=0

|X, D = Dg, dk = 1) = E(Y PD=Dg,dk=0

|X, D = Dg, dk = 0). (B.10)

Substitute (B.9) and (B.10) into the conditional parameters of AT T k(X) and AT N T k(X), we can identify theseparameters since all the terms are observed. The parameter AT Ek(X) is the weighted average of the AT T k(X) andAT N T k(X). The unconditional parameters are also identified by formulas (2.18) and (2.19).

To estimate the program impacts, the participants of program dk will be matched to the non-participants basedon the closeness of the distance in the X variables. In addition, the matching is performed for people who havethe same program statuses D (except program dk). The estimator of the AT T k(X) has a similar form as in thecase of two programs, i.e., formula (3.22), in which the sample mean outcomes of the participants are estimators ofE(Y P

D=Dg,dk=1|X, D = Dg, dk = 0), and the sample mean outcomes of the matched non-participants are estimators

of E(Y PD=Dg,dk=0

|X, D = Dg, dk = 1). The estimator of the AT N T k(X) has the formula similar to (3.23). Finally,

the estimator of the AT Ek(X) is the weighted average of the estimators of the AT T k(X) and AT N T k(X).

B.1. Matching using the propensity score

Proposition 2.1 is extended to the case of multiple overlapping programs as follows:

Proposition 3. Y PD ⊥ D|X ⇒ Y P

D ⊥ D|P(D|X).

As a result, if the CIA holds for the X variables, it also holds for the propensity score. Since we focus on impact of aprogram of interest, e.g., program dk , and use the estimators based on (B.7) and (B.8), we will state the proposition ina different way which emphasizes a program of interest.

Proposition 3′. Y PD ⊥ dk |X, D ⇒ Y P

D ⊥ dk |P(dk = 1|X, D

),

where D = D \ dk i.e. D does not include dk ,The proof is very similar to the case of one binary program in Rosenbaum and Rubin (1983).

Appendix C. Regression results

See Table C.1.

N.V. Cuong / Research in Economics 63 (2009) 27–54 49

Tabl

eC

.1R

egre

ssio

nre

sults

.

Exp

lana

tory

vari

able

sD

epen

dent

vari

able

s

Log

arith

mof

per

capi

taex

pend

iture

Rec

eipt

ofin

tern

atio

nalr

emitt

ance

s(d

1=

1)R

ecei

ptof

inte

rnal

rem

ittan

ces

(d2=

1)A

llho

useh

olds

All

hous

ehol

dsH

ouse

hold

sw

ithin

tern

alre

mitt

ance

s(d

2=

1)

Hou

seho

lds

with

out

inte

rnal

rem

ittan

ces

(d2=

0)

All

hous

ehol

dsA

llho

useh

olds

Hou

seho

lds

with

inte

r-na

tiona

lre

mitt

ance

s(d

1=

1)

Hou

seho

lds

with

outi

n-te

rnat

iona

lre

mitt

ance

s(d

1=

0)

Rec

eipt

ofin

tern

atio

nal

rem

ittan

ces

0.32

26**

*0.

3179

***

−0.

4647

***

[0.0

229]

[0.0

228]

[0.0

736]

Rec

eipt

ofin

tern

alre

mitt

ance

s

0.04

07**

0.02

09−

0.41

09**

*

[0.0

165]

[0.0

172]

[0.0

653]

Urb

anar

eas

−0.

3447

***−

0.34

41**

*−

0.36

31**

*−

0.36

24**

*−

0.35

97**

*−

0.40

41**

*−

0.19

850.

0942

0.06

75−

0.17

0.09

53[0

.018

2][0

.018

2][0

.018

5][0

.065

3][0

.066

0][0

.073

7][0

.141

6][0

.062

5][0

.063

4][0

.172

6][0

.066

3]D

umm

yre

gion

alva

riab

les

Red

Riv

erD

elta

Om

itted

Nor

thE

ast

−0.

1038

***−

0.10

73**

*−

0.11

45**

*−

0.39

94**

*−

0.44

59**

*−

0.36

62**

*−

0.68

87**

*−

0.35

76**

*−

0.37

59**

*−

0.14

08−

0.37

87**

*[0

.019

4][0

.019

5][0

.019

6][0

.110

6][0

.112

9][0

.125

6][0

.242

6][0

.079

1][0

.079

6][0

.367

5][0

.082

1]N

orth

Wes

t−

0.28

19**

*−

0.29

36**

*−

0.29

36**

*−

0.38

80**

−0.

5278

***−

0.47

04*

−0.

6180

*−

0.92

66**

*−

0.93

94**

*−

0.93

42−

0.92

97**

*[0

.034

3][0

.033

9][0

.034

4][0

.196

4][0

.200

2][0

.254

8][0

.323

4][0

.107

9][0

.108

3][0

.645

2][0

.111

0]N

orth

Cen

tral

Coa

st−

0.16

55**

*−

0.16

66**

*−

0.16

25**

*0.

120.

106

0.13

29−

0.01

56−

0.13

92−

0.13

48−

0.04

44−

0.14

32

[0.0

210]

[0.0

210]

[0.0

214]

[0.0

977]

[0.0

982]

[0.1

086]

[0.2

185]

[0.0

848]

[0.0

855]

[0.2

652]

[0.0

899]

Sout

hC

entr

alC

oast

−0.

0075

−0.

0079

−0.

0042

0.08

980.

0739

0.06

540.

0724

−0.

0569

−0.

0498

−0.

227

−0.

0274

[0.0

247]

[0.0

248]

[0.0

252]

[0.0

992]

[0.0

996]

[0.1

070]

[0.2

328]

[0.0

986]

[0.0

992]

[0.2

528]

[0.1

052]

Cen

tral

Hig

hlan

ds−

0.09

68**

*−

0.09

59**

*−

0.10

13**

*−

0.16

29−

0.15

39−

0.11

79−

0.31

520.

1231

0.11

460.

3974

0.11

25[0

.032

9][0

.032

9][0

.032

9][0

.143

8][0

.145

2][0

.157

1][0

.372

3][0

.127

5][0

.128

4][0

.439

4][0

.133

2]N

orth

Eas

tSou

th0.

3677

***

0.36

78**

*0.

3944

***

0.47

43**

*0.

4664

***

0.42

13**

*0.

6187

***

−0.

0396

0.00

92−

0.22

630.

07[0

.023

6][0

.023

6][0

.024

3][0

.085

7][0

.086

0][0

.096

4][0

.186

4][0

.089

3][0

.088

9][0

.215

2][0

.095

0]M

ekon

gR

iver

Del

ta0.

1072

***

0.10

77**

*0.

1172

***

0.24

77**

*0.

2557

***

0.33

19**

*−

0.21

220.

0326

0.04

530.

4196

*0.

0102

[0.0

205]

[0.0

206]

[0.0

208]

[0.0

868]

[0.0

871]

[0.0

932]

[0.2

228]

[0.0

835]

[0.0

841]

[0.2

481]

[0.0

880]

(con

tinue

don

next

page

)

50 N.V. Cuong / Research in Economics 63 (2009) 27–54

Tabl

eC

.1(c

ontin

ued)

Exp

lana

tory

vari

able

sD

epen

dent

vari

able

s

Log

arith

mof

per

capi

taex

pend

iture

Rec

eipt

ofin

tern

atio

nalr

emitt

ance

s(d

1=

1)R

ecei

ptof

inte

rnal

rem

ittan

ces

(d2=

1)A

llho

useh

olds

All

hous

ehol

dsH

ouse

hold

sw

ithin

tern

alre

mitt

ance

s(d

2=

1)

Hou

seho

lds

with

out

inte

rnal

rem

ittan

ces

(d2=

0)

All

hous

ehol

dsA

llho

useh

olds

Hou

seho

lds

with

inte

r-na

tiona

lre

mitt

ance

s(d

1=

1)

Hou

seho

lds

with

outi

n-te

rnat

iona

lre

mitt

ance

s(d

1=

0)

Rat

ioof

hh.

mem

bers

<16

year

s

−0.

4883

***−

0.48

79**

*−

0.49

13**

*−

0.05

2−

0.03

57−

0.01

180.

0141

0.05

830.

0519

0.47

560.

0147

[0.0

300]

[0.0

300]

[0.0

307]

[0.1

486]

[0.1

480]

[0.1

669]

[0.3

296]

[0.1

111]

[0.1

103]

[0.3

966]

[0.1

142]

Rat

ioof

hh.

mem

bers

>60

year

s

−0.

4428

***−

0.43

82**

*−

0.43

74**

*−

0.01

30.

0512

0.17

67−

0.50

50.

6236

***

0.62

50**

*1.

3762

***

0.54

45**

*

[0.0

335]

[0.0

334]

[0.0

346]

[0.1

548]

[0.1

558]

[0.1

681]

[0.4

758]

[0.1

323]

[0.1

326]

[0.4

783]

[0.1

362]

Age

ofho

useh

old

head

0.00

55**

*0.

0055

***

0.00

55**

*0.

0035

0.00

370.

0014

0.01

220.

0039

0.00

4−

0.00

670.

0048

*

[0.0

007]

[0.0

007]

[0.0

007]

[0.0

032]

[0.0

033]

[0.0

036]

[0.0

086]

[0.0

024]

[0.0

024]

[0.0

087]

[0.0

025]

Hou

seho

ldsi

ze0.

1295

***

0.12

97**

*0.

1313

***

0.02

810.

0307

*0.

0357

*0.

0207

0.01

580.

0183

0.04

460.

0169

[0.0

047]

[0.0

047]

[0.0

048]

[0.0

181]

[0.0

180]

[0.0

195]

[0.0

428]

[0.0

146]

[0.0

144]

[0.0

486]

[0.0

146]

Hea

dw

ithou

ted

ucat

ion

degr

eeO

mitt

ed

Hea

dw

ithpr

imar

ysc

hool

degr

ee

0.14

98**

*0.

1502

***

0.15

88**

*0.

2002

**0.

2002

**0.

1720

*0.

3567

*0.

0276

0.04

41−

0.31

540.

0721

[0.0

158]

[0.0

158]

[0.0

161]

[0.0

811]

[0.0

808]

[0.0

891]

[0.1

976]

[0.0

579]

[0.0

572]

[0.2

237]

[0.0

580]

Hea

dw

ithlo

wer

-sec

onda

rysc

hool

0.22

56**

*0.

2269

***

0.24

07**

*0.

3303

***

0.34

34**

*0.

3502

***

0.34

960.

1247

**0.

1483

**−

0.02

210.

1512

**

[0.0

182]

[0.0

182]

[0.0

187]

[0.0

933]

[0.0

934]

[0.1

006]

[0.2

195]

[0.0

623]

[0.0

616]

[0.2

412]

[0.0

631]

Hea

dw

ithup

per

seco

ndar

ysc

hool

0.31

88**

*0.

3195

***

0.34

19**

*0.

4840

***

0.48

57**

*0.

4588

***

0.62

89**

*0.

048

0.08

58−

0.25

340.

1076

[0.0

238]

[0.0

238]

[0.0

250]

[0.1

142]

[0.1

145]

[0.1

297]

[0.2

422]

[0.0

894]

[0.0

883]

[0.2

678]

[0.0

921]

Hea

dw

ithte

chni

cald

egre

e0.

3515

***

0.35

23**

*0.

3694

***

0.44

42**

*0.

4420

***

0.35

22**

*0.

7927

***

0.05

40.

0854

−0.

5333

*0.

1450

*

[0.0

247]

[0.0

247]

[0.0

251]

[0.1

102]

[0.1

103]

[0.1

238]

[0.2

437]

[0.0

821]

[0.0

817]

[0.2

745]

[0.0

867]

Hea

dw

ithpo

st-s

econ

dary

scho

ol

0.49

60**

*0.

4961

***

0.51

70**

*0.

4717

***

0.45

89**

*0.

4059

***

0.70

67**

−0.

0399

−0.

004

−0.

5297

0.08

02

[0.0

332]

[0.0

331]

[0.0

337]

[0.1

339]

[0.1

337]

[0.1

451]

[0.3

232]

[0.1

354]

[0.1

347]

[0.3

434]

[0.1

416]

No

spou

seO

mitt

ed

N.V. Cuong / Research in Economics 63 (2009) 27–54 51

Tabl

eC

.1(c

ontin

ued)

Exp

lana

tory

vari

able

sD

epen

dent

vari

able

s

Log

arith

mof

per

capi

taex

pend

iture

Rec

eipt

ofin

tern

atio

nalr

emitt

ance

s(d

1=

1)R

ecei

ptof

inte

rnal

rem

ittan

ces

(d2=

1)A

llho

useh

olds

All

hous

ehol

dsH

ouse

hold

sw

ithin

tern

alre

mitt

ance

s(d

2=

1)

Hou

seho

lds

with

out

inte

rnal

rem

ittan

ces

(d2=

0)

All

hous

ehol

dsA

llho

useh

olds

Hou

seho

lds

with

inte

r-na

tiona

lre

mitt

ance

s(d

1=

1)

Hou

seho

lds

with

outi

n-te

rnat

iona

lre

mitt

ance

s(d

1=

0)

Spou

sew

ithou

ted

ucat

ion

degr

ee0.

2161

***

0.21

47**

*0.

1832

***

−0.

6719

**−

0.67

02**

−0.

5505

−0.

1811

−0.

0852

−0.

1533

−0.

0636

−0.

2617

[0.0

706]

[0.0

701]

[0.0

690]

[0.3

241]

[0.3

316]

[0.3

381]

[0.3

836]

[0.2

914]

[0.2

949]

[0.3

858]

[0.2

979]

Spou

sew

ithpr

imar

ysc

hool

degr

ee

0.31

28**

*0.

3123

***

0.28

12**

*−

0.63

73*

−0.

6249

*−

0.48

1−

0.26

090.

0095

−0.

0586

0.16

72−

0.18

41

[0.0

707]

[0.0

702]

[0.0

692]

[0.3

263]

[0.3

336]

[0.3

401]

[0.3

892]

[0.2

966]

[0.2

995]

[0.3

897]

[0.3

002]

Spou

sew

ithlo

wer

-sec

onda

ry0.

3431

***

0.34

23**

*0.

3102

***

−0.

6653

**−

0.66

00**

−0.

547

−0.

1755

−0.

021

−0.

0886

−0.

1562

−0.

1953

[0.0

707]

[0.0

701]

[0.0

690]

[0.3

237]

[0.3

313]

[0.3

384]

[0.3

799]

[0.2

888]

[0.2

926]

[0.3

899]

[0.2

963]

Spou

sew

ithup

per

seco

ndar

y0.

4299

***

0.42

88**

*0.

4022

***

−0.

5387

−0.

5412

−0.

5122

0.06

13−

0.05

5−

0.10

99−

0.36

49−

0.17

58

[0.0

722]

[0.0

717]

[0.0

705]

[0.3

309]

[0.3

381]

[0.3

442]

[0.4

007]

[0.2

901]

[0.2

936]

[0.4

152]

[0.3

008]

Spou

sew

ithte

chni

cald

egre

e0.

4991

***

0.50

00**

*0.

4658

***

−0.

7091

**−

0.68

56**

−0.

5525

−0.

2707

0.18

990.

1215

0.08

530.

018

[0.0

716]

[0.0

712]

[0.0

705]

[0.3

295]

[0.3

360]

[0.3

400]

[0.3

966]

[0.2

826]

[0.2

853]

[0.4

511]

[0.2

882]

Spou

sew

ithpo

st-s

econ

dary

scho

ol

0.63

12**

*0.

6314

***

0.61

74**

*−

0.30

09−

0.29

45−

0.13

260.

076

0.03

83−

0.15

94−

0.04

16

[0.0

739]

[0.0

737]

[0.0

725]

[0.3

234]

[0.3

321]

[0.3

239]

[0.2

743]

[0.2

772]

[0.5

118]

[0.2

860]

Hea

dle

ader

s/m

anag

ers

Om

itted

Hea

dpr

ofes

sion

als/

tech

nici

ans

−0.

0318

−0.

0335

−0.

0326

0.05

030.

0256

0.01

40.

2517

−0.

232

−0.

2303

−0.

6948

−0.

2282

[0.0

439]

[0.0

440]

[0.0

446]

[0.1

751]

[0.1

753]

[0.1

870]

[0.5

429]

[0.1

557]

[0.1

566]

[0.6

388]

[0.1

630]

Hea

dcl

erks

/ser

vice

wor

kers

−0.

0647

−0.

0682

−0.

0667

0.06

790.

0142

−0.

0319

0.31

46−

0.42

46**

*−

0.42

21**

*−

1.05

69*

−0.

3628

**

[0.0

455]

[0.0

455]

[0.0

456]

[0.1

748]

[0.1

755]

[0.1

881]

[0.5

549]

[0.1

534]

[0.1

536]

[0.6

384]

[0.1

587]

Hea

dag

ricu

lture

/fo

rest

ry/fi

sher

y−

0.18

66**

*−

0.18

87**

*−

0.17

96**

*0.

2530

*0.

2143

0.13

540.

579

−0.

2780

**−

0.26

16**

−1.

1835

*−

0.20

63

(con

tinue

don

next

page

)

52 N.V. Cuong / Research in Economics 63 (2009) 27–54Ta

ble

C.1

(con

tinue

d)

Exp

lana

tory

vari

able

sD

epen

dent

vari

able

s

Log

arith

mof

per

capi

taex

pend

iture

Rec

eipt

ofin

tern

atio

nalr

emitt

ance

s(d

1=

1)R

ecei

ptof

inte

rnal

rem

ittan

ces

(d2=

1)A

llho

useh

olds

All

hous

ehol

dsH

ouse

hold

sw

ithin

tern

alre

mitt

ance

s(d

2=

1)

Hou

seho

lds

with

out

inte

rnal

rem

ittan

ces

(d2=

0)

All

hous

ehol

dsA

llho

useh

olds

Hou

seho

lds

with

inte

r-na

tiona

lre

mitt

ance

s(d

1=

1)

Hou

seho

lds

with

outi

n-te

rnat

iona

lre

mitt

ance

s(d

1=

0)

[0.0

391]

[0.0

391]

[0.0

395]

[0.1

472]

[0.1

480]

[0.1

575]

[0.4

941]

[0.1

293]

[0.1

303]

[0.6

088]

[0.1

325]

Hea

dsk

illed

/m

achi

neop

erat

ors−

0.08

99**

−0.

0922

**−

0.09

36**

−0.

0517

−0.

0967

−0.

1632

0.14

−0.

2902

**−

0.29

21**

−1.

1082

*−

0.23

92*

[0.0

411]

[0.0

411]

[0.0

415]

[0.1

645]

[0.1

654]

[0.1

785]

[0.5

241]

[0.1

390]

[0.1

399]

[0.6

230]

[0.1

433]

Hea

dun

skill

edw

orke

rs−

0.07

39*

−0.

0762

*−

0.07

07*

0.13

690.

0976

0.03

640.

3222

−0.

3035

**−

0.29

38**

−0.

9622

−0.

2430

*

[0.0

408]

[0.0

408]

[0.0

414]

[0.1

571]

[0.1

579]

[0.1

685]

[0.5

147]

[0.1

342]

[0.1

351]

[0.6

174]

[0.1

384]

Hea

dno

twor

king

−0.

0878

**−

0.09

08**

−0.

0711

*0.

3346

**0.

2901

*0.

2723

0.37

28−

0.40

97**

*−

0.38

00**

*−

0.99

93−

0.35

04**

[0.0

422]

[0.0

422]

[0.0

429]

[0.1

583]

[0.1

600]

[0.1

710]

[0.5

223]

[0.1

406]

[0.1

421]

[0.6

143]

[0.1

476]

Spou

sele

ader

s/m

anag

ers

Om

itted

Spou

sepr

ofes

sion

als/

tech

nici

ans

−0.

116

−0.

1166

−0.

1143

0.04

240.

0408

−0.

1665

−0.

2789

−0.

084

−0.

0661

0.31

39−

0.01

31

[0.0

714]

[0.0

710]

[0.0

697]

[0.3

282]

[0.3

359]

[0.3

377]

[0.4

138]

[0.2

814]

[0.2

851]

[0.5

412]

[0.2

914]

Spou

secl

erks

/ser

vice

wor

kers

−0.

036

−0.

0347

−0.

0059

0.58

91*

0.60

30*

0.53

5−

0.25

740.

1069

0.16

310.

7208

*0.

1691

[0.0

729]

[0.0

725]

[0.0

717]

[0.3

230]

[0.3

308]

[0.3

321]

[0.5

418]

[0.2

964]

[0.3

011]

[0.4

076]

[0.3

073]

Spou

seag

ricu

lture

/fo

rest

ry/fi

sher

y

−0.

2984

***−

0.29

81**

*−

0.28

31**

*0.

278

0.27

170.

1773

−0.

3622

−0.

0008

0.03

640.

2759

0.12

24

[0.0

684]

[0.0

679]

[0.0

666]

[0.3

154]

[0.3

230]

[0.3

264]

[0.4

178]

[0.2

807]

[0.2

845]

[0.3

711]

[0.2

880]

Spou

sesk

illed

/mac

hine

oper

ator

s

−0.

1770

**−

0.17

85**

−0.

1671

**0.

2258

0.20

270.

068

−0.

2605

−0.

1777

−0.

1483

−0.

0719

[0.0

704]

[0.0

698]

[0.0

685]

[0.3

293]

[0.3

365]

[0.3

481]

[0.4

396]

[0.2

921]

[0.2

957]

[0.2

984]

Spou

seun

skill

edw

orke

rs−

0.15

04**

−0.

1511

**−

0.13

59**

0.34

60.

3237

0.19

3−

0.14

67−

0.10

97−

0.07

18−

0.14

020.

0316

[0.0

692]

[0.0

687]

[0.0

673]

[0.3

211]

[0.3

282]

[0.3

317]

[0.4

201]

[0.2

787]

[0.2

827]

[0.3

577]

[0.2

886]

Spou

seno

tw

orki

ng−

0.14

80**

−0.

1481

**−

0.12

97*

0.41

950.

409

0.27

78−

0.05

86−

0.05

95−

0.01

390.

1415

0.06

84

[0.0

695]

[0.0

690]

[0.0

676]

[0.3

200]

[0.3

275]

[0.3

317]

[0.4

060]

[0.2

865]

[0.2

902]

[0.3

665]

[0.2

946]

Lan

dfo

ran

nual

crop

s(h

a)0.

0545

***

0.05

39**

*0.

0527

***

−0.

0414

−0.

0488

−0.

0188

−0.

1637

*−

0.06

45**

−0.

0682

**0.

0845

−0.

0747

**

N.V. Cuong / Research in Economics 63 (2009) 27–54 53

Tabl

eC

.1(c

ontin

ued)

Exp

lana

tory

vari

able

sD

epen

dent

vari

able

s

Log

arith

mof

per

capi

taex

pend

iture

Rec

eipt

ofin

tern

atio

nalr

emitt

ance

s(d

1=

1)R

ecei

ptof

inte

rnal

rem

ittan

ces

(d2=

1)A

llho

useh

olds

All

hous

ehol

dsH

ouse

hold

sw

ithin

tern

alre

mitt

ance

s(d

2=

1)

Hou

seho

lds

with

out

inte

rnal

rem

ittan

ces

(d2=

0)

All

hous

ehol

dsA

llho

useh

olds

Hou

seho

lds

with

inte

r-na

tiona

lre

mitt

ance

s(d

1=

1)

Hou

seho

lds

with

outi

n-te

rnat

iona

lre

mitt

ance

s(d

1=

0)

[0.0

088]

[0.0

088]

[0.0

090]

[0.0

397]

[0.0

414]

[0.0

436]

[0.0

928]

[0.0

314]

[0.0

317]

[0.1

415]

[0.0

321]

Lan

dfo

rpe

renn

ialc

rops

(ha)

0.04

50**

*0.

0448

***

0.04

50**

*0.

0124

0.01

160.

0206

−0.

071

−0.

0202

−0.

021

0.02

81−

0.02

82

[0.0

165]

[0.0

163]

[0.0

166]

[0.0

301]

[0.0

317]

[0.0

298]

[0.1

020]

[0.0

254]

[0.0

253]

[0.0

670]

[0.0

283]

Wat

ersu

rfac

efo

raq

uacu

lture

(ha)

0.12

40**

*0.

1216

***

0.12

04**

*−

0.07

02−

0.10

15−

0.10

04−

0.01

34−

0.21

78**

*−

0.22

26**

*−

0.42

1−

0.20

75**

*

[0.0

238]

[0.0

239]

[0.0

240]

[0.0

798]

[0.0

806]

[0.1

013]

[0.1

281]

[0.0

660]

[0.0

660]

[0.3

190]

[0.0

670]

Obs

erva

tions

9188

9188

9188

9188

9188

7825

1356

9188

9188

560

8625

R-s

quar

ed0.

590.

590.

57Ps

eudo

-R-s

quar

ed0.

100.

110.

100.

190.

040.

050.

100.

04

Rob

usts

tand

ard

erro

rsin

brac

kets

∗Si

gnifi

cant

at10

%,

∗∗

Sign

ifica

ntat

5%,

∗∗∗

Sign

ifica

ntat

1%So

urce

:Est

imat

ion

from

the

2004

VH

LSS

.

54 N.V. Cuong / Research in Economics 63 (2009) 27–54

References

Adams, R., 1989. Workers remittances and inequality in rural Egypt. Economic Development and Cultural Change 38 (1), 45–71.Cochran, W.G., 1968. The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics 24, 295–313.Cochran, W.G., Chambers, S.P., 1965. The planning of observational studies of human population. Journal of the Royal Statistical Society. Series

A (General) 128 (2), 234–266.Frolich, M., 2002. Program evaluation with multiple treatments. Discussion Paper 2002–17 Department of Economics, University of St. Gallen.Heckman, J., 2005. The scientific model of causality. Sociological Methodology 35, 1–97.Heckman, J., Lalonde, R., Smith, J., 1999. The Economics and Econometrics of Active Labor Market Programs. In: Ashenfelter, A., Card, D.

(Eds.), Handbook of Labor Economics, vol. 3. Elsevier Science, Amsterdam.Imbens, G., 1999. The role of the propensity score in estimating dose-response functions. NBER Technical Working Paper 237.Lechner, M., 2001. Identification and estimation of causal effects of multiple treatments under the conditional independence assumption.

In: Lechner, M., Pfeiffer, F. (Eds.), Econometric Evaluation of Labour Market Policies. Physica-Verlag, Heidelberg.Quandt, R., 1972. Methods for estimating switching regressions. Journal of the American Statistical Association 67 (338), 306–310.Rosenbaum, P., Rubin, R., 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70 (1), 41–55.Rubin, D., 1977. Assignment to a treatment group on the basis of a covariate. Journal of Educational Statistics 2 (1), 1–26.Rubin, D., 1979. Using multivariate sampling and regression adjustment to control bias in observational studies. Journal of the American Statistical

Association 74, 318–328.Rubin, D., 1980. Bias reduction using Mahalanobis-metric matching. Biometrics 36 (2), 293–298.Stark, O., Taylor, J.E., Yitzhaki, S., 1986. Remittances and inequality. Economic Journal 28, 309–322.Stark, O., Taylor, J.E., Yitzhaki, S., 1988. Migration, remittances and inequality, a sensitivity analysis using the extended gini index. Journal of

Development Economics 28, 309–322.