Microdata, Heterogeneity And The Evaluation of Public Policy · The Evaluation of Public Policy...

Microdata, Heterogeneity And

The Evaluation of Public Policy

James J. Heckman¤

University of Chicago and The American Bar Foundation

Bank of Sweden Nobel Memorial Lecture in Economic Sciences

December 8, 2000

Stockholm, Sweden

¤This research was supported by NSF 97-09-873 and NICHD-40-4043-000-85-261 and grantsfrom the American Bar Foundation.

On behalf of all economists who analyze microeconomic data and who use microecono-

metrics to unite theory and evidence and evaluate policy interventions of all kinds, I accept

my portion of the Bank of Sweden Prize in Economic Sciences in Memory of Alfred Nobel

for the year 2000.

The …eld of microeconometrics emerged in the past forty years to aid economists in

designing and evaluating public policies and in testing economic theories. It is a truly

scienti…c …eld within economics that links theory to data. A new and powerful research

tool has been created to complement, and improve on, introspection, anecdotal evidence

and time series evidence - the traditional sources of veri…cation in economics.

Research in microeconometrics is data driven. The availability of new forms of data has

raised challenges and opportunities that have stimulated all of the important developments

in the …eld. Research in the …eld is also policy driven. Questions of economic policy that

can be addressed with data motivate much of the research in this …eld. Research questions

in this …eld are also motivated by the desire to test and implement new models developed

in economic theory.

This lecture has several goals. I brie‡y de…ne the sub…eld of economics known as

microeconometrics that has been recognized by the Nobel Committee for the …rst time

this year. By providing this background, I can place my own work and that of my fellow

microeconometricians in the context of the larger …eld of economics. I summarize some

1

of the main contributions of this …eld and their relevance to economics more generally. I

demonstrate the close relationship between the statistical problem of missing data and the

more general problem of constructing counterfactuals and evaluating public policies that

has been a central theme my work in microeconometrics. I also summarize the development

of robust approaches to estimating and testing models and to evaluating alternative policies.

The early literature in this …eld made strong functional form and distributional assump-

tions. The sources of identi…cation of these models were not clearly understood. A major

new emphasis in this literature is on delineating what is convenient from what is essential

in identifying crucial economic and policy parameters, so that we can separate what is in

the data from what we put into it.

Answering di¤erent economic questions usually requires making di¤erent assumptions.

I document developments in the …eld that have moved it away from the possibly over am-

bitious goal of developing general purpose models that answer a variety of policy questions,

but only under strong assumptions, to the more empirically cautious position that char-

acterizes the literature today. I also consider some of the major empirical …ndings from

the micro literature and how they have changed the way we think about policies and the

economy. The principle empirical …nding is that people are diverse and that diversity and

heterogeneity have important implications for how we think about economic life. Finally,

I discuss directions for the future.

2

1. Microeconometrics: A De…nition

Econometrics is a branch of economics that unites economic theory with statistical

methods to interpret economic data and evaluate social programs. It is an empirically

oriented …eld that complements economic theory. Economic theory plays an integral role in

the application of econometric methods because the data almost never speak for themselves,

especially when there are missing data and missing counterfactual states. Econometric

methods use economic theory to guide the construction of counterfactuals and to provide

a discipline on empirical research in economics.

It is easy to forget how recent is the practice of systematic empirical research in eco-

nomics. A major development of Twentieth Century economics was the development of a

large data base that can be used to describe the economy, to test theories about it and to

evaluate public policy. Prior to the Twentieth Century, economics was largely a deductive

discipline that drew on anecdotal observations and on introspection to test theory and

evaluate public policies.

Alfred Marshall’s theoretically fruitful notion of the “Representative Firm” and the

“Representative Consumer” was …rmly rooted in theory by the time economists began the

systematic collection and analysis of aggregate economic data. It also played a crucial role

in Keynesian aggregative analysis. Economic phenomena was summarized by means or

medians and distributional details were swept aside as unimportant.

3

Econometrics as developed by Tinbergen, Frisch, and the Cowles Commission pioneers

produced statistically rigorous and economically justi…ed methods for using aggregate data

to measure business cycles and to build models that could be the basis for a scienti…c,

empirically based, approach to public policy.1 Using linear equation systems, this body

of scholars developed a framework for analyzing causal models and policy counterfactuals.

For the …rst time, causation was distinguished from correlation in a systematic theoretical

way that could be empirically implemented.

Despite these substantial intellectual contributions, empirical results from these meth-

ods proved to be disappointing. Almost from the outset, aggregate time series data were

perceived to be weak and empirical macro models were perceived as ine¤ective in testing

theories and producing policy advice. (Morgan, 1990). With a few notable exceptions,

macroeconometricians turned to the use of statistical time series methods where the link

to economic theory was usually weak.2

Early on, Guy Orcutt (1952), who had developed novel time series methods, recognized

the fragility of aggregate economic time series. He advocated a program of combining micro

and macrodata to produce a more credible description of actual economic processes and to

test alternative economic theories. At the time he set forth his views, the microdata base

1Schultz (1938) pioneered the development of empirical models of consumer demand but data limtationsforced him to rely on time series aggregates for most of his empirical research.

2See however the important work of Hansen and Sargent (1980; 1990), Hansen and Singleton (1982) andFair (1976; 1994) which constitutes an exception to this rule. Heckman, 2000, discusses this development.

4

was small, computers had limited power and a whole host of econometric problems that

arose in using microdata to estimate behavioral relationships were not understood, much

less solved. Nonetheless, Orcutt’s vision was a bold one and he helped set into motion the

forces that produced modern microeconometrics.

Microeconometrics extends Cowles econometrics in important ways by using the indi-

vidual as the unit of observation and by recognizing the diversity of actual economic life

more fully. Microeconometrics also extended the Cowles theory to build richer economic

models where heterogeneity of agents plays a fundamental role and where the equations

being estimated are more closely linked to individual data and individual choice models.

At its heart, economic theory is about individuals (or families or …rms) and their in-

teractions in markets or other social settings. The data needed to test the micro theory

are microdata. The econometric literature on the aggregation problem (Theil, 1954; See

Green, 1964, Fisher, 1970, Hildenbrand, 1986 for surveys) demonstrated the fragility of ag-

gregate data for inferring either the size or the sign of micro relationships. In the end, this

literature produced negative results and demonstrated the importance of using microdata

as the building block of an empirically based economic science. This literature provided a

major motivation for the collection and analysis of microeconomic data.

An even greater motivation was the growth of the modern welfare state and the ensuing

demand for information about the characterization, causation, and solutions to social prob-

5

lems and the public demand for the objective evaluation of social programs directed toward

speci…c groups. Application of the principles of the Cowles paradigm and its extensions by

Theil (1957, 1962), gave rise to a demand for structural estimation based on micro data.

2. Economic Policy, Economic Models and Econometric Policy Evaluation

To provide some background on developments in the literature, it is useful to consider

the various evaluation problems that routinely arise in evaluating the welfare state. Two

conceptually distinct policy evaluation questions were often confused in the early literature

and the confusion lingers in textbooks and popular discussions. Their careful separation is

a major development in this …eld and is a major theme of my talk. Those questions are (1)

“What is the e¤ect of a program in place on participants and nonparticipants compared

to no program at all.” This is what is now called the “treatment e¤ect” problem. The

second and the more ambitious question raised is (2) “What is the likely e¤ect of a new

program or an old program applied to a new environment?” The second question raises the

same problems as arise from estimating the demand for a new good.3 Its answer requires

structural estimation.

It is easier to answer the …rst question than the second although the early literature

attempted to answer both by estimating structural models. A major development in policy

3This question is discussed in basic papers by Gorman (1956; 1980), Quandt (1966, 1970), Lancaster(1966, 1970), McFadden (1974, 1975) among others.

6

evaluation research to which I have contributed has been clari…cation of the exact conditions

that must be satis…ed to answer both types questions.

The goal of structural econometric estimation is to solve a variety of decision problems.

Those decision problems entail such distinct tasks as (a) evaluating the e¤ectiveness of an

existing policy, (b) projecting the likely e¤ectiveness of a policy in di¤erent environments

from the one where it was experienced, or (c) forecasting the e¤ects of a new policy, never

previously experienced.4

Additional bene…ts of structural models are that they can be used to test economic

theory and make quantitative statements about the relative importance of causes within

a theory.5 In addition, structural models based on invariant parameters can be compared

across empirical studies. In this way, empirical knowledge cumulates. However, for certain

important classes of decision problems, knowledge of structural parameters is unnecessary.

This is fortunate because recovering structural parameters is not an easy task. Identifying

minimal conditions on data and models for answering speci…c policy questions is a major

goal of recent research on econometric policy evaluation. The modern treatment e¤ect

literature in economics takes as its main goal the estimation of one or another treatment

4Marschak (1953) stressed these features of structural estimation.5Similar issues arise in estimating the demand for new goods. Structural methods can be used to

estimate the parameters of demand equations in a given economic environment, in forecasting the demandfor goods in a di¤erent environment, and in forecasting the demand for a new good never previouslyconsumed. Knowledge of the parameters of demand functions is crucial in testing alternative theories ofconsumer demand and measuring the strength of complementaries and substitution among goods.

7

e¤ect parameters - not the full range of parameters pursued in structural econometrics-

although the precise question being answered is often not clear.

The Cowles distinctions between endogenous and exogenous variables and the later dis-

tinctions of weak, strong and superexogeneity developed in the literature on estimating

structural parameters and policy forecasting (Engle, Hendry and Richard, 1983) are irrel-

evant in identifying treatment e¤ects. By focusing on one particular decision problem, the

treatment e¤ect literature achieves its objectives under weaker and hence more credible

conditions than are invoked in the structural econometrics literature. At the same time,

the parameters so generated are less readily transported to di¤erent environments to es-

timate the e¤ects of the same policy in a di¤erent setting or the e¤ects of a new policy

and they are di¢cult to compare across studies. The treatment e¤ect literature has to be

extended to make such projections and comparisons and, unsurprisingly, the extensions are

nonparametric versions of the assumptions used by structural econometricians.6

To make my discussion speci…c, but at the same time keep it simple, consider the

prototypical problem of determining the impact of taxes and welfare on labor supply. This

problem motivated the early literature in evaluating the welfare state and remains an

important policy problem down to this day.

Following the conventional theory of consumer demand as empirically implemented by

6This point is developed more fully in Heckman and Vytlacil (2000a,b, 2001).

8

Henry Schultz (1938), write an interior solution labor supply equation of hours of work h

in terms of wages, W , and other variables including assets, demographic structure and the

like. Denote these other variables by X. Let " denote an unobservable. In the most general

form for h,

(1) h = h(W;X; "):

Assume for simplicity that h is di¤erentiable in all of its arguments. Equation (1) is a

Marshallian causal function.7 Its derivatives produce the ceteris paribus e¤ect of a change

in the argument being varied on h. Suppose that we wish to evaluate the e¤ect of a change

in a proportional wage tax on labor supply. Proportional wage taxes at rate t make the

after tax wage W (1 ¡ t). Assume that agents correctly perceive the tax and ignore any

general equilibrium e¤ects of the tax. In the language of treatment e¤ects, the treatment

e¤ect or “causal e¤ect” of a tax change on labor supply de…ned at the individual level is

h(W (1¡ t); X; ")¡ h(W (1¡ t0); X; ") for the same person subject to two di¤erent taxes, t

and t0.

An additively separable version of the Marshallian causal function (1) is

(2) h = h(W;X) + ", E(") = 0:

This version enables the analyst to de…ne the ceteris paribus e¤ects of W and X on h

without having to know the level of the unknown (to the econometrician) unobservable ".

7See Heckman, (2000) or Heckman and Vytlacil (2001) for a rigorous de…nition of Marshallian causalfunctions.

9

A structural version of (1) is

1(a) h = h(W;X; "; µ)

where µ is a low dimensional parameter that generates h. A structural version of (2) is

2(a) h = h(W;X; µ) + ":

The structural parameters reduce the dimensionality of the identi…cation problem from

that of identifying an in…nite-dimensional function to that of identifying a …nite set of

parameters. They play a crucial role in forecasting the e¤ects of an old policy on di¤erent

populations and forecasting the e¤ects of a new policy. A-linear-in-parameters representa-

tion of h writes:

(3) h = ®0X + ¯`nW + "

where we adopt a semi-log speci…cation to represent models widely used in the literature

on labor supply. (See Killingsworth, 1983).

Following Marschak (1953), it is useful to distinguish three di¤erent policy problems.

(1) The case where tax t has been implemented in the past and we wish to forecast the

e¤ects of the tax in a population with the same distribution of (X; ") variables as prevailed

when historical measurements of tax variation were made. (2) A second case where tax

t has been implemented in the past but we wish to project the e¤ects of the same tax

to a di¤erent population of (X; ") variables. (3) A case where the tax has never been

implemented and we wish to forecast the e¤ect of a tax either on an initial population used

10

to estimate (1) or on a di¤erent population.

Suppose that the goal of the analysis is to determine the e¤ect of taxes on average labor

supply on a relevant population with distribution G(W;X;"): In case 1, we have data from

the same population for which we wish to construct a forecast. Suppose we observe di¤erent

tax regimes. Persons face tax rate tj in regime j, j = 1; :::; J . In the sample from each

regime we can identify

(4) E(h j W;X; tj) =Zh(W (1 ¡ tj);X; ")dG(" j X;W; tj):

For the entire population this function is

(5) E(h j tj) =Zh(W (1 ¡ tj);X; ")dG(";X;W j tj):

This function is assumed to apply to the target population. Knowledge of (4) or (5) from

the historical data can be projected into all future periods provided the joint distributions

of data are temporally invariant. If one regime has been experienced in the past, lessons

from it apply to the future, provided that the same h(¢) and G(¢) prevail. No counterfactual

state need be constructed. No knowledge of any Marshallian causal function or structural

parameter is required to do policy analysis for case one. It is not necessary to break apart

(4) or (5) to isolate h or G .

Case two resembles case one except for one crucial di¤erence. Because we are now

projecting the same policy onto a di¤erent population, it is necessary to break (4) or (5)

into its components and determine h(W (1 ¡ tj); X; ") separately from G(";X;W; t). The

11

assumptions required to project the e¤ects of the old policy in a new regime are as follows.

(1) Knowledge of h(¢) is needed on the new population. This may entail determination

of h on a di¤erent support from that used to determine h in an initial sample if the target

population has a di¤erent support than the original source population. At this stage,

structural estimation comes into its own. It sometimes enables us to extrapolate h from

the source population to the target population. A completely nonparametric solution to

this problem is impossible even if we adopt structural additive separability assumption (2a)

unless the supports of target and source populations coincide.

Some structure must be placed on h even if (2a) characterizes the labor supply model.

Parametric structure (3) is traditional in the labor supply literature and versions of a linear

in parameters model dominate applied econometric research.8

Knowledge of G(¢) for the target population is also required. In this context, exogeneity

enters as a crucial facilitating assumption. If we de…ne exogeneity by

(A-1) (X;W; T )?? "

then

G(" j X;W; T ) = G("):9

8The assumption that h(W;X) is real analytic so that it can be extended to other domains is anotherstructural assumption. This assumption is exploited in Heckman and Singer (1984) to solve a censoringproblem in duration analysis.

9There are many de…nitions of this term. Assumption (A-1) is often supplemented by the additionalassumption that the distribution of X does not depend on the parameters of the model (e.g. µ in (2-2a)or (2-1a)). See Engle, Hendry and Richard (1983).

12

In this case, if we assume that the distribution of unobservables is the same in the sample

as in the forecast or target regime, G( ") = G 0("), where G 0(") is the distribution of

unobservables in the target population, we can project to a new population using the

relationship

E(h jW;X; tj) =Zh(W (1 ¡ tj); X; ")dG(")

provided we can determine h(¢) over the new support of X. If, however, G 0 6= G ; G 0

must somehow be determined. This entails some structural assumptions to determine the

relationship between G and G 0.

In the third case, where no tax has previously been introduced, knowledge of the target

population is required. Taxes operate through the term W (1 ¡ t): If there is no wage

variation in samples extracted from the past, there is no way to identify the e¤ect of taxes

on labor supply since by assumption t = 0, and it is not possible to determine the e¤ect

of the …rst argument on labor supply. The problem is only worse if we assume that taxes

operate on labor supply independently of wages. Then, even if there is wage variation, it

is impossible to identify tax e¤ects or project them to a new population.10

10If wages vary in the presample period, it may not be necessary to decompose (4) into h and G, or to dostructural estimation, in order to estimate the e¤ect of taxes on labor supply in a regime that introducestaxes for the …rst time. If the support of W (1 ¡ t) def= W ¤ in the target regime is contained in the supportof W in the historical regime, and the conditional distributions of W ¤ and W given X; " are the same, andthe supports of (X; ") are the same in both regimes, then knowledge of (4) over the support of W in thehistorical regime is enough to determine the e¤ect of taxes in the target regime. More precisely, letting

13

In the optimistic era of the 1960s and 1970s; estimation of policy-invariant structural

parameters on micro data became a central goal of policy oriented econometric analysis to

consider the e¤ects of old policies in new environments and to consider the possible e¤ects

of new policies never tried. A second demand for estimating structural models independent

of policy analysis was to test economic theory. Labor economics in particular had been

enriched by the application of neoclassical theory to the labor market.

A third demand for structural estimation arose from the need to synthesize and interpret

the ‡ood of microdata that began to pour into economics in the mid 1950s. The advent

of micro surveys coupled with the introduction of the computer and the development and

dissemination of multiple regression methods by Goldberger (1964) and Theil (1957; 1962;

1970) made it possible to produce hundreds, if not thousands, of regressions quickly. The

resulting ‡ood of numbers was di¢cult to interpret or to use to test theories or create

an informed policy consensus. A demand for low dimensional economically interpretable

“historical” denote the past data, and “target” denote the target population for projection, we may writethese assumptions as:

(a) Support (W¤)target µ Support (W )historical(b)F (W ¤ j X; ")target = F (W j X; ")historical(c) Support (X;")target = Support (X,")historical

where W ¤ = W (1 ¡ t) for random variables W de…ned in the new regime and (W ¤)target = Whistorical:In this case, no structural estimation is required to forecast the e¤ect of taxes on labor supply in the target

population. A fully nonparametric policy evaluation is possible estimating (4) or (5) nonparametrically(and not decomposing E(h j W;X) into the h(¢) and G(¢) components). Under assumption (a), we may…nd a counterpart value of W (1 ¡ t) = W ¤ in the target population to insert in the nonparametric versionof (4) (or (5)). If these conditions are not met, it is necessary to build up the G and the h functions overthe new supports using the appropriate distributions. We enter the realm where structural estimation isrequired, either to extend the support of the h(¢) functions or to determine G(W j X; ") or both. It is stillnecessary to determine the relationship between W and X in the target population.

14

models to summarize the growing mountains of micro data was created, and there was a

growing recognition that standard regression methods did not capture all of the features

of the data nor did they provide a framework for interpreting the data within well posed

economic models.

3. New Features of Micro Data

The new data revealed patterns and features that were not easily rationalized by stan-

dard models of consumer demand and labor supply or that were well modeled by con-

ventional regression analysis. Important dimensions of heterogeneity and diversity that are

masked in macro data were uncovered. These …ndings challenged the standard econometric

tool kit of the 1960s.

Inspection of cross section data reveals that otherwise observationally identical people

make di¤erent choices, earn di¤erent wages and hold di¤erent levels and compositions of

asset portfolios. Inspection of these data reveals the inadequacy of the traditional repre-

sentative agent paradigm.11 Table 1 presents a typical sample of data on labor supply. A

considerable fraction of people do not work, and we do not observe wages for these people.

The R2 (measure of explained …t) of any micro relationship is typically low. Accounting for

these unobservables is important in interpreting the evidence for making policy decisions.

11Lancaster (1966, 1970); Quandt (1966, 1970), McFadden (1974) and Domencich and McFadden (1975)were among the …rst to question the empirical validity of the representative agent empirical paradigm. SeeKirman (1992) for a recent assessment of the representative agent paradigm.

15

Di¤erent assessments of the unobservables have di¤erent e¤ects on the interpretation of

the evidence. For example, is joblessness due to unobserved tastes for leisure on the part

of workers or a failure of the market to generate wage o¤ers?12 Are women transients in

the labor market or do some women (or most) have a long term attachment to it?13

There are additional problems with using these data that are much less apparent in

analyses of time series data. Wages are missing for nonworkers. How can one estimate the

e¤ect of wages on labor supply using equation (3) if wages are available only for workers?

How can one relate the various dimensions of labor supply (hours of work; work or not

work, number of periods worked) in order to do counterfactual policy analysis?

In confronting the new data, a variety of econometric problems arose: (a) accounting

for discreteness of outcome variables; (b) rationalizing choices made at both the extensive

and intensive margins (models for discrete choice and for discrete and continuous choices)

within a common structural model and (c) accounting for systematically missing data where

prices or wages are missing because of choices made by individuals.

Focusing solely on the statistical aspects of microeconometrics, however, obscures its

basic contributions. After all, many statisticians worried about some of these problems.

Models with discrete outcomes were analyzed by Goodman, 1970; Fienberg, 1974; Haber-

12Flinn and Heckman (1982) analyze this question and show the di¢culty of resolving it using data onmarket choices.

13Heckman and Willis (1977) and Heckman (1981) analyze this question.

16

man, 1974, Bishop, Fienberg and Holland, 1975, although it was economists who pioneered

the study of models with jointly determined discrete and continuous outcomes (Heckman,

1974a,b) and models with systematically missing data (Gronau 1974; Heckman 1974a,b

1976a,b, 1979).14 An important contribution of microeconometrics was to clarify the lim-

itations of these statistical frameworks for estimating economic models, making causal

distinctions and solving both versions of the policy evaluation problem.

Unlike the models developed by statisticians, the class of microeconometric models

developed to exploit and interpret the new sources of microdata emphasized the role of

economics and causal frameworks in interpreting evidence, in establishing causal relation-

ships and in constructing counterfactuals, whether they were counterfactual missing wages

in the analysis of female labor supply or counterfactual policy states. Research in micro-

econometrics demonstrated that it was necessary to be careful in accounting for the sources

of manifest di¤erences among apparently similar individuals. Di¤erent assumptions about

the sources of unobserved heterogeneity have a profound e¤ect on the estimation and eco-

nomic interpretation of empirical evidence and in evaluating programs in place and in using

the data to forecast new policies and assess the e¤ect of transporting existing policies to

new environments.

Heterogeneity due to unmeasured variables became an important topic in this literature

14See 1983, JASA, for a discussion of the originality of the work of econometricians in analyzing modelsfor data not missing at random.

17

because its manifestations were so evident in the data and the consequences of ignoring it

turned out to be so profound. The problem became even more apparent as panel micro

data became available and it was possible to observe persistent di¤erences over time for

the same persons.

4. Potential Outcomes, Counterfactuals and Selection Bias

My initial e¤orts in the …eld of microeconometrics were focused on building models to

capture the central features of data like that displayed in Table 1 within a well posed choice

theoretic model that also could be used to address structural policy evaluation problems

(Question 2 problems). I was inspired by the work of Mincer (1962) on female labor supply

and was challenged by the opportunity of building a precise econometric framework for

analyzing the various dimensions of female labor supply and their relationship with wages.

In accomplishing this task, I drew on two sets of econometric tools that were available

and my attempts to fuse these tools into a common research instrument produced both

frustration and discovery.

The two sets of tools available were to me (1) classical Cowles Commission simultaneous

equation theory and (2) models of discrete choice originating in mathematical psychology

that were introduced into economics by Quandt (1956, 1970) and McFadden (1973, 1975).

My goal was to unite these two literatures in order to produce an economically motivated,

18

low dimensional, simultaneous equations model with both discrete and continuous endoge-

nous variables that accounted for systematically missing wages for nonworkers and di¤erent

dimensions of labor supply within a common framework, that could explain female labor

supply, and that could be the basis for a rigorous analysis of policies never previously

implemented.

The standard model of labor supply embodied in equations (1), (2) or (3) is not ade-

quate to account for the data in Table 1. Neither is Cowles econometrics. Under standard

conditions Cowles methods can account for the correlation between W and " in equation

(3) assuming that wages are available for everyone. Such correlation can arise from mea-

surement error in wages or because of common unobservables in the wage and labor supply

equations (e.g. more motivated people work more and have higher wages and we do not

observe motivation). Cowles methods do not tell us what to do when wages are missing,

how to account for nonworkers, or how to relate the decision to work with the decision on

hours of work.

In a series of papers written in the period 1972-1975 (Heckman 1972; 1974a,b, 1976a,b,

1978),15 I developed the following index model of potential outcomes to unite Cowles econo-

metrics and discrete choice theory as well as to unify the disjointed and scattered literature

15The earliest papers were published in 1974 (Heckman 1974a,b) but widely circulated before then.Heckman (1976b, 1978) was actually written in 1973 and widely circulated at that time to senior econo-metricians (e.g. G.S. Maddala and others). My students used this analysis in their own research. (See,e.g. Nelson and Olson, 1978).

19

on sample selection, truncation and limited dependent variables that characterized the lit-

erature of the day. Following the literature in mathematical psychology and discrete choice

as synthesized and extended by McFadden (1974, 1981) de…ne

(7) Yi = Yi(Xi; Ui) i = 1; :::; I

as latent random variables re‡ecting potential outcomes. In the context of discrete choice,

the Yi are latent utilities associated with choice i and they depend on both observed (Xi)

and unobserved (Ui) characteristics. Within each choice i, the level of the utility may vary.

More generally, following the Cowles program, and in particular Haavelmo (1943), (7) may

represent any potential outcome, including wages, hours of work and the like. Equations

(1) or (7) are Marshallian causal relationships that tell us how hypothetical outcome Yi

varies as we manipulate the arguments on the right hand side.

Depending on the context, the Yi may be directly observed or only their manifestations

may be observed. In models of discrete choice, the Yi are never observed but we observe

argmaxi

fYig.

In the more general class of models I considered, some of the Yi can be observed under

certain conditions.

To consider these models in the most elementary setting, I present a model with three

potential outcome functions. The literature analyzes models with many potential outcomes.

Write the potential outcomes in additively separable form as

20

(8) Y0 = ¹0(X) + U0

Y1 = ¹1(X) + U1

Y2 = ¹2(X) + U2:

These are latent variables that may be only imperfectly observed. In the context of the

neoclassical theory of labor supply, the theory of search, and the theory of consumer de-

mand, the reservation wage or reservation price at zero hours of work (zero demand for the

good) plays a central role. It informs us what price it takes to induce someone to work the

…rst hour or buy the …rst unit of a good. Denote this potential reservation wage function

by Y0. Let Y1 be the market wage function - what the market o¤ers. Ignoring any …xed

costs of work, a person works (D = 1) if

(9) Y1 ¸ Y0 () D = 1

otherwise the person does not work. Potential hours of work Y2 are generated from the

same preferences that produce the reservation wage function so Y2 and Y0 are generated

by a common set of parameters. In my 1974a paper, I produced a class of simple tractable

functional forms where

10(a) Y0 = ln R = log reservation wage

10(b) Y1 = lnW = market wage

10(c) Y2 =1°(`n W ¡ `n R); ° > 0

and observed hours of work are written as

21

h = Y21(`nW ¸ `nR) = 1°(`nW ¡ `nR)1(`nW ¸ `nR):

Proportional taxes or transfers t introduce another source of variation into these equations

so that in place of W one uses the after tax wage W (1 ¡ t). The unobservables U1 and

U0 account for why otherwise observationally identical people (same X) make di¤erent

choices.16

Closely related to this model is the pioneering model of Roy (1951) on self-selection in

the labor market that was rediscovered in the 1970s: His model is a version of the model

for index functions just presented.17 From equation (8), Y0 and Y1 are potential outcomes

and Y2 is a latent utility.

(11) Y2 ¸ 0 () D = 1; and Y1 observed

Y2 < 0 () D = 0; and Y0 observed.

Thus observed Y is

16In Heckman (1974b) I present a more explicit structural model of labor supply, child care and wagesthat develops, among others things the …rst rigorous econometric framework for analyzing the e¤ect ofprogressive taxes on labor supply and the e¤ect of informal markets on labor supply. In that paper, Icharacterized preferences by the marginal rate of substitution function and generate Y0 and Y2 from theconsumer indi¤erence curves and produce Y2, from a solution of consumer …rst order conditions and thebudget constraint. In that model the unobservables a¤ecting preferences translate into variation acrossconsumers (or workers) in the slopes of indi¤erence curves. Characterizing consumer preferences by theslopes of indi¤erence curves facilitates the analysis of labor supply with kinked income tax schedules andprovides a more ‡exible class of preferences than is produced by simple linear or semilog speci…cations oflabor supply equations.

The analysis of progressive taxes for the convex case appears in an appendix to Heckman (1974b).Because of space limitations, the editor T.W. Schultz, requested a condensed presentation.The formalanalysis was published later in Heckman and MaCurdy (1981, 1985). Hausman (1980, 1985) extends thisanalysis to the non convex case.

17Roy develops an economic model of income inequality and sorting but does not consider any of theeconometric issues arising from his model.

22

(12) Y = DY1 + (1 ¡D)Y0:

In the original Roy model, Y2 = Y1 ¡ Y0. In the Generalized Roy Model, Y2 is more freely

speci…ed but may depend on Y1 and Y0.

These models of potential outcomes contain several distinct ideas. (1) As in the Cowles

Commission analyses, there is a hypothetical superpopulation of potential outcomes de…ned

by possible values assumed by the Yj for ceteris paribus changes in the X and the U .

These are models of Marshallian causal functions usually represented by low dimensional

structural models to facilitate forecasting of policy e¤ects to new regimes. (2) Unlike the

Cowles models, but like the models for discrete choice, some of the latent variables are

not observed (e.g. `n R is not observed but is sometimes elicited by a questionnaire.) (3)

Unlike either the Cowles model or the discrete choice model, some of the latent variables

are observed, but only as a consequence of choices, i.e. they are observed selectively.

Thus we observe (`n W ¡ `n R) up to scale only if `n W ¸ `n R (D = 1). We observe

wages only if `n W ¸ `n R (D = 1): This selective sampling of potential outcomes gives

rise to the problem of selection bias. We only observe selected subsamples of the latent

population variables. In the context of the Roy model, we observe Y0 or Y1 but not both.

If there were no unobservables in the model, this selective sampling would not be a

cause for any concern. Conditioning on X, we would obtain unbiased estimates of the

missing outcomes for those who do not work using the outcomes of those who do not work.

23

Yet the data in Table 1, which are typical, reveal that the observables explain only a small

fraction of the variance of virtually all microeconomic variables. Accounting for observables

typically explains only a tiny fraction of the variance of outcomes. It is necessary to account

for heterogeneity in preferences and selective sampling on unobservables. As a consequence

of selection rule (9), in general the wages and hours we observe are a selected sample of

the potential outcomes from the larger population.18 Accounting for this is a major issue

if we seek to estimate structural relationships (the parameters of the causal functions) or

describe the world of potential outcomes (the equations such as 10(a)-10(c)). This gives

rise to the problem of selection bias. Before turning to this topic, let me brie‡y discuss how

the analysis of discrete choice and mixed continuous-discrete choice challenged conventional

Cowles econometrics, and demonstrated the inadequacy of conventional statistical models

for making causal distinctions.

Coherency Conditions and Causality

The theory of discrete choice and mixed discrete-continuous choice challenged the

received Cowles paradigm by linking econometrics more closely to choice and decision

processes. Consider a prototypical simultaneous equations model for two endogenous vari-

ables (Y1; Y2) written as a function of exogenous variable X:

18If (9) applies than there must be selection bias in observing wages or reservation wages except fordegenerate cases. (Heckman, 1993). The selection in potential hours is an immediate consequence of (9)since D = 1 () ln W ¡ ln R ¸ 0:

24

(13) ®11Y1 + ®12Y2 = X¯1 + U1

®21Y1 + ®22Y2 = X¯2 + U2:

E(U1; U2 j X) = (0; 0):

The Cowles group developed an elaborate theory for identi…cation and estimation in this

class of models when Y1 and Y2 are continuous variables of the type that appear in conven-

tional market equilibria or interior solution demand equations.

How well does this theory transport to settings where (Y1; Y2) are discrete or mixed-

discrete continuous? These questions were addressed in papers by Amemiya (1974), Heck-

man (1976b, 1978), Gourioux, La¤ont and Monfort (1980) and Schmidt (1981), in di¤erent

settings. First note that if Y1 and Y2 are discrete, (U1; U2) cannot be continuous. In par-

ticular, conventional normality assumptions for (U1; U2) are inappropriate. Rede…ning (13)

to a model where Y1 and Y2 are latent variables with

D1 = 1(Y1 > 0) and D2 = 1(Y2 > 0)

preserves the Cowles paradigm.19 However, the behavioral content of the model is not

clear and involves relationships among latent variables which are di¢cult to motivate with

a precise theory. This model is more an analogy with conventional simultaneous equations

methods than a model that has a …rm economic motivation.20

19This model was developed in Heckman (1973, 1976b, 1978) and Mallar (1977).20The tobit version of this model was developed in Heckman (1973, 1976b, 1978) and by my student

Larry Olson working with Forrest Nelson (1978).

25

I also investigated a model with a clearer behavioral foundation. In this model, dummy

variables shift the equations determining the latent indices. Thus (Y1; Y2) are latent vari-

ables and

14(a) Y1 = °11D1 + °12D2 +X¯1 + U1

Y2 = °21D1 + °22D2 +X¯2 + U2

where

14(b) D1 = 1(Y1 ¸ 0) and D2 = 1(Y2 ¸ 0):

This model has the strange feature that Yi causes Di but Di also causes Yi. Since the

two are mechanically related from 14(b), it seems more natural to set °11 = °22 = 0. But

even in this case, we encounter logical problems. D1 = 0 can coexist with Y1 > 0. This

produces models with negative probabilities or probabilities that exceed one. Mechanical

application of Cowles methods breaks down. To rule out these pathologies for a general

(U1; U2) requires

(15) °12°21 = 0 (Coherency Condition).

This condition seems to rule out true simultaneity and force models into a recursive frame-

works. In fact, it eliminates bad economic models. In every application, the coherency

condition has a clear economic interpretation (see, e.g. Heckman (1976b), Ransom (1988)

and Blundell and Smith (1994).) Heckman and MaCurdy (1985) present a comprehensive

examination of the coherency condition.

26

The model (14a) and (14b) allows one to make the spurious vs. true causality distinc-

tions central to Cowles econometrics but with more general types of endogenous variables.

Assuming that °11 = °12 = °22 = 0 but °21 6= 0 a true e¤ect of D1 on D2 exists if

°21 6= 0 but a spurious e¤ect arises if D1 ?=? U2: A properly reformulated Cowles model can

still make the crucial Cowles distinctions between causal and statistical associations and

methods were developed in the literature.21

This type of analysis also illustrates the bene…ts of the econometric approach link-

ing statistics to economics. Log linear models for analyzing discrete data developed by

statisticians (see e.g., Goodman (1970) and Bishop, Fienberg and Holland, 1975) cannot

distinguish the °21 e¤ect from the U2 ?=? D1 e¤ect and so cannot be used to make causal

distinctions. (Heckman, 1978). Panel data extensions of these models can be used to dis-

tinguish whether past occurrences of an event causally a¤ect the probability of occurrence

of future events or just stand in for unmeasured components i.e. they can address causal

questions in a dynamic setting, a topic I consider in Section 6.

Before turning to dynamic issues, let me consider the selection problem more generally

and review the body of work on it.

21Heckman (1976b), estimates 12(a) and 12(b) under the coherency condition. Y1 is an index of senti-ment in favor of passing an anti-discrimination law and Y2 is a measure of the impact such as wages oremployment.

27

5. Selection Bias and Missing Data

The problem of selection bias in economic and social statistics is more general than I

have described it in Section 4 and potentially arises when a rule other than simple random

sampling is used to sample the underlying population that is the object of interest. The

distorted representation of a true population in a measured sample as a consequence of a

sampling rule is the essence of the selection problem. The inferential problem is to recover

features of the hypothetical population from the observed sample. (See Figure 1).

Distorting selection rules may be the outcome of decisions of sample survey statisticians

in addition to the economic self-selection decisions previously discussed where, as a conse-

quence of self selection, we only observe subsets of a population of potential outcomes (e.g.

Y0 or Y1 in the Roy model). The early work in econometrics focused on self-selection deci-

sions and their consequences for estimating structural equations, but the selection problem

is more general.

A random sample of a population produces a description of the population distribution

of characteristics that has desirable properties. One attractive feature of a random sample

generated by the known rule that all individuals are equally likely to be sampled is that

it produces a description of the population distribution of characteristics that becomes

increasingly accurate as sample size expands. It provides a full enumeration of the models

of potential outcomes presented in the previous sections.

28

A sample selected by any rule not equivalent to random sampling produces a description

of the population distribution of characteristics that does not accurately describe the true

population distribution of characteristics no matter how big the sample size. Unless the rule

by which the sample is selected is known or can be recovered from the observed population,

the selected sample cannot be used to produce an accurate description of the underlying

population. For certain sampling rules, even knowledge of the rule generating the sample

does not su¢ce to recover the population distribution from the sampled distribution.

Two characterizations of the selection problem are fruitful. The …rst, which originates

in statistics, involves characterizing the sampling rule depicted in Figure 1 as applying

a weighting to the hypothetical population distribution. The second, which originates in

econometrics, explicitly treats the selection problem as a missing data problem and, in its

essence, uses observables to impute unobservables.

(i) Weighted Distributions

Any selection bias model can be described by the following model of weighted distri-

butions. Let Y be a vector of outcomes of interest and let X be a vector of “control” or

“explanatory” variables. The population distribution of (Y;X) is F (y; x). To simplify the

exposition we assume that the density is well de…ned and write it as f(y; x).

Any sampling rule can be interpreted as producing a non-negative weighting function

29

!(y; x) that alters the population density. Let (Y ¤; X¤) denote the sampled random vari-

ables. The density of the sampled data g(y¤; x¤) may be written as

(16) g(y¤; x¤) =!(y¤; x¤)f(y¤; x¤)Z!(y¤; x¤)f(y¤; x¤)dy¤dx¤

where the denominator of the expression is introduced to make the density g(y¤; x¤) inte-

grate to one as is required for proper densities.

Sampling schemes for which !(y; x) = 0 for some values of (Y;X) create special problems

because not all values of (Y;X) are sampled. For samples in which !(y; x) = 0 for a non-

negligible proportion of the population, it is useful to consider two cases. A truncated

sample is one for which the probability of observing the same from the larger random

sample is not known. For such a sample, (16) is the density of all the sampled Y and X

values. A censored sample is one for which the probability is known or can be consistently

estimated.

In many problems in economics, attention focuses on f(y; x), the conditional density

of Y given X = x. In such problems, knowledge of the population distribution of X is

of no direct interest. If samples are selected solely on the x variables (“selection on the

exogenous variables”), !(y; x) = !(x) and there is no problem about using selected samples

to make valid inference about the population conditional density. Sampling on both y and

x is termed general strati…ed sampling.

From a sample of data, it is not possible to recover the true density f(y; x) with-

30

out knowledge of the weighting rule. On the other hand, if the weighting rule is known

(!(y¤; x¤)), the density of the sampled data is known (g(y¤; x¤)), the support of (y; x) is

known and !(y; x) is nonzero, then f(x; y) can always be recovered because

(17)g(y¤; x¤)!(y¤; x¤)

=f(y¤; x¤)Z

!(y¤; x¤)f(y¤; x¤)dy¤dx¤

and by hypothesis both the numerator and denominator of the left-hand side are known, and

we knowZf(y¤; x¤)dy¤dx¤ = 1; so it is possible to determine

Z!(y¤; x¤)f(y¤; x¤)dy¤dx¤.

Choice based sampling, length biased sampling and all other sampling plans with known

non-negative weights or weights that can be estimated without having to estimate the full

structure of the model are fundamentally easier to correct for than selection on unobserv-

ables where the weights are not known, and must be estimated as part of the model.22

The requirements that (a) the support of (y; x) is known and (b) !(y; x) is nonzero are

not innocuous. In many important problems in economics, requirement (b) is not satis…ed:

the sampling rule excludes observations for certain values of (y; x) and hence it is impossible

without invoking further assumptions to determine the population distribution of (Y;X)

at those values. If neither the support nor the weight is known, it is impossible, without

22Selection with known weights has been studied under the rubric of the Horvitz-Thompson estimatessince the mid 50s: Rao (1963, 1986) summarizes this research in statistics. Important contributions tothe choice based sampling literature in economics were made by Manski and Lerman (1977), Manski andMcFadden (1981) and Cosslett (1981). Imbens (1993) is an important recent addition to this literature.Length biased sampling is analytically equivalent and has been studied since the late 19th Century byDanish actuaries. See also Sheps and Menken (1973) and Trivedi (1984). Heckman and Singer (1985) extendthe classical analysis of length biased sampling in duration analysis to consider models with unobservablesdependent across spells and time varying variables.

31

invoking strong assumptions, to determine whether the fact that data are missing at certain

y; x values is due to the sampling plan or that the population density has no support at

those values. Using this framework, Heckman (1986) analyzes a variety of sampling plans

of interest in economics, showing what assumptions they make about the weights and

the model to solve the inferential problem of going from the observed population to the

hypothetical population.

(ii) A Regression Representation of the Selection Problem

A regression version of the selection problem originating in the work of Gronau (1974),

Heckman (1976a,b, 1978) and Lewis (1974) starts from the Roy model, using (8), and

assuming (U0; U1; U2)??X;Z. It is also closely related to Telser’s characterization of simul-

taneous equations bias in a conventional Cowles system like (13). (See Telser, 1964 and

Heckman, 1978, 2000). We use Z to denote variables that a¤ect choices while the X a¤ect

outcomes. There may be variables in common in X and Z. We observe Y (see (12)). Then

18(a) E(Y j X;Z;D = 1) = E(Y1 j Z;X;D = 1) = ¹1(X) + E(U1 j Z;X;D = 1)

and

18(b) E(Y j X;Z;D = 0) = E(Y0 j X;Z;D = 0) = ¹0(X) + E(U0 j Z;X;D = 1):

De…ne P (z) = Pr(D = 1 j Z = z). As a consequence of decision rule (11), in Heckman

(1980) I demonstrate that under general conditions we may always write these expressions

as

32

19(a) E(Y j X;Z;D = 1) = ¹1(X) +K1(P (z))

19(b) E(Y j X;Z;D = 0) = ¹0(X) +K0(P (z))

where K1(P (z)) and K0(P (z)) are “control functions” or bias functions as de…ned by

Heckman and Robb (1985, 1986) and depend on Z only through P: The functional forms

of the K depend on speci…c distributional assumptions. See Heckman and MaCurdy, 1985,

for a catalogue of examples.

P (z) measures the selection and K1 is the selection bias. The value of P is related to

the magnitude of the selection bias. As samples become more representative, P (z) ¡! 1;

K1(P ) ¡! 0 (alternatively, as P (z) ¡! 0; K0(P ) ¡! 0): See Figure 2. We can compute

the population mean of Y1 in samples with little selection. In general, regressions on

selected samples are biased for ¹1(x). We confuse the selection bias term with the function

of interest. If there are variables in Z not in X, regressions on selected samples would

indicate that they “belong” in the regression. Representation 19(a) and 19(b) is the basis

for an entire econometric literature on selection bias in regression functions.23

The control functions relate the missing data (the U0 and U1) to observables. Under a

variety of assumptions, it is possible to form these functions up to unknown parameters and

identify the ¹0(x); ¹1(x) and the unknown parameters from regression analysis, and control

for selection bias. (See Heckman, 1976a, Heckman and Robb, 1985, 1986 and Heckman

23Heckman, Ichimura, Smith and Todd (1998) present methods for testing the suitability of this expres-sion in a semiparametric setting.

33

and Vytlacil, 2001).

Representations (18) and (19) were originally derived assuming that the U were joint

normally distributed:

(A-2) (U0; U1; U2) s N(0;P

):

This assumption coupled with the assumption

(A-3) (U0; U1; U2)??(X;Z)

produces precise functional forms for K1 and K0. For censored samples, a two step estima-

tion procedure was developed. (1) Estimate P (z) from data on the decision to work or not

and (2) using an estimated P (z) form K1(P (z)) and K0(P (z)) up to unknown parameters.

Then 19(a) and 19(b) could be estimated using regression. This produced a convenient ex-

pression linear in the parameters when ¹1(x) = x¯1 and ¹0(X) = x¯0.24 A direct one step

regression produced was developed for truncated samples. (See Heckman and Robb, 1985.)

Equations 19(a) and 19(b) became the basis for an entire literature which generalized and

extended the early models, and remains active to this day.

(iii) Empirical Results From The Model and Their E¤ects on Economic Analysis

This framework provided a useful framework for investigating microeconomic phenom-

24Corrections for using estimated P (z) in …rst stage estimation are given in Heckman (1979) and Neweyand McFadden (1994). Assumptions (A-2) and (A-3) were also used to estimate the model by maximumlikelihood as in the papers of Heckman (1974a,b). The early literature was not clear about the sources ofidenti…cation, whether exclusion restrictions were needed and the role of normality.

34

enon from selected samples. Versions of this model have been applied to a variety of

problems in economics besides investigations of labor supply and wages.

These ideas shape the way we interpret economic and social data and gauge the e¤ec-

tiveness of social policy. Consider, for example, the important question of whether there

has been improvement in the economic status of African Americans. As depicted in Figure

3a, the median black-white male wage ratio increased in the US over the period 1940-1980

and then stabilized. (See the dark curve in Figure 3a). This statistic is widely cited as

justi…cation for a whole set of social policies put into place in this period. Over the same

period, blacks were withdrawing from the labor force, (P (z) is going down) and hence from

the statistics used to measure wages, at a much greater rate than were whites. (See Figure

3b). Correcting for the selective withdrawal of low wage black workers from employment

reduces and virtually eliminates black male economic progress compared to whites and

challenges optimistic assessments of African-American economic progress.25

These issues have much wider generality. They a¤ect the way we think about inequality

and the e¤ects on employment and welfare of alternative ways of organizing the labor

market. In European discussions, the low wage, high inequality U.S. labor market is often

compared unfavorably to high wage, low inequality European labor markets.

These comparisons founder on the same issues that arise in discussing black-white wage

25Butler and Heckman (1977) …rst raised this issue. Subsequent research by Brown (1982), Juhn (1998),Chandra (1999) and Heckman, Lyons and Todd (1999) verify the importance of accounting for dropouts.

35

gaps. In Europe, the unemployed and the nonemployed are not counted in computing

the wage measures. This understates wage inequality and overstates wage levels for the

entire population by only counting the workers. A recent paper by Blundell, Stoker and

Reed (1999) indicates the importance of the selection problem in the English context. The

English data reveal a growth in the real wages of workers over the period 1978-1994. (See

top curve in Figure 4(a)) . At the same time, the proportion of persons working has declined

(see Figure 4(b)) and accounting for dropouts reduces the level and rate of growth of real

wages. Similar adjustments widen wage inequality.

Accounting for selection also a¤ects measures of wage variability over the cycle. (Bils,

1986). Low wage persons drop out of the work force statistics in recessions and return in

booms. Changing composition partly o¤sets measured wage variability. Thus measured

wages appear to be exhibit “too little variability” over the business cycle. When a Roy

model of comparative advantage is estimated, the argument becomes more subtle. Over

the cycle, not only is there entry and exit from the workforce but there are movements

of workers across sectors within the workforce. Thus measured wages are not the price

of labor services. In addition to the simple selection e¤ect, measured wages include the

weighting placed on di¤erent mixed of skills used by workers over the cycle. (Heckman and

Sedlacek, 1985).

Accounting for both the extensive and intensive margin a¤ects our view of labor mar-

36

kets. As another example, consider equation (2). We only observe the labor supply for

workers. To focus on the selection problem assume, contrary to the fact, that wages are

observed for everyone, and ignore any endogeneity in wages (soW??"). Then lettingD = 1

denote work, the observed labor supply conditional on W and X is

(20) E(h j W;X;D = 1) = h(W;X) + E(" j W;X;D = 1):

where the Marshallian labor supply parameter or causal parameter for wages is@h@W; is the

ceteris paribus change in labor supply due to a change in wages. Compensating for income

e¤ects, we can construct a utility constant labor supply function from this to use conduct a

welfare analysis e.g. to compute measures of consumer surplus. But a labor supply function

…t on a selected sample of workers identi…es two wage e¤ects: the Marshallian e¤ect and a

compositional or selection e¤ect due to entry and exit from the work force. Thus

(21)@E(h jW;X;D = 1)

@W=@h(W;X)@W

+@E(" j W;X;D = 1)

@W:

The second term is a selection e¤ect, or compositional e¤ect, arising from the change in

the composition of the unobservables due to the entry or exit of people into the workforce

induced by the wage change This is not a ceteris paribus change corresponding to the

parameters of classical consumer theory. Equation (21) does not tell us how much a current

worker would change her labor supply when wages change. However, it does inform us of

what an in-sample wage change would predict for average labor supply. It answers a

treatment e¤ect question. Under proper conditions on the support of W it can be used

37

to estimate the within-sample e¤ects of taxes on labor supply.26 Moreover, if there are

variables that determine participation but do not determine hours of work, they would

show up in estimated hours of work equations.

Aggregate labor supply elasticities are inclusive of the e¤ects of entry and exit into

the workforce as well the e¤ects of movement along a Marshallian labor supply curve.

This simple observation has had substantial e¤ects on the speci…cation, estimation and

interpretation of labor supply in macroeconomic models.

In the early 1980s, a literature in macroeconomics arose claiming that aggregate labor

supply elasticities were too small and wage movements were too large for a neoclassical

model of the labor market to explain the U.S. time series data. I have already discussed

why measured wages variation understates the variation in the price of labor. In Heckman

(1983), I go on to note that the macro literature focused exclusively on the interior solution

labor supply component - the …rst term on the right of (21)-and ignored the entry and exit

of workers from the labor force. Since half of the aggregate labor supply movements are

at the extensive margin (Coleman, 1983, 1984), where the labor supply elasticity is higher,

the standard 1980’s calculations understated the true aggregate labor supply elasticity and

hence understated the ability of a neoclassical labor supply model to account for ‡uctuations

in aggregate labor supply. Accounting for choices at the extensive margin changed the way

26The required condition is (a) in Footnote 9 of Section 2.

38

macroeconomists perceived and modeled the labor market. (See, e.g. Hansen, 1985, and

Rogerson, 1986).

Empirical developments in the labor supply literature reinforced this conclusion. Early

on, the evidence called into question the empirical validity of model (10). Fixed costs of

work make it unlikely that the index for hours of work is as tightly linked to the participation

decision as that model suggests. When workers jump into the labor market, they tend to

work a large numbers of hours, not a small number of hours as (10) suggest if the U are

normally distributed. Heckman (1976a, 1980) and Cogan (1981) proposed a more general

model in which participation and hours of work equations are less tightly linked. This

produces an even greater elasticity for the second term in equation (21). The evidence also

called into question the validity of the normality assumption, especially for hours of work

data. Inspection of hours distributions from many countries reveal spiking at standard

hours of work.27 This led to developments to relax the normality assumption.

(iv) Identi…cation

Much of the econometric literature on the selection problem combines discussions of

identi…cation with discussions of estimation in solving the inferential problem of going from

observed samples to hypothetical populations.28 It is analytically useful to distinguish the

27See the articles in the Journal of Human Resources, 1990, special issue of taxes and labor supply.28See Heckman, 2000, for one precise de…nition of identi…cation.

39

conditions required to identify the selection model from ideal data from the numerous

practical and important problems of estimating the model. Understanding the sources of

identi…cation of a model are essential to understanding how much of what we are getting

out of an empirical model is a consequence of what we put into it.

At a conference at ETS in 1985 that brought together economists and statistician, and

provided some useful contrasts in points of view on causal modelling, (see Wainer (1986);

reissued, 2000).

Holland (1986) used the law of iterated expectations to write the conditional distribution

of an outcome, say Y1 on X in the following form:

(22) F (Y1 j X) = F (Y1 j X;D = 1)Pr(D = 1 j X)+ F (Y1 j X;D = 0)Pr(D = 0 j X).

From the analysis of (11) and (12), we observe Y1 only if D = 1. In a censored sample,

we can identify F (Y1 j X;D = 1);Pr(D = 1 j X) and hence Pr(D = 0 j X). We do not

observe Y1 when D = 0. Hence, we do not identify F (Y1 j X): In independent work, Smith

and Welch (1986) made a similar decomposition of conditional means (replace F with E).

Holland questioned how one could identify the model, and brie‡y compared selection

models with other approaches. Smith and Welch and some of the authors at the ETS

conference discussed how to bound F (Y1 j X) (E(Y1 j X)) by placing bounds on the

missing components F (Y1 j X;D = 0)(E(Y1 j X;D = 0) respectively). A clear precedent

for this idea was the work of Peterson (1976) who developed nonparametric bounds for the

40

competing risk model of duration analysis which is mathematically identical to the Roy

model of equations (11) and (12).29

Heckman and Honoré (1990) consider the identi…cation of the Roy model under a variety

of conditions. We establish that under normality, the model is identi…ed even if there are no

regressors so there are no exclusion restrictions. We establish that the model is identi…ed

(up to subscripts) even if we observe only Y , but do not know if it is a Y1 or Y0. The

original normality assumption used in selection models was powerful medicine indeed.30

We develop a nonparametric non-normal model and establish conditions under which

variation in regressors over time or across people can identify the model nonparametrically.

One can replace distributional assumptions with variation in the data to identify the Roy

version of the selection model. Heckman and Smith (1998) extend this line of analysis to the

Generalized Roy model. It turns out that decision rule (12) plays a crucial role in securing

identi…cation of the selection model. In a more general case, even with substantial variation

in regressors across persons or over time, only partial identi…cation of the full selection

model is possible. When models are not identi…ed, it is still possible to bound crucial

parameters and an entire literature has grown up elaborating this idea. See Appendix A

for a discussion of this literature.

29The competing risks model replaces Max(Y0; Y1) with Min(Y0; Y1):30Powerful, but testable. The model is over identi…ed. See for example the tests by Bera, Jarque and

Lee (1984) for the tests of distributional assumptions within a class of limited dependent variable models.

41

6. Microdynamics and Panel Data: Heterogeneity vs. State Dependence and

Life Cycle Labor Supply

The initial microdata were largely cross sections. Thus early work on discrete choice,

limited dependent variables, and models with mixed continuous-discrete endogenous vari-

ables was cross sectional in nature and focused exclusively on explaining variation over

people at a point in time. This gave rise to multiple interpretations of the sources of the

unobservables in (7) and (8). The random utility models interpreted these as temporally

independent preference shocks, (McFadden, 1973) especially when discrete choice was con-

sidered. Other interpretations were (a) systematic variations in unobserved preferences

that were stable over time and (b) omitted characteristics of choices and agents which may

or may not be stable over time.31

With the advent of panel data in labor economics, an accomplishment due in large part

to James N. Morgan and his group at the Institute For Survey Research at the University

of Michigan, it was possible to explore these sources of variation more systematically. The

issue was especially important in the study of the female labor supply.32

Mincer (1962) used an implicit version of the random utility model to argue that cross

section labor force participation data could be used to estimate Hicks-Slutsky income and

31Heckman and Synder (1997) consider the history of these ideas.32See Sta¤ord et. al. (2000).

42

substitution e¤ects. His idea was that h in equation (1) measured the fraction of lifetime

that people worked and that if time were perfectly substitutable over time the timing of

labor supply was irrelevant and could be determined by draw of a coin. Then a regression

of labor force participation rates, on W could identify a Hicks-Slutsky wage e¤ect.33 Ben

Porath (1973) interpreted labor force participation as a corner solution and showed that

a regression of labor force participation rates on wages would identify parameters from a

distribution of tastes for work, and not the Hicks-Slutsky substitution e¤ect (i.e. it would

de…ne the parameters of Pr(D = 1 j X) from equation (9)).

This issue is also important in understanding employment and unemployment data. A

frequently noted empirical regularity is that those who are unemployed in the past or have

worked in the past are more likely to be unemployed (or work) in the future. Is this due to

the e¤ect of being unemployed or working - a treatment e¤ect - or is it a manifestation of

a stable trait e.g. some people are lazier than others, i.e. observables are persistent. An

entire theory if macropolicy was built around the premise that promoting work, through

macro policies would foster higher levels of employment. (Phelps, 1972).

In a series of papers, I use panel data to investigate these issues. One set of studies

builds on the model of equations 10(a)-10(c) but places them in a life cycle setting. My work

on life cycle labor supply (Heckman, 1974, 1976c), demonstrated that the marginal utility

33Heckman, 1978, provides a formal analysis and the relationship to the random utility model studiedby McFadden.

43

of wealth constant (“Frisch”) demand functions were the relevant concept for analyzing the

evolution of labor supply over the life cycle in environments of perfect certainty or with

complete contingent claims markets. Building on this work, Tom MaCurdy and I (1980),

drawing on independent work by MaCurdy (1978, 1981), formulated and estimated a life

cycle version of the model of equations (10) that interpreted one of the key unobservables

in the model the marginal utility of wealth, ¸. In the economic settings we assumed, ¸ is a

stable unobservable derived from economic theory. The models we developed extended, for

the …rst time, models for limited dependent variables, systematically missing data and joint

continuous-discrete endogeneous variables to a panel setting.34 Our evidence suggests that

a synthesis of the views of Ben Porath and Mincer was appropriate, and a pure random

utility speci…cation was inappropriate. This framework has been extended to account for

human capital and uncertainty in important papers by Altug and Miller (1990; 1998).

In related work, I generalized the static cross sectional models of discrete choice to a

dynamic setting, and used this generalization to address the problem of heterogeneity vs.

state dependence. The fundamental problem can be understood most simply by considering

the following urn schemes. (Heckman, 1981).

In the …rst scheme there are I individuals who possess urns with the same content of

red and black balls. On T independent trials individual i draws a ball and then puts it

34Deaton et.al. (1985) adopt this idea to repeated cross section data.

44

back in his urn. If a red ball is drawn at trial t, person i experiences the event. (e.g. is

employed, is unemployed, etc.). If a back ball is drawn, person i does not experience the

event. This model corresponds to a simple Bernoulli model and captures the essential idea

underlying the choice process in McFadden’s (1973) work on discrete choice. From data

generated by this urn scheme, one would not observe the empirical regularity that person

who experience the event in the past are more likely to experience the event in the future.

Irrespective of their event histories, all people have the same probability of experiencing

the event.

A second urn scheme generates data that would give rise to a measured e¤ect of past

events on current events solely due to heterogeneity. In this model, individuals possess

distinct urns which di¤er in their composition of red and black balls. As in the …rst model,

sampling is done with replacement. However, unlike the …rst model, information concerning

an individual’s past experience of the event provides information useful in locating the

position of the individual in the population distribution of urn compositions.

The person’s past record can be used to estimate the person-speci…c urn composition.

The conditional probability that individual i experiences the event at time t is a function

of his past experience of the event. The contents of each urn are una¤ected by actual

outcomes and in fact are constant. There is no true state dependence.

The third urn scheme generates data characterized by true state dependence. In this

45

model individuals start out with identical urns. On each trial, the contents of the urn

change as a consequence of the outcome of the trial. For example, if a person draws a red

ball, and experiences the event, additional new red balls are added to his urn. Subsequent

outcomes are a¤ected by previous outcomes because the choice set for subsequent trials is

altered as a consequence of experiencing the event.

A variant of the third urn scheme can be constructed that corresponds to a renewal

model. In this scheme, new red balls are added to an individual’s urn on successive drawings

of red balls until a black ball is drawn, and then all of the red balls added to the most

recent continuous run of drawings of red balls are removed from the urn. The composition

of the urn is then the same as it was before the …rst red ball in the run was drawn. A

model corresponding to …xed costs of labor force entry is a variant of the renewal scheme

in which new red balls are added to an individual’s urn only on the …rst drawn of the red

ball in any run of red draws.

The crucial feature that distinguishes the third scheme from the second is that the

contents of the urn (the choice set) are altered as a consequence of previous experience.

The key point is not that the choice set changes across trials but that it changes in a way

that depends on previous outcomes of the choice process. To clarify this point, it is useful

to consider a fourth urn scheme that corresponds to models with more general types of

heterogeneity to be introduced more formally below.

46

In this model individuals start out with identical urns, exactly as in the …rst urn scheme.

After each trial, but independent of the outcome of the trial, the contents of each person’s

urn are changed by discarding a randomly selected portion of balls and replacing the

discarded balls with a randomly selected portion of balls and replacing the discarded balls

with a randomly selected group of balls from a larger urn (say, with a very large number

of balls of both colors). Assuming that the individual urns are not completely replenished

on each trial, information about the outcomes of previous trials is useful in forecasting

the outcomes of future trials, although the information from a previous trial declines with

its remoteness in time. As in the second and third urn models, previous outcomes give

information about the contents of each urn. Unlike the second model, the fourth model is

a scheme in which the information depreciates since the contents of the urn are changed in

a random fashion. Unlike in the third model, the contents of the urn do not change as a

consequence of any outcome of the choice process.

In the literature on female labor force participation, models of extreme homogeneity

(corresponding to urn model one) and extreme heterogeneity (corresponding to urn model

two with urns either all red or all black) are presented in a paper by Ben Porath (1973) which

is a comment on Mincer’s model (1962) of female labor supply. Ben Porath notes that cross-

section data on female participation are consistent with either extreme model. Heckman

and Willis (1977) estimate a model of heterogeneity in female labor force participation

47

probabilities that is a probit analogue of urn model two.

Urn model three is of special interest. It is consistent with human capital theory, and

other models that stress the impact of prior work experience on current work choices.

Human capital investment acquired through on the job training may generate structural

state dependence. Fixed costs incurred by labor force entrants may also generate structural

state dependence as a renewal process. So may spell-speci…c human capital. This urn model

is also consistent with psychological choice models in which, as a consequence of receiving a

stimulus of work, women’s preferences are altered so that labor force activity is reinforced.

(Atkinson, Bower, and Crothers 1965).

Panel data can be used to discriminate among these models. For example, an implica-

tion of the second urn model is that the probability that a woman participates does not

change with her labor force experience. An implication of the third model in the general

case is that participation probabilities change with work experience. One method for dis-

criminating between these two models utilizes individual labor force histories of su¢cient

length to estimate the probability of participation in di¤erent subintervals of the life cycle.

If the estimated probabilities for a given woman do not di¤er at di¤erent stages of the life

cycle, there is no evidence of structural state dependence.

Heckman (1981) develops a class of discrete data stochastic processes that generalize the

discrete choice model of McFadden (1973) to a dynamic setting. That set up is su¢ciently

48

general to test among all four urn schemes and present a framework for dynamic discrete

choice. Heckman and Singer (1985) present an explicit generalization of the McFadden

model in which Weibull shocks arrive at Poisson arrival times. Dagsvisk (1994) presents a

generalization to a continuous time setting.

Heckman (1978) shows that it is possible to use runs tests to distinguish between hetero-

geneity and state dependence. The intuition behind the test is that just the total number

of past occurrences of an event, and not the order of occurrence of past events, is relevant

to predicting the current probability of an outcome if a pure heterogeneity explanation is

appropriate. If state dependence matters, then the timing of past events matters. This

insight is the basis for recent work by Honoré and Kyriazidou (1998) in analyzing discrete

data models with lagged dependent variables. It is also used by Chiappori and Heckman

(2000) in distinguishing adverse selection from moral hazard models. Heckman and Bor-

jas (1980) present a related analysis for continuous time duration analysis and apply this

to analyzing the important policy question of whether or not past unemployment causes

future unemployment or whether past unemployment is just a signal of a propensity to be

unemployed. For young men in the U.S., they …nd that the latter story is more appropriate.

The experience of unemployment has no lasting e¤ects on future employment. Heckman

(1981) analyzes female employment, and …nds that for older married women (45-59), past

employment raises future employment even controlling for unobservables. Layard, Jack-

49

man and Nickell (1994) summarize the evidence on true state dependence in unemployment

histories.

Table 2 from Heckman (1981), reveals that heterogeneity and temporal persistence are

important features of the data on the labor force participation of women in the data. The

table records the work history of women over three year periods with “1” denoting working

in the year and “0” not working. If the classical random utility model characterized the

data, all the rows should be roughly equal, which they clearly are not. Those who work

tend to persist in work while those who do not work persist in not working. This is true

even after conditioning on observables. Temporally persistent unobservables are a central

feature of microeconomic data which a¤ect how we estimate models and interpret data.

Early work on this topic accounted for persistent heterogeneity using parametric dis-

tributions for the unobservables.35 An important issue was whether it was possible to

distinguish heterogeneity from state dependence without using parametric structure. Us-

ing analysis of runs tests Heckman, 1978 showed it was possible to test between the two

sources of dependence. The open question was whether it was possible to measure the

relative contributions the way Heckman (1981a,b) had done using parametric structures.

Elbers and Ridder (1982) showed that this was logically possible in the context of a pro-

portional hazard model in duration analysis but did not provide an estimation algorithm.

35For an overview of mixing models in demography, see Sheps and Menken (1973). Lancaster (1979,1990) and Lancaster and Nickell (1980) apply these models to analyze unemployment spells.

50

Heckman and Singer (1984) extend their analysis to develop a consistent estimation proce-

dure. Cameron and Heckman (1998) and Hansen, Heckman and Vytlacil (2000) extend and

generalize this analysis models to discrete time outcomes with both discrete and continuous

outcome variables. A major empirical …nding in the work of Heckman and Singer that has

been replicated in numerous subsequent studies is that distributions for unobservables can

be approximated by …nite mixtures of “types”.

The …eld of discrete dynamic choice has progressed enormously and is an active area of

research. Flinn and Heckman (1982) present the …rst rigorous structural model of discrete

dynamic choice in the context of a search model of unemployment. They investigate the

nonparametric identi…ability of the search model and present a class of identi…cation the-

orems that have also proved useful in the context of identifying auction models. (See, e.g.

Vuong (1989), Paarsch and Donald (1989), Hong (1998)). Later work in dynamic discrete

choice by Pakes (1984), Rust (1986), Eckstein and Wolpin (1989, 1998) and Keane and

Wolpin (1996, 1998) and others has focused more exclusively on computational methods

for parametric models.36 See Rust (1996) for an overview of this work.

7. Treatment E¤ects

As economists explored the sources of identi…cation in structural econometric models

36Related reseach on auctions by Athey and Haile (2000) and others considers nonparametric identi…ca-tion more seriously.

51

that account for heterogeneity and self selection, and the sensitivity of estimates from these

models to alternative identifying assumptions, they began to pursue other approaches to

more focused versions of the questions addressed by structural econometric models. One

response to the di¢culties inherent in structural estimation is to focus on policy evaluation

questions like question 1 of Section 2, and to forego the interpretability, transportability

and comparability of estimates obtained from structural models. This gives rise to the

literature on treatment e¤ects that seeks to evaluate policies in place and not to evaluate

policies never previously experienced or old policies transported to new environments. This

literature trades ambition for credibility. By not linking estimates to tightly speci…ed

economic models, it often forsakes the use of economics to aid in justifying the validity of

alternative identifying assumptions and in interpreting evidence.

The treatment e¤ect literature approaches the problem of policy evaluation in the same

way that biostatisticians approach the problem of evaluating a drug. Outcomes of persons

exposed to a policy are compared to outcomes of those that are not. The analogy is

more than a little strained in the context of evaluating many social policies because in a

modern economy outcomes of persons are linked through markets and other forms of social

interaction. This gives rise to a distinction between those “directly” a¤ected and those only

“indirectly” a¤ected. Thus those who attend college because of a tuition subsidy program

are directly a¤ected. The rest of society is indirectly a¤ected by the cost of taxation to

52

…nance the subsidy and by the e¤ects of an expansion of the stock of education on the

prices of educated and uneducated labor.37

The treatment e¤ect literature investigates a class of policies with partial coverage so

there is a “treatment” and “control” group. It is not helpful in evaluating policies that

have universal participation. It …nesses general equilibrium problems by assuming that the

outcomes of nonparticipants are the same as what they would experience in the absence

of the policy. (Heckman and Smith, 1998). I discuss these general equilibrium questions

further in Section 8.

Models for treatment e¤ects relax many of the controversial assumptions in the struc-

tural econometrics literature and yet can still be used to answer Question 1. To summarize

developments in this literature I draw on my recent work with Vytlacil (1999, 2000a,b). It

is not the most general possible framework for treatment e¤ects, but it is an analytically

convenient one that can be used to unify the literature on treatment e¤ects, identify treat-

ment e¤ects, introduce interpretable economic parameters and link that literature to the

older literature in structural econometrics that is based on index models.

For simplicity, consider models in which there are two potential outcomes (Y0; Y1). Let

D = 1 if a person i receives “treatment” (e.g. goes to college). LetD = 0 denote nonreceipt

(e.g. does not go to college). De…ne Y as a measured outcome variable. It may be discrete

37See Heckman and Smith (1998). Lewis (1963) addresses these general equilibrium e¤ects in the contextof evaluating the impact of unionism on the economy, and introduces the notion of direct and indirect e¤ects.

53

or continuous. Observed outcomes are

(23) Y = DY1 + (1 ¡D)Y0

24(a) Y1 = ¹1(X;U1)

24(b) Y0 = ¹0(X;U0)

and are general nonlinear and nonseparable functions of their arguments. No explicit

distributional assumptions are made about the (U0; U1) and X may be “endogenous” in

the sense of the Cowles econometricians.

To link this set up to the literature on structural microeconometrics, characterize the

decision rule for program participation by an index model:

(25) D¤ = ¹D(Zi) ¡ UD

D = 1 if D¤ ¸ 0

= 0 otherwise

where Z;X is observed and (U1; U0; UD) are unobserved. Z may include the X. This rule

embodies the same choice mechanism as (11), but now it is applied to a more general

setting.

As previously noted, the early literature in structural econometrics made distributional

independence and functional form assumptions like (A-2) and (A-3) with ¹1(X) and ¹0(X)

assumed to be linear inX. The recent structural literature seeks to relax these assumptions

but retains the objective of estimating at least some structural parameters to identify im-

54

portant economic parameters and answer certain counterfactual questions. The treatment

e¤ect literature is more radical. It seeks to address a narrow class of policy evaluation

questions without identifying any structural parameter. In this literature Question 1 is

completely detached from Question 2.

In a cross sectional setting, it is usually impossible to know both Y0 and Y1 for anyone.38

We encounter the same type of sampling problem as depicted in Figure 1 where we can

determine F (Y1 j X;D = 1) and F (Y0 j X;D = 0) but not F (Y1 j X) or F (Y0 j X). Hence

attention focuses on certain average e¤ects. A variety of di¤erent treatment parameters

have been identi…ed in this literature. These parameters are all the same if responses to

“treatment” are either identical for people with the same observables or if ex ante responses

are the same. These parameters rarely answer interesting economic questions, a point to

which I return below.

One intuitive parameter widely used in biostatistics is the average treatment e¤ect

conditional on X :

(P-1) ATE(X) = E(Y1 ¡ Y0 j X):

This is the e¤ect of picking someone out of the population at random conditional on X,

and examining the e¤ect of treatment. Because of purposive self selection, this parameter

38In stationary panel data setting, these may be observed by following the same persons over time. SeeHeckman and Smith (1998).

55

rarely corresponds to any economic policies in place. Another parameter that is more often

estimated is the e¤ect of treatment on the treated:

(P-2) TT (X;D = 1) = E(Y1 ¡ Y0 j X;D = 1):

This is the e¤ect of treatment on those who receive it compared to what they would

experience without treatment.

The marginal treatment e¤ect, introduced into this literature by Anders Björklund

and Robert Mo¢tt (1987)

(P-3) MTE(X;UD = uD) = E(Y1 ¡ Y0 j X;UD = uD);

is the average gain to participation in the program for people at the margin of indi¤erence

UD = uD of participation in it. It is the average willingness to pay for Y1 (compared to

Y0) of people using characteristics X and unobservable UD. Under conditions speci…ed in

Heckman and Vytlacil (1999, 2000a,b), the Björklund-Mo¢tt parameter can be used to

generate all of the treatment parameters in the literature, including a new class of policy-

relevant treatment parameters introduced in Heckman and Vytlacil (2001), and de…ned

below.

To simplify the analysis, suppose that we can write 23(a) and 23(b) in additively sepa-

rable form as

25(a) Y1 = ¹1(X) + U1

56

and

25(b) Y0 = ¹0(X) + U0:

The U1 and U0 are left freely speci…ed. One way to obtain this representation is to work with

24(a) and 24(b) directly and interpret ¹1(X) and ¹0(X) as the conditional expectations of

Y1 and Y0 givenX, and reinterpret the errors as the residuals from conditional expectations.

Then in this setting

ATE(X = x) = ¹1(x) ¡ ¹0(x)

TT (X = x;D = 1) = ¹1(x) ¡ ¹0(x) + E(U1 ¡ U0 j X = x;D = 1)

MTE(X = x; UD = uD) = ¹1(x) ¡ ¹0(x) + E(U1 ¡ U0 j X = x; UD = uD):

Then if U1 = U0 (so there is no heterogeneity among agents conditional on X) or if (U1 ¡

U0)??UD which is equivalent to (U1 ¡U0)??D, then all of these parameters are the same.39

In addition, if we assume that (U0; U1)??X these are structural parameters as de…ned in

conventional econometrics. Appendix (A-2) presents expressions for these parameters in

the context of the classical normal selection model.

The distinctive features of the treatment e¤ect literature are that (1) (U1; U0) ?=? X,

(2) no attempt is made to identify the ¹1 or ¹0 functions directly, and (3) no distributional

assumptions are made. Parameters (P-1) - (P-3) describe a program in place for a particular

population, not the wider class of policy evaluation questions considered in Question 2.

39Strictly speaking, mean independence is all that is required.

57

Under the following conditions, Heckman and Vytlacil (2000) establish that it is possible

to recover these parameters under suitable conditions on the support of the latent variables:

(A-4) ¹D(Z) is a nondegenerate random variable conditional on X;

(A-5) UD is absolutely continuous with respect to Lebesgue measure;

(A-6) (U0; U1; UD) is independent of Z conditional on X; these random variables are depen-

dent on X.

(A-7) Y1 and Y0 have …nite …rst moments;

and

(A-8) 1 > Pr(D = 1 j X = x) > 0 for every x 2 Supp(X).

Assumptions (A-4) and (A-6) are “instrumental variable” assumptions that there is

at least one instrument that determines participation in the program but not outcomes.

Assumptions (A-5) and (A-7) are technical assumptions made primarily to simplify the

analysis. Assumption (A-8) is the assumption in the population of both a treatment and

a control group for each X:

There are no Cowles Commission exogeneity requirements concerning X. A counterfac-

tual “no feedback” condition corresponding to the classical superexogeneity assumption of

structural econometrics is required for interpretability so that conditioning on X does not

58

mask the e¤ects of D. Letting Xd denote a value of X if D set to d, a su¢cient condition

that rules out feedback from D to X is:

(A-9) X1 = X0 a.e.

This condition is not strictly required to formulate an evaluation model, but it enables an

analyst who conditions on X to capture the “full e¤ect” of X.40

De…ne P (z) as the probability of receiving treatment: P (z) ´ Pr(D = 1 j Z = z) =

FUD(¹D(z)): Without loss of generality, we may write UD » Unif[0,1] so ¹D(z) = P (z):

(Das and Newey, 1998). If D¤ = º(z)¡V , we can always reparameterize the model so that

¹D(z) = FV (º(z)) and UD = FV (V ):

A simple regression representation of this model serves to illustrate what is familiar and

what is distinctive about it. Substituting 25(a) and 25(b) into (23) we obtain the following

expression:

Y = Y0 +D(Y1 ¡ Y0)

De…ne ¢ = ¹1(x) ¡ ¹0(x) + U1 ¡ U0

(27) Y = ¹0(X) + ¢D + U0

40Heckman, Ichimura, Smith and Todd (1998) develop a version of this condition for a Roy Model. Seealso Heckman, LaLonde and Smith (1999). These conditions are discussed extensively in Heckman andVytlacil (2001).

59

where the coe¢cient on D is a random coe¢cient that may be correlated with ¢. Such

dependence is produced if UD depends on (U1 ¡ U0). Let ¹¢ = E(¢ j X) = ATE and

¢ ¡ ¹¢ = (U1 ¡ U0): We may write

(28) Y = ¹0(x) + ¹¢D + (U1 ¡ U0)D + U0.

It is useful to distinguish three cases. The …rst case arises when responses to treatment

are the same for everyone given X.

(C-1) U1 = U0 and ¢ is a constant.

In this case MTE does not depend on UD and E(¢ j X;D = 1) = E(¢ j X) = ¢:

The second case arises when responses to treatment vary among people, but decisions

to take treatment are not based on these variable responses:

(C-2) U1 6= U0;

¢ varies among people but E(¢ j X = x;UD = uD) = ¹1(x)¡¹0(x) does not depend

on UD:

Then E(¢ j X;D = 1) = E(¢ j X): In this case returns to participation vary ex post

but are the same on average for all persons with the same values of X. Like the case with

a common coe¢cient, marginal and average e¤ects coincide. The third case arises when

responses to treatment vary among people and decisions to take treatment are based on

this variation:

60

(C-3) U1 6= U0;

¢ varies among people and

E(¢ j X = x; UD = uD) = ¹1(x) ¡ ¹0(x) + E(U1 ¡ U0 j X = x;UD = uD)

depends on UD.

In this case E(¢ j X = x;D = 1) 6= E(¢ j X = x) and people sort into treatment status

based at least in part on unobserved gains. More succinctly, the three cases are

(C-1) ¢ is the same for everyone, (U1¡U0) = 0. ATE(X) = TT (X) =MTE(X) = ¢ = ¹¢:

(C-2) ¢ varies and E(¢ j X = x;D = d) = E(¢ j X).

There is no sorting into the program on ex post gains from it and

E(U1 ¡ U0 j D = 1; X) = 0

ATE(X) = TT (X) =MTE(X) = E(¯ j X) = ¹¢

(C-3) ¢ varies among people,

E(U1 ¡ U0 j X = x;D = d) 6= E(U1 ¡ U0 j X = x)

E(U1 ¡ U0 j D = 1;X = x) 6= 0:

There is sorting into the program based in part on the ex post unobserved gains from it.

61

ATE(X) 6= TT (X);ATE(X) 6=MTE(X);TT (X) 6= ATE(X)

ATE(X) = ¹¢

TT (X) = ¹¢ + E(U1 ¡ U0 j X;D = 1)

MTE(X) = ¹¢ + E(U1 ¡ U0 j X;UD = uD):

As a consequence of heterogeneity in response to treatment on which agents act, there

is no single “e¤ect” of treatment but, rather, a variety of e¤ects. This represents a radical

departure from the policy invariant structural parameters that are the hallmark of Cowles

econometrics. The unity and simplicity of the structural literature in producing parameters

that can be transported and compared across economic environments appears to be lost

in the treatment literature on treatment e¤ects. In addition, it is not clear what precise

economic questions these parameters answer, or why economists should be interested in

them.

(ii) Relationships Among the Parameters

Fortunately, …rst impressions are false. Some uni…cation is possible and some compara-

bility of parameters across environments is possible if assumptions (A-4) - (A-8) characterize

the model. As a consequence of the index structure, all of the treatment parameters can be

generated from MTE, the marginal willingness to pay function so that the various treat-

ment parameters can be given a well de…ned economic interpretation.41 Let ¢ = Y1¡Y0;the

41These relationships were …rst presented in Heckman and Vytlacil (1998a,b, 1999 and 2000).

62

individual level treatment e¤ect. Then

ATE(x) =Z 1

0MTE(X = x;UD = uD)duD

TT (x; P (z);D = 1) =1P (z)

Z P (z)

0MTE(X = x;UD = uD)duD

LATE(x; uD; u0D) =

"Z uD

u0D

MTE(X = x; UD = uD)du

#1

P (z) ¡ P (z0) :

TT (x;D = 1) =Z 1

0MTE(X = x; UD = uD)gx(u)duD:42

(iii) Identifying the Parameters: The MTE As A Unifying Device

Recovering these evaluation parameters from data requires fewer assumptions than are

required to recover structural parameters. But there are still nontrivial problems, that take

us back to the selection problem. Focusing on means, we observe

E(Y j X = x;D = 1) = E(Y1 j X = x;D = 1)

(e.g. earnings of college graduates) and

E(Y j X = x;D = 0) = E(Y0 j X = x;D = 0):

42gx(u) =1 ¡ FP (Z)jX(ujx)

Z 1

0(1 ¡ FP (Z)jX(tjx))dt

=SP (Z)(ujx)

E(P jx)

63

(e.g. earnings of high school graduates).

Using the earnings of high school graduates to proxy what college graduates would earn

is problematic. Comparing the mean earnings of the two groups:

E(Y1 j X = x;D = 1) ¡ E(Y0 j X = x;D = 0) =

E(Y1 ¡ Y0 j X = x;D = 1)| {z }TT(X)

+ E(Y0 j X = x;D = 1) ¡E(Y0 j X = x;D = 0):| {z }Selection Bias

The second term is the di¤erence between what college graduates would earn if they were

high school graduates and what high school graduates would earn.

If there were no unobservables, or if fortuitously conditioning on X eliminated di¤er-

ences in unobservables, as is assumed by statisticians who advocate the method of matching,

then the selection bias term vanishes. Yet the poor …t of most microdata equations suggests

that the assumption of no observables is unacceptable. Reliance on matching is an act of

faith. The classical model of ability bias assumes that the selection bias is positive, and

the Roy model is inconsistent with matching. (See Heckman and Vytlacil, 2001).

Consider applying the method of instrumental variables. Economists have been using

instrumental variables (IV) for over 70 years.43 The intuition supporting the instrumental

variables methods is widely understood. It mimics experimental variation by using instru-

mental variable variation. In regression model (C-1), IV is a solution to the problem that

43See the history in Morgan (1990). The earliest use was by Wright (1927).

64

D is correlated with the error term U0.

Consider using P (z) as an instrument for D. Suppose that we seek to estimate the

parameter treatment on the treated (TT ) in case (C-3). Suppress the explicit dependence

on X to simplify the notation. Recall that under assumptions (A-4)-(A-8) this parameter

is

(28) TT = E(¢ j Z;D = 1) = E(¢ j P (z) ¸ UD)

= ¹1 ¡ ¹0 + E(U1 ¡ U0 j P (z) ¸ UD)

= ¹¢+ E(U1 ¡ U0 j P (z) ¸ UD):

In case (C-1) and (C-2) the …nal term in the last two expressions is zero. From (28), we

may write

(29) E(Y j Z = z) = ¹0 + [ ¹¢ + E(U1 ¡ U0 j P (z) ¸ UD)]P (z):

If E(U1 ¡ U0 j P (z) ¸ UD) = 0, as in case (C-1) or (C-2), then we may apply the logic

of instrumental variables to write for two values of z; z0 such that P (z) 6= P (z0)

E(Y j Z = z) = ¹0 + [ ¹¢ + E(U1 ¡ U0 j P (z) ¸ UD]P (z)

E(Y j Z = z0) = ¹0 + [ ¹¢ + E(U1 ¡ U0 j P (z0) ¸ UD]P (z0)

and subtract the bottom equation from the top to obtain

(30)E(Y j Z = z) ¡ E(Y j Z = z0)

P (z) ¡ P (z0) = ¹¢

+E(U1 ¡ U0 j P (z) ¸ UD)P (z) ¡ E(U1 ¡ U0) j P (z0) ¸ UD)P (z0)

P (z) ¡ P (z0) :

65

If E(U1 ¡ U0 j P (z) ¸ UD) = 0, IV identi…es “the” e¤ect of treatment ¹¢, and all mean

treatment parameters are the same. In the more general case, the method only identi…es

the combination of parameters in (30).

In case (C-1) and (C-2), P (z) enters (29) linearly, as the variable multiplying the term

in parentheses. In case (C-3), P (z) enters the expression in two places. Thus a test of

whether (C-1) or (C-2) is valid is a test of the linearity of (29) in terms of P (z). This is

also a test of the validity of the standard IV procedure for identifying ¹¢: (See Figure 6).

Even if the test is failed, it is possible to extract important information from (29). To

show this, write (29) in equivalent form:

E(Y j Z = z) = ¹0 + ¹¢P (z)

+1Z

¡1

P (z)Z

0

(U1 ¡ U0)f(U1 ¡ U0 j UD = uD)duDd(U1 ¡ U0):

Di¤erentiate with respect to P (z) to obtain

@E(Y j Z = z)@P (z)

=

¹¢ +1Z

¡1

(U1 ¡ U0)f(U1 ¡ U0 j UD = P (z))d(U1 ¡ U0) =MTE:

From knowledge of theMTE, we can recover all of the treatment parameters by integrating

up MTE so identi…ed using the relationships given in subsection (ii). From the derivative

of the E(Y j P = p) function with respect to p we can recover all of the parameters of the

model. This is the method of local instrumental variables introduced into this literature

by Heckman and Vytlacil (1999, 2000, 2001).

66

Using the MTE function we can organize all of the econometric estimators in the

evaluation literature on the basis of whether or not they allow for selection into the program

evaluated on the basis of unobservable gains and are inconsistent with selection models.

See Figure 7. Matching and conventional IV estimators assume no selection on the gains.

Selection models, LATE and LIV allow for selection unobservables gains and hence allow

for the Roy model to characterize choices:

(iv) Policy Relevant Treatment Parameters

The conventional treatment parameters de…ned in section (ii) are justi…ed on intuitive

grounds. The link to cost bene…t analysis and interpretable economic frameworks is ob-

scure. Heckman (1997), Heckman and Smith (1998) and Heckman and Vytlacil (2000)

develop the relationship between these parameters and the requirements of cost bene…t

analysis.

Ignoring general equilibrium e¤ects, TT is one ingredient for determining whether or

not a given program should be shut down or retained. It tells us whether the persons

participating in a program bene…t in gross terms. It is necessary to account for costs to

conduct a proper cost bene…t analysis. MTE estimates the gross gain from a marginal

expansion of a program and must be compared with marginal cost to be used as an evalu-

ation parameter. ATE considers whether a person selected at random would bene…t from

participation in a program. It is di¢cult to justify this parameter as being relevant for

67

cost bene…t analysis except when MTE is constant in UD so that ex ante gains from the

program are identical for all people with the same X.

A more direct approach to de…ning parameters is to postulate the policy question or

decision problem of interest and to derive the parameter that answers it. (Heckman and

Vytlacil, 2001). Taking this approach does not in general produce the conventional treat-

ment parameters. Consider a class of policies that a¤ect P , the probability of participation

in a program, but do not directly a¤ectMTE. An example in the context of analyzing the

returns to schooling would be policies that change tuition or distance to school but do not

directly a¤ect the gross returns to schooling.44 De…ne P as the baseline probability with

density fP . De…ne P ¤ as the probability produced under an alternative policy regime with

density fP ¤: Comparing the policies using mean outcomes, we obtain

31(a) E(Y junder policy¤)¡E(Yjunder baseline) =Z 1

0!(u)MTE(u)du

where the policy weights are

31(b) !(u) = FP (u) ¡ FP¤(u)

where FP (u) is the distribution of P evaluated at u. As demonstrated in Heckman and

Vytlacil (2000a,b), conventional linear IV weights MTE(u) di¤erently and so usually does

not recover the parameter of interest. The ATE weight is 1 and the TT weight is given in

footnote 37. Figure 8 plots the weights for ATE; TT and the shape for conventional linear

44Recall we abstract from general equilibrium e¤ects in this paper.

68

IV when P (z) is an instrument.

TheMTE plays the role of a policy invariant functional analogous to the role of a policy

invariant parameter in structural econometrics. A better approach to policy evaluation than

mechanically estimating conventional treatment parameters and hoping that they estimate

interesting economic questions is to estimateMTE and weight it by the appropriate weight

determined by how the policy changes the distribution of P . An alternative approach is

to produce a policy weighted instrument based on a speci…c choice for !(u): Heckman and

Vytlacil (2000, 2001a,b) develop both approaches.

(v) An Example

Figure 8, taken from the research of Carneiro, Heckman and Vytlacil (2000), plots

the estimated MTE as a function of ud for the returns to education for a sample of white

males in the United States in the late 1980s and early 1990s. Also plotted are the weighting

functions for MTE that are implicit in de…ning TT and ATE and in using conventional

linear IV with P (z) as an instrument to estimate “the” e¤ect of schooling on education.

The fact that the MTE is rising means that conventional methods of matching and linear

IV do not identify TT or ATE.

Figure 9, also taken from Carneiro, Heckman and Vytlacil (2000), considers the policy

weight of MTE for three di¤erent policies de…ned at the base of the …gure. The number

produced by IV weights MTE close to the weight required to evaluate policy III but it

69

far o¤ the mark in evaluating policies I and II. The agreement between the IV weights

for the MTE and the policy III weights is fortuitous and cannot be counted on. Social

experiments that randomize people out of a program at the point where they apply and are

accepted into it estimate TT under full compliance. (See Heckman, LaLonde and Smith,

1999; Heckman and Vytlacil, 2001). Such experiments do not accurately evaluate any of

three policies considered in Figure 9.

(vi) Extrapolating To New Populations

If we seek to apply the estimated MTE to new populations, where the dependence

between X and U is di¤erent from that in the sample used to estimate it, or where the

support of (X;UD) is di¤erent from that used in the estimation sample, it is necessary

to make the same types of independence and support assumptions discussed in Section 2.

This point has been recognized by econometricians since the work of Haavelmo (1943) and

Marschak (1953). The treatment e¤ect literature avoids structural assumptions by evading

the questions addressed by structural econometrics.

(vii) Evidence From The Literature on Treatment E¤ects

Despite its limitations, the literature on treatment e¤ects has produced a large number

of important studies of economic policies. The literature is too vast to summarize in this

lecture.

70

I brie‡y sketch some of the main empirical results from the literature on evaluating

active labor market policies, drawing on my survey of this …eld written with LaLonde and

Smith (Heckman, LaLonde and Smith, 1999). Looking across countries and over time, most

short term job training and labor market intervention schemes are ine¤ective in promoting

long term wage growth and employment. Application of very basic empirical methods shows

that comparing post program outcomes of trainees with preprogram outcome measures - the

common practice in many evaluations - dramatically overstates the gains from participation

in training. This occurs because trainees typically experience a decline in employment

prior to entering training and training is a form of job search.45 Much of post program

improvement in outcome measures would have occurred in the absence of the program.

Basic principles, such as using a comparison group of nontrainees that is comparable

in terms of geographic location, labor market histories and questionnaires administered to

samples of trainees, goes a long way toward eliminating selection bias in evaluating social

programs. However, such matching does not eliminate all of the bias, and accounting for

selection unobservables is important in getting accurate estimates of the commonly used

“treatment on the treated” parameter.46 (See Heckman, Ichimura, Smith and Todd, 1996,

45Ashenfelter (1978) documented that trainees su¤er a decline in earnings prior to enrollment in the pro-gram. Heckman, Ichimura, Smith and Todd (1998), Heckman and Smith (1998) and Heckman, LaLondeand Smith (1999) demonstrate that it is employment dynamics, not earnings dynamic that a¤ects enroll-ment into the program.

46An in‡uential study of LaLonde (1986) cast doubt on the ability of nonexperimental econometricmethods to evaluate social programs. Subsequent analysis by Heckman, Ichimura, Smith and Todd (1996,

71

1998, and Heckman, LaLonde and Smith, 1999).

8. Uniting Macro and Microeconometrics:

General Equilibrium Policy Evaluation

Microeconomic data have greatly enriched our understanding of the diversity and het-

erogeneity in modern economies. Microeconometrics has played an important role in orga-

nizing and interpreting the wealth of microeconomic data produced in the past 50 years.

The …ndings from macro research have begun to in‡uence the development of macroeco-

nomic theory which is slowing weaning itself from the representative agent paradigm. I

have already discussed how recognition of choices at the extensive margin have altered

macroeconomic discussions of the labor market. Numerous other examples could be pre-

sented of how …ndings from macro data have a¤ected the development of micro theory.

(See the discussion in Browning, Hansen and Heckman, 1999).

Microeconometric methods have been developed for evaluating a host of social programs

put in place by the modern welfare state. Yet the scope of microeconometrics alone is

necessarily limited. Many programs, like the tuition subsidy programs discussed in Section

1998) demonstrated that LaLonde’s …ndings are generated by comparing incomparable people. His traineeslive in di¤erent labor markets, are administered di¤erent questionairres and have di¤erent X values thanhis comparison group. Making trainees comparable to nontrainees eliminates some but not all of LaLonde’sbias. See important papers by Smith and Todd (2000, 2001) who reanalyze LaLonde’s data and demonstratethat matching methods do not eliminate empirically important components of LaLonde’s selection bias.For problems with social experiments see Heckman (1992), Heckman and Smith (1993, 1995), Manski,(1996), Heckman, LaLonde and Smith (1999).

72

7, are national in character and likely have general equilibrium e¤ects. Expanding the stock

of educated people is likely to reduce the return to educated labor. Reducing taxes on labor

expands the supply of labor and reduce its return. Partial equilibrium methods can only

go so far in evaluating the full impacts of large scale public programs. The treatment e¤ect

methodology is completely ine¤ective in analyzing programs with universal coverage.

A synthesis of macro and micro approaches is required to analyze policies instituted

at the national level. Cross sectional variation cannot identify the e¤ects of prices and

interest rates that are common across persons. Yet accounting for the feedback of capital

markets on human capital investment decisions is essential in investigating the full e¤ects

of national policies on skill formation. Moreover, cross sectional variation in wages, prices,

or interest rates are unlikely to be exogenous and this raises separate issues.

This synthesis has just begun but the …rst results from this research program are promis-

ing. The goal of this line of research is to develop an empirically grounded general equi-

librium theory that will improve on calibration as a source of estimates for the parameters

of general equilibrium models, and that will provide a rigorous empirical and theoretical

foundation for evaluating large scale social programs like educational subsidies that alter

prices and programs, like social security reforms, that have universal coverage.

In Heckman, Lochner and Taber (1998a,b,c, 1999) present a prototype for such a syn-

thesis of micro and macro data. We formulate and estimate a perfect foresight, dynamic

73

general equilibrium model of skill formation that generalizes the Roy model to a full dy-

namic setting with endogenous skill accumulation. The model generalizes the framework

of Auerbach and Kotliko¤ (1987) to account for endogenous skill formation. This model

accounts for heterogeneity and self-selection in the labor market, and explains rising wage

inequality in the U.S over the past two decades. The results from this analysis suggest

that failure to account for general equilibrium e¤ects in the fashion of the treatment e¤ect

literature overstates, by an order of magnitude, the e¤ects of tuition reductions on college

enrollment.47

9. Summary and Conclusions

In the past half century, economics has been enriched by an infusion of vast new re-

sources of microeconomic data. These data have opened the eyes of the profession to the

diversity and heterogeneity of economic life. They have enabled economists to understand

the nature of many important social problems.

At the same time, these data have challenged traditional econometric methodologies.

Problems that appear to be unimportant in examining aggregate averages become central

in analyzing micro data.

47Studies by Calmfors (1994), Davidson and Woodbury (1993, 1994) and Forsund and Krueger (1998)demonstrate the importance of accounting for displacement in evaluating various active labor marketpolicies. See the survey in Heckman, LaLonde and Smith (1999).

74

These problems, and the policy concerns that gave rise to the extensive collection of

microeconomic data, gave birth to modern microeconometrics. This …eld unites economics

and statistics to produce interpretable summaries of microdata, to test theories and to

address basic counterfactual economic policy questions. The important contributions of

this …eld have been to clarify which questions can be settled by the available data and

which data are required to settle basic policy questions.

75

Appendix A-1

Bounding and Sensitivity Analysis

Starting from (20) or its version for conditional means, the papers by Smith and Welch,

(1986), Holland (1986) and Glynn, Laird and Rubin (1986) characterize the selection prob-

lem more generally without the index structure, and o¤er Bayesian and classical methods

for performing sensitivity analyses.

Selection on observables (Barnow, Cain and Goldberger, 1980, Heckman and Robb,

1985) solves this problem of selection by assuming that Y1??D j X so F (Y1 j X;D =

1) = F (Y1 j X). This is the assumption that drives matching models. It is inconsistent

with the Roy model of self selection (Heckman and Vytlacil, 2001a,b). More generally,

F (Y j X;D = 0) is not identi…ed in the population.

Various approaches to bounding this distribution, or moments of the distribution, have

been proposed in the literature. To illustrate these ideas in the simplest possible setting,

let g(Y1 j X;D = 1) be the density of outcomes (e.g. wages) for persons who work (D = 1

corresponds to work) and we have censored samples.

Missing is g(Y j X;D = 0) e.g. the distribution of the wages of non-workers. In order

to estimate E(Y1 j X); Smith and Welch (1986) use the law of iterated expectations to

1

obtain

E(Y1 j Z) = E(Y1 j X;D = 1)Pr(D = 1 j X) + E(Y1 j X;D = 0)Pr(D = 0 j X):

To estimate the left hand side of this expression, it is necessary to obtain information on the

missing component E(Y1 j X;D = 0). Smith and Welch propose and implement bounds

on E(Y1 j X;D = 0) e.g.

YL · E(Y1 j X;D = 0; Z) · Y U

where Y U is an upper bound and YL is a lower bound.1 Using this information, they

construct the bounds

E(Y1 j X;D = 1)Pr(D = 1 j X) + YL Pr(D = 0 j X) · E(Y1 j X)

· E(Y1 j X;D = 1)Pr(D = 1 j X) + Y U Pr(D = 0 j X):

By doing a sensitivity analysis, they produce a range of values for E(Y j X) that are

explicitly dependent on the range of values assumed for E(Y j X;D = 0). Later work by

Manski (1989, 1990, 1994, 1995), Horowitz and Manski (1995) and Robins (1989) develop

this type of analysis more systematically for a variety of models.

Glynn, Laird, and Rubin (1986) present a sensitivity analysis for distributions using

Bayesian methods under a variety of di¤erent assumptions about G(Y1 j X;D = 0) to

1In their problem there are plausible ranges of wages which dropouts can earn.

2

determine a range of values of G(Y j X). Holland (1986) proposes a more classical sensi-

tivity analyses that vary the ranges of parameters of models. Rosenbaum (1995) discusses

a variety of sensitivity analyses.

The objective of these analyses of bounds and the Bayesian and classical sensitivity

analyses is to clearly separate what is known from what is conjectured about the data, and

to explore the sensitivity of reported estimates to the assumptions used to secure them.

Manski (1990, 1994) and Heckman and Vytlacil (2000) demonstrate the extra restrictions

that come from using index models to produce bounds on outcomes.

Much of the theoretical analysis presented in the recent literature is nonparametric al-

though in practice, much practical experience in statistics and econometrics demonstrates

that high-dimensional nonparametric estimation is not feasible for most sample sizes avail-

able in cross sectional econometrics. Some form of structure must be imposed to get any

reliable nonparametric estimates. However, feasible parametric versions of these methods

run the risk of imposing false parametric structure.2

2The methods used in the bounding literature depend critically on the choice of conditioning variablesX. In principle, all possible choices of the conditioning variables should be explored especially in computingbounds for all possible models, although in practice this is never done. If it were, the range of estimatesproduced by the bounds would be substantially larger than the wide bounds already reported in thisliterature.

3

Appendix A-2

Representations Within The Classical Normal Model

For the classical normal model of selection bias, these treatment parameters may be

written as

MTE(X; vD) = ¹1(x) ¡ ¹0(x) ¡·¾1V ¡ ¾0V¾V V

¸©¡1(Vd)

where © is the cdf of the normal distribution and ©¡1(Vd) = Ud. For a linear index

structure, D¤ = Z´ ¡ Vd;

MTE(X; vD = Z´) = ¹1(x) ¡ ¹0(x) ¡·¾1V ¡ ¾0V¾V V

¸Z´:

ATE is the di¤erence in means: ¢ATE(x) = ¹1(x) ¡ ¹0(x): Using a control function

(Heckman and Robb, 1985, 1986)

E(U1 ¡ U0 j P (z) > UD) = K(P (z)) =(¾1V ¡ ¾0V )¾V V

exp

(¡12(©¡1(P (z))2)

)

P (z) ;

we may write treatment on the treated given P (z) as

= ¹1(x) ¡ ¹0(x) +K(p(z)):

4

References

Abramazar and Schmidt (1982),Ahn and Powell (1993),Altug and Miller (1998),Altug and Miller (1990),Amemiya, Takeshi (1984), “Tobit Models: A Survey,” Journal of Econometrics, 24:3-61.Amemiya, Takeshi (1985), Advanced Econometrics, (Cambridge, MA: Harvard Univer-

sity Press).Amman, Hans, David Kendrick and John Rust (1996), Handbook of Computational

Economics: Volume 1, (Amsterdam: North Holland).Andrews, D. and M. Schafgans (1996), “Semiparametric Estimation of a Sample Selec-

tion Model,” Cowles Foundation Discussion Paper No. 1119 (Yale University).Ashenfelter, Orley (1978), “Estimating the E¤ect of Training Programs on Earnings,”

Review of Economics and Statistics 6(1):47-57.Auerbach, Allen and Lawrence Kotliko¤ (1987), Dynamic Fiscal Policy, (Cambridge:

Cambridge University Press).Balke, A. (1995), “Probabilistic Counterfactuals: Semantics, Computation, and Appli-

cations,” Technical Report R-242 (UCLA).Balke, A. and J. Pearl (1993), “Nonparametric Bounds on Causal E¤ects from Partial

Compliance Data,” Technical Report R-199 (UCLA).Balke, Alexander and Judea Pearl (1989), “Bounds on Treatment E¤ects from Studies

with Imperfect Compliance,” Journal of the American Statistical Association XCII:1171-1176.

Barnow, B., G. Cain, and A. Goldberger (1980), “Issues In “The Analysis of Selectiv-ity Bias,” in: E. Stromsdorfer and G. Farkas, eds., Evaluation Studies, Volume 5 (SagePublications, Beverly Hills, CA), 290-317.

Barros, Ricardo Paes (1988), “Minimal Just-Identifying Assumptions in NonparametricSelection Models,” unpublished manuscript, (Rio de Janeiro: IPEA).

Ben Porath (1973),Bera, A., C. Jarque and L. Lee (1984), “Testing the Normality Assumption in Limited

Dependent Variable Models,” International Economic Review 25(3):563-578.Billingsley, Patrick (1986), Probability and Measure, Second Edition, (New York: John

Wiley).Bishop, Y., S. Fienberg and P. Holland (1975), Discrete Multivariate Analysis, (Cam-

bridge: MIT Press).Björklund, Anders (1989), “Evaluations of Training Programs: Experiences and Sug-

gestions for Future Research,” Discussion Paper 89 (Wissenschaftszentrum, Berlin).

76

Björklund, Anders (1994), “Evaluations of Swedish Labor Market Policy,” InternationalJournal of Manpower 15(5, part 2):16-31.

Björklund, Anders and Robert Mo¢tt (1987), “Estimation of Wage Gains and Welfaregains in Self-Selection Models,” Review of Economics and Statistics 69(1):42-49.

Björklund, A. and H. Regner (1996), “Experimental Evaluation of European LabourMarket Policy,” in: G. Schmid, J. O’Reilly, and K. Schumann, eds., International Handbookof Labour Market Policy and Evaluation, (Edward Elgar, Aldershot, UK). 89-114.

Blundell, Richard and Richard J. Smith (1994), “Coherency and Estimation in Simulta-neous Models with Censored or Qualitative Dependent Variables,” Journal of Econometrics64:355-373.

Blundell, Richard, Alan Duncan, and Costas Meghir (1998), “Estimating Labor SupplyResponse Using Tax Reforms,” Econometrica LXVI:827-862.

Bock, R.D. and R. Jones (1968), The Measurement and Prediction of Judgment andChoice, (San Francisco: Holden-Day).

Bonnal, L., D. Fougere, and A. Serandon (1997), “Evaluating the Impact of FrenchEmployment Policies on Individual Labour Market Histories,” Review of Economic Studies64(4):683-713.

Borjas, G. and J. Heckman (1980), “Does Unemployment Cause Future Unemployment?De…nitions, Questions and Answers From a Continuous Time Model of Heterogeneity andState Dependence,” Economica 47(187):247-283.

Bound, J., C. Brown, G. Duncan, and W. Rodgers (1994), “Evidence on the Validityof Cross-Sectional and Longitudinal Labor Market Data,” Journal of Labor Economics12(3):345-368.

Browning, Martin, Lars Hansen, and James Heckman (1999), “Micro Data and GeneralEquilibrium Models,” in J. B. Taylor and M. Woodford, eds., Handbook of Macroeconomics:Volume 1, Chapter 8, 543-633.

Burtless, Gary and Jerry Hausman (1978), “The E¤ect of Taxation on Labor Supply:Evaluating the Gary Negative Income Tax Experiment,” Journal of Political Economy86(6):1103-1131.

Butler, R. and J. Heckman (1977), “Government’s Impact on the Labor Market Status ofBlack Americans: A Critical Review,” in: Equal Rights and Industrial Relations (IndustrialRelations Research Association, Madison, WI) 235-281.

Byron, R. and Anil K. Bera (1983), “Least Squares Approximations to Unknown Re-gression Functions, A Comment,” International Economic Review 24(1):255-260.

Cain, G. (1975), “Regression and Selection Models to Improve Nonexperimental Com-parisons,” in: C. Bennett and A. Lumsdaine, eds., Evaluation and Experiment, (AcademicPress, New York), 297-317.

77

Cain, Glenn and Harold Watts (1973), “Summary and Overview,” in Glenn Cain andHarold Watts, eds., Income Maintenance and Labor Supply, (Chicago: Markham).

Calmfors, L. (1994), “Active Labor Market Policy and Unemployment - A FrameworkFor The Analysis of Crucial Design Features,” OECD Economic Studies 22(1):7-47.

Cameron, S. and J. Heckman (1998), “Life Cycle Schooling and Dynamic Selection Bias:Models and Evidence For Five Cohorts of American Males,” Journal of Political Economy106(2):262-333.

Campbell, D. and J. Stanley (1963), “Experimental and Quasi-Experimental Designsfor Research on Teaching,” in: N. Gage, ed., Handbook of research on teaching, (RandMcNally, Chicago, IL). 171-246.

Campbell, D. and J. Stanley (1966), Experimental and Quasi-Experimental Designs ForResearch, (Rand McNally, Chicago, IL).

Card, David (1999), “The Causal E¤ect of Education on Earnings,” in Orley Ashenfelterand David Card, eds., Handbook of Labor Economics, Volume 3, (Amsterdam: NorthHolland).

Chamberlain, Gary (1984), “Panel Data” in Z. Griliches and M. Intriligator, eds, Hand-book of Econometrics, (North-Holland, Amsterdam, Netherlands), 1248-1318.

Chiappori, Andre and J. Heckman (2000),Cochran, W. and D. Rubin (1973), “Controlling Bias in Observational Studies,” Sankyha

35:417-446.Cogan, J. (1981), “Fixed Costs and Labor Supply,” Econometrica 49(4):945-963.Coleman, Thomas (1984), “Two Essays on the Labor Market,” University of California,

unpublished Ph.D. Dissertation.Coleman, Thomas (1981), “Dynamic Models of Labor Supply,” University of Chicago,

unpublished manuscript.Cosslett, Steven. (1983), “Distribution-Free Maximum Likelihood Estimator of the

Binary Choice Model,” Econometrica 51(3):765-782.Cosslett, Steven (1981), “Maximum Likelihood Estimation From Choice Based Sam-

ples,” Econometrica.Cox, David Roxbee (1959), The Design of Experiments, (New York: John Wiley).Dagsvisk (1994), EconometricaDavidson, C. and S. Woodbury (1993), “The Displacement E¤ect of Reemployment

Bonus Programs,” Journal of Labor Economics 11(4):575-605.Davidson, C. and S. Woodbury (1995), “Wage Subsidies for Dislocated Workers,” un-

published manuscript (W.E. Upjohn Institute for Employment Research, Kalamazoo, MI).Dawkins, Christina, T.N. Srinivasan and John Whalley (1999), “Calibration” in James

Heckman and Edward Leamer, eds., Handbook of Econometrics: Volume 5, (Amsterdam:North Holland).

78

Domencich, Thomas and David McFadden (1975), Urban Travel Demand, (Amster-dam:North Holland).

Eberwein, C., J. Ham, and R. LaLonde (1997), “The Impact of Classroom Training onthe Employment Histories of Disadvantaged Women: Evidence From Experimental Data,”Review of Economic Studies 64(4):655-682.

Eckstein and Wolpin (1989),Eckstein and Wolpin (1996),Elders. C. and G. Ridder (1982),Review of Economic StudiesEngle, Hendry and Richard (1983),Epstein, Roy (1987), A History of Econometrics, (Amsterdam: North Holland).Fair, Ray (1994), Testing Macroeconometric Models, (Cambridge: Harvard University

Press).Fienberg (1974),Fisher, Franklin (1966), The Identi…cation Problem in Economics, (New York: McGraw

Hill).Fisher, Ronald Aylmar (1935), Design of Experiments, (New York: Hafner).Fisher, W. (1970), Aggregation in Economics, (Princeton: Princeton University Press).Flinn, Christopher (1984), “Behavioral Models of Wage Growth and Job Change Over

the Life Cycle,” University of Chicago, unpublished Ph.D. dissertation.Flinn, Christopher and James Heckman (1982) “New Method for Analyzing Structural

Models of Labor Force Dynamics,” Journal of Econometrics 18(1):115-168.Galles, David and Judea Pearl (1998), “An Axiomatic Characterization of Counterfac-

tuals,” Foundations of Science, 151-182.Glynn, Robert, Nan Laird and Donald Rubin (1986), “Selection Modeling vs. Mixture

Modeling with Nonignorable Response,” in Howard Wainer, ed., Drawing Inferences fromSelf-Selected Samples, (New York: Springer-Verlag).

Goldberger, Arthur (1983), “Abnormal Selection Bias,” in S. Karlin, Takeshi Amemiyaand L. Goodman, eds, Studies in Econometrics, Time Series and Multivariate Statistics,(New York: Wiley).

Goldberger, Arthur (1964), Econometric Theory, (New York: John Wiley).Goldberger, Arthur (1972), Selection Bias in Evaluating Treatment E¤ects, Discussion

paper No. 123-172, (Institute for Research on Poverty, University of Wisconsin).Goldberger, Arthur (1983),Goodman, (1970),Gorman, W., (1956),Gorman, W., (1980) “A Possible Procedure for Analyzing Quality Di¤erentials in the

Egg-Market,” Review of Economic Studies, 47:843-856.

79

Gourieroux, C., J. J. La¤ont and A. Monfort (1980), “Coherency Conditions in Simul-taneous Linear Equation Models With Endogenous Switching Regimes,” Econometrica,48:675-695.

Green, H.A.J. (1964), Aggregation In Economics, (Princeton: Princeton UniversityPress).

Gronau, Reuben (1974), “Wage Comparisons - A Selectivity Bias,” Journal of PoliticalEconomy 82(6):1119-1144.

Haavelmo, Trygve (1943), “ ”, Econometrica, XI,Haavelmo, Trygve (1944), “The Probability Approach in Econometrics,” Econometrica

XII, Supplement.Haberman, S. (1974), Analysis of Frequency Data, (Chicago: University of Chicago

Press).Haberman, S. (1978), Analysis of Qualitative Data, (New York: Academic Press, I and

II).Hahn, J., P. Todd and W. van der Klaauw (1998), “Estimation of Treatment E¤ects with

a Quasi-experimental Regression-Discontinuity Design with Application to Evaluating theE¤ect of Federal Antidiscrimination Laws on Minority Employment in Small U.S. Firms,”unpublished manuscript (University of Pennsylvania).

Ham J. and R. LaLonde (1996), “The E¤ect of Sample Selection and Initial Conditionsin Duration Models: Evidence From Experimental Data,” Econometrica 64(1):175-205.

Hamermesh, D. (1971), Economic Aspects of Manpower Training Programs, (LexingtonBooks, Lexington, MA).

Hansen, Lars and James Heckman (1966), “The Empirical Foundations of Calibration,”Journal of Economic Perspectives, X:87-104.

Hansen, Lars and Thomas Sargent (1980), “Formulating and Estimating Dynamic Lin-ear Rational Expectations Models,” Journal of Economic Dynamics and Control, 7-46.

Hansen, Lars and Thomas Sargent (1991), “Two Di¢culties in Interpreting Vector Au-toregressions,” in Rational Expections Econometrics, (Boulder, Colorado: Westview Press).

Hausman, Jerry (1980), “The E¤ects of Wages, Taxes, and Fixed Costs on Women’sLabor Force Participation,” Journal of Public Economics 14:161-194.

Heckman, James (1974a), “Shadow Prices, Market Wages and Labor Supply,” Econo-metrica, 42(4), 679-694.

Heckman, James (1974b) “E¤ects of Child-Care Programs on Women’s Work E¤ort,”Journal of Political Economy. Reprinted in T.W. Schultz (ed.) Economics of the Family:Marriage, Children, and Human Capital, (University of Chicago Press), 491-518.

Heckman, James (1974c), “Life Cycle Consumption and Labor Supply: An Explanationof the Relationship Between Income and Consumption over the Life Cycle,” AmericanEconomic Review 6(1):188-194.

80

Heckman, James (1976a), “The Common Structure of Statistical Models of Trunca-tion, Sample Selection and Limited Dependent Variables and A Simple Estimator for SuchModels,” Annals of Economic and Social Measurement, 5(4): 475-492.

Heckman, James (1976b), “Simultaneous Equations Models with Continuous and Dis-crete Endogenous Variables and Structural Shifts,” in: S. Goldfeld and R. Quandt, eds.,Studies in Nonlinear Estimation (Ballinger, Cambridge, MA.).

Heckman, James (1978a), “Dummy Endogenous Variables in a Simultaneous EquationsSystem”, Econometrica; 46(4):931-959.

Heckman, James (1978b), “A Partial Survey of Recent Research on the Labor Sup-ply of Women,” AEA Papers and Proceedings. Invited paper, presented to the AmericanEconomic Association, (New York).

Heckman, James (1979), “Sample Selection Bias as a Speci…cation Error,” Economet-rica, 47(1):153-161.

Heckman, James (1980), “Addendum to Sample Selection Bias As A Speci…cation Er-ror,” in: E. Stromsdorfer and G. Farkas, eds., Evaluation Studies Review Annual, Volume5 (Sage, San Francisco, CA) 970-995.

Heckman, James (1981), “Statistical Models for Discrete Panel Data,” in C. Manskiand D. McFadden, eds., Structural Analysis of Discrete Data with Economic Applications(Cambridge: MIT Press, 1981).

Heckman, James (1981b), “Heterogeneity and State Dependence, in S. Rosen, Studiesin Labor Markets, (Chicago, University of Chicago Press), 91-138.

Heckman, James (1986),Heckman, James (1990), “Varieties of Selection Bias,” American Economic Review,

80(2):313-318.Heckman, James (1997), “Instrumental Variables: A Study of Implicit Behavioral As-

sumptions in One Widely Used Estimator,” Journal of Human Resources 32(3):441-461.Heckman, James (2000), “Causal Parameters and Policy Analysis in Economics: A

Twentieth Century Retrospective,” Quarterly Journal of Economics 45-97.Heckman, James and Bo Honoré (1989), “The Identi…ability Of The Competing Risks

Model,” Biometrika 76(2):325-30.Heckman, James and Bo Honoré (1990), “The Empirical Content of the Roy Model,”

Econometrica LVIII:1121-1149.Heckman, James and George Borjas (1980),Heckman, James, Mark Killingsworth and Thomas MaCurdy (1981), “Empirical Evi-

dence on Static Labour Supply Models: A Survey of Recent Developments,” in Z. Hornstein,J. Grice and A. Webb, eds., The Economics of the Labour Market. London: Her Majesty’sStationery O¢ce, 75-122.

81

Heckman, James, Robert LaLonde and Je¤rey Smith (1999), “The Economics andEconometrics of Active Labor Market Programs,” in Orley Ashenfelter and David Card,eds., Handbook of Labor Economics, Chapter 31, (Amsterdam: North Holland).

Heckman, James and Thomas MaCurdy (1986), “Labor Econometrics,” in: Z. Grilichesand M. Intriligator, eds., Handbook of Econometrics (North-Holland, Amsterdam), 1917-1977.

Heckman, James and Thomas MaCurdy (1985),Heckman, James and Thomas MaCurdy (1981), “New Methods For Estimating Labor

Supply Functions,” in R. Ehrenberg, ed, Research in Labor Economics, Volume 4, (Green-wich, Conn.: JAI Press).

Heckman, James and Thomas MaCurdy (1980),Heckman, James and Richard Robb (1985a), “Alternative Methods for Evaluating the

Impact of Interventions,” in: J. Heckman and B. Singer, eds., Longitudinal Analysis of La-bor Market Data (Cambridge University Press for Econometric Society Monograph Series,New York, NY), 156-246.

Heckman, James and Richard Robb (1985b),“Alternative Methods for Evaluating theImpact of Interventions: An Overview,” Journal of Econometrics 30(1-2):239-267.

Heckman, J. and R. Robb (1986a), “Alternative Methods for Solving the Problem ofSelection Bias in Evaluating The Impact of Treatments on Outcomes,” in: H. Wainer,ed., Drawing Inferences From Self-Selected Samples (Springer-Verlag, Berlin, Germany),63-107.

Heckman, James and Richard Robb (1986b), “Alternative Identifying Assumptions inEconometric Models of Selection Bias,” in: G. Rhodes, ed. Advances in Econometrics,Volume 5 (JAI Press, Greenwich CT), 243-287.

Heckman, James and Guilherme Sedlacek (1985), “Heterogeneity, Aggregation and Mar-ket Wage Functions: An Empirical Model of Self-Selection in the Labor Market,” Journalof Political Economy 98(6):1077-1125.

Heckman, James and Burton Singer (1984), “A Method for Minimizing the Impactof Distributional Assumptions in Econometric Models for Duration Data,” Econometrica52(2):271-320.

Heckman, James and Burton Singer (1985), “Econometric Analysis of LongitudinalData,” in Z. Griliches and M. Intriligator, Handbook of Econometrics, Volume 3, (Amster-dam: North-Holland).

Heckman, James, Neil Hohmann, Michael Khoo and Je¤rey Smith (1998), “Substitutionand Dropout Bias in Social Experiments: Evidence From an In‡uential Social Experiment,”unpublished manuscript (University of Chicago).

Heckman, James, and Joseph Hotz (1989), “Choosing Among Alternative Methods ofEstimating the Impact of Social Programs: The Case of Manpower Training,” Journal of

82

the American Statistical Association 84(408):862-874.Heckman, James, Hidehiko Ichimura, Je¤rey Smith and Petra Todd (1996), “Sources

of Selection Bias in Evaluating Social Programs: An Interpretation of Conventional Mea-sures and Evidence on the E¤ectiveness of Matching as a Program Evaluation Method,”Proceedings of the National Academy of Sciences 93(23):13416-13420.

Heckman, James, Hidehiko Ichimura, Je¤rey Smith and Petra Todd (1998), “Charac-terizing Selection Bias Using Experimental Data,” Econometrica 66(5):1017-1098.

Heckman, James, Hidehiko Ichimura, and Petra Todd (1997), “Matching As An Econo-metric Evaluation Estimator: Evidence From Evaluating A Job Training Program,” Reviewof Economic Studies 64(4):605-654.

Heckman, James, Hidehiko Ichimura, and Petra Todd (1998), “Matching As An Econo-metric Evaluation Estimator,” Review of Economic Studies 65(2):261-294.

Heckman, James, Lance Lochner, Je¤rey Smith and Christopher Taber (1997), “The Ef-fects of Government Policies on Human Capital Investment and Wage Inequality,” ChicagoPolicy Review 1(2):1-40.

Heckman, James, Lance Lochner and Christopher Taber (1998a), “Explaining RisingWage Inequality: Explorations With A Dynamic General Equilibrium Model of Labor Earn-ings with Heterogeneous Agents,” Review of Economic Dynamics 1(1):1-58.

Heckman, James, Lance Lochner and Christopher Taber (1998b), “General EquilibriumTreatment E¤ects: A Study of Tuition Policy,” American Economic Review 88(2):381-386.

Heckman, James and Je¤rey Smith (1993), “Assessing The Case For Randomized Eval-uation of Social Programs,” in: K. Jensen and P.K. Madsen, eds., Measuring Labour MarketMeasures (Ministry of Labour, Copenhagen, Denmark) 35-96.

Heckman, James and Je¤rey Smith (1995), “Assessing The Case for Social Experi-ments,” Journal of Economic Perspectives 9(2):85-100.

Heckman, James and Je¤rey Smith (1998a), “Evaluating The Welfare State”, in: S.Strom, ed., Econometrics and Economics in the 20th Century: The Ragnar Frisch Cente-nary (Cambridge University Press for Econometric Society Monograph Series, New York,NY) forthcoming.

Heckman, James and Edward Vytlacil (1998), “Instrumental Variables Methods forthe Correlated Random Coe¢cient Model: Estimating the Average Rate of Return toSchooling when the Return is Correlated with Schooling,” Journal of Human ResourcesXXXIII:974-987.

Heckman, James and Edward Vytlacil (1999), “Local Instrumental Variables and LatentVariable Models For Identifying and Bounding Treatment E¤ects,” Proceedings NationalAcademy of Science 96:4730-4734.

Heckman, James and Edward Vytlacil (2000a), “The Relationship Between TreatmentParameters Within A Latent Variable Framework,” Economics Letters 66:33-39.

83

Heckman, James and Edward Vytlacil (2000b),Heckman, James and Edward Vytlacil (2001), “Structural Equations, Treatment E¤ects

and Econometric Policy Evaluation,” forthcoming, Econometrica.Heckman, James and Edward Vytlacil (2001b),Heckman, James, Je¤rey Smith, and Nancy Clements (1997), “Making The Most Out

of Programme Evaluations and Social Experiments: Accounting for Hetergeneity in Pro-gramme Impacts,” Review of Economic Studies 64(4):487-535.

Heckman, James and Robert Willis (1977), “A Beta Logistic Model for Analysis ofSequential Labor Force Participation by Married Women,” Journal of Political Economy85:27-58.

Hendry, David and Jean-Francis Richard (1982), “On the Formulation of EmpiricalModels in Dynamic Econometrics,” Annals of Applied Econometrics XX:3-34.

Hildenbrand (1986),Holland, Paul (1988), “Causal Inference, Path Analysis, and Recursive Structural Equa-

tion Models,” in: C. Clogg, ed., Sociological Methodology (American Sociological Associa-tion, Washington, DC), 449-484.

Hong (1998),Honoré, B. and E. Kyriazidou (1998), “Panel Data Discrete Choice Models With Lagged

Dependent Variables,” unpublished manuscript (University of Chicago).Horowitz and Manski (1995),Hotz, Joseph and Robert Miller (1987), “A Dynamic Model of Fertility and Labor

Supply,” Econometrica,Jorgenson, Dale (1963), “Capital Theory and Investment Behavior,” American Eco-

nomic Review LIII:247-259.Killingsworth, Mark (1983), Labor Supply, (Cambridge: Cambridge University Press).Knight, Frank (1921), Risk, Uncertainty and Pro…t, (New York: Houghton Miin Co).LaLonde, Ronald (1986), “Evaluating the Econometric Evaluations of Training Pro-

grams with Experimental Data,” American Economic Review, 76(4):604-620.LaLonde, Ronald (1995), “The Promise of Public Sector-Sponsored Training Programs,”

Journal of Economic Perspectives 9(2):149-168.Lancaster, Kelvin (1966), “A New Approach To Consumer Theory,” Journal of Political

Economy 74:132-157.Lancaster, Kelvin (1970),Lancaster, Kelvin (1971), Consumer Demand: A New Approach, (New York: Columbia

University).Lancaster, T. (1990), Econometric Analysis of Transition Data, (Cambridge University

Press for Econometric Society Monograph Series, Cambridge).

84

Lee, L.F. (1978), “Unionism and Wage Rates; A Simultaneous Equations Model withQualitative and Limited Dependent Variables,” International Economic Review 19:415-433.

Lee, L.F (1981)., “Simultaneous Equation Models with Discrete and Censored Vari-ables,” in C. Manski and D. McFadden, eds., Structural Analysis of Discrete Data withEconometric Applications, (MIT Press, Cambridge, MA).

Lee, L.F. (1982), “Some Approaches to the Correction of Selectivity Bias,” Review ofEconomic Studies 49:355-372.

Lee, L. (1983), “Generalized Econometric Models with Selectivity,” Econometrica51(2):507-512.

Lewis, H. Gregg (1986), Unionism and Relative Wages E¤ects : A Survey, (Chicago:University of Chicago Press).

Lewbel, A. (1998), “Semiparametric Qualitative Response Model Estimation with In-strumental Variables and Unknown Heteroskedasticity,” unpublished manuscript (BrandeisUniversity).

Lord, F. and M. Novick (1968), Statistical Theories of Mental Test Scores, (Reading:Addison-Wesley Publishing Company).

Lucas, Robert (1976), “Econometric Policy Evaluation: A Critique,” The Phillips Curveand Labor Markets, Carnegie Rochester Conference Series on Public Policy, 19-46.

Lucas, Robert and Thomas Sargent (1981), Essays on Rational Expectations and Econo-metric Practice, (Minneapolis: University of Minnesota Press).

MaCurdy, Thomas (1982), “The Use of Time Series Processes to Model the ErrorStructure of Earnings in a Longitudinal Data Analysis,” Journal of Econometrics 18(1):83-114.

MaCurdy, Thomas (1981), Journal of Political EconomyMcFadden, Daniel (1985), “Econometric Analysis of Qualitative Response Models,” in

K.J. Arrow, and M.D. Intriligator, eds., Handbook of Econometrics ( North-Holland), II,Ch. 24.

McFadden, Daniel (1981),McFadden, Daniel (1975),McFadden, Daniel (1974), “Conditional Logit Analysis of Qualitative Choice Behavior,”

in: P. Zarembka, ed., Frontiers in Econometrics, (New York: Academic Press).McFadden, Daniel (1973a),Malar (1977),Manski Charles and S. Lerman (1977), “The Estimation of Choice Probabilites From

Choice-Based Samples,” Econometrica 45(8):1977-1988.Manski, Charles and Daniel McFadden (1981),“Alternative Estimators and Sample De-

signs for Discrete Choice Analysis,” in: C. Manski and D. McFadden, eds., Structural

85

Analysis of Discrete Data With Econometric Applications, (MIT Press, Cambridge, MA),1-50.

Manski, Charles (1990),Manski, Charles (1994),Manski, Charles (1995), The Identi…cation Problem in the Social Sciences, (Cambridge:

Harvard University Press).Marschak, Jacob and William Andrews (1944), “Random Simultaneous Equations and

the Theory of Production,” Econometrica XII:143-205.Marschak, Jacob (1953), “Economic Measurements for Policy and Prediction,” in William

Hood and Tjalling Koopmans, eds., Studies in Econometric Method, (New York: John Wi-ley), 1-26.

Marshall, Alfred (1920), Principles of Economics, Ninth Edition, (London: McMillan),36.

Matzkin, R. (1992), “Nonparametric and Distribution-Free Estimation of the BinaryThreshold Crossing and The Binary Choice Models,” Econometrica 60(2):239-270.

Matzkin, R. (1993), “Nonparametric Identi…cation and Estimation of PolychotomousChoice Models,” Journal of Econometrics 58(1-2):137-168.

Mincer, Jacob (1962), “Labor Force Participation of Married Women,” in H. GreggLewis, ed., Aspects of Labor Economics, (Princeton: Princeton University Press).

Mincer, Jacob (1974), Schooling, Experience and Earnings, (New York: Columbia Uni-versity Press).

Morgan, Mary (1990), The History of Econometric Ideas, (Cambridge: Cambridge Uni-versity Press).

Olson, R. (1980), “A Least Squares Correction for Selectivity Bias,” Econometrica48:1815-1820.

Orcutt, Guy (1952), “Towards Partial Redirection of Econometrics,” Review of Eco-nomics and Statistics XXXIV:195-200.

Orcutt Alice and Guy Orcutt (1968), “Experiments for Income Maintenance Policies,”American Economic Review LVIII:754-772.

Pessino, (1990),Phelps (1972),Powell, J. (1989),Powell, J. (1994), “Estimation of Semiparametric Models,” in: R. Engle and D. Mc-

Fadden, eds., Handbook of Econometrics, Volume 4, (North-Holland, Amsterdam, Nether-lands), 2443-2521.

Powell, J. (2000),Quandt, Richard, (1956), “A Probablitistic Model of Consumer Behavior,” Quarterly

Journal of Economics 79:507-536.

86

Quandt, Richard and W. Baumol (1966), “The Demand For Abstract Transport Modes:Theory and Measurement,” Journal of Regional Science 6:13-26.

Quandt, Richard, (1970), The Demand for Travel, (Lexington, Mass: D.C. Health).Quandt, Richard (1972), “Methods for Estimating Switching Regressions,” Journal of

the American Statistical Association 67(338):306-310.Quandt, Richard (1988), The Economics of Disequilibrium, (Basil Blackwell, Oxford,

UK).Ransom (1988),Rao, C. R. (1986), “Weighted Distributions.” in: S. Fienberg, ed., A Celebration of

Statistics, (Springer-Verlag, Berlin).Rao, C. R. (1965), “On Discrete Distributions Arising Out of Methods of Ascertain-

ment,” in Classical and Contagious Distributions, ed., G. Patil, (Calcutta: PergamonPress).

Robins, Jamie (1989), “The Analysis of Randomized and Non-Randomized AIDS Treat-ment Trials Using a New Approach to Causal Inference in Longitudinal Studies,” in L.Sechrest, H. Freeman, and A. Mulley, eds., Health Service Research Methodology: A Focuson AIDS, U.S. Public Health Service, Washington, DC, 113-150.

Robinson, C., J (1989), Journal of Political Economy,Robinson, P. (1996), “The Role and Limits of Active Labour Market Policy,” European

Union Insitute Working Paper RSC No. 96/27.Robinson, C. and N. Tomes (1982), “Self Selection and Interprovincial Migration in

Canada,” Canadian Journal of Economics 15(3):474-502.Rosenbaum, Paul (1995), Observational Studies, (Springer-Verlag, Leipzig, Germany).Rosenbaum, Paul and D. Rubin (1983), “The Central Role of the Propensity Score in

Observational Studies for Causal E¤ects,” Biometrika 70(1):41-55.Roy, A. D. (1951), “Some Thoughts on the Distribution of Earnings,” Oxford Economic

Papers 3:135-146.Rubin, Donald (1978), “Bayesian Inference for Causal E¤ects: The Role of Random-

ization,” Annals of Statistics 6:34-58.Rust, John (1986),Rust, John (1984), “Maximum Likelihood Estimation of Controlled Discrete Choice

Processes,” SSRI No. 8407, University of Wisconsin.Sattinger, M. (1975), “Comparative Advantage and Distributions of Earnings and Abil-

ities,” Econometrica.Sattinger, M. (1978), “Comparative Advantage In Individuals,” Review of Economics

and Statistics 259-267.Sattinger, M., (1980) Capital and The Distribution of Labor Earnings, (North Holland).

87

Schmidt, Peter (1981), “Constraints on the Parameters in Simultaneous Tobit andProbit Models,” in C. Manski and D. McFadden, eds., Structural Analysis of Discrete Datawith Econometric Applications (MIT Press, Cambridge, MA), 1-50.

Schultz, Henry (1938), The Theory and Measurement of Demand, (University of ChicagoPress: Chicago, IL).

Stigler (1986)Strauss, R. and P. Schmidt (1976), “The E¤ects of Unions on Earnings and Earnings

on Unions: A Mixed Logit Approach,” International Economic Review 17(1):204-212.Theil, H. (1954),Theil, H. (1957),Thiel, H. (1962),Theil, H. (1970),Tunali (2000),van der Klaauw, W. (1997), “A Regression-Discontinuity Evaluation of the E¤ect of

Financial Aid O¤ers on College Enrollment,” unpublished manuscript (New York Univer-sity).

Viverberg, William (1993), “Measuring the Unidenti…ed Parameter of the Roy Modelof Selectivity,” Journal of Econometrics LVII:69-90.

Wales, T.J. and A.D. Woodland (1979), “Labour Supply and Progressive Taxes,” Reviewof Economic Studies 46:83-95.

Wallis, Kenneth (1980), “Econometric Implications of the Rational Expectations Hy-pothesis,” Econometrica XLVIII:49-74.

Willis, Robert and Sherwin Rosen (1979), “Education and Self Selection,” Journal ofPolitical Economy 87:S7-S36.

Willis, Robert (1977),Wolpin, Kenneth (1984), “An Estimable Dynamic Stochastic Model of Fertility and

Child Mortality,” Journal of Political Economy 92.Wright, Philip Green (1928), The Tari¤ on Animal and Vegetable Oils, (New York:

MacMillan).Wright, Sewall (1934), “The Method of Path Coe¢cients,” Annals of Mathematical

Statistics, V:161-215.

88

Microdata, Heterogeneity And The Evaluation of Public Policy · The Evaluation of Public Policy...

Documents

Transcript of Microdata, Heterogeneity And The Evaluation of Public Policy · The Evaluation of Public Policy...